Machine Learning Keyworks

IT/Machine Learning

Machine Learning Keyworks

묵회 2019. 1. 24. 12:29

https://www.coursera.org/learn/machine-learning/resources/JXWWS

Supervised vs. Unsupervised Learning

Supervised

- data set with collect output

- categorized "regression" and "classification"

Unsupervised

- with little or no idea what result should look like

- clustering

Linear Regression

trying to fit the output onto a continuous expected result function

univariate linear regression: predict a single output y from a single input

Hypothesis Function

hθ(x)=θ0+θ1x

Cost Function

J(θ0,θ1)=2m1i=1∑m(y^i−yi)2=2m1i=1∑m(hθ(xi)−yi)2

Gradient Decent

repeat until convergence:

$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$

θj:=θj−α[Slope of tangent aka derivative in j dimension]

repeat until convergence: {θ0:=θ1:=}θ0−α1m∑i=1m(hθ(xi)−yi)θ1−α1m∑i=1m((hθ(xi)−yi)xi)

Linear Regression with Multiple Variables

hθ(x)=θ0+θ1x1+θ2x2+θ3x3+⋯+θnxnh

X=⎡⎣⎢⎢⎢x(1)0x(2)0x(3)0x(1)1x(2)1x(3)1⎤⎦⎥⎥⎥,θ=[θ0θ1]

hθ(X)=Xθ

J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2

J(θ)=2m1(Xθ−y)T(Xθ−y)

repeat until convergence:{θj:=θj−α1m∑i=1m(hθ(x(i))−y(i))⋅x(i)jfor j := 0..n

θ:=θ−α∇J(θ)

θ:=θ−mαXT(Xθ−y)

Feature Normalization

feature scaling and mean normalization

xi:=sixi−μi

Where $μ_i$ is the average of all the values for feature (i) and $s_i$ is the range of values (max - min), or $s_i$ is the standard deviation.

Normal Equation

θ=(XTX)−1XTy

Gradient Descent	Normal Equation
Need to choose alpha	No need to choose alpha
Needs many iterations	No need to iterate
O ( $kn^2$ )	O ( $n^3$ ), need to calculate inverse of $X^TX$
Works well when n is large	Slow if n is very large

$X^{T} X$ may be noninvertible. The common causes are:

Redundant features, where two features are very closely related (i.e. they are linearly dependent)
Too many features (e.g. m ≤ n). In this case, delete some features or use "regularization" (to be explained in a later lesson).

Logistic Regression

Decision Boundary

Regularization

Neural Network

Backpropagation Algorithm

Gradient Checking

Sigmoid Function

Train/Validation/Test set

- Crosse Validation(CV)

Bias(underfitting) vs. Variance(overfitting)

skewed class

Precision vs Recall

- True Positive, True Negative, False Positive, False Negative

F Score

Support Vector Machine(SVM)

Large Margin Classification

Kernels

Clustering

K-Mean Algorithm

elbow method

Dimentionality Reduction

Principal Component Analysis(PCA)

covariance matrix

eigenvectors

Anomaly Detection

Gaussian Distribution

Recommender System

Contents Based Recommendations

Collaborated Filtering

Mean Normalization

Stochastic Gradient Descent

Mini-Batch Gradient Descent

Online Learning

Map Reduce

저작자표시 비영리 (새창열림)