Machine Learning Keyworks
https://www.coursera.org/learn/machine-learning/resources/JXWWS
Supervised vs. Unsupervised Learning
Supervised
- data set with collect output
- categorized "regression" and "classification"
Unsupervised
- with little or no idea what result should look like
- clustering
Linear Regression
trying to fit the output onto a continuous expected result function
univariate linear regression: predict a single output y from a single input
Hypothesis Function
hθ(x)=θ0+θ1x
Cost Function
J(θ0,θ1)=2m1i=1∑m(y^i−yi)2=2m1i=1∑m(hθ(xi)−yi)2
Gradient Decent
repeat until convergence:
θj:=θj−α[Slope of tangent aka derivative in j dimension]
repeat until convergence: {θ0:=θ1:=}θ0−α1m∑i=1m(hθ(xi)−yi)θ1−α1m∑i=1m((hθ(xi)−yi)xi)Linear Regression with Multiple Variables
hθ(x)=θ0+θ1x1+θ2x2+θ3x3+⋯+θnxnh
X=⎡⎣⎢⎢⎢x(1)0x(2)0x(3)0x(1)1x(2)1x(3)1⎤⎦⎥⎥⎥,θ=[θ0θ1]
hθ(X)=Xθ
J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2
J(θ)=2m1(Xθ−y)T(Xθ−y)
repeat until convergence:{θj:=θj−α1m∑i=1m(hθ(x(i))−y(i))⋅x(i)jfor j := 0..n
θ:=θ−α∇J(θ)
θ:=θ−mαXT(Xθ−y)
Feature Normalization
feature scaling and mean normalization
xi:=sixi−μi
Where is the average of all the values for feature (i) and is the range of values (max - min), or is the standard deviation.
Normal Equation
θ=(XTX)−1XTy
Gradient Descent | Normal Equation |
---|---|
Need to choose alpha | No need to choose alpha |
Needs many iterations | No need to iterate |
O () | O (), need to calculate inverse of |
Works well when n is large | Slow if n is very large |
may be noninvertible. The common causes are:
- Redundant features, where two features are very closely related (i.e. they are linearly dependent)
- Too many features (e.g. m ≤ n). In this case, delete some features or use "regularization" (to be explained in a later lesson).