当前位置:
文档之家› 机器学习总结一 算法公式的向量化实现vectorization
机器学习总结一 算法公式的向量化实现vectorization
endfor
或者可以这样写:
Y=(labels*ones(1,m)==ones(k,1)* yT);
hΘ(X)∈IRk*m;
predicts∈IRm;
[,predicts]=max(hΘ(X));
公式:
向量化(Vectorization):Neural network
A(j+1)=g(Θ(j)*A(j));
nu: number of users;
nm: number of movies;
r(i,j): 1 if user j has rated moviei(otherwise0);矩阵形式R;
y(i,j): rating given by user j to moviei (defined only if r(i,j)=1);矩阵形式Y;
set labels=[1;2;…;k];
labels=[1:k]T;
δi(j)=’error’of nodeiin layerj;
setp is the last layer,
hΘ(x(j))= a(p);针对m=1的情况;
δi(p)=hΘ(x(j))i-Y(i,j)= ai(p)-Y(i,j);此处Y(i,j)用yi代替会导致定义混淆;详情见维数分析
μ∈IR1*n;
∑∈IRn*n;Sgma2=∑;
sigma2=var(X);
μ=mean(X);
Sigma2=1/m*(X-μ)T*(X-μ);
or
Sigma2=diag(sigma2);
sigma2=sum(Sigma2);
p(X)∈IRm;
II-3RecommenderSystems
Definition
a(j+1)=g(Θ(j)*a(j));Forwardpropagation,由a(j)----a(j+1);
add a0(j+1)=1;加上常数项;
a(1)=x or x(i);
A(1)=X(T);
S1=n;
K:number ofoutputunits;
setp is the last layer,K=Sp;
θ:=θ-α/m*XT*( X*θ-y);
grad=1/m*XT*( X*θ-y);
I-2Logisticregression:
hθ(x(i))=g(θTx(i));
gis thelogisticfunction/sigmoid function.
g(z)=sigmoid(z)=1/(1+exp(-z));
Ureduce∈IRn*k;
Z∈IRm*k;
Xrec∈IRm*n;
II-2Anomalydetection
Originalmodel
向量化(Vectorization):
p(X)=prod((((2*π*sigma2).^(-0.5)).*exp(-((X-μ).^2)./(2*sigma2))),2);
机器学习总结一算法公式的向量化实现vectorization
目录(参考octave语法)
I Supervised learning
I-1Linear regression
I-2Logistic regression
I-3Regularized logistic regression
I-4 Neural network
Delta(j-1)= Delta(j-1)(2:end,:);把常数项去掉。
setreg_Θ(j)=Θ(j); reg_Θ(j)(:,1)=0;
Grad(j)=1/m*Delta(j+1)*( a(j))T+λ/m* reg_Θ(j);
IIUnsupervisedlearning
II-1 PCA
Sigma=1/m*XT*X;
grad(j)=
向量化(Vectorization):Logisticregression
hθ(X)=g(X*θ);
J(θ)=1/m*(-yT*log(hθ(X))-(1-y)T* log(1-hθ(X)));
grad=1/m*XT*(hθ(X)-y);
I-3Regularized logistic regression:
MultivariateGaussian
向量化(Vectorization):
p(X)= (2 *π) ^ (-n/ 2) * det(Sigma2) ^ (-0.5) * ...
exp(-0.5 * sum((X-μ)* pinv(Sigma2).*(X-μ)), 2));
维数分析:
σ2∈IR1*n;sigma2=σ2;
IIUnsupervisedlearning
II-1 PCA
II-2Anomalydetection
II-3RecommenderSystems
II-4Stochasticgradientdescent
ISupervised learning:
x(i)∈IRn+1;
X∈IRm*(n+1);
y∈IRm;
set p is the last layer,
X=[ones(m,1),X];X要加上常数项;
A(2)=g(Θ(1)*A(1))= g(Θ(1)*XT);
A(2)=[ones(1,m); A(2)];A(2)要加上常数项;和X加常数项的方向是反的。详情见维数分析;
…
hΘ(X)= A(p)=g(Θ(p-1)*A(p-1));
J(Θ)=-1/m*sum((Y.*log(hΘ(X)+(1-Y).*log(1- hΘ(X)))(:))+…
λ/(2m)*(sum(Θ(:).^2 -Θ(p-1)(:,1).^2-Θ(p-2)(:,1).^2-…
Θ(1)(:,1).^2);
Delta(p)= A(p)-Y;
Delta(j-1)= (Θ(j-1))T*Delta(j).* A(j-1).*( 1-A(j-1));
0.5*λ*sum((X.^2)(:))+...
0.5*λ*sum((Θ.^2)(:));
X_grad=((X*ΘT-Y).*R)*Θ+λ*X;
Θ_grad=((X*ΘT-Y).*R)T*X+λ*Θ;
II-4Stochasticgradientdescent
Batchgradientdescent
θ:=θ-α/m*XT*( X*θ-y);
δ(p)= a(p)-Y(:,j);
δ(p-1)= (Θ(p-1))T*δ(p).* a(p-1).*( 1-a(p-1));Backpropagationalgorithm,由δ(p)----δ(p-1);
δ(p-1)=δ(p-1)(2:end,:);把常数项去掉。
…
δ(1)= 0;
维数分析:
a(j)∈IRSj;
grad=1/m*XT*(hθ(X)-y)+λ/m*θ.*vec1;
I-4 Neural network
Definition定义:
ai(j): unit i in layer j;
Sj: number of units(notcountingbiasunit)in layer j;
Θ(j):matrix of weights mapping from layer j to layer j+1;(不同的Θ(j)具有不同的维数,所以无法汇成一个总的矩阵Θ,但是每个Θ(j)都展开成一个向量Θ(j)(:),最后可以汇总成一个总的向量Θ(:);梯度Grad情形类似)
θ(j):parameter vector of user j;矩阵形式Θ;
x(i):feature vector of movie i;矩阵形式:X;
p(i,j)=(θ(j))Tx(i); predicted rating for user j,movie i;
P=XΘT;
维数分析:
θ(j)∈IRn;
θ∈IRn+1;
J(θ)∈IR;
hθ(x(i))∈IR;
hθ(X)∈IRm;
grad∈IRn+1;
gradon:
hθ(x(i))=θTx(i);
向量化(Vectorization):Linear regression
hθ(X)=X*θ;
J(θ)=1/(2*m)*( X*θ-y)T*( X*θ-y);
δ(j)∈IRSj;delta(j)=δ(j);
Delta(j)∈IRSj*m;
Θ(j)∈IR(Sj+1*(Sj+1));
Grad(j)∈IR(Sj+1*(Sj+1));
A(j)∈IRSj*m;
y∈IRm;
labels∈IRk;
Y∈IRk*m;
for i=1:m
Y(:,i)=(labels==y(i));
向量化(Vectorization):Regularized logistic regression
set :vec1=ones(n+1,1);vec1(1,1)=0;
J(θ)=1/m*(-yT*log(hθ(X))-(1-y)T* log(1-hθ(X)))+…
λ/(2m)*( (θ.*vec1)T*(θ.*vec1));
Stochasticgradientdescent
θ:=θ-α*( X(i)*θ-y(i)) *X(i)T;
[U,S,V]=svd(Sigma);
Ureduce=U(:,1:k);
Z=X* Ureduce;
Xrec=Z* UreduceT;Xrec:Reconstructing an approximation of the data
维数分析: