正在学习Stanford吴恩达的机器学习课程,常做笔记,以便复习巩固。
鄙人才疏学浅,如有错漏与想法,还请多包涵,指点迷津。
7.1 Clustering
7.1.1 K-means algorithm
Intuition
K-means algorithm has two steps :
- Cluster assignment
- Move centroid step
The algorithm illustrations is show in the picture below :
Symbols
- c(i)c(i) : index of cluster (1,2,…K1,2,…K) to which example x(i)x(i) is currently assigned
- μkμk : cluster centroid k(μk∈Rn)k(μk∈Rn)
- μc(i)μc(i) : cluster centroid of cluster to which example x(i)x(i) has been assigned
Optimization objective
K-means algorithm - Algorithm 4
Randomly initialize KK cluster centroids
Repeat{
for i=1i=1 to mm
c(i):=c(i):= index (from 11 to ) of cluster centroid closest to x(i)x(i)
for kk 1 to
μk:=μk:= average (mean) of points assigned to cluster kk
}
7.1.2 Important tricks
We randomly choose the KK cluster centroids, and different case result in different optimal solution, which may cause local optimal.
For example :
Random Initialization
For to 100100{
Randomly initialize K-means.
Run K-means. Get c(1),⋯,c(m),μ1,⋯,μKc(1),⋯,c(m),μ1,⋯,μK.
Compute cost function (distortion)
J(c(1),⋯,c(m),μ1,⋯,μK)J(c(1),⋯,c(m),μ1,⋯,μK)
}
Pick clustering that gave lowest cost J(c(1),⋯,c(m),μ1,⋯,μK)J(c(1),⋯,c(m),μ1,⋯,μK).For k=2to10k=2to10, random initialization behave well, when kk is large, it is easy to get a good solution at a time.
Number of Clusters
Choosing the number of clusters is a matter of option. It is often based on experience.
One way to try (but not always effective) is Elbow method, draw the figure, and choose K.
Sometimes, K-means is used for some later/downstream purpose. Evaluate K-means based on metric for how well it performs for that later purpose.
7.2 Dimensionality Reduction
7.2.1 Intuition
The intuition from 2D to 1D and from 3D to 2D is showed below :
Application : Data Compress, Data Visualization …
7.2.2 Principal Component Analysis
Reduce from nn-dimension to -dimension, what the PCA do is :
Find kk vectors onto which to project the data so as to minimaze the projection error.
Principal Component Analysis - Algorithm 5
Preprocessing “feature scaling” / “mean normalization” (ensure zero mean)
Calculate the covariance matrix :
Σ=1m∑mi=1(x(i))(x(i))TΣ=1m∑i=1m(x(i))(x(i))T (mark Sigma = ΣΣ)
Do the single value decomposition :
[U, S, V] = svd(Sigma);
Ureduce = U(:, 1 : k);
z = Ureduce’ * x;Reconstruction from Compressed Representation
x(i)=Ureducez(i),i=1,2,⋯,mx(i)=Ureducez(i),i=1,2,⋯,m7.2.3 Choose the kk
Here the (dimension of zz) is also call number of principal components.
Typically, choose to be smallest value so that
1m∑mi=1∥∥x(i)−x(i)approx∥∥21m∑mi=1∥∥x(i)∥∥2≤0.011m∑i=1m‖x(i)−xapprox(i)‖21m∑i=1m‖x(i)‖2≤0.01The number 0.010.01 indicates that 99%99% of variance is retained.
An easier way to calculate is showed below :
Choose the kk - Algorithm 6
[U, S, V] = svd(Sigma)
Pick smallest value of for which
∑ki=1Sii∑mi=1Sii≥0.99∑i=1kSii∑i=1mSii≥0.997.2.4 Advice for applying PCA
Supervisied learning speedup
Given a dataset : (x(1),y(1)),(x(2),y(2)),⋯(x(m),y(m))(x(1),y(1)),(x(2),y(2)),⋯(x(m),y(m)), x(i)∈Rnx(i)∈Rn
- Extract inputs and get unlabeled dataset.
- Apply PCA algorithm.
- Get new training set.
Finally, we get new training set : (z(1),y(1)),(z(2),y(2)),⋯(z(m),y(m))(z(1),y(1)),(z(2),y(2)),⋯(z(m),y(m)), z(i)∈Rkz(i)∈Rk
Note :
- Mapping x(i)→z(i)x(i)→z(i) should be defined by running PCA only the training set.
- This mapping can be applied to cross validation and test sets.
Bad use of PCA : To prevent overfitting
That is : use z(i)z(i) instead of x(i)x(i) to reduce the number of features to k<nk<n
Reason : PCA will throw away some valuable information.
Consider machine learning without PCA first
Before implementing PCA, first try running whatever to get with the raw/original data. Only if that doesn’t do idealy, them implement PCA.
本文介绍了吴恩达机器学习课程中的聚类算法K-means及其注意事项,包括初始化技巧和选择聚类数量的方法。此外,还详细阐述了主成分分析(PCA)用于数据降维的过程,并给出了如何选择保留主成分数量的建议。

1万+

被折叠的 条评论
为什么被折叠?



