文章目录
摘要
多维缩放(MDS)是一种保持样本间距离关系的降维技术,通过将高维空间中的距离矩阵转换为低维空间中的内积矩阵来实现。在MDS中,首先计算原始数据间的欧氏距离,然后构造出一个中心化的内积矩阵B,并对其进行特征值分解以获得降维后的坐标。主成分分析(PCA)则是另一种广泛使用的降维方法,它基于最大化投影后样本点方差的原则,通过求解协方差矩阵的特征向量找到最佳投影方向。两种方法都旨在减少数据维度同时尽可能保留原始数据的信息。
Abstract
Multidimensional scaling (MDS) is a dimensionality reduction technique that maintains the distance relationship between samples by converting the distance matrix in the high-dimensional space to the inner product matrix in the low-dimensional space. In MDS, the Euclidean distance between the original data is calculated first, and then a centralized inner product matrix B is constructed, and the eigenvalue decomposition is performed to obtain the coordinates after dimensionality reduction. Principal component analysis (PCA) is another widely used dimensionality reduction method, which is based on the principle of maximizing the variance of the projected sample points, and finds the best projection direction by solving the eigenvector of the covariance matrix. Both approaches aim to reduce the data dimension while preserving as much information as possible about the original data.
1. 多维缩放(MDS)
目的:要求原始空间中样本之间的距离在低维中得以保持。

假定一共有m个样本空间的距离矩阵为
D∈ R m × m {R}^{m\times m} Rm×m,令 B = Z T Z ∈ R m × m \mathbf{B}=\mathbf{Z}^{\mathrm{T}}\mathbf{Z}\in\mathbb{R}^{m\times m} B=ZTZ∈Rm×m,其中B为降维后样本的内积矩阵, b i j = z i T z j b_{ij}=z_i^\mathrm{T}z_j bij=ziTzj有 d i s t i j 2 = ∥ z i ∥ 2 + ∥ z j ∥ 2 − 2 z i T z j = b i i + b j j − 2 b i j . dist_{ij}^2=\|\boldsymbol{z}_i\|^2+\|\boldsymbol{z}_j\|^2-2\boldsymbol{z}_i^\mathrm{T}\boldsymbol{z}_j\\=b_{ii}+b_{jj}-2b_{ij} . distij2=∥zi∥2+∥zj∥2−2ziTzj=bii+bjj−2bij.
令降维后的样本Z被中心化 ∑ i = 1 m z i = 0 \sum_{i=1}^mz_i=0 ∑i=1mzi=0.显然,矩阵B的行与列之和均为零,即 ∑ i = 1 m b i j = ∑ j = 1 m b i j = 0. \sum_{i=1}^mb_{ij}=\sum_{j=1}^mb_{ij}=0. ∑i=1mbij=∑j=1mbij=0.可以得到 ∑ i = 1 m d i s t i j 2 = t r ( B ) + m b j j ∑ j = 1 m d i s t i j 2 = t r ( B ) + m b i i ∑ i = 1 m ∑ j = 1 m d i s t i j 2 = 2 m t r ( B ) \sum_{i=1}^{m}dist_{ij}^{2}=\mathrm{tr}(\mathbf{B})+mb_{jj}\\\sum_{j=1}^{m}dist_{ij}^{2}=\mathrm{tr}(\mathbf{B})+mb_{ii}\\\sum_{i=1}^{m}\sum_{j=1}^{m}dist_{ij}^{2}=2m \mathrm{tr}(\mathbf{B})


3295

被折叠的 条评论
为什么被折叠?



