Q1. What are the main motivations for reducing a dataset's dimensionality? What are the main drawbacks?
A1:
Motivations:
- To speed up a subsequent training algorithm.
- To visualize the data and gain insights on the most important features.
- Simply to save space like compression.
Drawbacks:
- Some information is lost, possibly degrading the performance of subsequent training algorithms.
- It can be computationally intensive.
- It adds some complexity to your Machine Learning pipelines.
- Transformed features are often hard to interpret or reconstruct.
Q2. What is the curse of dimensionality?
A2: The curse of dimensionality is the fact that many problems that do not exist in low-dimensional space arise in high-dimensional space.
Q3. Once a dataset's dimensionality has been reduced, is it possible to reverse the

本文介绍了维度约简的主要动机,如加速训练、数据可视化和节省空间,同时也讨论了其可能带来的信息丢失、计算成本增加和解释难度增大等缺点。解释了维度灾难的概念,即高维空间中出现的问题。PCA作为常见的降维方法,可以在非线性数据集中使用,但可能会损失大量信息。PCA的逆操作通常不可能完美还原,因为降维过程会丢失信息。根据数据集,PCA可以将1000维数据降至5%方差解释率时的任意维度。不同场景下,可以选择普通PCA、增量PCA、随机PCA或核PCA。评估降维算法性能的方法是通过重建误差,而串联两种不同的降维算法有时能以更短的时间达到相似的效果。

550

被折叠的 条评论
为什么被折叠?



