Hands-On Machine Learning with Scikit-Learn & TensorFlow Exercise Q&A Chapter08

原创

于 2019-03-22 20:43:30 发布 · 587 阅读

·

0

·

标签

#Machine Learning #HandsOn #Dimensionality Reduction

本文介绍了维度约简的主要动机，如加速训练、数据可视化和节省空间，同时也讨论了其可能带来的信息丢失、计算成本增加和解释难度增大等缺点。解释了维度灾难的概念，即高维空间中出现的问题。PCA作为常见的降维方法，可以在非线性数据集中使用，但可能会损失大量信息。PCA的逆操作通常不可能完美还原，因为降维过程会丢失信息。根据数据集，PCA可以将1000维数据降至5%方差解释率时的任意维度。不同场景下，可以选择普通PCA、增量PCA、随机PCA或核PCA。评估降维算法性能的方法是通过重建误差，而串联两种不同的降维算法有时能以更短的时间达到相似的效果。

Q1. What are the main motivations for reducing a dataset's dimensionality? What are the main drawbacks?

A1:

Motivations:

To speed up a subsequent training algorithm.
To visualize the data and gain insights on the most important features.
Simply to save space like compression.

Drawbacks:

Some information is lost, possibly degrading the performance of subsequent training algorithms.
It can be computationally intensive.
It adds some complexity to your Machine Learning pipelines.
Transformed features are often hard to interpret or reconstruct.

Q2. What is the curse of dimensionality?

A2: The curse of dimensionality is the fact that many problems that do not exist in low-dimensional space arise in high-dimensional space.

Q3. Once a dataset's dimensionality has been reduced, is it possible to reverse the

最低0.47元/天解锁文章

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。