1.sklearn中与逻辑回归有关的三个类
sklearn中,lr相关的代码在linear_model模块中,查看linear_model的__init__文件,内容如下
__all__ = ['ARDRegression',
'BayesianRidge',
'ElasticNet',
'ElasticNetCV',
'Hinge',
'Huber',
'HuberRegressor',
'Lars',
'LarsCV',
'Lasso',
'LassoCV',
'LassoLars',
'LassoLarsCV',
'LassoLarsIC',
'LinearRegression',
'Log',
'LogisticRegression',
'LogisticRegressionCV',
'ModifiedHuber',
'MultiTaskElasticNet',
'MultiTaskElasticNetCV',
'MultiTaskLasso',
'MultiTaskLassoCV',
'OrthogonalMatchingPursuit',
'OrthogonalMatchingPursuitCV',
'PassiveAggressiveClassifier',
'PassiveAggressiveRegressor',
'Perceptron',
'Ridge',
'RidgeCV',
'RidgeClassifier',
'RidgeClassifierCV',
'SGDClassifier',
'SGDRegressor',
'SquaredLoss',
'TheilSenRegressor',
'enet_path',
'lars_path',
'lars_path_gram',
'lasso_path',
'logistic_regression_path',
'orthogonal_mp',
'orthogonal_mp_gram',
'ridge_regression',
'RANSACRegressor']
可以看出来,与lr有关的部分一共有三个类:
LogisticRegression
LogisticRegressionCV
logistic_regression_path
2.logistic_regression_path
@deprecated('logistic_regression_path was deprecated in version 0.21 and '
'will be removed in version 0.23.0')
def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
max_iter=100, tol=1e-4, verbose=0,
solver='lbfgs', coef=None,
class_weight=None, dual=False, penalty='l2',
intercept_scaling=1., multi_class='auto',
random_state=None, check_input=True,
max_squared_sum=None, sample_weight=None,
l1_ratio=None):
"""Compute a Logistic Regression model for a list of regularization
parameters.
This is an implementation that uses the result of the previous model
to speed up computations along the set of solutions, making it faster
than sequentially calling LogisticRegression for the different parameters.
Note that there will be no speedup with liblinear solver, since it does
not handle warm-starting.
.. deprecated:: 0.21
``logistic_regression_path`` was deprecated in version 0.21 and will
be removed in 0.23.
Read more in the :ref:`User Guide <logistic_regression>`.
首先可以看到,logistic_regression_path将会在0.23.0版本移出。
由注释说明不难看出,logistic_regression_path主要是基于之前的训练结果用于训练加速。同时特别强调,如果用的是liblinear求解器,logistic_regression_path不能加快速度,因为liblinear不能处理warm-starting场景。
3.LogisticRegression
class LogisticRegression(BaseEstimator, LinearClassifierMixin,
SparseCoefMixin):
"""
Logistic Regression (aka logit, MaxEnt) classifier.
In the multiclass case, the training algorithm uses the one-vs-rest (OvR)
scheme if the 'multi_class' option is set to 'ovr', and uses the
cross-entropy loss if the 'multi_class' option is set to 'multinomial'.
(Currently the 'multinomial' option is supported only by the 'lbfgs',
'sag', 'saga' and 'newton-cg' solvers.)
This class implements regularized logistic regression using the
'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note
that regularization is applied by default**. It can handle both dense
and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit
floats for optimal performance; any other input format will be converted
(and copied).
The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization
with primal formulation, or no regularization. The 'liblinear' solver
supports both L1 and L2 regularization, with a dual formulation only for
the L2 penalty. The Elastic-Net regularization is only supported by the
'saga' solver.
Read more in the :ref:`User Guide <logistic_regression>`.
...
def __init__(self, penalty='l2', dual=False, tol=1e-4, C=1.0,
fit_intercept=True, intercept_scaling=1, class_weight=None,
random_state=None, solver='lbfgs', max_iter=100,
multi_class='auto', verbose=0, warm_start=False, n_jobs=None,
l1_ratio=None):
self.penalty = penalty
self.dual = dual
self.tol = tol
self.C = C
self.fit_intercept = fit_intercept
self.intercept_scaling = intercept_scaling
self.class_weight = class_weight
self.random_state = random_state
self.solver = solver
self.max_iter = max_iter
self.multi_class = multi_class
self.verbose = verbose
self.warm_start = warm_start
self.n_jobs = n_jobs
self.l1_ratio = l1_ratio
看看注释里头提到的几个关键点:
1.首先对于多分类,如果multi_class设置的是ovr,那么采用的是one-vs-rest的方式。如果设置的是multinomial,会使用cross-entropy loss。
2.具体的优化求解方法包括’liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’。‘newton-cg’, ‘sag’, and ‘lbfgs’支持L2正则,liblinear支持L1与L2,Elastic-Net正则只能使用’saga’。
关心一下初始化方法中的相关参数:
1.dual:选择目标函数是原始形式还是对偶形式,默认false。
2.tol:停止迭代的标准,默认1e-4。
3.C:正则化系数,默认1.0
4.solver:最优化求解方法,默认lbfgs
5.max_iter: 最大迭代次数,默认100。
4.LogisticRegressionCV
class LogisticRegressionCV(LogisticRegression, BaseEstimator,
LinearClassifierMixin):
"""Logistic Regression CV (aka logit, MaxEnt) classifier.
See glossary entry for :term:`cross-validation estimator`.
This class implements logistic regression using liblinear, newton-cg, sag
of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2
regularization with primal formulation. The liblinear solver supports both
L1 and L2 regularization, with a dual formulation only for the L2 penalty.
Elastic-Net penalty is only supported by the saga solver.
For the grid of `Cs` values and `l1_ratios` values, the best hyperparameter
is selected by the cross-validator
:class:`~sklearn.model_selection.StratifiedKFold`, but it can be changed
using the :term:`cv` parameter. The 'newton-cg', 'sag', 'saga' and 'lbfgs'
solvers can warm-start the coefficients (see :term:`Glossary<warm_start>`).
LogisticRegressionCV的用法与LogisticRegression基本相当,不一样在于LogisticRegressionCV通过cross validation,来选择正则化参数C,同时通过l1_ratios来选择l1正则与l2正则的组合。
本文详细介绍了scikit-learn库中与逻辑回归相关的类,包括LogisticRegression、LogisticRegressionCV和logistic_regression_path。LogisticRegression是用于二分类的模型,支持多种优化算法和正则化选项。LogisticRegressionCV则通过交叉验证自动选择最佳的正则化参数C。logistic_regression_path是一个即将废弃的函数,用于计算一系列正则化参数下的逻辑回归路径。此外,文章还强调了不同求解器对正则化的支持情况以及多分类问题的处理方式。
3757

被折叠的 条评论
为什么被折叠?



