Featuretools是一个执行自动特征工程的框架。它擅长于为深度学习把互相关联的数据集转换为特征矩阵。我们可以将特征构造的操作分为两类:「转换」和「聚合」。我们通过下面的例子来了解FeatureTools使用方法。
代码示例地址:
https://github.com/scottlinlin/auto_feature_demo.git
安装
pip install featuretools
快速入门
1、导入feauretools
import featuretools as ft
2、加载数据
#加载数据
clients = pd.read_csv('data/clients.csv', parse_dates = ['joined'])
loans = pd.read_csv('data/loans.csv', parse_dates = ['loan_start', 'loan_end'])
payments = pd.read_csv('data/payments.csv', parse_dates = ['payment_date'])
3、创建实体和实体集
#创建实体
es = ft.EntitySet(id = 'clients')
#添加clients实体
es = es.entity_from_dataframe(entity_id = 'clients', dataframe = clients,
index = 'client_id', time_index = 'joined')
#添加loads实体
es = es.entity_from_dataframe(entity_id = 'loans', dataframe = loans,
variable_types = {'repaid': ft.variable_types.Categorical},
index = 'loan_id',
time_index = 'loan_start')
#添加pyments实体
es = es.entity_from_dataframe(entity_id = 'payments',
dataframe = payments,
variable_types = {'missed': ft.variable_types.Categorical},
make_index = True,
index = 'payment_id',
time_index = 'payment_date')
#打印实体集
es
4、添加实体关系
# 通过client_id 关联clients和loans实体
r_client_previous = ft.Relationship(es['clients']['client_id'],
es['loans']['client_id'])
es = es.add_relationship(r_client_previous)
# 通过loan_id 关联payments和loans实体
r_payments = ft.Relationship(es['loans']['loan_id'],
es['payments']['loan_id'])
es = es.add_relationship(r_payments)
#打印实体集
es
5、聚合特征,并生成新特征
#聚合特征,并生成新特征
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients')
features.head()
6、聚合特征,通过指定聚合和转换函数生成新特征
#聚合特征,通过指定聚合agg_primitives和转换trans_primitives生成新特征
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients',
agg_primitives = ['mean', 'max', 'percent_true', 'last'],
trans_primitives = ['years', 'month'])
features.head()
更多参数请参考
本文介绍Featuretools框架,一种自动特征工程工具,擅长处理深度学习中互相关联的数据集,将其转换为特征矩阵。文章详细讲解了如何使用Featuretools进行特征构造,包括实体和实体集的创建、实体关系的添加、聚合特征的生成等关键步骤。

1241

被折叠的 条评论
为什么被折叠?



