使用场景:
- 批量合并相同格式的Excel,给dataframe添加行或列
Concat语法:pandas.concat(objs,axis=0,join='outer',ignore_index=False)
- 使用某种合并方式(inner/outer)
- 沿着某个轴向(axis=0/1)
- 把多个Pandas对象(DataFrame/Series)合并成一个
| 参数 | 说明 |
| objs | 一个列表,内容可以是DataFrame或者Series,可以混合 |
| axis | 默认是0,代表按行合并;如果等于1代表按列合并 |
| join | 合并的时候索引对齐的方式,默认是outer join, 也可以是Inner join |
| ignore_index | 是否忽略原来的数据索引 |
Append语法:DataFrame.append(other,ignore_index=False)
append只有按行合并,没有按列合并,相当于concat按行的简写形式
| 参数 | 说明 |
| other | 单个DataFrame,Series,dict 或者List |
| ignore_index | 是否忽略掉原来的数据索引 |
一,使用Pandas.concat合并数据
# 1.默认的concat,参数为axis=0,join=outer,ingore_index=False;df1和df2按行合并
df1 = pd.DataFrame({'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3'],
'E':['E0','E1','E2','E3']
})
df2 = pd.DataFrame({'A':['A4','A5','A6','A7'],
'B':['B4','B5','B6','B7'],
'C':['C4','C5','C6','C7'],
'D':['D4','D5','D6','D7'],
'F':['F4','F5','F6','F7']
})
print(df1)
print(df2)
A B C D E
0 A0 B0 C0 D0 E0
1 A1 B1 C1 D1 E1
2 A2 B2 C2 D2 E2
3 A3 B3 C3 D3 E3
A B C D F
0 A4 B4 C4 D4 F4
1 A5 B5 C5 D5 F5
2 A6 B6 C6 D6 F6
3 A7 B7 C7 D7 F7
df3 = pd.concat([df1,df2])
print(df3)
A B C D E F
0 A0 B0 C0 D0 E0 NaN
1 A1 B1 C1 D1 E1 NaN
2 A2 B2 C2 D2 E2 NaN
3 K3 B3 C3 D3 E3 NaN
0 A4 B4 C4 D4 NaN F4
1 A5 B5 C5 D5 NaN F5
2 A6 B6 C6 D6 NaN F6
3 A7 B7 C7 D7 NaN F7
# 2. 使用ignore_index=True可以忽略原来的索引
df4 = pd.concat([df1,df2],ignore_index=True)
print(df4)
A B C D E F
0 A0 B0 C0 D0 E0 NaN
1 A1 B1 C1 D1 E1 NaN
2 A2 B2 C2 D2 E2 NaN
3 A3 B3 C3 D3 E3 NaN
4 A4 B4 C4 D4 NaN F4
5 A5 B5 C5 D5 NaN F5
6 A6 B6 C6 D6 NaN F6
7 A7 B7 C7 D7 NaN F7
3. 使用join=inner过滤掉不匹配的列
df1 = pd.DataFrame({'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3'],
'E':['E0','E1','E2','E3']
})
df2 = pd.DataFrame({'A':['A4','A5','A6','A7'],
'B':['B4','B5','B6','B7'],
'C':['C4','C5','C6','C7'],
'D':['D4','D5','D6','D7'],
'F':['F4','F5','F6','F7']
})
print(df1)
print(df2)
A B C D E
0 A0 B0 C0 D0 E0
1 A1 B1 C1 D1 E1
2 A2 B2 C2 D2 E2
3 A3 B3 C3 D3 E3
A B C D F
0 A4 B4 C4 D4 F4
1 A5 B5 C5 D5 F5
2 A6 B6 C6 D6 F6
3 A7 B7 C7 D7 F7
df5 = pd.concat([df1,df2],ignore_index=True, join='inner')
print(df5)
# E列和F列在两个df中没有同时出现,所以结果只有A,B,C,D列
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
4. 使用axis=1相当于添加新列
df1 = pd.DataFrame({'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3'],
'E':['E0','E1','E2','E3']
})
print(df1)
A B C D E
0 A0 B0 C0 D0 E0
1 A1 B1 C1 D1 E1
2 A2 B2 C2 D2 E2
3 A3 B3 C3 D3 E3
##创建一个新的Series, 值为0,1,2,3; 列名为new_one
s1 = pd.Series(list(range(4)),name='new_one')
df6 = pd.concat([df1,s1],axis=1) ## df1上添加新列new_one
print(df6)
A B C D E new_one
0 A0 B0 C0 D0 E0 0
1 A1 B1 C1 D1 E1 1
2 A2 B2 C2 D2 E2 2
3 A3 B3 C3 D3 E3 3
## df1上添加多列series
s2 = df1.apply(lambda x:x['A']+'GG', axis=1) ## df1中A列中的值+后缀GG
print(s2)
0 A0GG
1 A1GG
2 A2GG
3 A3GG
dtype: object
s2.name='G'
df7 = pd.concat([df1,s1,s2],axis=1)
print(df7)
A B C D E new_one G
0 A0 B0 C0 D0 E0 0 A0GG
1 A1 B1 C1 D1 E1 1 A1GG
2 A2 B2 C2 D2 E2 2 A2GG
3 A3 B3 C3 D3 E3 3 A3GG
## 列表可以只有Series
df1 = pd.DataFrame({'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3'],
'E':['E0','E1','E2','E3']
})
print(df1)
A B C D E
0 A0 B0 C0 D0 E0
1 A1 B1 C1 D1 E1
2 A2 B2 C2 D2 E2
3 A3 B3 C3 D3 E3
s1 = pd.Series(list(range(4)),name='new_one')
print(s1)
0 0
1 1
2 2
3 3
Name: new_one, dtype: int64
s2 = df1.apply(lambda x:x['A']+'GG', axis=1)
print(s2)
0 A0GG
1 A1GG
2 A2GG
3 A3GG
dtype: object
df8 = pd.concat([s1,s2], axis=1)
print(df8)
new_one G
0 0 A0GG
1 1 A1GG
2 2 A2GG
3 3 A3GG
## 列表是可以混合顺序的
s1 = pd.Series(list(range(4)),name='new_one')
print(s1)
0 0
1 1
2 2
3 3
Name: new_one, dtype: int64
df1 = pd.DataFrame({'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3'],
'E':['E0','E1','E2','E3']
})
print(df1)
A B C D E
0 A0 B0 C0 D0 E0
1 A1 B1 C1 D1 E1
2 A2 B2 C2 D2 E2
3 A3 B3 C3 D3 E3
s2 = df1.apply(lambda x:x['A']+'GG', axis=1)
print(s2)
0 A0GG
1 A1GG
2 A2GG
3 A3GG
dtype: object
df9 = pd.concat([s1,df1,s2],axis=1)
new_one A B C D E G
0 0 A0 B0 C0 D0 E0 A0GG
1 1 A1 B1 C1 D1 E1 A1GG
2 2 A2 B2 C2 D2 E2 A2GG
3 3 A3 B3 C3 D3 E3 A3GG

2万+

被折叠的 条评论
为什么被折叠?



