1.函数
df.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)#在索引上的合并连接
# 在索引或键上使用其他df连接列;将不同索引列组合
df.join(right, on=key_or_keys)#完全等效于下面
pd.merge(left, right, left_on=key_or_keys, right_index=True,how='left', sort=False)
2.参数
other:DataFrame,Series或s,df列表; 系列必须设置其name属性
on:name,tuple / names of list或array-like列或索引名称
how ='left': {'left', 'right', 'outer', 'inner'}如何处理这两个对象的操作
* left:用调用帧df的索引
*right:用other索引
* outer:df,other索引并集
* inner:df,other索引交集
lsuffix/rsuffix:str,左右数据重叠列使用的后缀
sort=False:True连接键按字典顺序排序
注意:
传递列表时不支持on,lsuffix和rsuffix选项
3.实例
# 实例1:重叠值列
# 必须在df1,df2有相同的列名如key,必须指定后缀lsuffix,rsuffix之一;必须df1的列数量>=df2
df1= pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'A': ['A0', 'A1', 'A2']})
df2 = pd.DataFrame({'key': ['K0', 'K1'],'B': ['B0', 'B1']})
df1.join(df2, lsuffix='_df1', rsuffix='_df2')#使用索引连接
# df1 df2 result
key A key B key_df1 A key_df2 B
0 K0 A0 0 K0 B0 0 K0 A0 K0 B0
1 K1 A1 1 K1 B1 1 K1 A1 K1 B1
2 K2 A2 2 K2 A2 NaN NaN
============================================================
# 实例2.1:df1=索引,df2=索引
df1= pd.DataFrame({'key1': ['K0', 'K1', 'K2'], 'A': ['A0', 'A1', 'A2']})
df2 = pd.DataFrame({'key2': ['K0', 'K1'],'B': ['B0', 'B1']})
df1.set_index('key1').join(df2.set_index('key2'))#用列名连接,将列名:转换为索引
A B
key1
K0 A0 B0
K1 A1 B1
K2 A2 NaN
df2.set_index('key2').join(df1.set_index('key1'))
B A
key2
K0 B0 A0
K1 B1 A1
# 实例2.2:df1=列名; df2=索引名(总是,不能为列)
# join总是使用df2索引,调用者df1可用中任何列。在结果中保留原始调用者的索引
df1= pd.DataFrame({'key1': ['K0', 'K1', 'K2'], 'A': ['A0', 'A1', 'A2']})
df2 = pd.DataFrame({'key2': ['K0', 'K1'],'B': ['B0', 'B1']})
df1.join(df2.set_index('key2'), on='key1')
key1 A B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 NaN
df2.join(df1.set_index('key1'), on='key2')
key2 B A
0 K0 B0 A0
1 K1 B1 A1
============================================================
# 实例2.3:left=单个索引,right=多索引; 单个索引连接到多索引
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2']},
index=pd.Index(['K0', 'K1', 'K2'], name='key'))
index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),('K2', 'Y2'), ('K2', 'Y3')],names=['key', 'Y'])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']},index=index)
result = left.join(right, how='inner')
# left right result
A B C D A B C D
key key Y key Y
K0 A0 B0 K0 Y0 C0 D0 K0 Y0 A0 B0 C0 D0
K1 A1 B1 K1 Y1 C1 D1 K1 Y1 A1 B1 C1 D1
K2 A2 B2 K2 Y2 C2 D2 K2 Y2 A2 B2 C2 D2
Y3 C3 D3 Y3 A2 B2 C3 D3
============================================================
# 实例2.4:left=多列,right=多索引; 要连接多个键,传递的DataFrame必须具有MultiIndex:
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'], 'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],'B': ['B0', 'B1', 'B2', 'B3']})
index = pd.MultiIndex.from_tuples([('K0', 'K0'), ('K1', 'K0'), ('K2', 'K0'), ('K2', 'K1')])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}, index=index)
result = left.join(right, on=['key1', 'key2'])#等价下面
# left right result
key1 key2 A B C D A B key1 key2 C D
0 K0 K0 A0 B0 K0 K0 C0 D0 0 A0 B0 K0 K0 C0 D0
1 K0 K1 A1 B1 K1 K0 C1 D1 1 A1 B1 K0 K1 NaN NaN
2 K1 K0 A2 B2 K2 K0 C2 D2 2 A2 B2 K1 K0 C1 D1
3 K2 K1 A3 B3 K1 C3 D3 3 A3 B3 K2 K1 C3 D3
result = left.join(right, on=['key1', 'key2'], how='inner')#等价下面
# left right result
key1 key2 A B C D A B key1 key2 C D
0 K0 K0 A0 B0 K0 K0 C0 D0 0 A0 B0 K0 K0 C0 D0
1 K0 K1 A1 B1 K1 K0 C1 D1 2 A2 B2 K1 K0 C1 D1
2 K1 K0 A2 B2 K2 K0 C2 D2 3 A3 B3 K2 K1 C3 D3
3 K2 K1 A3 B3 K1 C3 D3
============================================================
# 实例3:how连接方式
left = pd.DataFrame({'A': [ 'A1', 'A2'], 'B': [ 'B1', 'B2']}, index=['K1', 'K2'])
right = pd.DataFrame({'C': [ 'C2', 'C3'], 'D': [ 'D2', 'D3']}, index=['K2', 'K3'])
result1 = left.join(right)
result2=left.join(right, how='outer')
result3=left.join(right, how='inner')
# left #result1:how=left
A B C D
A B K1 A1 B1 NaN NaN
K1 A1 B1 K2 A2 B2 C2 D2
K2 A2 B2 #result2:how='outer'
A B C D
right K1 A1 B1 NaN NaN
K2 A2 B2 C2 D2
C D K3 NaN NaN C3 D3
K2 C2 D2 #result3: how='inner'
K3 C3 D3 A B D
K2 A2 B2 C2 D2
============================================================
# 实例4:连接多个DataFrame
# DataFrames传递列表或元组,join() 在它们的索引上将它们连接在一起
left = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'v': [1, 2, 3]})
right = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'v': [4, 5, 6]})
left = left.set_index('k')
right = right.set_index('k')
right2 = pd.DataFrame({'v': [7, 8, 9]}, index=['K1', 'K1', 'K2'])
result = left.join([right, right2])
# left right right2 v_x v_y v
v v v K0 1 4.0 NaN
k k K1 7 K0 1 5.0 NaN
K0 1 K0 4 K1 8 K1 2 NaN 7.0
K1 2 K0 5 K2 9 K1 2 NaN 8.0
K2 3 K3 6 K2 3 NaN 9.0
============================================================
---------------------
转载,仅作分享,侵删
作者:tcy23456
原文:https://blog.csdn.net/tcy23456/article/details/85538951
|
|