網(wǎng)站首頁編程語言正文

詳解利用Pandas求解兩個(gè)DataFrame的差集,交集,并集_python

作者：尤而小屋 ? 更新時(shí)間： 2022-09-07 編程語言

模擬數(shù)據(jù)

模擬一份簡單的數(shù)據(jù)：

In [1]:

import?pandas?as?pd

In [2]:

df1?=?pd.DataFrame({"col1":[1,2,3,4,5],
????????????????????"col2":[6,7,8,9,10]
???????????????????})

df2?=?pd.DataFrame({"col1":[1,3,7],
????????????????????"col2":[6,8,10]
???????????????????})

In [3]:

df1

Out[3]:

?	col1	col2
0	1	6
1	2	7
2	3	8
3	4	9
4	5	10

In [4]:

df2

Out[4]:

?	col1	col2
0	1	6
1	3	8
2	7	10

兩個(gè)DataFrame的相同部分：

差集

方法1：concat + drop_duplicates

In [5]:

df3?=?pd.concat([df1,df2])
df3

Out[5]:

?	col1	col2
0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
0	1	6
1	3	8
2	7	10

In [6]:

#?結(jié)果1

df3.drop_duplicates(["col1","col2"],keep=False)

Out[6]:

?	col1	col2
1	2	7
3	4	9
4	5	10
2	7	10

方法2：append + drop_duplicates

In [7]:

df4?=?df1.append(df2)
df4

Out[7]:

?	col1	col2
0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
0	1	6
1	3	8
2	7	10

In [8]:

#?結(jié)果2

df4.drop_duplicates(["col1","col2"],keep=False)

Out[8]:

?	col1	col2
1	2	7
3	4	9
4	5	10
2	7	10

交集

方法1：merge

In [9]:

#?結(jié)果

#?等效：df5 = pd.merge(df1, df2, how="inner")
df5?=?pd.merge(df1,df2)

df5

Out[9]:

?	col1	col2
0	1	6
1	3	8

方法2：concat + duplicated + loc

In [10]:

df6?=?pd.concat([df1,df2])
df6

Out[10]:

?	col1	col2
0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
0	1	6
1	3	8
2	7	10

In [11]:

s?=?df6.duplicated(subset=['col1','col2'],?keep='first')
s

Out[11]:

0????False
1????False
2????False
3????False
4????False
0?????True
1?????True
2????False
dtype:?bool

In [12]:

#?結(jié)果
df8?=?df6.loc[s?==?True]
df8

Out[12]:

?	col1	col2
0	1	6
1	3	8

方法3：concat + groupby + query

In [13]:

#?df6?=?pd.concat([df1,df2])

df6

Out[13]:

?	col1	col2
0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
0	1	6
1	3	8
2	7	10

In [14]:

df9?=?df6.groupby(["col1",?"col2"]).size().reset_index()
df9.columns?=?["col1",?"col2",?"count"]

df9

Out[14]:

?	col1	col2	count
0	1	6	2
1	2	7	1
2	3	8	2
3	4	9	1
4	5	10	1
5	7	10	1

In [15]:

df10?=?df9.query("count?>?1")[["col1",?"col2"]]
df10

Out[15]:

?	col1	col2
0	1	6
2	3	8

并集

方法1：concat + drop_duplicates

In [16]:

df11?=?pd.concat([df1,df2])
df11

Out[16]:

?	col1	col2
0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
0	1	6
1	3	8
2	7	10

In [17]:

#?結(jié)果

#?df12?=?df11.drop_duplicates(subset=["col1","col2"],keep="last")
df12?=?df11.drop_duplicates(subset=["col1","col2"],keep="first")
df12

Out[17]:

?	col1	col2
0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
2	7	10

方法2：append + drop_duplicates

In [18]:

df13?=?df1.append(df2)

#?df13.drop_duplicates(subset=["col1","col2"],keep="last")
df13.drop_duplicates(subset=["col1","col2"],keep="first")

Out[18]:

?	col1	col2
0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
2	7	10

方法3：merge

In [19]:

pd.merge(df1,df2,how="outer")

Out[19]:

?	col1	col2
0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
5	7	10

原文鏈接：https://mp.weixin.qq.com/s/kmuVEdt13c8qRFA6w5lYFw

上一篇：Go泛型應(yīng)用工廠方法及泛型使用_Golang
下一篇：Python查看Tensor尺寸及查看數(shù)據(jù)類型的實(shí)現(xiàn)_pyt

日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

網(wǎng)站首頁編程語言正文

詳解利用Pandas求解兩個(gè)DataFrame的差集,交集,并集_python

目錄

模擬數(shù)據(jù)

差集

方法1：concat + drop_duplicates

方法2：append + drop_duplicates

交集

方法1：merge

方法2：concat + duplicated + loc

方法3：concat + groupby + query

并集

方法1：concat + drop_duplicates

方法2：append + drop_duplicates

方法3：merge

相關(guān)推薦

日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

網(wǎng)站首頁 編程語言 正文

詳解利用Pandas求解兩個(gè)DataFrame的差集,交集,并集_python

目錄

模擬數(shù)據(jù)

差集

方法1：concat + drop_duplicates

方法2：append + drop_duplicates

交集

方法1：merge

方法2：concat + duplicated + loc

方法3：concat + groupby + query

并集

方法1：concat + drop_duplicates

方法2：append + drop_duplicates

方法3：merge

相關(guān)推薦

網(wǎng)站首頁編程語言正文