網站首頁編程語言正文

python?Dataframe?合并與去重詳情_python

作者：Coderusher??????? ? 更新時間： 2022-10-04 編程語言

1.合并

1.1 結構合并

將兩個結構相同的數據合并

1.1.1 concat函數

函數配置：

concat([dataFrame1, dataFrame2,…], index_ingore=False)

參數說明：index_ingore=False（表示合并的索引不延續），index_ingore=True（表示合并的索引可延續）

實例：

import pandas as pd
import numpy as np

# 創建一個十行兩列的二維數據
df = pd.DataFrame(np.random.randint(0, 10, (3, 2)), columns=['A', 'B'])

# 將數據拆分成兩份，并保存在列表中
data_list = [df[0:2], df[3:]]

# 索引值不延續 
df1 = pd.concat(data_list, ignore_index=False)

# 索引值延續
df2 = pd.concat(data_list, ignore_index=True)

返回結果：

----------------df--------------------------
? ?A ?B
0 ?7 ?8
1 ?7 ?3
2 ?5 ?9
3 ?4 ?0
4 ?1 ?8
----------------df1--------------------------
? ?A ?B
0 ?7 ?8
1 ?7 ?3
3 ?4 ?0# -------------->這里并沒有2出現，索引不連續
4 ?1 ?8
----------------df2--------------------------
? ?A ?B
0 ?7 ?8
1 ?7 ?3
2 ?4 ?0
3 ?1 ?8

1.1.2 append函數

函數配置：

df.append(df1, index_ignore=True)

參數說明：index_ingore=False（表示索引不延續），index_ingore=True（表示索引延續）

實例：

import pandas as pd
import numpy as np

# 創建一個五行兩列的二維數組
df = pd.DataFrame(np.random.randint(0, 10, (5, 2)), columns=['A', 'B'])

# 創建要追加的數據
narry = np.random.randint(0, 10, (3, 2))
data_list = pd.DataFrame(narry, columns=['A', 'B'])

# 合并數據
df1 = df.append(data_list, ignore_index=True)

返回結果：

----------------df--------------------------
? ?A ?B
0 ?5 ?6
1 ?1 ?2
2 ?5 ?3
3 ?1 ?8
4 ?1 ?2
----------------df1--------------------------
? ?A ?B
0 ?5 ?6
1 ?1 ?2
2 ?5 ?3
3 ?1 ?8
4 ?1 ?2
5 ?8 ?1
6 ?3 ?5
7 ?1 ?1

1.2 字段合并

將同一個數據不同列合并

參數配置：

pd.merge( left, right, how="inner", on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=("_x", "_y"), copy=True, indicator=False, validate=None, )

參數說明：

參數	說明
how	連接方式：inner、left、right、outer，默認為 inner
on	用于連接的列名
left_on	左表用于連接的列名
right_on	右表用于連接的列名
Left_index	是否使用左表的行索引作為連接鍵，默認為False
Right_index	是否使用右表的行索引作為連接鍵，默認為False
sort	默認為False，將合并的數據進行排序
copy	默認為True?？偸菍祿椭频綌祿Y構中，設置為False可以提高性能
suffixes	存在相同列名時在列名后面添加的后綴，默認為（’_x’, ‘_y’）
indicator	顯示合并數據中數據來自哪個表

實例1：

import pandas as pd
 
df1 = pd.DataFrame({'key':['a','b','c'], 'data1':range(3)})
df2 = pd.DataFrame({'key':['a','b','c'], 'data2':range(3)})
df = pd.merge(df1, df2) # 合并時默認以重復列并作為合并依據

結果展示：

----------------df1--------------------------
? key ?data1
0 ? a ? ? ?0
1 ? b ? ? ?1
2 ? c ? ? ?2
----------------df2--------------------------
? key ?data2
0 ? a ? ? ?0
1 ? b ? ? ?1
2 ? c ? ? ?2
----------------df---------------------------
? key ?data1 ?data2
0 ? a ? ? ?0 ? ? ?0
1 ? b ? ? ?1 ? ? ?1
2 ? c ? ? ?2 ? ? ?2

實例2：

# 多鍵連接時將連接鍵組成列表傳入
 
right=DataFrame({'key1':['foo','foo','bar','bar'],  
         'key2':['one','one','one','two'],  
         'lval':[4,5,6,7]})  
 
left=DataFrame({'key1':['foo','foo','bar'],  
         'key2':['one','two','one'],  
         'lval':[1,2,3]})  
  
pd.merge(left,right,on=['key1','key2'],how='outer')

結果展示：

----------------right-------------------------
? key1 key2 ?lval
0 ?foo ?one ? ? 4
1 ?foo ?one ? ? 5
2 ?bar ?one ? ? 6
3 ?bar ?two ? ? 7
----------------left--------------------------
? key1 key2 ?lval
0 ?foo ?one ? ? 1
1 ?foo ?two ? ? 2
2 ?bar ?one ? ? 3
----------------df---------------------------
? key1 key2 ?lval_x ?lval_y
0 ?foo ?one ? ? 1.0 ? ? 4.0
1 ?foo ?one ? ? 1.0 ? ? 5.0
2 ?foo ?two ? ? 2.0 ? ? NaN
3 ?bar ?one ? ? 3.0 ? ? 6.0
4 ?bar ?two ? ? NaN ? ? 7.0
?

2.去重

參數配置：

data.drop_duplicates(subset=['A','B'],keep='first',inplace=True)

參數說明：

參數	說明
subset	列名，可選，默認為None
keep	{‘first’, ‘last’, False}, 默認值 ‘first’
first	保留第一次出現的重復行，刪除后面的重復行
last	刪除重復項，除了最后一次出現
False	刪除所有重復項
inplace	布爾值，默認為False，是否直接在原數據上刪除重復項或刪除重復項后返回副本。（inplace=True表示直接在原來的DataFrame上刪除重復項，而默認值False表示生成一個副本。）

實例：

去除完全重復的行數據

data.drop_duplicates(inplace=True)

df = pd.DataFrame({
    'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'rating': [4, 4, 3.5, 15, 5]
})

df.drop_duplicates()

結果展示：

---------------去重前的df---------------------------
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
---------------去重后的df---------------------------
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0

使用subset 去除某幾列重復的行數據

data.drop_duplicates(subset=[‘A’,‘B’],keep=‘first’,inplace=True)

df.drop_duplicates(subset=['brand'])

結果展示：

brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5

使用 keep刪除重復項并保留最后一次出現

df.drop_duplicates(subset=['brand', 'style'], keep='last')

結果展示：

brand style rating
1 Yum Yum cup 4.0
2 Indomie cup 3.5
4 Indomie pack 5.0

原文鏈接：https://blog.51cto.com/coderusher/5554275

上一篇：正則表達式中關于對原生字符串的簡單理解_正則表達式
下一篇：混合棧跳轉導致Flutter頁面事件卡死問題解決_IOS

日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

網站首頁編程語言正文

python?Dataframe?合并與去重詳情_python

目錄

1.合并

1.1 結構合并

1.1.1 concat函數

1.1.2 append函數

1.2 字段合并

2.去重

相關推薦

日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

網站首頁 編程語言 正文

python?Dataframe?合并與去重詳情_python

目錄

1.合并

1.1 結構合并

1.1.1 concat函數

1.1.2 append函數

1.2 字段合并

2.去重

相關推薦

網站首頁編程語言正文