網站首頁編程語言正文

python使用xlsx和pandas處理Excel表格的操作步驟_python

作者：水w ? 更新時間： 2023-02-10 編程語言

2.1 讀取數據
2.2 使用pandas查找兩個列表中相同的元素
? 解決ValueError: Excel file format cannot be determined, you must specify an engine manually.
? 解決but no encoding declared; see https://python.org/dev/peps/pep-0263/ for details
? 解決MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later.

總結

一、使用xls和xlsx處理Excel表格

xls是excel2003及以前版本所生成的文件格式；
xlsx是excel2007及以后版本所生成的文件格式；
（excel 2007之后版本可以打開上述兩種格式，但是excel2013只能打開xls格式）；

1.1 用openpyxl模塊打開Excel文檔，查看所有sheet表

openpyxl.load_workbook()函數接受文件名，返回一個workbook數據類型的值。這個workbook對象代表這個Excel文件，這個有點類似File對象代表一個打開的文本文件。??

workbook = xlrd2.open_workbook("1.xlsx")  # 返回一個workbook數據類型的值
sheets = workbook.sheet_names()
print(sheets)
# 結果：
# ['Sheet1', 'Sheet2']

或者

workbook = openpyxl.load_workbook("1.xlsx")	    # 返回一個workbook數據類型的值
print(workbook.sheetnames)	    # 打印Excel表中的所有表
 
# 結果：
# ['Sheet1', 'Sheet2']

1.2 通過sheet名稱獲取表格

workbook = openpyxl.load_workbook("數據源總表(1).xlsx")	    # 返回一個workbook數據類型的值
print(workbook.sheetnames)	    # 打印Excel表中的所有表
sheet = workbook['Sheet1']  # 獲取指定sheet表
print(sheet)
 
# 結果：
# ['Sheet1', 'Sheet2']
# <Worksheet "Sheet1">

1.3 獲取活動表的獲取行數和列數

方法1：自己寫一個for循環

方法2：使用

sheet.max_row 獲取行數
sheet.max_column 獲取列數

workbook = openpyxl.load_workbook("數據源總表(1).xlsx")	    # 返回一個workbook數據類型的值
print(workbook.sheetnames)	    # 打印Excel表中的所有表
sheet = workbook['1、基本情況']  # 獲取指定sheet表
print(sheet)
print('rows', sheet.max_row, 'column', sheet.max_column)    # 獲取行數和列數

? 讀取xlsx文件錯誤：xlrd.biffh.XLRDError: Excel xlsx file； not supported

運行代碼時，會出現以下報錯。

xlrd.biffh.XLRDError: Excel xlsx file； not supported

（1）檢查第三方庫xlrd的版本：

我這里的版本為xlrd2.0.1最新版本，問題就出在這里，我們需要卸載最新版本，安裝舊版本，卸載安裝過程如下。

（2）在File-Settings下的Project-Python Interpreter中重新按照舊版本xlrd2，

按照上述步驟卸載xlrd后再安裝xlrd2后，

可以看到錯誤解決了。

二、使用pandas讀取xlsx

pyCharm pip安裝pandas庫，請移步到python之 pyCharm pip安裝pandas庫失敗_水w的博客-CSDN博客_pandas安裝失敗

2.1 讀取數據

import pandas as pd
#1.讀取前n行所有數據
df1=pd.read_excel('d1.xlsx')#讀取xlsx中的第一個sheet
 
data1=df1.head(10)    #讀取前10行所有數據
data2=df1.values    #list【】  相當于一個矩陣，以行為單位
#data2=df.values()   報錯：TypeError: 'numpy.ndarray' object is not callable
print("獲取到所有的值：\n{0}".format(data1))#格式化輸出
print("獲取到所有的值：\n{0}".format(data2))
 
#2.讀取特定行特定列
data3=df1.iloc[0].values    #讀取第一行所有數據
data4=df1.iloc[1,1]    #讀取指定行列位置數據：讀取（1，1）位置的數據
data5=df1.iloc[[1,2]].values    #讀取指定多行：讀取第一行和第二行所有數據
data6=df1.iloc[:,[0]].values    #讀取指定列的所有行數據：讀取第一列所有數據
 
print("數據：\n{0}".format(data3))
print("數據：\n{0}".format(data4))
print("數據：\n{0}".format(data5))
print("數據：\n{0}".format(data6))
 
#3.獲取xlsx文件行號、列號
print("輸出行號列表{}".format(df1.index.values))    #獲取所有行的編號：0、1、2、3、4
print("輸出列標題{}".format(df1.columns.values))    #也就是每列的第一個元素
 
#4.將xlsx數據轉換為字典
data=[]
for i in df1.index.values:    #獲取行號的索引，并對其遍歷
    #根據i來獲取每一行指定的數據，并用to_dict轉成字典
    row_data=df1.loc[i,['id','name','class','data','score',]].to_dict()
    data.append(row_data)
print("最終獲取到的數據是：{0}".format(data))
 
#iloc和loc的區別：iloc根據行號來索引，loc根據index來索引。
#所以1，2，3應該用iloc，4應該有loc

讀取特定的某幾列的數據：

import pandas as pd
 
file_path = r'int.xlsx'   # r對路徑進行轉義，windows需要
df = pd.read_excel(file_path, header=0, usecols=[3, 4])  # header=0表示第一行是表頭，就自動去除了, 指定讀取第3和4列

2.2 使用pandas查找兩個列表中相同的元素

解決：查找兩個列表中相同的元素，可以把列表轉為元祖/集合，進行交運算。

import pandas as pd
 
file_path = r'int.xlsx'   # r對路徑進行轉義，windows需要
df = pd.read_excel(file_path, header=0, usecols=[3, 4])  # header=0表示第一行是表頭，就自動去除了, 指定讀取第3和4列
i, o = list(df['i']), list(df['o'])
in_links, out_links = [], []
 
a = set(in_links)   # 轉成元祖
b = set(out_links)
c = (a & b)  # 集合c和b中都包含了的元素
print(a, '\n', b)
print('兩個列表中相同的元素是：', list(c))

? 解決ValueError: Excel file format cannot be determined, you must specify an engine manually.

報錯：我在使用python的pandas讀取表格的數據，但是報錯了，

import pandas as pd
 
file_path = 'intersection.xlsx'   # r對路徑進行轉義，windows需要
df = pd.read_excel(file_path, header=0, usecols=[0])  # header=0表示第一行是表頭，就自動去除了, 指定讀取第1列
print(df)

問題：問題在于原表格格式可能有些問題。

解決：最直接的辦法就是把表格的內容復制到一個自己新建的表格中，然后改成之前表格的路徑，

然后再安裝這個openpyxl第三方庫。

pip install openpyxl

重新運行代碼，

ok，問題解決。

? 解決but no encoding declared; see https://python.org/dev/peps/pep-0263/ for details

報錯：

but no encoding declared; see https://python.org/dev/peps/pep-0263/ for details

問題：xxx文件里有中文字符。

解決：在py文件的代碼第一行加上，

# -*-coding:utf8 -*-

? 解決MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later.

報錯：在使用pandas讀取文件時，顯示錯誤。

MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later.

問題：matplotlib3.2以后就把mpl-data分離出去了。

解決：卸載原來的版本，安裝3.1版本。

pip uninstall matplotlib  # 卸載原來的版本
pip install matplotlib==3.1.1 -i http://pypi.douban.com/simple --trusted-host pypi.douban.com  # 安裝3.1版本

總結

原文鏈接：https://blog.csdn.net/qq_45956730/article/details/126976482

上一篇：rust引用和借用的使用小結_Rust語言
下一篇：Python常見錯誤:IndexError:?list?in

日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

網站首頁編程語言正文

python使用xlsx和pandas處理Excel表格的操作步驟_python

目錄

一、使用xls和xlsx處理Excel表格

1.1 用openpyxl模塊打開Excel文檔，查看所有sheet表

1.2 通過sheet名稱獲取表格

1.3 獲取活動表的獲取行數和列數

? 讀取xlsx文件錯誤：xlrd.biffh.XLRDError: Excel xlsx file； not supported

二、使用pandas讀取xlsx

2.1 讀取數據

2.2 使用pandas查找兩個列表中相同的元素

? 解決ValueError: Excel file format cannot be determined, you must specify an engine manually.

? 解決but no encoding declared; see https://python.org/dev/peps/pep-0263/ for details

? 解決MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later.

總結

相關推薦

日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

網站首頁 編程語言 正文

python使用xlsx和pandas處理Excel表格的操作步驟_python

目錄

一、使用xls和xlsx處理Excel表格

1.1 用openpyxl模塊打開Excel文檔，查看所有sheet表

1.2 通過sheet名稱獲取表格

1.3 獲取活動表的獲取行數和列數

? 讀取xlsx文件錯誤：xlrd.biffh.XLRDError: Excel xlsx file； not supported

二、使用pandas讀取xlsx

2.1 讀取數據

2.2 使用pandas查找兩個列表中相同的元素

? 解決ValueError: Excel file format cannot be determined, you must specify an engine manually.

? 解決but no encoding declared; see https://python.org/dev/peps/pep-0263/ for details

? 解決MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later.

總結

相關推薦

網站首頁編程語言正文