網(wǎng)站首頁(yè) 編程語(yǔ)言正文

Python中的pandas表格模塊、文件模塊和數(shù)據(jù)庫(kù)模塊_python

作者：springsnow ? 更新時(shí)間： 2022-07-27 編程語(yǔ)言

支持文件存取操作，支持?jǐn)?shù)據(jù)庫(kù)(sql)、html、json、pickle、csv(txt、excel)、sas、stata、hdf等。
支持增刪改查、切片、高階函數(shù)、分組聚合等單表操作，以及和dict、list的互相轉(zhuǎn)換。
支持多表拼接合并操作。
支持簡(jiǎn)單的繪圖操作。
支持簡(jiǎn)單的統(tǒng)計(jì)分析操作。

一、Series數(shù)據(jù)結(jié)構(gòu)

Series是一種類(lèi)似于一維數(shù)組的對(duì)象，由一組數(shù)據(jù)和一組與之相關(guān)的數(shù)據(jù)標(biāo)簽（索引）組成。

Series比較像列表（數(shù)組）和字典的結(jié)合體

import numpy as np
import pandas as pd

df = pd.Series(0, index=['a', 'b', 'c', 'd'])
print(df)
# a    0
# b    0
# c    0
# d    0
# dtype: int64

print(df.values) # 值
# [0 0 0 0]

print(df.index) # 索引
# Index(['a', 'b', 'c', 'd'], dtype='object')

1、Series的創(chuàng)建

import numpy as np
import pandas as pd

df = pd.Series(np.array([1, 2, 3, 4, np.nan]), index=['a', 'b', 'c', 'd', 'e']) # 1、從ndarray創(chuàng)建Series
print(df)
# a    1.0
# b    2.0
# c    3.0
# d    4.0
# e    NaN
# dtype: float64

df = pd.Series({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': np.nan}) # 2、也可以從字典創(chuàng)建Series

dates = pd.date_range('20190101', periods=6, freq='M')
print(type(dates)) # 
print(dates)
# DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
#                '2019-05-31', '2019-06-30'],
#               dtype='datetime64[ns]', freq='M')

df=pd.Series(0,index=dates) # 3、時(shí)間序列索引
print(df)
# 2019-01-31    0
# 2019-02-28    0
# 2019-03-31    0
# 2019-04-30    0
# 2019-05-31    0
# 2019-06-30    0
# Freq: M, dtype: int64

產(chǎn)生時(shí)間對(duì)象數(shù)組：date_range參數(shù)詳解：

start:開(kāi)始時(shí)間
end:結(jié)束時(shí)間
periods:時(shí)間長(zhǎng)度
freq:時(shí)間頻率，默認(rèn)為'D'，可選H(our),W(eek),B(usiness),S(emi-)M(onth),(min)T(es), S(econd), A(year),…

2、Series屬性

print(df ** 2) # 3、與標(biāo)量運(yùn)算
# a     1.0
# b     4.0
# c     9.0
# d    16.0
# e     NaN
# dtype: float64


print(df + df) # 4、兩個(gè)Series運(yùn)算
# a    2.0
# b    4.0
# c    6.0
# d    8.0
# e    NaN
# dtype: float64

print(df[0] )  # 5、數(shù)字索引； 1.0
print(df[[0, 1, 2]]) # 行索引
# a    1.0
# b    2.0
# c    3.0
# dtype: float64
print(df['a'] )  #  6、鍵索引（行標(biāo)簽） ；1.0
  
print(df[['b','c']])

print('a' in df)  # 7、in運(yùn)算；True

print(df[0:2] ) # 8、切片
# a    1.0
# b    2.0
# dtype: float64

print(np.sin(df)) # 9、通用函數(shù)
# a    0.841471
# b    0.909297
# c    0.141120
# d   -0.756802
# e         NaN
# dtype: float64

print(df[df > 1] ) # 10、布爾值過(guò)濾
# b    2.0
# c    3.0
# d    4.0
# dtype: float64

2、Series缺失數(shù)據(jù)處理

df = pd.Series([1, 2, 3, 4, np.nan], index=['a', 'b', 'c', 'd', 'e'])
print(df)
# a    1.0
# b    2.0
# c    3.0
# d    4.0
# e    NaN
# dtype: float64

print(df.dropna() ) # 1、過(guò)濾掉值為NaN的行
# a    1.0
# b    2.0
# c    3.0
# d    4.0
# dtype: float64

print(df.fillna(5) ) # 2、用指定值填充缺失數(shù)據(jù)
# a    1.0
# b    2.0
# c    3.0
# d    4.0
# e    5.0
# dtype: float64

print(df.isnull() ) #  3、返回布爾數(shù)組，缺失值對(duì)應(yīng)為T(mén)rue
# a    False
# b    False
# c    False
# d    False
# e     True
# dtype: bool

print(df.notnull() ) # 4、返回布爾數(shù)組，缺失值對(duì)應(yīng)為False
# a     True
# b     True
# c     True
# d     True
# e    False
# dtype: bool

二、DataFrame數(shù)據(jù)結(jié)構(gòu)

DataFrame是一個(gè)表格型的數(shù)據(jù)結(jié)構(gòu)，含有一組有序的列。

DataFrame可以被看做是由Series組成的字典，并且共用一個(gè)索引。

1、DataFrame的創(chuàng)建

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.zeros((3, 4))) # 創(chuàng)建一個(gè)三行四列的DataFrame
print(df1)
#      0    1    2    3
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0

dates = pd.date_range('20190101', periods=6, freq='M')

np.random.seed(1)
arr = 10 * np.random.randn(6, 4)
print(arr)
# [[ 16.24345364  -6.11756414  -5.28171752 -10.72968622]
#  [  8.65407629 -23.01538697  17.44811764  -7.61206901]
#  [  3.19039096  -2.49370375  14.62107937 -20.60140709]
#  [ -3.22417204  -3.84054355  11.33769442 -10.99891267]
#  [ -1.72428208  -8.77858418   0.42213747   5.82815214]
#  [-11.00619177  11.4472371    9.01590721   5.02494339]]

df = pd.DataFrame(arr, index=dates, columns=['c1', 'c2', 'c3', 'c4']) # 自定義index和column
print(df)
#                    c1         c2         c3         c4
# 2019-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2019-02-28   8.654076 -23.015387  17.448118  -7.612069
# 2019-03-31   3.190391  -2.493704  14.621079 -20.601407
# 2019-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2019-05-31  -1.724282  -8.778584   0.422137   5.828152
# 2019-06-30 -11.006192  11.447237   9.015907   5.024943

2、DataFrame屬性

print(df.dtypes) # 1、查看數(shù)據(jù)類(lèi)型
# 0    float64
# 1    float64
# 2    float64
# 3    float64
# dtype: object


print(df.index) # 2、查看行索引
# DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
#                '2019-05-31', '2019-06-30'],
#               dtype='datetime64[ns]', freq='M')

print(df.columns) # 3、查看各列的標(biāo)簽
# Index(['c1', 'c2', 'c3', 'c4'], dtype='object')

print(df.values) # 4、查看數(shù)據(jù)框內(nèi)的數(shù)據(jù)，也即不含行標(biāo)簽和列頭的數(shù)據(jù) 
# [[ 16.24345364  -6.11756414  -5.28171752 -10.72968622]
#  [  8.65407629 -23.01538697  17.44811764  -7.61206901]
#  [  3.19039096  -2.49370375  14.62107937 -20.60140709]
#  [ -3.22417204  -3.84054355  11.33769442 -10.99891267]
#  [ -1.72428208  -8.77858418   0.42213747   5.82815214]
#  [-11.00619177  11.4472371    9.01590721   5.02494339]]

print(df.describe()) # 5、查看數(shù)據(jù)每一列的極值，均值，中位數(shù)，只可用于數(shù)值型數(shù)據(jù)
#               c1         c2         c3         c4
# count   6.000000   6.000000   6.000000   6.000000
# mean    2.022213  -5.466424   7.927203  -6.514830
# std     9.580084  11.107772   8.707171  10.227641
# min   -11.006192 -23.015387  -5.281718 -20.601407
# 25%    -2.849200  -8.113329   2.570580 -10.931606
# 50%     0.733054  -4.979054  10.176801  -9.170878
# 75%     7.288155  -2.830414  13.800233   1.865690
# max    16.243454  11.447237  17.448118   5.828152

print(df.T) # 6、transpose轉(zhuǎn)置，也可用Ｔ來(lái)操作
#     2019-01-31  2019-02-28  2019-03-31  2019-04-30  2019-05-31  2019-06-30
# c1   16.243454    8.654076    3.190391   -3.224172   -1.724282  -11.006192

# c2   -6.117564  -23.015387   -2.493704   -3.840544   -8.778584   11.447237
# c3   -5.281718   17.448118   14.621079   11.337694    0.422137    9.015907
# c4  -10.729686   -7.612069  -20.601407  -10.998913    5.828152    5.024943


print(df.sort_index(axis=0)) # 7、排序，axis=0 可按行標(biāo)簽排序輸出； 按行標(biāo)簽][2019-01-01, 2019-01-02...]從大到小排序
#                    c1         c2         c3         c4
# 2019-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2019-02-28   8.654076 -23.015387  17.448118  -7.612069
# 2019-03-31   3.190391  -2.493704  14.621079 -20.601407
# 2019-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2019-05-31  -1.724282  -8.778584   0.422137   5.828152
# 2019-06-30 -11.006192  11.447237   9.015907   5.024943

print(df.sort_index(axis=1)) # 7、排序，axis=1 可按列頭標(biāo)簽排序輸出；按列標(biāo)簽[c1, c2, c3, c4從大到小排序
#                    c1         c2         c3         c4
# 2019-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2019-02-28   8.654076 -23.015387  17.448118  -7.612069
# 2019-03-31   3.190391  -2.493704  14.621079 -20.601407
# 2019-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2019-05-31  -1.724282  -8.778584   0.422137   5.828152
# 2019-06-30 -11.006192  11.447237   9.015907   5.024943

print(df.sort_values(by='c2')) # 8、按數(shù)據(jù)值來(lái)排序 ；按c2列的值從大到小排序
#                    c1         c2         c3         c4
# 2019-02-28   8.654076 -23.015387  17.448118  -7.612069
# 2019-05-31  -1.724282  -8.778584   0.422137   5.828152
# 2019-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2019-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2019-03-31   3.190391  -2.493704  14.621079 -20.601407
# 2019-06-30 -11.006192  11.447237   9.015907   5.024943

3、DataFrame取值

print(df['c2']) # 1、 通過(guò)columns標(biāo)簽取值
# 2019-01-31    -6.117564
# 2019-02-28   -23.015387
# 2019-03-31    -2.493704
# 2019-04-30    -3.840544
# 2019-05-31    -8.778584
# 2019-06-30    11.447237
# Freq: M, Name: c2, dtype: float64

print(df[['c2', 'c3']])
#                    c2         c3
# 2019-01-31  -6.117564  -5.281718
# 2019-02-28 -23.015387  17.448118
# 2019-03-31  -2.493704  14.621079
# 2019-04-30  -3.840544  11.337694
# 2019-05-31  -8.778584   0.422137
# 2019-06-30  11.447237   9.015907

print(df[0:3])  # 2、 通過(guò)columns索引取值
#                    c1         c2         c3         c4
# 2019-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2019-02-28   8.654076 -23.015387  17.448118  -7.612069
# 2019-03-31   3.190391  -2.493704  14.621079 -20.601407

print(df.loc['20200228':'20200430']) # 3、loc 通過(guò)行標(biāo)簽取值：
#                  c1         c2         c3         c3
# 2020-02-29  8.654076 -23.015387  17.448118  -7.612069
# 2020-03-31  3.190391  -2.493704  14.621079 -20.601407
# 2020-04-30 -3.224172  -3.840544  11.337694 -10.998913

print(df.iloc[1:3]) # 4、iloc 通過(guò)行索引選擇數(shù)據(jù),取第二行到三行。
#                  c1         c2         c3         c3
# 2020-02-29  8.654076 -23.015387  17.448118  -7.612069
# 2020-03-31  3.190391  -2.493704  14.621079 -20.601407

print(df.iloc[2, 1])  # 第三行第二列值：-2.493703754774101

print(df.iloc[1:4, 1:4]) # 第 2-4行與第2-4列：
#                    c2         c3         c4
# 2019-02-28 -23.015387  17.448118  -7.612069
# 2019-03-31  -2.493704  14.621079 -20.601407
# 2019-04-30  -3.840544  11.337694 -10.998913

print(df['c3'] > 10) # 5、 使用邏輯判斷取值
# 2020-01-31       False
# 2020-02-29         True
# 2020-03-31         True
# 2020-04-30         True
# 2020-05-31       False
# 2020-06-30       False
# Freq: M, Name: c3, dtype: bool
print(df[df['c3'] > 10]) # 5、 使用邏輯判斷取值
# c1 c2 c3 c4 
# 2020-02-29 8.654076 -23.015387 17.448118 -7.612069 
# 2020-03-31 3.190391 -2.493704 14.621079 -20.601407 
# 2020-04-30 -3.224172 -3.840544 11.337694 -10.998913
print(df[(df['c1'] > 0) & (df['c2'] > -8)])
#                    c1        c2         c3         c4
# 2019-01-31  16.243454 -6.117564  -5.281718 -10.729686
# 2019-03-31   3.190391 -2.493704  14.621079 -20.601407

4、DataFrame值替換

df.iloc[1:3]=5 # 將2-3行的值設(shè)為5
print(df)

#                    c1         c2         c3         c4
# 2020-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2020-02-29   5.000000   5.000000   5.000000   5.000000
# 2020-03-31   5.000000   5.000000   5.000000   5.000000
# 2020-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2020-05-31  -1.724282  -8.778584   0.422137   5.828152


df.iloc[0:3, 0:2] = 0 # 將1-3行1-2列的值設(shè)為0
print(df)
#                    c1         c2         c3         c4
# 2019-01-31   0.000000   0.000000  -5.281718 -10.729686
# 2019-02-28   0.000000   0.000000  17.448118  -7.612069
# 2019-03-31   0.000000   0.000000  14.621079 -20.601407
# 2019-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2019-05-31  -1.724282  -8.778584   0.422137   5.828152
# 2019-06-30 -11.006192  11.447237   9.015907   5.024943


# 針對(duì)行做處理
df[df['c3'] > 10] = 100 # 將C3列的大于10的行數(shù)值設(shè)為0
print(df)
#                     c1          c2          c3          c4
# 2019-01-31    0.000000    0.000000   -5.281718  -10.729686
# 2019-02-28  100.000000  100.000000  100.000000  100.000000
# 2019-03-31  100.000000  100.000000  100.000000  100.000000
# 2019-04-30  100.000000  100.000000  100.000000  100.000000
# 2019-05-31   -1.724282   -8.778584    0.422137    5.828152
# 2019-06-30  -11.006192   11.447237    9.015907    5.024943


# 針對(duì)行做處理
df = df.astype(np.int32)
df[df['c3'].isin([100])] = 1000 # 將C3列的等于100的行數(shù)值設(shè)為1000
print(df)
#               c1    c2    c3    c4
# 2019-01-31     0     0    -5   -10
# 2019-02-28  1000  1000  1000  1000
# 2019-03-31  1000  1000  1000  1000
# 2019-04-30  1000  1000  1000  1000
# 2019-05-31    -1    -8     0     5
# 2019-06-30   -11    11     9     5

5、處理丟失數(shù)據(jù)

print(df.isnull())
#       c1     c2     c3     c4
# 0  False   True  False  False
# 1  False  False  False  False
# 2  False  False   True  False
# 3  False  False  False  False
# 4  False  False  False  False
# 5  False  False  False   True
# 6   True   True   True   True

print(df.isnull().sum()) # 1、通過(guò)在isnull()方法后使用sum()方法即可獲得該數(shù)據(jù)集某個(gè)特征含有多少個(gè)缺失值
# c1    1
# c2    2
# c3    2
# c4    2
# dtype: int64

print(df.dropna(axis=0)) # 2、axis=0刪除有NaN值的行
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5

print(df.dropna(axis=1)) # 3、axis=1刪除有NaN值的列
# Empty DataFrame
# Columns: []
# Index: [0, 1, 2, 3, 4, 5, 6]

print(df.dropna(how='all')) # 4、刪除全為NaN值得行或列
#     c1   c2   c3   c4
# 0  5.1  NaN  1.4  0.2
# 1  4.9  3.0  1.4  0.2
# 2  4.7  3.2  NaN  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5
# 5  6.9  3.1  4.9  NaN

print(df.dropna(thresh=4)) #5、 保留至少有4個(gè)非NaN數(shù)據(jù)的行，刪除行不為4個(gè)值的，
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5

print(df.dropna(subset=['c2'])) # 6、刪除c2中有NaN值的行
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 2  4.7  3.2  NaN  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5
# 5  6.9  3.1  4.9  NaN

print(df.fillna(value=10)) # 7、用指定值填充nan值
#      c1    c2    c3    c4
# 0   5.1  10.0   1.4   0.2
# 1   4.9   3.0   1.4   0.2
# 2   4.7   3.2  10.0   0.2
# 3   7.0   3.2   4.7   1.4
# 4   6.4   3.2   4.5   1.5
# 5   6.9   3.1   4.9  10.0
# 6  10.0  10.0  10.0  10.0

6、合并數(shù)據(jù)

print(df.isnull())
#       c1     c2     c3     c4
# 0  False   True  False  False
# 1  False  False  False  False
# 2  False  False   True  False
# 3  False  False  False  False
# 4  False  False  False  False
# 5  False  False  False   True
# 6   True   True   True   True

print(df.isnull().sum()) # 1、通過(guò)在isnull()方法后使用sum()方法即可獲得該數(shù)據(jù)集某個(gè)特征含有多少個(gè)缺失值
# c1    1
# c2    2
# c3    2
# c4    2
# dtype: int64

print(df.dropna(axis=0)) # 2、axis=0刪除有NaN值的行
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5

print(df.dropna(axis=1)) # 3、axis=1刪除有NaN值的列
# Empty DataFrame
# Columns: []
# Index: [0, 1, 2, 3, 4, 5, 6]

print(df.dropna(how='all')) # 4、刪除全為NaN值得行或列
#     c1   c2   c3   c4
# 0  5.1  NaN  1.4  0.2
# 1  4.9  3.0  1.4  0.2
# 2  4.7  3.2  NaN  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5
# 5  6.9  3.1  4.9  NaN

print(df.dropna(thresh=4)) #5、 保留至少有4個(gè)非NaN數(shù)據(jù)的行，刪除行不為4個(gè)值的，
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5

print(df.dropna(subset=['c2'])) # 6、刪除c2中有NaN值的行
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 2  4.7  3.2  NaN  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5
# 5  6.9  3.1  4.9  NaN

print(df.fillna(value=10)) # 7、用指定值填充nan值
#      c1    c2    c3    c4
# 0   5.1  10.0   1.4   0.2
# 1   4.9   3.0   1.4   0.2
# 2   4.7   3.2  10.0   0.2
# 3   7.0   3.2   4.7   1.4
# 4   6.4   3.2   4.5   1.5
# 5   6.9   3.1   4.9  10.0
# 6  10.0  10.0  10.0  10.0

二、讀取CSV文件

import pandas as pd
from io import StringIO

test_data = '''
5.1,,1.4,0.2
4.9,3.0,1.4,0.2
4.7,3.2,,0.2
7.0,3.2,4.7,1.4
6.4,3.2,4.5,1.5
6.9,3.1,4.9,
,,,
'''

test_data = StringIO(test_data)
df = pd.read_csv(test_data, header=None)
df.columns = ['c1', 'c2', 'c3', 'c4']
print(df)
#     c1   c2   c3   c4
# 0  5.1  NaN  1.4  0.2
# 1  4.9  3.0  1.4  0.2
# 2  4.7  3.2  NaN  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5
# 5  6.9  3.1  4.9  NaN
# 6  NaN  NaN  NaN  NaN

三、導(dǎo)入導(dǎo)出數(shù)據(jù)

pandas的讀寫(xiě)Excel需要依賴xlrd模塊，所以我們需要去安裝一下，命令：pip install xlrd

使用df = pd.read_excel(filename)讀取文件，使用df.to_excel(filename)保存文件。

1、讀取文件導(dǎo)入數(shù)據(jù)

df = pd.read_excel(filename)

讀取文件導(dǎo)入數(shù)據(jù)函數(shù)主要參數(shù)：

sep ：指定分隔符，可用正則表達(dá)式如'\s+'
header=None ：指定文件無(wú)行名
name ：指定列名
index_col ：指定某列作為索引
skip_row ：指定跳過(guò)某些行
na_values ：指定某些字符串表示缺失值
parse_dates ：指定某些列是否被解析為日期，布爾值或列表

2、寫(xiě)入文件導(dǎo)出數(shù)據(jù)

df.to_excel(filename)

寫(xiě)入文件函數(shù)的主要參數(shù)：

sep 分隔符
na_rep 指定缺失值轉(zhuǎn)換的字符串，默認(rèn)為空字符串
header=False 不保存列名
index=False 不保存行索引
cols 指定輸出的列，傳入列表

3、實(shí)例

import pandas as pd
import numpy as np
df = pd.read_excel("http://pbpython.com/extras/excel-comp-data.xlsx")
print(df.head())
print(len(df.index)) # 行數(shù) （不包含表頭，且一下均如此）
print(df.index.values) # 行索引 
print(len(df.columns)) # 列數(shù)
print(df.columns.values) # 列索引

data = df.loc[0].values #  表示第0行數(shù)據(jù)
data = df.loc[[1, 2]].values #  讀取多行數(shù)據(jù)（這里是第1行和第2行）

data = df.iloc[:, 1].values #  讀第1列數(shù)據(jù)
data = df.iloc[:, [1, 2]].values #  讀取多列數(shù)據(jù)（這里是第1列和第2列）
data = df.iloc[1, 2] #  讀取指定單元格數(shù)據(jù)（這里是第1行第一列數(shù)據(jù)）
data = df.iloc[[1, 2], [1, 2]].values #  讀取多行多列數(shù)據(jù)（第1,2行1,2列的數(shù)據(jù)）

#  任務(wù)：輸出滿足成績(jī)大于等于90的數(shù)據(jù)
temp = []
for i in range(len(df.index.values)):
    if df.iloc[i, 3] >= 90:
        temp.append(df.iloc[i].values)
df2 = pd.DataFrame(data=temp, columns=df.columns.values)
writer = pd.ExcelWriter('out_test.xlsx')#  不寫(xiě)index會(huì)輸出索引
df2.to_excel(writer, 'Sheet', index=False)
writer.save()

四、pandas讀取json文件

import pandas as pd

strtext = '[{"ttery":"min","issue":"20130801-3391","code":"8,4,5,2,9","code1":"297734529","code2":null,"time":1013395466000},\
{"ttery":"min","issue":"20130801-3390","code":"7,8,2,1,2","code1":"298058212","code2":null,"time":1013395406000},\
{"ttery":"min","issue":"20130801-3389","code":"5,9,1,2,9","code1":"298329129","code2":null,"time":1013395346000},\
{"ttery":"min","issue":"20130801-3388","code":"3,8,7,3,3","code1":"298588733","code2":null,"time":1013395286000},\
{"ttery":"min","issue":"20130801-3387","code":"0,8,5,2,7","code1":"298818527","code2":null,"time":1013395226000}]'

df = pd.read_json(strtext, orient='records')
print(df)
#   ttery          issue       code      code1  code2           time
# 0   min  20130801-3391  8,4,5,2,9  297734529    NaN  1013395466000
# 1   min  20130801-3390  7,8,2,1,2  298058212    NaN  1013395406000
# 2   min  20130801-3389  5,9,1,2,9  298329129    NaN  1013395346000
# 3   min  20130801-3388  3,8,7,3,3  298588733    NaN  1013395286000
# 4   min  20130801-3387  0,8,5,2,7  298818527    NaN  1013395226000

df = pd.read_json(strtext, orient='records')
df.to_excel('pandas處理json.xlsx', index=False, columns=["ttery", "issue", "code", "code1", "code2", "time"])

orient參數(shù)的五種形式

orient是表明預(yù)期的json字符串格式。orient的設(shè)置有以下五個(gè)值：

1.'split' : dict like {index -> [index], columns -> [columns], data -> [values]}

這種就是有索引，有列字段,和數(shù)據(jù)矩陣構(gòu)成的json格式。key名稱只能是index,columns和data。

s = '{"index":[1,2,3],"columns":["a","b"],"data":[[1,3],[2,8],[3,9]]}'
df = pd.read_json(s, orient='split')
print(df)
#    a  b
# 1  1  3
# 2  2  8
# 3  3  9

2.'records' : list like [{column -> value}, ... , {column -> value}]

這種就是成員為字典的列表。如我今天要處理的json數(shù)據(jù)示例所見(jiàn)。構(gòu)成是列字段為鍵,值為鍵值,每一個(gè)字典成員就構(gòu)成了dataframe的一行數(shù)據(jù)。

strtext = '[{"ttery":"min","issue":"20130801-3391","code":"8,4,5,2,9","code1":"297734529","code2":null,"time":1013395466000},\
{"ttery":"min","issue":"20130801-3390","code":"7,8,2,1,2","code1":"298058212","code2":null,"time":1013395406000}]'

df = pd.read_json(strtext, orient='records')
print(df)
#   ttery          issue       code      code1  code2           time
# # 0   min  20130801-3391  8,4,5,2,9  297734529    NaN  1013395466000
# # 1   min  20130801-3390  7,8,2,1,2  298058212    NaN  1013395406000

3.'index' : dict like {index -> {column -> value}}

以索引為key,以列字段構(gòu)成的字典為鍵值。如：

s = '{"0":{"a":1,"b":2},"1":{"a":9,"b":11}}'
df = pd.read_json(s, orient='index')
print(df)
#    a   b
# 0  1   2
# 1  9  11

4.'columns' : dict like {column -> {index -> value}}

這種處理的就是以列為鍵，對(duì)應(yīng)一個(gè)值字典的對(duì)象。這個(gè)字典對(duì)象以索引為鍵,以值為鍵值構(gòu)成的json字符串。如下圖所示:

s = '{"a":{"0":1,"1":9},"b":{"0":2,"1":11}}'
df = pd.read_json(s, orient='columns')
print(df)
#    a   b
# 0  1   2
# 1  9  11

5.'values' : just the values array。

values這種我們就很常見(jiàn)了。就是一個(gè)嵌套的列表。里面的成員也是列表，2層的。

s = '[["a",1],["b",2]]'
df = pd.read_json(s, orient='values')
print(df)
#    0  1
# 0  a  1
# 1  b  2

五、pandas讀取sql語(yǔ)句

import numpy as np
import pandas as pd
import pymysql


def conn(sql):
    # 連接到mysql數(shù)據(jù)庫(kù)
    conn = pymysql.connect(
        host="localhost",
        port=3306,
        user="root",
        passwd="123",
        db="db1",
    )
    try:
        data = pd.read_sql(sql, con=conn)
        return data
    except Exception as e:
        print("SQL is not correct!")
    finally:
        conn.close()


sql = "select * from test1 limit 0, 10"  # sql語(yǔ)句
data = conn(sql)
print(data.columns.tolist())  # 查看字段
print(data)  # 查看數(shù)據(jù)

原文鏈接：https://www.cnblogs.com/springsnow/p/11970678.html

上一篇：PostgreSQL出現(xiàn)死鎖該如何解決_PostgreSQL
下一篇：Python?pyecharts?Line折線圖的具體實(shí)現(xiàn)_

日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

網(wǎng)站首頁(yè) 編程語(yǔ)言正文

Python中的pandas表格模塊、文件模塊和數(shù)據(jù)庫(kù)模塊_python

目錄

一、Series數(shù)據(jù)結(jié)構(gòu)

1、Series的創(chuàng)建

2、Series屬性

2、Series缺失數(shù)據(jù)處理

二、DataFrame數(shù)據(jù)結(jié)構(gòu)

1、DataFrame的創(chuàng)建

2、DataFrame屬性

3、DataFrame取值

4、DataFrame值替換

5、處理丟失數(shù)據(jù)

6、合并數(shù)據(jù)

二、讀取CSV文件

三、導(dǎo)入導(dǎo)出數(shù)據(jù)

1、讀取文件導(dǎo)入數(shù)據(jù)

2、寫(xiě)入文件導(dǎo)出數(shù)據(jù)

3、實(shí)例

四、pandas讀取json文件

五、pandas讀取sql語(yǔ)句

相關(guān)推薦

日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

網(wǎng)站首頁(yè) 編程語(yǔ)言 正文

Python中的pandas表格模塊、文件模塊和數(shù)據(jù)庫(kù)模塊_python

目錄

一、Series數(shù)據(jù)結(jié)構(gòu)

1、Series的創(chuàng)建

2、Series屬性

2、Series缺失數(shù)據(jù)處理

二、DataFrame數(shù)據(jù)結(jié)構(gòu)

1、DataFrame的創(chuàng)建

2、DataFrame屬性

3、DataFrame取值

4、DataFrame值替換

5、處理丟失數(shù)據(jù)

6、合并數(shù)據(jù)

二、讀取CSV文件

三、導(dǎo)入導(dǎo)出數(shù)據(jù)

1、讀取文件導(dǎo)入數(shù)據(jù)

2、寫(xiě)入文件導(dǎo)出數(shù)據(jù)

3、實(shí)例

四、pandas讀取json文件

五、pandas讀取sql語(yǔ)句

相關(guān)推薦

網(wǎng)站首頁(yè) 編程語(yǔ)言正文