日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

學(xué)無先后,達(dá)者為師

網(wǎng)站首頁 編程語言 正文

Python使用pandas將表格數(shù)據(jù)進(jìn)行處理_python

作者:IT之一小佬 ? 更新時(shí)間: 2022-10-27 編程語言

前言

任務(wù)描述:

當(dāng)前有一份excel表格數(shù)據(jù),里面存在缺失值,需要對(duì)缺失的數(shù)據(jù)到es數(shù)據(jù)庫中進(jìn)行查找并對(duì)其進(jìn)行把缺失的數(shù)據(jù)進(jìn)行補(bǔ)全。

excel表格數(shù)據(jù)如下所示:

一、構(gòu)建es庫中的數(shù)據(jù)

1.1 創(chuàng)建索引

# 創(chuàng)建physical_examination索引
PUT /physical_examination
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "properties": {
      "nums": {
        "type": "integer"
      },
      "name": {
        "type": "text"
      },
      "sex": {
        "type": "text"
      },
      "phone": {
        "type": "integer"
      },
      "result": {
        "type": "text"
      }
    }
  }
}

1.2 插入數(shù)據(jù)

注意:json數(shù)據(jù)不能格式化換行,否則報(bào)錯(cuò)】

# 向physical_examination索引中添加數(shù)據(jù)
POST physical_examination/_bulk
{"index":{"_id":"1"}}
{"nums":1,"name":"劉一","sex":"男","phone":1234567891,"result":"優(yōu)秀"}
{"index":{"_id":"2"}}
{"nums":2,"name":"陳二","sex":"男","phone":1234567892,"result":"優(yōu)秀"}
{"index":{"_id":"3"}}
{"nums":3,"name":"張三","sex":"男","phone":1234567893,"result":"優(yōu)秀"}
{"index":{"_id":"4"}}
{"nums":4,"name":"李四","sex":"男","phone":1234567894,"result":"優(yōu)秀"}
{"index":{"_id":"5"}}
{"nums":5,"name":"王五","sex":"男","phone":1234567895,"result":"優(yōu)秀"}

1.3 查詢數(shù)據(jù)

注意:默認(rèn)查詢索引下的所有數(shù)據(jù)】

# 查詢索引中的所有數(shù)據(jù)
GET physical_examination/_search
{
  "query": {
    "match_all": {}
  }
}

二、對(duì)excel表格中的數(shù)據(jù)處理操作

2.1 導(dǎo)出es查詢的數(shù)據(jù)

  • 方法一:直接在kibana或postman查詢的結(jié)果中進(jìn)行復(fù)制粘貼到一個(gè)文檔。
  • 方法二:使用kibana導(dǎo)出數(shù)據(jù)。
  • 方法三:使用postman導(dǎo)出數(shù)據(jù)保存到本地。

使用python處理數(shù)據(jù),獲取需要的數(shù)據(jù)。

示例代碼:

# 讀取json中體檢信息
with open('./data/physical_examination.json', 'r', encoding='utf-8') as f:
    data_json = f.read()
print(data_json)
 
# 處理json數(shù)據(jù)中的異常數(shù)據(jù)
if 'false' in data_json:
    data_json = data_json.replace('false', "False")
 
data_json = eval(data_json)
print(data_json)
 
print(data_json['hits']['hits'])
print('*' * 100)
 
valid_data = data_json['hits']['hits']
need_data = []
for data in valid_data:
    print(data['_source'])
    need_data.append(data['_source'])
print(need_data)

讀取缺失數(shù)據(jù)的excel表格,把缺失的數(shù)據(jù)填補(bǔ)進(jìn)去。

# 讀取需要填補(bǔ)數(shù)據(jù)的表格
data_xlsx = pd.read_excel('./data/體檢表.xlsx', sheet_name='Sheet1')
# print(data_xlsx)
 
# 獲取excel表格的行列
row, col = data_xlsx.shape
print(row, col)
 
# 修改表格中的數(shù)據(jù)
for i in range(row):
    bb = data_xlsx.iloc[i]
    print(bb['姓名'], bb['手機(jī)號(hào)'])
    if pd.isnull(bb['手機(jī)號(hào)']):
        bb['手機(jī)號(hào)'] = '666'
        for cc in need_data:
            if cc['name'] == bb['姓名']:
                bb['手機(jī)號(hào)'] = cc['phone']
            data_xlsx.iloc[i, 3] = bb['手機(jī)號(hào)']
    print(bb['姓名'], bb['手機(jī)號(hào)'])
    print("-" * 100)
print(data_xlsx)

將最終處理好的數(shù)據(jù)保存在新建的文件中。

# 保存數(shù)據(jù)到新文件中
data_xlsx.to_excel('./data/new_data.xlsx', sheet_name='Sheet1', index=False, header=True)

完整代碼如下:

import pandas as pd
 
# 讀取json中體檢信息
with open('./data/physical_examination.json', 'r', encoding='utf-8') as f:
    data_json = f.read()
print(data_json)
 
# 處理json數(shù)據(jù)中的異常數(shù)據(jù)
if 'false' in data_json:
    data_json = data_json.replace('false', "False")
 
data_json = eval(data_json)
print(data_json)
 
print(data_json['hits']['hits'])
print('*' * 100)
 
valid_data = data_json['hits']['hits']
need_data = []
for data in valid_data:
    print(data['_source'])
    need_data.append(data['_source'])
print(need_data)
 
# 讀取需要填補(bǔ)數(shù)據(jù)的表格
data_xlsx = pd.read_excel('./data/體檢表.xlsx', sheet_name='Sheet1')
# print(data_xlsx)
 
# 獲取excel表格的行列
row, col = data_xlsx.shape
print(row, col)
 
# 修改表格中的數(shù)據(jù)
for i in range(row):
    bb = data_xlsx.iloc[i]
    print(bb['姓名'], bb['手機(jī)號(hào)'])
    if pd.isnull(bb['手機(jī)號(hào)']):
        bb['手機(jī)號(hào)'] = '666'
        for cc in need_data:
            if cc['name'] == bb['姓名']:
                bb['手機(jī)號(hào)'] = cc['phone']
            data_xlsx.iloc[i, 3] = bb['手機(jī)號(hào)']
    print(bb['姓名'], bb['手機(jī)號(hào)'])
    print("-" * 100)
print(data_xlsx)
 
# 保存數(shù)據(jù)到新文件中
data_xlsx.to_excel('./data/new_data.xlsx', sheet_name='Sheet1', index=False, header=True)

運(yùn)行效果,最終處理好的數(shù)據(jù)如下所示:

原文鏈接:https://blog.csdn.net/weixin_44799217/article/details/126559956

欄目分類
最近更新