網站首頁編程語言正文

ES對比兩個索引的數據差

作者：鐵柱同學更新時間： 2022-07-26 編程語言

一、前言

我們在修改索引的mapping后，為了不影響線上的服務，一般需要新建索引,然后刷新數據過去，然而新索引的數據是否正常，跟舊索引數據比起來差異在哪里，這塊總是難以驗證。

有幸參考大佬的文章，具體實施了以下兩個方案，對比新舊索引的數據，大佬文章鏈接：圖解 | Elasticsearch 獲取兩個索引數據不同之處的四種方案

二、kibana的方式

1. kibana對比兩個索引的數據差

有時候我們需要對比兩個索引的字段差，比如兩個索引Id的差，從而找到缺失的數據，我們可以用下面這個sql搞定。(本地或者其他環境均可以使用該方法)

（1）打開kibana的dev tools
（2）輸入以下sql
（3）index_old,index_new是要對比的索引名稱
（4）id 是對比的字段，最好是業務上的唯一字段
（5）執行，查看結果即可。
原理：使用聚合的方式，如果兩個索引id相同，則聚合結果為2.我們查詢聚合結果<2的數據，那么結果里面就是缺失的id.


POST index_new,index_old/_search
{
  "size": 0,
  "aggs": {
    "group_by_uid": {
      "terms": {
        "field": "id",
        "size": 1000000
      },
      "aggs": {
        "count_indices": {
          "cardinality": {
            "field": "_index"
          }
        },
        "values_bucket_filter_by_index_count": {
          "bucket_selector": {
            "buckets_path": {
              "count": "count_indices"
            },
            "script": "params.count < 2"
          }
        }
      }
    }
  }
}

結果：

注意：這里的 "key" : 6418 就代表差值里面有id為6418的記錄，需要自己去檢查為什么會出現差異。。

{
  "took" : 1851,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 21969,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_uid" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 6418,
          "doc_count" : 1,
          "count_indices" : {
            "value" : 1
          }
        },
        {
          "key" : 6419,
          "doc_count" : 1,
          "count_indices" : {
            "value" : 1
          }
        }
}}}

二、其他輪子

github: esdiff
ps：這個插件的作者就是olivere/elastic 的作者，大佬出品，可以一試

1、本地使用步驟

1.下載
go install github.com/olivere/esdiff@latest

2.執行命令
./esdiff -u=true -d=false 'http://localhost:9200/index_old/type' 'http://localhost:9200/index_new/type'

3.效果
Unchanged       1
Updated 3       {*diff.Document}.Source["message"]:
        -: "Playing the piano is fun as well"
        +: "Playing the guitar is fun as well"
 
Created 4       {*diff.Document}:
        -: (*diff.Document)(nil)
        +: &diff.Document{ID: "4", Source: map[string]interface {}{"message": "Climbed that mountain", "user": "sandrae"}}

2.常用參數

新增或者刪除字段的時候，使用exclude 或者include 比較好用，對比指定字段之外的數據準確性。

esdiff [flags] <source-url> <destination-url>

 -dsort string  [根據destination索引字段排序] {"term":{"name.keyword":"Oliver"}}
-ssort string   [根據source索引字段排序]"id" or "-id"
-exclude string  [source中排除某些字段]"hash_value,sub.*"
-include string  [source中包含某些字段] "obj.*"

3.自定義文檔Id

由于博主目前文檔的ID字段是根據索引名來的，比如：

//雖然id都是1，但是文檔Id不一樣，導致會出現在差異中
index_old_1
index_new_1

我們的需求主要是對比source里面的字段，因此新增了-replace-with參數，指定唯一ID.
例如：

//使用id來替換文檔ID，實現source字段的對比，獲取差異

go run main.go -ssort=unit_id -dsort=unit_id -replace-with=id'http://localhost:9200/index_old/type' 'http://localhost:9200/index_new/type'

4.輪子對比差異原理

1.根據參數批量讀取es數據，使用scroll游標查詢，默認一次100條
2.使用go-cmp包的cmp.Equal(srcDoc.Source, dstDoc.Source) 對比數據
3.根據參數打印created,updated,deleted等差異數據

end

原文鏈接：https://blog.csdn.net/LJFPHP/article/details/125882840

上一篇：圖解Elasticsearch 獲取兩個索引數據不同之處的四
下一篇：golang中slice切片使用的誤區

日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

網站首頁編程語言正文

ES對比兩個索引的數據差

一、前言

二、kibana的方式

1. kibana對比兩個索引的數據差

二、其他輪子

1、本地使用步驟

2.常用參數

3.自定義文檔Id

4.輪子對比差異原理

相關推薦

日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

網站首頁 編程語言 正文

ES對比兩個索引的數據差

一、前言

二、kibana的方式

1. kibana對比兩個索引的數據差

二、其他輪子

1、本地使用步驟

2.常用參數

3.自定義文檔Id

4.輪子對比差異原理

相關推薦

網站首頁編程語言正文