日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

學無先后,達者為師

網站首頁 編程語言 正文

主從同步中斷(sql_thread)問題一例

作者:記錄Python學習 更新時間: 2022-10-11 編程語言

? ? ? ? 在接收到某境外業務項目同步國內分析庫實例的MySQL中轉從庫磁盤大于80%的告警,ECS主機數據盤擴容過程中,偶然發現從庫sql_thread線程狀態中斷。其中查閱主庫的error log無異常,本想通過mysqlbinlog?--base64-output=decode-rows -vv(verify-binlog-checksum) 是否可以正常解析binlog文件,以驗證主庫的binlog是否損壞,發現主庫binlog已經過期刪除。此外,在處理從庫的時候重啟從庫MySQL實例,因為relay_log_recovery=on設置導致從庫的relay-log也因重啟實例被刪除了,當前就只能寄希望于從庫實例上的蛛絲馬跡看是否能跟蹤定位發現問題,從庫實例主機日志也被清理了,其中MySQL錯誤日志如下:

##從庫MySQL錯誤日志如下:
Status information:

Current dir: /data/3306/data/
Running threads: 12  Stack size: 262144
Current locks:
lock: 0x7c673b0:

lock: 0x7bf5a60:

lock: 0x7bd8770:

lock: 0x7be74b0:

lock: 0x7bc6350:

lock: 0x7bbf920:

lock: 0x7bb7c10:

lock: 0x2081480:

lock: 0x20813a0:

lock: 0x20812c0:
......


Events status:
LLA = Last Locked At  LUA = Last Unlocked At
WOC = Waiting On Condition  DL = Data Locked

Event scheduler status:
State      : INITIALIZED
Thread id  : 0
LLA        : n/a:0
LUA        : n/a:0
WOC        : NO
Workers    : 0
Executed   : 0
Data locked: NO

Event queue status:
Element count   : 0
Data locked     : NO
Attempting lock : NO
LLA             : init_queue:103
LUA             : init_queue:111
WOC             : NO
Next activation : never
2022-09-01T11:41:22.841046+08:00 7 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 59, event_type: 101
2022-09-01T11:41:22.841075+08:00 7 [ERROR] Error reading relay log event for channel '': slave SQL thread aborted because of I/O error
2022-09-01T11:41:22.841091+08:00 7 [ERROR] Slave SQL for channel '': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog
' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or sl
ave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594
2022-09-01T11:41:22.843607+08:00 7 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000242' position 314110116.
2022-09-01T22:09:25.948621+08:00 2 [Note] Aborted connection 2 to db: 'unconnected' user: 'dm_cnl' host: '10.10.80.99' (Got timeout reading communication packets)

? ? ? ? 從以上從庫實例MySQL錯誤日志,大概能猜測可能已經跨境網絡導致從庫實例接收到的Relay-log傳輸過程中錯誤導致日志文件損壞,進而導致MySQL從庫實例在回放binlog時報錯,即sql_thread_running=no,但在定位和解決問題的幾點教訓總結如下:

  1. 在定位和解決問題之前,"保護現場"!即不管是數據還是日志都需要足夠警惕,事先備份
  2. 重要數據保護措施/參數等:
    1. relay_log_recovery?
    2. expire_logs_days
    3. /var/log/messags
    4. /data/3306/log/error.log?
  3. 任何時候,任何服務都應及時配置/創建服務的重要監控告警項,以及時發現和解決問題
  4. 解決relay-log文件損壞的重要步驟
    1. 定位是主庫binlog還是從庫relaylog損壞,通過mysqlbinlog看是否正常解析日志文件
    2. 查閱主從操作系統日志和MySQL錯誤日志信息,/var/log/messages? error.log?
    3. 如master節點的binlog還保留著,從節點執行reset slave,change master to重新同步
    4. 如master節點的binlog已過期刪除,重新備份主庫數據并重做主從數據同步
  5. MySQL對于binlog和relaylog文件中event事件有效性校驗參數
參數名稱 參數解釋
binlog_checksum binlog完整性校驗參數,默認CRC32,如為none,檢查binlog中event的長度方式來校驗
master_verify_checksum?

從Master中獲取binlog會去校驗binlog的事件完整性,兩種情況會發生,其一:show binlog events,其二:binlog dump向slave中binlog

slave_sql_verify_checksum 從庫sql_thread回放relaylog獲取event時校驗

原文鏈接:https://blog.csdn.net/u014674448/article/details/126833588

欄目分類
最近更新