mysql semi-sync的演化
5.5引入semi-sync,當master事務提交後,由dump將對應binlog傳給slaves,至少收到一個slave的ACK確認,master才返回給使用者執行緒;
注意事項
1 slave ACK只代表io_thread已記錄relay_log,並不意味著sql_thread已經執行;
2 master的事務commit後才傳輸給slave,如果此時master crash,會出現主備資料不一致;
3 dump thread既要負責傳輸binlog,又負責接收slave的ACK,且兩者不能並行,效率很低;
4 dump thread讀取binlog時獲取LOCK_log,mutex期間任何執行緒不得對binlog進行讀寫;
為此後續版本不斷改進
1 after_sync
5.7引入rpl_semi_sync_master_wait_point引數 ,DBA可選擇master 在哪個階段等待來自slave的ACK,要麼按照以前的方法(after_commit),要麼在master事務flush binlog之後但是commit storage engine之前;
AFTER_SYNC (the default): The master writes each transaction to its binary log and the slave, and syncs the binary log to disk. The master waits for slave acknowledgment of transaction receipt after the sync. Upon receiving acknowledgment, the master commits the transaction to the storage engine and returns a result to the client, which then can proceed.
AFTER_COMMIT: The master writes each transaction to its binary log and the slave, syncs the binary log, and commits the transaction to the storage engine. The master waits for slave acknowledgment of transaction receipt after the commit. Upon receiving acknowledgment, the master returns a result to the client, which then can proceed.
假定master上有兩個客戶端連線clienta和clientb,
clienta提交一個事務,pre-5.7 mysql將其依次寫入redo,binlog和redo(commit),然後semi-sync,接收到slave ack後才能返回給clienta;
clientb便可在redo(commit)之後看到clienta提交的事務資料,這領先於clienta一步,從而造成連線間的資料不一致;
after_sync則避免了這種問題,clienta提交一個事務,mysql將其依次寫入redo和binlog,然後semi-sync,等收到slave ack後才進行redo(commit),然後返回給clienta;
after_commit另外一個問題,若master在redo(commit)和semi-sync期間crash,此時主備資料並不一致;
after_sync至少能保證redo(commit)成功的事務都已同步到slave,比之改進了半步;
2 ack collector thread
5.7引入此獨立執行緒,此時的dump thread只負責讀取併傳送binlog event,slave ACK的接收由ACK collector thread負責;
dump thread不必等待ack確認便可繼續傳送event,類似TCP的滑動視窗協議;
master維護一個semisync slave列表,即便ack thread宕掉,該列表仍然存在;
dump thread透過呼叫transmit_start時將slave註冊到master,如果slave支援semisync則新增到semisync slave列表;
ack thread透過select()監聽semisync slave列表;
Ack_receiver Class用於維護ACK執行緒
該執行緒有3種狀態
enum status { ST_UP, ST_STOPPING, ST_DOWN };
ST_UP means ack receive thread is created and is working.
ST_DOWN means ack receive thread is destroyed.
ST_STOPPING means a user is disabling semisync master, and ack receive thread is being destroyed.
- m_slaves
A slave vector which includes slaves' useful information here.
DEFINITION:
Slave_vector m_slaves
- m_mutex
m_slaves and m_status are shared between user sessions(dump threads) and ack thread. So they should be protected by a mutex.
- add_slave()
Add a new semisync slave to slave list.
DEFINITION:
bool add_slave(THD *thd);
LOGIC:
initialze slave information.
acquire m_mutex
add the slave's information into m_slaves.
send a signal to ack receive thread. It may be waiting for a signal.
release m_mutex
- remove_slave()
remove a semisync slave from slave list.
DEFINITION:
void remove_slave(THD *thd)
LOGIC:
acquire m_mutex
remove thd of the slave from m_slaves.
release m_mutex
- run()
The handle function of receive thread.
DEFINITION:
void run();
LOGIC:
initialize pthread related things
while (1)
{
acquire m_mutex
if m_status is ST_STOPPING then break the loop.
wait any semisync slave to be added if slave list empty.
call select to listen on sockets, timeout is 1s.
restart and continue the loop if error or timeout happens.
receive and report acks to semisync master.
release m_mutex
}
de-initialize pthread related things
Note: Giving select a timeout makes other threads can add/remove slaves
or stop ack receive thread when there is no ack.
3 解除dump thread的LOCK_log mutex
當前dump執行緒的工作邏輯如下:
前臺執行緒寫binlog
acquire LOCK_log
write log event to binlog
release LOCK_log
signal update
dump執行緒
while client is not killed:
acquire LOCK_log
read event from binlog
release LOCK_log
if EOF was reached in the previous read:
acquire LOCK_log
wait for update signal
read event from binlog
release LOCK_log
當某個dump執行緒讀取binlog時,它會獲取LOCK_log mutex,期間會阻塞任何針對該binlog的讀寫請求;
移除LOCK_log
event只新增到當前binlog的尾部,所以讀取其他部位的event不需要鎖;
唯一的顧慮是當前臺執行緒寫binlog時,dump thread可能會讀取到incomplete event;
為此MYSQL_BIN_LOG引入一個變數binlog_end_pos,記錄當前binlog的last event的位置資訊,dump thread只讀取這之前的event;
write thread:新增完event後更新此變數,
read thread:只讀取binlog_end_pos之前的event,
該變數由LOCK_binlog_end_pos保護,讀寫時均需要;
此時dump thread的邏輯如下
dump thread design:
end_position = 0
while client is not killed:
if current read position == end_position:
acquire lock_binlog_end
while end_position == binlog_end and client is not killed:
wait for update signal
release lock_binlog_end
if client is killed:
break
read event from binlog
http://dev.mysql.com/worklog/task/?id=5721#tabs-5721-5來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15480802/viewspace-1430221/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- MySQL 5.5 Semi-sync 半同步複製測試MySql
- mysql online ddl的演化MySql
- mysql多執行緒slave的演化MySql執行緒
- MVP框架的演化MVP框架
- 【分散式鎖的演化】“超賣場景”,MySQL分散式鎖篇分散式MySql
- Golang的演化歷程Golang
- 《慾望的演化》總結
- semi-sync原主庫加入叢集阻塞問題分析
- 架構演化架構
- UI架構設計的演化UI架構
- Go 語言的演化歷程Go
- Go語言的演化歷程Go
- 程式語言的演化過程
- 微服務架構在阿里的演化微服務架構阿里
- Java應用架構的演化之路Java應用架構
- 架構演化-初識架構
- 【分散式鎖的演化】什麼是鎖?分散式
- 談談UI架構設計的演化UI架構
- 論軟體體系結構的演化
- JavaScript模組化演化史JavaScript
- 大型網站架構演化網站架構
- Android 架構演化之路Android架構
- SAP 前端技術的演化史簡介前端
- Node 系統中定時任務的演化
- php-fpm是什麼, 以及它的演化PHP
- 事理圖譜:事件演化的規律和模式事件模式
- 行動電話的演化–資料資訊圖
- Java日誌框架演化歷史Java框架
- 架構演化思考總結(1)架構
- 架構演化思考總結(2)架構
- 架構演化學習思考(3)架構
- 談談從CAP定理到Lambda架構的演化架構
- 元宇宙只是未來技術演化的外衣元宇宙
- IBM Adapter 產品的演化及分類IBMAPT
- Memcached--分散式演算法演化分散式演算法
- 大型網站架構演化歷程網站架構
- C# LINQ需求實現演化C#
- 大資料、人和機器智慧演化大資料