[ERROR] [MY-012872] [InnoDB] [FATAL] Semaphore wait has lasted > 600 seconds

darren__chan發表於2021-11-28


InnoDB: ###### Diagnostic info printed to the standard error stream

2021-11-19T14:52:21.671328+08:00 0 [ERROR] [MY-012872] [InnoDB] [FATAL] Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

2021-11-19T14:52:21.671407+08:00 0 [ERROR] [MY-013183] [InnoDB] Assertion failure: ut0ut.cc:532 thread 139928164968192

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html

InnoDB: about forcing recovery.

06:52:21 UTC - mysqld got signal 6 ;

Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...



Thread 139932014196480 在 buf0flu.cc line 1357 等待一個訊號量,需要在buf0buf.cc line 781 建立一個 申請SX-lock的 RW-latch  ,但是這時thread id 139928049043200


在這上面已經有獨佔鎖了,

而Thread 139928049043200 在  btr0cur.cc line 233 等待一個訊號量,需要在buf0buf.cc line 781 建立一個 申請X-lock (wait_ex)的 RW-latch ,可是此時Thread 139928049043200

實際上是一直在等待獨佔鎖(wait exclusive),即是說該RW-latch 目前仍未被釋放。



[Warning] [MY-012985] [InnoDB] A long semaphore wait:

--Thread 139932014196480 has waited at buf0flu.cc line 1357 for 784 seconds the semaphore:

SX-lock on RW-latch at 0x7f439b4df988 created in file buf0buf.cc line 781

a writer (thread id 139928049043200) has reserved it in mode  exclusive



2021-11-19T14:51:51.381694+08:00 0 [Warning] [MY-012985] [InnoDB] A long semaphore wait:

--Thread 139928049043200 has waited at btr0cur.cc line 233 for 923 seconds the semaphore:

X-lock (wait_ex) on RW-latch at 0x7f43a36c0108 created in file buf0buf.cc line 781

a writer (thread id 139928049043200) has reserved it in mode  wait exclusive



先說buf0buf.cc ,其就是指的database buffer pool , line 781 對應的是  rw_lock_create(PFS_NOT_INSTRUMENTED, &block->lock, SYNC_LEVEL_VARYING); 是建立一個讀寫鎖,其所屬的函式是buf_block_init(),其是在初始buffer pool時初始化一個buffer 控制塊,即是指對資料庫buffer 進行鎖控制的。


而buf0flu.cc 是用於mysql 的buffer pool flush的。line 1357 是 rw_lock_sx_lock_gen(rw_lock, BUF_IO_WRITE),看起來是在flush buffer時需要申請的rw_lock。



 btr0cur.cc 的描述是索引樹遊標,作用是所有行操作對一個b-tree或記錄的改變都必須透過這個模組,同時,每一次對聚合索引記錄的修改或插入,都會在這裡寫入undo 日誌記錄。

 而第233 行是獲取btree block的過程,可以看到他需要去申請RW_X_LATCH.

232  get_block = btr_block_get(page_id_t(page_id.space(), left_page_no),

233                          page_size, RW_X_LATCH, cursor->index, mtr);


這裡看起來就是對錶的更改操作才會觸發, 同時在error log中也可以看到Thread 139928049043200  的TRANSACTION 787138017 是一直在進行rollback,這可以進一步印證以上情況。

---TRANSACTION 787138017, ACTIVE 4387 sec rollback <<<<<<<

mysql tables in use 2, locked 1

ROLLING BACK 1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 7891101

MySQL thread id 11963922, OS thread handle 139928049043200, query id 1348226447 30.88.75.45 msvcmdata query end

INSERT INTO `xxl_job_log_copy1` SELECT * FROM `msvcm`.`xxl_job_log`

Trx read view will not see trx with id >= 787138017, sees < 787138017


同時也存在另外一條類似sql正在進行inserting,此時的buffer pool確實會相對緊張。

---TRANSACTION 787233964, ACTIVE 3040 sec inserting

mysql tables in use 2, locked 1

1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 8544482

MySQL thread id 11964240, OS thread handle 139933323695872, query id 1348381276 30.88.75.45 msvcmdata executing

INSERT INTO `xxl_job_log_copy2` SELECT * FROM `msvcm`.`xxl_job_log`

Trx read view will not see trx with id >= 787233964, sees < 787138017


從innodb status 看起,當時Pending flushes  buffer pool 的情況不斷上漲。

Pending flushes (fsync) log: 1; buffer pool: 30

Pending flushes (fsync) log: 1; buffer pool: 33

Pending flushes (fsync) log: 1; buffer pool: 31

Pending flushes (fsync) log: 1; buffer pool: 31

Pending flushes (fsync) log: 1; buffer pool: 31

Pending flushes (fsync) log: 1; buffer pool: 35

Pending flushes (fsync) log: 0; buffer pool: 37

Pending flushes (fsync) log: 0; buffer pool: 48

Pending flushes (fsync) log: 1; buffer pool: 54

Pending flushes (fsync) log: 1; buffer pool: 54

Pending flushes (fsync) log: 1; buffer pool: 62

Pending flushes (fsync) log: 0; buffer pool: 68

Pending flushes (fsync) log: 0; buffer pool: 63

Pending flushes (fsync) log: 1; buffer pool: 63

Pending flushes (fsync) log: 0; buffer pool: 67

Pending flushes (fsync) log: 0; buffer pool: 71

Pending flushes (fsync) log: 1; buffer pool: 70

Pending flushes (fsync) log: 1; buffer pool: 71

Pending flushes (fsync) log: 1; buffer pool: 72

Pending flushes (fsync) log: 1; buffer pool: 73

Pending flushes (fsync) log: 0; buffer pool: 76

Pending flushes (fsync) log: 1; buffer pool: 75

Pending flushes (fsync) log: 1; buffer pool: 75

Pending flushes (fsync) log: 0; buffer pool: 75

Pending flushes (fsync) log: 1; buffer pool: 75

Pending flushes (fsync) log: 1; buffer pool: 76

Pending flushes (fsync) log: 0; buffer pool: 76

Pending flushes (fsync) log: 1; buffer pool: 78

Pending flushes (fsync) log: 0; buffer pool: 82

Pending flushes (fsync) log: 0; buffer pool: 85

Pending flushes (fsync) log: 1; buffer pool: 89

Pending flushes (fsync) log: 1; buffer pool: 95

Pending flushes (fsync) log: 1; buffer pool: 97

Pending flushes (fsync) log: 0; buffer pool: 102



主要是因為存在兩條比較大的insert select 全表掃描的語句,一條在進行insert,一條在進行rollback,加上當時buffer pool相對比較小,buffer pool的資源緊張,資料庫在flush buffer及modify buffer 時對buffer 的latch發生爭用且長時間等待,從而導致訊號量超時。


目前我暫時未找到相關的已知bug描述,這類問題解決方案就是避免SQL長時間佔用Latch:

1,修改select查詢避免全表掃描,避免Latch長期被佔用。

2,加查詢條件,適當的加索引,讓select執行更快,也避免一個select鎖的資料更少。

3,適當加大buffer pool 及 buffer pool instance,每個buffer pool都有自己獨立的Latch,避免latch競爭。

4.避免長時間回滾


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29863023/viewspace-2844506/,如需轉載,請註明出處,否則將追究法律責任。

相關文章