MySQL 5.6 遭遇 OS bug INNODB MONITOR OUTPUT 事件
資料庫主機從庫 報錯了。
持續的報出 INNODB STAUTS OUTPUT 寫到 err log檔案裡面去。
END OF INNODB MONITOR OUTPUT
2019-02-19 02:10:16 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:10:36 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:10:56 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:11:16 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:11:36 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:11:56 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:12:16 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:12:36 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:12:56 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:13:16 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:13:36 7f7dc518b700 INNODB MONITOR OUTPUT
END OF INNODB MONITOR OUTPUT
2019-02-19 02:13:56 7f7dc518b700 INNODB MONITOR OUTPUT
然後從庫無法連線主庫。
監控程式碼報錯:
mysql host:′ó?a??] slave ??3ì ?é?ü3??êìaá?£?show slave status £o??ê?1ê??
Warning: Using a password on the command line interface can be insecure.
Master_Log_File: mysql-bin.000062
Read_Master_Log_Pos: 94626697
Relay_Log_File: mysql- relay-bin.000121
Relay_Master_Log_File: mysql-bin.000062
Slave_IO_Running: No
Slave_SQL_Running: No
Exec_Master_Log_Pos: 94626697
Seconds_Behind_Master: NULL
Slave_SQL_Running_State:
err 日誌的報錯資訊:
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 158651
--Thread 140178133391104 has waited at trx0undo.ic line 191 for 924.00 seconds the semaphore:
S-lock on RW-latch at 0x7f8075840fc0 created in file buf0buf.cc line 1069
a writer (thread id 140178090628864) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1056
Last time write locked in file /export/home/pb2/build/sb_0-19672887-1468230598.75/mysql-5.6.32/storage/innobase/include/trx0undo.
ic line 171
--Thread 140178090628864 has waited at trx0sys.ic line 103 for 924.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7f81bda169c0 created in file buf0buf.cc line 1069
a writer (thread id 140178090628864) has reserved it in mode wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file buf0flu.cc line 1056
Last time write locked in file /export/home/pb2/build/sb_0-19672887-1468230598.75/mysql-5.6.32/storage/innobase/include/trx0sys.i
c line 103
晚上也懶得去細查了, 就重啟了資料庫問題解決。
早上回來調查。
發現了這是一個bug。
檢視手冊的時候發現自適應雜湊索引可能會導致btr0sea.c檔案的rw latch獲取爭用,從而導致SEMAPHORES問題。
詳細連結:
https://dev.mysql.com/doc/refman/5.6/en/innodb-adaptive-hash.html
。
解決方案:set global innodb_adaptive_hash_index=0;
進一步網上搜尋發現
linux 6.6 有個bug 觸發這個問題。
We had this one bite us hard and scare the %$^! out of us, so I figured I'd share the fear...
The linux futex_wait call has been broken for about a year (in upstream since 3.14, around Jan 2014), and has just recently been fixed (in upstream 3.18, around October 2014). More importantly this breakage seems to have been back ported into major distros (e.g. into RHEL 6.6 and its cousins, released in October 2014), and the fix for it has only recently been back ported (e.g. RHEL 6.6.z and cousins have the fix).
The impact of this kernel bug is very simple: user processes can deadlock and hang in seemingly impossible situations. A futex wait call (and anything using a futex wait) can stay blocked forever, even though it had been properly woken up by someone. Thread.park() in Java may stay parked. Etc. If you are lucky you may also find soft lockup messages in your dmesg logs. If you are not that lucky (like us, for example), you'll spend a couple of months of someone's time trying to find the fault in your code, when there is nothing there to find.
This behavior seems to regularly appear in the wild on Haswell servers (all the machines where we have had customers hit it in the field and in labs been Haswells), and since Haswell servers are basically what you get if you buy a new machine now, or run on the cool new amazon EC2/GCE/Azure stuff, you are bound to experience some interesting behavior. I don't know of anyone that will see this as a good thing for production systems. Except for maybe Netflix (maybe we should call this the linux fumonkey).
The commit for the fix is here:
The commit explanation says that it fixes (presumably the bug introduced with that change), which was made in Jan of 2014into 3.14. That 3.14 code added logic to avoid taking a lock if the code knows that there are no waiters. It documents (pretty elaborately) how "…thus preventing tasks sleeping forever if wakers don't acknowledge all possible waiters" with logic that explains how memory barriers guarantee the correct order (see paragraph at line 141), which includes the statement "this is done by the barriers in get_futex_key_refs(), through either ihold or atomic_inc, depending on the futex type." (this assumption is the actual bug). The assumption is further reinforced in the fact that the change added a comment to every calls to get_futex_key_refs() in the code that says "/* implies MB (B) */".
The problem was that get_futex_key_refs() did NOT imply a memory barrier. It only included a memory barrier for two explicit cases in a switch statement that checks the futex type, but did not have a default case handler, and therefor did not apply a memory barrier for other fuxtex types. Like private futexes. Which are a very commonly used type of futex.
The fix is simple, an added default case for the switch that just has an explicit smp_mb() in it. There was a missing memory barrier in the wakeup path, and now (hopefully) it's not missing any more...
So lets be clear: RHEL 6.6 (and CentOS 6.6., and Scientific Linux 6.6.) are certainly broken on Haswell servers. It is likely that recent versions other distros are too (SLES, Ubuntu, Debia, Oracle Linux, etc.). The good news is that fixes are out there (including 6.6.z). But the bad news is that there is not much chatter saying "if you have a Haswell system, get to version X now". For some reason, people seem to not have noticed this or raised the alarm. We certainly haven't seen much "INSTALL PATCHES NOW" fear mongering. And we really need it, so I'm hoping this posting will start a panic.
Bottom line: the bug is very real, but it probably only appeared in the 3.14 upstream version (and distro versions that had backported , presumably after Jan 2014). The bug was fixed in 3.18 in October 2014, but backports probably took a while (and some may still be pending). I now for a fact that RHEL 6.6.z has the fix. I don't know about other distro families and versions (yet), but if someone else does, please post (including when was it broken, and when was it fixed).
Note: I would like to profusely thank @aplokhotnyuk. His tweet originally alerted me to the bug's existence, and started us down the path of figuring out the what//why/where/when behind it. Why this is not being shouted in the streets is a mystery to me, and scary in its own right. We were lucky enough that I had a "that looks suspiciously familiar" moment when I read that tweet, and that I put 3.14 and 1.618 together and thought enough to ask "Umm... have we only been seeing this bug on Haswell servers?". Without @aplokhotnyuk's tweet we'd probably still be searching for the nonexistent bugs in our own locking code... And since the tweet originated from another discussion on this group, it presents a rare "posting and reading twitter actually helps us solve bugs sometimes" example.
而且剛剛好,我們的os 為 centos 6.6
cpu 為 haswell 架構。
model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
stepping : 1
看來要升級os 才能搞定曾問題了。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/133735/viewspace-2636330/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【MySQL】5.6.x InnoDB Error Table mysql.innodb_table_stats not foundMySqlError
- mysql xtrabackup 遭遇嚴重bugMySql
- innodb_lock_monitor解決mysql死鎖MySql
- [MySQL5.6]Innodb新的監控表INNODB_METRICSMySql
- mysql 5.6效能監控表innodb_metricsMySql
- 遭遇mysql 5.6.10 BUG 一例MySql
- MySQL 5.6 InnoDB儲存引擎體系結構圖MySql儲存引擎
- 遭遇 bug InnoDB: Failing assertion: page_get_n_recs(page) > 1AI
- MySQL:5.6 大事務show engine innodb status故障一例MySql
- MySQL 5.6 innodb_io_capacity引數效能測試MySql
- Percona MySQL 5.6 配置InnoDB優化器永久統計資訊MySql優化
- Solaris 5.6 dual head monitor config
- [ 丹臣]利用innodb_table_monitor來trace innodb內部資訊
- mysql5.6在匯入時報innodb_table_stats不存在MySql
- [Shell] Linux monitor OS process and DB sessionLinuxSession
- 6.Monitor Linux OS process and DB sessionLinuxSession
- oracle 10.2.4 遭遇bug 當機Oracle
- 【MySQL】5.6/5.7並行複製bug導致的故障 ERROR 1755/1756MySql並行Error
- mysql 5.6官方文件MySql
- MySQL InnoDB 索引MySql索引
- Monitor All SQL Queries in MySQL (alias mysql profiler)MySql
- cent os 6.6 x64 自動原始碼安裝mysql 5.6指令碼原始碼MySql指令碼
- delete操作時遭遇ora-600 bug[kntgslm]delete
- MySQL5.6開始可以使用獨立表空間, innodb_file_per_table=1MySql
- Windows 安裝 MySQL 5.6WindowsMySql
- MySQL:5.6 升級 5.7MySql
- mysql5.6備份MySql
- MySQL 5.6主主同步MySql
- CanalBinlogChange(mysql5.6)GCMySql
- CentOS 安裝 mysql 5.6CentOSMySql
- mysql innodb索引高度MySql索引
- MySQL InnoDB update流程MySql
- MySQL Deadlocks in InnoDBMySql
- WebSphere Business Monitor 事件處理內幕揭祕Web事件
- 32 位 oracle 安裝到 64 位os ,遭遇的錯誤Oracle
- 升級Grid Infrastructure到10.2.0.2 遭遇bug 9413827ASTStruct
- 遭遇ORA-600 [kgscLogOff-notempty] bugGo
- MySQL 5.6的表壓縮MySql