mysqltoomanyconnections分析

zysql發表於2016-12-23

現象

例項出現too many connections

ERROR 1040 (08004): Too many connections

gdb修改max_connections後檢視processlist, 有Waiting for backup lock,sql執行緒被阻塞,同時大量show slave status連線

| 131945 | system user |                     | mysql              | Connect          |  302156 | Waiting for Slave Worker to release partition                         | NULL                                                     |
| 131946 | system user |                     | NULL               | Connect          |  302832 | Waiting for an event from Coordinator                                 | NULL                                                     |
| 131947 | system user |                     | NULL               | Connect          |  381957 | Waiting for an event from Coordinator                                 | NULL                                                     |
| 131948 | system user |                     | NULL               | Connect          |  302167 | Waiting for an event from Coordinator                                 | NULL                                                     |
| 131949 | system user |                     | NULL               | Connect          |  302520 | Waiting for backup lock                                               | NULL                                                     |
| 131950 | system user |                     | NULL               | Connect          |  302531 | Waiting for backup lock                                               | NULL                                                     |
| 131951 | system user |                     | NULL               | Connect          |  302531 | Waiting for backup lock                                               | NULL                                                     |
| 131952 | system user |                     | NULL               | Connect          |  302537 | Waiting for backup lock                                               | NULL                                                     |
| 131953 | system user |                     | NULL               | Connect          |  302554 | Waiting for backup lock                                               | NULL                                                     |
| 187069 | root        | 127.0.0.1:49991     | NULL               | Sleep            |       9 |                                                                       | NULL                                                     |
| 211141 | root        | 127.0.0.1:49251     | NULL               | Query            |  297261 | init                                                                  | show slave status for channel ``                         |
| 245974 | root        | 127.0.0.1:48726     | NULL               | Query            |  297194 | init                                                                  | SHOW SLAVE STATUS                                        |
| 247341 | aurora      | 10.143.33.57:36949  | NULL               | Query            |  297336 | Killing slave                                                         | stop slave                                               |
| 247346 | root        | 127.0.0.1:58466     | NULL               | Killed           |  297335 | init                                                                  | show slave status                                        |
| 247349 | root        | 127.0.0.1:58565     | NULL               | Killed           |  297327 | init        

檢視存在備份程式

root      86809  86803  0 May14 ?        00:00:00  innobackupex --defaults-file=/etc/my.cnf ......

分析

我們引入了percona 的Backup Locks方案,備份會執行LOCK TABLES FOR BACKUP

pt-pmt 分析執行緒堆疊資訊,

show slave status等待LOCK_msr_map

__lll_lock_wait(libpthread.so.0),_L_lock_995(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),inline_mysql_mutex_lock(mysql_thread.h:690),show_slave_status_cmd(mysql_thread.h:690),mysql_execute_command(sql_parse.cc:3347),mysql_parse(sql_parse.cc:7158),dispatch_command(sql_parse.cc:1597),do_handle_one_connection(sql_connect.cc:1006),handle_one_connection(sql_connect.cc:922),start_thread(libpthread.so.0),clone(libc.so.6)

stop slave持有LOCK_msr_map等待stop_cond io和sql退出

1 pthread_cond_timedwait,inline_mysql_cond_timedwait(mysql_thread.h:1199),terminate_slave_thread(mysql_thread.h:1199),terminate_slave_thread(rpl_slave.cc:1268),terminate_slave_threads(rpl_slave.cc:1268),terminate_slave_threads(rpl_slave.cc:9768),stop_slave(rpl_slave.cc:9768),stop_slave(rpl_slave.cc:611),stop_slave_cmd(rpl_slave.cc:756),mysql_execute_command(sql_parse.cc:3707),mysql_parse(sql_parse.cc:7158),dispatch_command(sql_parse.cc:1597),do_handle_one_connection(sql_connect.cc:1006),handle_one_connection(sql_connect.cc:922),start_thread(libpthread.so.0),clone(libc.so.6)

sql執行緒等待 worker執行緒執行完事務( slave_worker_hash_cond)

  1 pthread_cond_wait,inline_mysql_cond_wait(mysql_thread.h:1162),wait_for_workers_to_finish(mysql_thread.h:1162),slave_stop_workers(rpl_slave.cc:6471),handle_slave_sql(rpl_slave.cc:6997),start_thread(libpthread.so.0),clone(libc.so.6)

worker等待backup_tables_lock 鎖

pthread_cond_timedwait,inline_mysql_cond_timedwait(mysql_thread.h:1199),MDL_wait::timed_wait(mysql_thread.h:1199),MDL_context::acquire_lock(mdl.cc:2416),Global_backup_lock::acquire_protection(lock.cc:1221),open_table(sql_base.cc:3173),open_and_process_table(sql_base.cc:4630),open_tables(sql_base.cc:4630),open_and_lock_tables(sql_base.cc:5735),open_and_lock_tables(sql_base.h:476),Rows_log_event::do_apply_event(sql_base.h:476),slave_worker_exec_job(rpl_rli_pdb.cc:2061),handle_slave_worker(rpl_slave.cc:5696),start_thread(libpthread.so.0),clone(libc.so.6)

而我們備份又持有backup_tables_lock鎖

以鎖等待依賴順序導致大量的show slave status被阻塞,從而佔滿root連線

修復方法

可以通過kill備份的方式修復

如何避免

1 儘量不要使用myisam,減少備份持有LOCK TABLES FOR BACKUP的時間。本例中myisam有200多個

2 備份期間儘量不要執行stop slave操作。


相關文章