技術分享 | OceanBase 手滑誤刪了資料檔案怎麼辦

愛可生雲資料庫發表於2023-04-18

作者:張乾

外星人2號,現兼任六位喵星人的資深鏟屎官。

本文來源:原創投稿

*愛可生開源社群出品,原創內容未經授權不得隨意使用,轉載請聯絡小編並註明來源。


手滑誤刪了資料檔案,並且沒有可替換的節點時,先別急著提桶跑路,可以考慮利用引數 server_permanent_offline_time 來重建受影響的節點。

原理:

server_permanent_offline_time 是 OceanBase 資料庫中用於控制節點永久下線時長的引數。當叢集中的某個節點當機後,系統會根據該引數的設定值來進行相應操作。

如果節點當機時間小於該引數設定的值,系統會暫時不做處理,以避免頻繁的資料遷移;如果當機時間超過該引數設定的值,該節點被標記為永久下線,RootService 會將該 OBServer 上包含的資料副本從 Paxos 成員組中刪除,並在同 zone 內其他可用 OBServer 上補充資料,以保證資料副本 Paxos 成員組完整。該引數預設值是 3600 秒,一般設定較大,以避免不必要的副本複製。此外,當永久下線的節點重新被拉起後,其上的全部資料都需要從其他副本重新拉取。

在本場景下,即是透過調低該引數,讓故障節點快速永久下線再重新上線,達到資料重建的目的。

請注意,此過程會佔用叢集一定的資源,可能會影響效能,因此建議在業務低峰期進行。

官方建議

關於 server_permanent_offline_time 的適用場景和建議值,官方提供如下:

  1. OceanBase 資料庫版本升級場景:建議將該配置項的值設定為72h。
  2. OBServer 硬體更換場景:建議將該配置項的值設定為4h。
  3. OBServer 清空上線場景:建議將該配置項的值設定為10m,使叢集快速上線。

準備過程

預備一套環境

使用OBD工具快速部署一套3節點OB以及一個OBProxy,再建立好一個租戶sysbench_tenant,primary_zone為RANDOM。

注:本文基於OB 3.1.2版本,其他版本需注意另作驗證。

準備些資料

使用 sysbench 建立一個表 sbtest1 並插入1W資料。

sysbench ./oltp_insert.lua --mysql-host=10.186.60.3 --mysql-port=2883 --mysql-db=sysbenchdb --mysql-user="sysbench@sysbench_tenant"  --mysql-password=sysbench --tables=1 --table_size=10000 --threads=1 --time=600 --report-interval=10 --db-driver=mysql --db-ps-mode=disable --skip-trx=on --mysql-ignore-errors=6002,6004,4012,2013,4016,1062,5157,4038 prepare

這裡改寫了 sysbench 的建表語句,分了3個區,查詢 sbtest1 表分割槽副本分佈如下

MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta  inner join __all_tenant tenant  on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab  on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by  tenant.tenant_name,zone, svr_ip,svr_port, 5 order by  tenant.tenant_name, zone, svr_ip, role desc;
+-----------------+-------+--------------+----------+----------+---------------+
| tenant_name     | zone  | svr_ip       | svr_port | role     | partition_cnt |
+-----------------+-------+--------------+----------+----------+---------------+
| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | leader   |             1 |
| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | follower |             2 |
| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | leader   |             1 |
| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | follower |             2 |
| sysbench_tenant | zone3 | 10.186.64.79 |     2882 | leader   |             1 |
| sysbench_tenant | zone3 | 10.186.64.79 |     2882 | follower |             2 |
+-----------------+-------+--------------+----------+----------+---------------+

開始實驗

使用 sysbench 持續寫入資料,維持一定的流量,便於在節點重建後對比各節點資料是否一致。

sysbench ./oltp_insert.lua --mysql-host=10.186.60.3 --mysql-port=2883 --mysql-db=sysbenchdb --mysql-user="sysbench@sysbench_tenant"  --mysql-password=sysbench --tables=1 --table_size=10000 --threads=1 --time=300 --report-interval=10 --db-driver=mysql --db-ps-mode=disable --skip-trx=on --mysql-ignore-errors=6002,6004,4012,2013,4016,1062,5157,4038 run

刪除某節點的資料檔案

選擇zone3下的10.186.64.79節點,將資料檔案刪除。

[root@localhost data]# rm -rf 1/sstable/block_file
[root@localhost data]# cd 1/sstable/
[root@localhost sstable]# ll
total 0

永久下線故障節點

1.調小引數 server_permanent_offline_time ,縮短節點永久下線時間

server_permanent_offline_time 預設值為3600s

MySQL [oceanbase]> alter system set server_permanent_offline_time='60s';
Query OK, 0 rows affected (0.030 sec)
 
MySQL [oceanbase]> SHOW PARAMETERS LIKE "%server_permanent_offline_time%";
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
| zone  | svr_type | svr_ip       | svr_port | name                          | data_type | value | info                                                                                                                              | section      | scope   | source  | edit_level        |
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
| zone3 | observer | 10.186.64.79 |     2882 | server_permanent_offline_time | NULL      | 60s   | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone1 | observer | 10.186.64.74 |     2882 | server_permanent_offline_time | NULL      | 60s   | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone2 | observer | 10.186.64.75 |     2882 | server_permanent_offline_time | NULL      | 60s   | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+

2.停止故障節點對外提供服務

在kill ob程式前,建議使用隔離(ISOLATE SERVER)或者停止(STOP SERVER)節點的命令,停掉髮往該節點的請求,轉移副本leader角色。在節點重建恢復後,再開啟流量。

# 停掉79節點服務
MySQL [oceanbase]> ALTER SYSTEM STOP SERVER '10.186.64.79:2882' ZONE='zone3';
  
# 或者隔離
ALTER SYSTEM ISOLATE SERVER '10.186.64.79:2882' ZONE='zone3';

3.kill observer程式

執行kill -9 $observer_pid,等待 server_permanent_offline_time 的時間,該ob進入"永久下線”狀態。判斷ob是否已經永久下線,可以查詢表 __all_rootservice_event_history,存在名為 "permanent_offline "的event記錄,確認時間和ip都一致後,即可認為ob已經永久下線。

MySQL [oceanbase]> select * from __all_rootservice_event_history where event='permanent_offline' ; 

+----------------------------+--------+-------------------+--------+---------------------+-------+--------+-------+--------+-------+--------+-------+--------+-------+--------+------------+--------------+-------------+
| gmt_create                 | module | event             | name1  | value1              | name2 | value2 | name3 | value3 | name4 | value4 | name5 | value5 | name6 | value6 | extra_info | rs_svr_ip    | rs_svr_port |
+----------------------------+--------+-------------------+--------+---------------------+-------+--------+-------+--------+-------+--------+-------+--------+-------+--------+------------+--------------+-------------+
| 2023-03-29 17:34:09.596035 | server | permanent_offline | server | "10.186.64.79:2882" |       |        |       |        |       |        |       |        |       |        |            | 10.186.64.74 |        2882 |
+----------------------------+--------+-------------------+--------+---------------------+-------+--------+-------+--------+-------+--------+-------+--------+-------+--------+------------+--------------+-------------+

查詢分割槽副本分佈如下,已不存在79節點的分割槽副本資訊,進一步確認了79節點已永久下線。

zone2下的75節點有一個從副本升級為leader角色,此時叢集仍然可以繼續對外服務。

MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta  inner join __all_tenant tenant  on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab  on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by  tenant.tenant_name,zone, svr_ip,svr_port, 5 order by  tenant.tenant_name, zone, svr_ip, role desc;
+-----------------+-------+--------------+----------+----------+---------------+
| tenant_name     | zone  | svr_ip       | svr_port | role     | partition_cnt |
+-----------------+-------+--------------+----------+----------+---------------+
| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | leader   |             1 |
| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | follower |             2 |
| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | leader   |             2 |
| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | follower |             1 |
+-----------------+-------+--------------+----------+----------+---------------+
4 rows in set (0.005 sec)

拉起故障節點,觸發資料自動重建

1.啟動79節點的ob程式,程式啟動後會自動觸發重建。

注:防止ob啟動失敗或存在其他問題,建議啟動前將資料檔案和事務日誌均清空。


[root@localhost data]# rm -rf log1/clog/*
[root@localhost data]# rm -rf log1/ilog/*
[root@localhost data]# rm -rf log1/slog/*
[root@localhost data]# rm -rf 1/sstable/block_file
[root@localhost data]# cd 1/sstable/
[root@localhost sstable]# ll
total 0
[root@localhost sstable]# su admin
bash-4.2$ cd /home/admin/ && ./bin/observer
./bin/observer

程式啟動後,確認ob心跳恢復狀態為active,然後檢視分割槽正在不斷補足中

MySQL [oceanbase]> select svr_ip,zone,with_rootserver,status,stop_time,start_service_time,build_version from __all_server;
+--------------+-------+-----------------+--------+-----------+--------------------+----------------------------------------------------------------------------------------+
| svr_ip       | zone  | with_rootserver | status | stop_time        | start_service_time | build_version                                                                          |
+--------------+-------+-----------------+--------+-----------+--------------------+----------------------------------------------------------------------------------------+
| 10.186.64.74 | zone1 |               1 | active |                0 |   1679984798650860 | 3.1.2_10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d(Dec 30 2021 02:47:29) |
| 10.186.64.75 | zone2 |               0 | active |                0 |   1679984801289281 | 3.1.2_10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d(Dec 30 2021 02:47:29) |
| 10.186.64.79 | zone3 |               0 | active | 1680082329964975 |   1680082511964975 | 3.1.2_10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d(Dec 30 2021 02:47:29) |
+--------------+-------+-----------------+--------+-----------+--------------------+----------------------------------------------------------------------------------------+
3 rows in set (0.002 sec)
 
MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;
+----------+-------+
| count(*) | zone  |
+----------+-------+
|     1322 | zone1 |
|     1322 | zone2 |
|      152 | zone3 |
+----------+-------+
3 rows in set (0.228 sec)
  
MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;
+----------+-------+
| count(*) | zone  |
+----------+-------+
|     1322 | zone1 |
|     1322 | zone2 |
|      664 | zone3 |
+----------+-------+
3 rows in set (0.113 sec)
MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;                                                
+----------+-------+
| count(*) | zone  |
+----------+-------+
|     1322 | zone1 |
|     1322 | zone2 |
|     1179 | zone3 |
+----------+-------+
3 rows in set (0.112 sec)
  
MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;
+----------+-------+
| count(*) | zone  |
+----------+-------+
|     1322 | zone1 |
|     1322 | zone2 |
|     1322 | zone3 |
+----------+-------+
3 rows in set (0.116 sec)

當3個zone內的分割槽個數一致後,同時檢視zone3已存在副本資訊,認為重建完畢。

由於79節點處於隔離狀態,所以還沒有leader副本。

MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta  inner join __all_tenant tenant  on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab  on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by  tenant.tenant_name,zone, svr_ip,svr_port, 5 order by  tenant.tenant_name, zone, svr_ip, role desc;
+-----------------+-------+--------------+----------+----------+---------------+
| tenant_name     | zone  | svr_ip       | svr_port | role     | partition_cnt |
+-----------------+-------+--------------+----------+----------+---------------+
| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | leader   |             1 |
| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | follower |             2 |
| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | leader   |             2 |
| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | follower |             1 |
| sysbench_tenant | zone3 | 10.186.64.79 |     2882 | follower |             3 |
+-----------------+-------+--------------+----------+----------+---------------+
6 rows in set (0.005 sec)

2.開啟故障節點服務

執行命令解除79節點的隔離狀態。

ALTER SYSTEM START SERVER '10.186.64.79:2882' ZONE='zone3';

查詢分割槽副本分佈如下,leader角色已遷回79節點。

MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta  inner join __all_tenant tenant  on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab  on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by  tenant.tenant_name,zone, svr_ip,svr_port, 5 order by  tenant.tenant_name, zone, svr_ip, role desc;
+-----------------+-------+--------------+----------+----------+---------------+
| tenant_name     | zone  | svr_ip       | svr_port | role     | partition_cnt |
+-----------------+-------+--------------+----------+----------+---------------+
| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | leader   |             1 |
| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | follower |             2 |
| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | leader   |             1 |
| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | follower |             2 |
| sysbench_tenant | zone3 | 10.186.64.79 |     2882 | leader   |             1 |
| sysbench_tenant | zone3 | 10.186.64.79 |     2882 | follower |             2 |
+-----------------+-------+--------------+----------+----------+---------------+

3.把server_permanent_offline_time引數的預知重新設定為預設的3600s

MySQL [oceanbase]> alter system set server_permanent_offline_time='3600s';
Query OK, 0 rows affected (0.028 sec)
 
MySQL [oceanbase]> SHOW PARAMETERS LIKE "%server_permanent_offline_time%";
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
| zone  | svr_type | svr_ip       | svr_port | name                          | data_type | value | info                                                                                                                              | section      | scope   | source  | edit_level        |
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
| zone2 | observer | 10.186.64.75 |     2882 | server_permanent_offline_time | NULL      | 3600s | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone1 | observer | 10.186.64.74 |     2882 | server_permanent_offline_time | NULL      | 3600s | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone3 | observer | 10.186.64.79 |     2882 | server_permanent_offline_time | NULL      | 3600s | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
3 rows in set (0.007 sec)

校驗各ob節點資料量

sysbench已執行結束,直連各observer,校驗資料量是一致的。

[root@localhost ~]#  obclient -h10.186.64.74 -P2881 -usysbench@sysbench_tenant -Dsysbenchdb -A -psysbench
Welcome to the OceanBase.  Commands end with ; or \g.
Your MySQL connection id is 3221545401
Server version: 5.7.25 OceanBase 3.1.2 (r10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d) (Built Dec 30 2021 02:47:29)
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MySQL [sysbenchdb]> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
|    53195 |
+----------+
1 row in set (0.036 sec)
 
MySQL [sysbenchdb]> exit
Bye
[root@localhost ~]#  obclient -h10.186.64.75 -P2881 -usysbench@sysbench_tenant -Dsysbenchdb -A -psysbench
Welcome to the OceanBase.  Commands end with ; or \g.
Your MySQL connection id is 3221823448
Server version: 5.7.25 OceanBase 3.1.2 (r10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d) (Built Dec 30 2021 02:47:29)
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MySQL [sysbenchdb]> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
|    53195 |
+----------+
1 row in set (0.040 sec)
 
MySQL [sysbenchdb]> exit
Bye
[root@localhost ~]#  obclient -h10.186.64.79 -P2881 -usysbench@sysbench_tenant -Dsysbenchdb -A -psysbench
Welcome to the OceanBase.  Commands end with ; or \g.
Your MySQL connection id is 3222011907
Server version: 5.7.25 OceanBase 3.1.2 (r10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d) (Built Dec 30 2021 02:47:29)
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MySQL [sysbenchdb]> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
|    53195 |
+----------+
1 row in set (0.037 sec)
 
MySQL [sysbenchdb]>

總結

資料檔案損壞或者丟失時,可透過調整引數 server_permanent_offline_time 來重建受影響的節點。

1.設小 server_permanent_offline_time 閾值

2.停止故障節點對外服務

3.終止該節點程式。

4.超過閾值後,節點將被標記為永久下線,系統會自動清空副本以及向同zone內其他節點遷移資料。

5.啟動 OB 程式,自動觸發重建節點資料。

6.開啟故障節點服務。

7.把server_permanent_offline_time引數改回原來的值

相關文章