Availability Group On Linux 搭建後記
create availability group [InvestorAG]
with (db_failover=on,cluster_type=external)
for replica on
N'192.168.1.10'
with (
endpoint_url = N'tcp://192.168.1.10:5022',
availability_mode=synchronous_commit,
failover_mode=external,
seeding_mode=automatic
),
N'192.168.1.11'
with (
endpoint_url = N'tcp://192.168.1.11:5022',
availability_mode=synchronous_commit,
failover_mode=external,
seeding_mode=automatic
),
N'192.168.1.12'
with (
endpoint_url = N'tcp://192.168.1.12:5022',
availability_mode=synchronous_commit,
failover_mode=external,
seeding_mode=automatic
);
alter availability group [InvestorAG]
grant create any database ;
使用這份指令碼來搭建 Availability Group 的時候,會爆出如下的錯誤:
訊息 35237,級別 16,狀態 1,第 1 行
None of the specified replicas for availability group InvestorAG maps to the instance of SQL Server to which you are connected. Reenter the command, specifying this server instance to host one of the replicas. This replica will be the initial primary replica.
使用如下命令即可:
create availability group [InvestorAG]
with (db_failover=on,cluster_type=external)
for replica on
N'centos00'
with (
endpoint_url = N'tcp://192.168.1.10:5022',
availability_mode=synchronous_commit,
failover_mode=external,
seeding_mode=automatic
),
N'centos01'
with (
endpoint_url = N'tcp://192.168.1.11:5022',
availability_mode=synchronous_commit,
failover_mode=external,
seeding_mode=automatic
),
N'centos02'
with (
endpoint_url = N'tcp://192.168.1.12:5022',
availability_mode=synchronous_commit,
failover_mode=external,
seeding_mode=automatic
);
新建表以及裝載資料
create table dbo.DimTime(TimeID bigint, DayX int, FinMonth int, FinYear Int, FinDate datetime)
GO
Insert into DimTime(TimeID,DayX,FinMonth,FinYear,FinDate) values(1,1,1,2018,'2018-01-01')
go
SELECT * FROM DimTime
GO
如果 SQL Server 再聰明一點,這一步已經足夠讓其他的副本也啟動起來了, 但遺憾的是,我去檢查 192.168.1.11 和 192.168.1.12 什麼都沒有發生。
因為我並沒有給新建的 Availability Group 新增同步的資料庫 Investor, 只是AG 的名字用了 InvestorAG, 並沒有同時指定資料庫。
通過檢查新建的 hadr session 和內建的 AlwaysOn_health session, 倒是有點收穫:
-- 新建一個 all-in 的 session
CREATE EVENT SESSION [All_AlwaysOn] ON SERVER
ADD EVENT sqlserver.alwayson_ddl_executed,
ADD EVENT sqlserver.availability_group_lease_expired(
ACTION(package0.last_error,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.is_system,sqlserver.nt_username,sqlserver.server_instance_name,sqlserver.session_nt_username,sqlserver.session_server_principal_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.availability_replica_automatic_failover_validation(
ACTION(package0.last_error,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.is_system,sqlserver.nt_username,sqlserver.server_instance_name,sqlserver.session_nt_username,sqlserver.session_server_principal_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.availability_replica_database_fault_reporting(
ACTION(package0.last_error,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.is_system,sqlserver.nt_username,sqlserver.server_instance_name,sqlserver.session_nt_username,sqlserver.session_server_principal_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.availability_replica_manager_state_change(
ACTION(package0.last_error,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.is_system,sqlserver.nt_username,sqlserver.server_instance_name,sqlserver.session_nt_username,sqlserver.session_server_principal_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.availability_replica_state(
ACTION(package0.last_error,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.is_system,sqlserver.nt_username,sqlserver.server_instance_name,sqlserver.session_nt_username,sqlserver.session_server_principal_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.availability_replica_state_change(
ACTION(package0.last_error,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.is_system,sqlserver.nt_username,sqlserver.server_instance_name,sqlserver.session_nt_username,sqlserver.session_server_principal_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.hadr_ddl_failover_execution_state(
ACTION(package0.last_error,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.is_system,sqlserver.nt_username,sqlserver.server_instance_name,sqlserver.session_nt_username,sqlserver.session_server_principal_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.hadr_read_only_route_preconditions(
ACTION(package0.last_error,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.is_system,sqlserver.nt_username,sqlserver.server_instance_name,sqlserver.session_nt_username,sqlserver.session_server_principal_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.hadr_undo_of_redo_log_scan(
ACTION(package0.last_error,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.is_system,sqlserver.nt_username,sqlserver.server_instance_name,sqlserver.session_nt_username,sqlserver.session_server_principal_name,sqlserver.sql_text,sqlserver.username))
ADD TARGET package0.event_file(SET filename=N'All_AlwaysOn')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO
-- 內建的 AlwaysOn_health session
CREATE EVENT SESSION [AlwaysOn_health] ON SERVER
ADD EVENT sqlserver.alwayson_ddl_executed,
ADD EVENT sqlserver.availability_group_lease_expired,
ADD EVENT sqlserver.availability_replica_automatic_failover_validation,
ADD EVENT sqlserver.availability_replica_manager_state_change,
ADD EVENT sqlserver.availability_replica_state,
ADD EVENT sqlserver.availability_replica_state_change,
ADD EVENT sqlserver.error_reported(
WHERE ([error_number]=(9691) OR [error_number]=(35204) OR [error_number]=(9693) OR [error_number]=(26024) OR [error_number]=(28047) OR [error_number]=(26023) OR [error_number]=(9692) OR [error_number]=(28034) OR [error_number]=(28036) OR [error_number]=(28048) OR [error_number]=(28080) OR [error_number]=(28091) OR [error_number]=(26022) OR [error_number]=(9642) OR [error_number]=(35201) OR [error_number]=(35202) OR [error_number]=(35206) OR [error_number]=(35207) OR [error_number]=(26069) OR [error_number]=(26070) OR [error_number]>(41047) AND [error_number]<(41056) OR [error_number]=(41142) OR [error_number]=(41144) OR [error_number]=(1480) OR [error_number]=(823) OR [error_number]=(824) OR [error_number]=(829) OR [error_number]=(35264) OR [error_number]=(35265) OR [error_number]=(41188) OR [error_number]=(41189))),
ADD EVENT sqlserver.hadr_db_partner_set_sync_state,
ADD EVENT sqlserver.lock_redo_blocked
ADD TARGET package0.event_file(SET filename=N'AlwaysOn_health.xel',max_file_size=(5),max_rollover_files=(4))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO
定義中唯一的區別就是 sqlserver.error_reported 事件,在 AlwaysOn_health 定義中新增了這份事件,而在 All_AlwaysOn 自定義中卻沒有。導致的結局是 AlwaysOn_health 提供了一些蛛絲馬跡可供調查:
message A connection timeout has occurred while attempting to establish a connection to availability replica ‘centos01’ with id [D42B7002-C59F-47A3-9E39-7980F399AC91]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.
通過執行命令:
select * from sys.dm_hadr_availability_replica_states
發現 D42B7002-C59F-47A3-9E39-7980F399AC91 在結果集中是脫線的,disconnected.
登入 centos01 ,執行下列命令:
select * from sys.dm_hadr_availability_replica_states
select * from sys.dm_hadr_availability_replica_cluster_states
select * from sys.dm_hadr_availability_replica_cluster_nodes
select * from sys.dm_hadr_availability_group_states
可知,centos01 上沒有新增到 InvestorAG 中來的 replica, group 和 replica 唯一確定了參與 Availability Group 的成員,即 group_id 與 replica_id 是組合唯一鍵。
在每臺從庫上執行下面命令:
ALTER AVAILABILITY GROUP [InvestorAG]
Join WITH(cluster_type=external);
ALTER AVAILABILITY GROUP [InvestorAG]
Grant Create Any Database ;
在 AlwaysOn_health 會話中,找到如下錯誤:
message A connection for availability group ‘InvestorAG’ from availability replica ‘centos01’ with id [D42B7002-C59F-47A3-9E39-7980F399AC91] to ‘centos00’ with id [F7830D02-7E2E-4494-82FD-D1CC700CA108] has been successfully established. This is an informational message only. No user action is required.
但這個錯誤顯然是讓人高興的。因為再次去主庫上查詢狀態時,centos01 從庫已經上線了。
但此時 Investor 資料庫還沒有完全上線,即從庫 centos01 還沒有接收到主庫同步過來的 Investor 資料庫。因為我還沒有把資料庫 Investor 加入到 InvestorAG 中來。
需在 master 為當前資料庫的情況下,執行新增 AG 資料庫的操作,否則有這麼個錯誤:
訊息 35208,級別 16,狀態 241,第 32 行
Availability-group DDL operations are permitted only when you are using the master database. Run the USE MASTER command, and retry your availability-group DDL command.
use master
go
alter availability group InvestorAG
add database investor
看上去很簡單, 我認為此時資料庫已經同步到每臺從庫上了,且每臺從庫也都有建立這個庫了,但是並不能訪問
訊息
The target database, ‘investor’, is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group. For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online.
似乎在這裡我忘記了這麼一個概念:高可用架構保障的就是當前叢集中只有一臺伺服器在提供者某一種服務,而當前配置的 Availability Group 架構是三臺實時同步的伺服器,提供了某一個資料庫的7*24不發生故障,持續提供服務的功能。如果需要某一臺副本資料庫可供訪問,那麼配置兩臺實時同步,另一臺只讀,這樣便可提供讀寫分離的服務。
所以還是要思考兩個問題:
1 在高可用的叢集中,衍生出一臺來做讀寫分離,另外兩臺提供高可用服務
2 實現高可用的故障轉移,即主庫失效則自動切換到從庫,以提供持續服務
第一個問題,在配置的時候,只要選擇可供只讀訪問:
create availability group sap
with (db_failover=on,cluster_type=external)
for replica on
N'centos00'
with (
endpoint_url = N'tcp://192.168.1.10:5022',
availability_mode=synchronous_commit,
failover_mode=external,
seeding_mode=automatic
),
N'centos01'
with (
endpoint_url = N'tcp://192.168.1.11:5022',
availability_mode=synchronous_commit,
failover_mode=external,
seeding_mode=automatic,
secondary_role (
allow_connections = read_only
)
),
N'centos02'
with (
endpoint_url = N'tcp://192.168.1.12:5022',
availability_mode=synchronous_commit,
failover_mode=external,
seeding_mode=automatic,
secondary_role (
allow_connections = read_only
)
);
當下最重要的話題,先配置三臺實時同步的高可用叢集,實現自動化故障轉移。
有兩篇外文,值得參考:
https://blog.dbi-services.com/introducing-alwayson-availability-groups-on-linux/
https://blog.dbi-services.com/introducing-high-availability-with-sql-server-on-linux/
在之前第一遍嘗試搭建 Availability Groups 過程中,啟動切換故障的主庫不成功,要分析原因,就要做一些測試點:
[root@centos00 log]# pcs status
Cluster name: crmcluster
Stack: corosync
Current DC: centos00 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition WITHOUT quorum
Last updated: Sun Aug 26 15:26:43 2018
Last change: Tue Aug 21 23:33:46 2018 by root via cibadmin on centos00
3 nodes configured
4 resources configured
Online: [ centos00 ]
OFFLINE: [ centos01 centos02 ]
Full list of resources:
Master/Slave Set: crmag_cluster-master [crmag_cluster]
Stopped: [ centos00 centos01 centos02 ]
virtualip (ocf::heartbeat:IPaddr2): Stopped
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
通過 pcs status 命令看到部分原因了: Master/Slave set 和 virtualip 都停止了。
[root@centos02 ~]# pcs status
Cluster name: crmcluster
Stack: corosync
Current DC: centos00 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Wed Aug 29 06:15:23 2018
Last change: Wed Aug 29 06:15:01 2018 via centos02 on centos00
3 nodes configured
4 resources configured
Online: [ centos00 centos01 centos02 ]
Full list of resources:
Master/Slave Set: crmag_cluster-master [crmag_cluster]
Masters: [ centos01 ]
Slaves: [ centos00 centos02 ]
virtualip (ocf::heartbeat:IPaddr2): Started centos01
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
在每一臺伺服器上開啟叢集服務,即可啟動整個叢集。
pcs cluster start
pcs cluster enable
有趣的是,誤打誤撞開啟了本該是從庫的伺服器為叢集之首(master), 結果訪問 virtual ip (192.168.1.15) 的時候,連上的是 192.168.1.11,並不能開啟可讀可寫的 Investor 資料庫,該資料庫才是我們加入 availability group,保障高可用的目標。
此時我們在 192.168.1.15 的例項上面新建一個資料庫,可以看到該資料庫在 192.168.1.12 上也同時建立了。
關閉 centos02 的叢集,再新建一個資料庫,看看如何:
pcs cluster stop
新建的資料庫,並不能同步到 192.168.1.12 的例項中了。
但卻無意中,將原本一直連線不同的 crm 資料庫給同步起來了。
這樣的話,要測試是否自動切換就能簡單了,直接斷掉 192.168.1.11 (現在是 cluster mater ).
[root@centos01 ~]# pcs cluster stop
Error: Stopping the node will cause a loss of the quorum, use --force to override
[root@centos02 ~]# pcs status
Cluster name: crmcluster
Stack: corosync
Current DC: centos00 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Wed Aug 29 07:11:01 2018
Last change: Tue Aug 21 23:33:46 2018 by root via cibadmin on centos00
3 nodes configured
4 resources configured
Online: [ centos00 centos02 ]
OFFLINE: [ centos01 ]
Full list of resources:
Master/Slave Set: crmag_cluster-master [crmag_cluster]
Masters: [ centos00 ]
Slaves: [ centos02 ]
Stopped: [ centos01 ]
virtualip (ocf::heartbeat:IPaddr2): Started centos00
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
自動切換實現了
所有的連線如果指向的是 192.168.1.15 那麼背後的 sql server instance 自動切換,都是沒有影響的。借用《子彈飛》裡劉嘉玲的那句話,“我只想當縣長夫人,至於縣長是誰,那並不重要!”。
相關文章
- Linux 上配置 SQL Server Always On Availability GroupLinuxSQLServerAI
- Sql Server Linux(Redhat) Distributed Availability Group Setup — step by stepSQLServerLinuxRedhatAI
- SQL Server Availability Group Failover 測試SQLServerAI
- MYSQL Group Replication搭建過程記錄MySql
- [AlwaysOn2017] AlwaysOn的DMV和DMF - Sys.availability_group_listenersAI
- 【Mysql】MySQL5.7.17- Group Replication搭建MySql
- [AlwaysOn] 建立SQL Server高可用性組T-SQL語法:AVAILABILITY GROUP ON子句SQLServerAI
- [AlwaysOn2017] AlwaysOn的DMV和DMF - Sys.dm_hadr_availability_group_statesAI
- MGR(MySQL Group Replication)部署搭建測試MySql
- MySQL-HA MHA(Master High Availability manager)搭建手冊MySqlASTAI
- [AlwaysOn2017] AlwaysOn的DMV和DMF -Sys.availability_group_listener_ip_addressesAI
- Overview of Availability in a CDBViewAI
- Availability Check 概念AI
- 分分鐘搭建MySQL Group Replication測試環境(r11筆記第83天)MySql筆記
- MySQL 5.7.17 Group Relication(組複製)搭建手冊MySql
- Linux下TimesTen主備搭建、重建cache group、重建備機操作[TimesTen運維]Linux運維
- Linux 搭建互信後,仍需要密碼驗證Linux密碼
- 分分鐘搭建MySQL Group Replication測試環境(二)(r12筆記第41天)MySql筆記
- Linux筆記01:概述以及環境搭建Linux筆記
- Linux中建立group是什麼意思?Linux
- SCM600--Availability CheckAI
- vgchange -c y後的【Skipping clustered volume group 】GC
- VPS折騰記七搭建稍後閱讀應用
- 配置Mysql Group Replication遇到的問題筆記MySql筆記
- Performance and High-Availability OptionsORMAI
- oracle之dataguard switch_availabilityOracleAI
- High Availability (HA) in SQL ServerAISQLServer
- 【LINUX】LINUX PHP搭建LinuxPHP
- [MySQL] 分組排序取前N條記錄以及生成自動數字序列,類似group by後 limitMySql排序MIT
- Linux 學習筆記(二):搭建個人Git伺服器Linux筆記Git伺服器
- linux平臺開發自學筆記(一)-環境搭建Linux筆記
- Linux搗鼓記錄:快速搭建alist+aria2+qbittorrentLinux
- sql中的left切割 與 group by後的兩次分組SQL
- (二)《SQL進階教程》學習記錄--GROUP BY、PARTITION BYSQL
- Django筆記十七之group by 分組用法總結Django筆記
- linux搭建FabricLinux
- Linux OpenGrok搭建Linux
- Availability and Optimization of Free Space in a Data Block(五)AIBloC