https://docs.microsoft.com/en-us/archive/blogs/sql_server_team/sql-server-20162017-availability-group-secondary-replica-redo-model-and-performance

sqlserver2016預設啟動並行

禁用並行2種方法
1、DBCC TRACEON (3459, -1)，不用重啟sqlserver服務
2、-T 3459引數加入sqlserver啟動引數

禁用序列再啟用並行，必須重啟sqlserver
如果有-T 3459引數，則移除該引數，再重啟sqlserver
如果是DBCC TRACEON (3459, -1)，則直接重啟sqlserver

現象：wondadb3上有20來個資料庫做了AG到dbprod126，dbprod126只有24個CPU，發現wondadb3.wondb到dbprod126.wondb的AG傳輸很慢，把所有資料庫移出AG，再重新加入，加入的過程中第一個先加入wondb，在加入其它所有資料庫，發現wondadb3.wondb到dbprod126.wondb的AG傳輸快了很多，發現sqlserver2016的並行設計上有缺陷，資料庫數量一多，排後面的幾個資料庫就只能預設使用序列

官方文件的解釋：
Parallel redo thread usage is well covered in "Thread usage by Availability Groups" here.

A SQL Server instance uses up to 100 threads for parallel redo for secondary replicas. Each database uses up to one-half of the total number of CPU cores, but not more than 16 threads per database. If the total number of required threads for a single instance exceeds 100, SQL Server uses a single redo thread for every remaining database.

When the host server has 32 or more CPU cores, each database will occupy 16 parallel redo worker threads and one helper worker thread. It means that all databases starting with the 7th database (ordered by database id ascending) that has joined availability group it will be in single thread redo or serial redo irrespective which database has actual redo workload. If a SQL Server Instance has a number of databases, and it is desired for a particular database to run under parallel redo model, the database creation order needs to be considered. Same idea can be
applied to force a database always runs in serial redo model as well. Again, in SQL Server instance level, the way to switch between parallel redo and serial redo is the TF 3459. All databases in the same SQL Server instance will be switched together. Also, to switch from serial redo to parallel redo by disable TF 3459, a SQL Server service restart is required

一個資料庫最高併發執行緒數量的計算
一個AG從節點例項最多有100個並行的redo重做執行緒，但是每個資料庫最多隻有一半的cpu資料的執行緒，但是最多又不能超過16個，假如一臺伺服器24個cpu，20個資料庫加入了AG，則使用每個資料庫的最多不能超過8(100/12=8)

實踐得出結論
1、此處的加入AG的順序，不是代表資料第一次加進AG的順序，而是資料庫啟動後資料庫加進AG的順序。
假如10個庫，名稱分別是1-10，1開始1號庫是用上了並行，但是1號庫產生的日誌特別大，某天突然從庫重啟後，從庫上的1號庫要花很長時間來恢復無法及時進入Synchronizing狀態，這個時候其他資料庫正常恢復完了馬上進入了Synchronizing，1號庫又排在後面用不上並行
2、一個庫同步很慢，沒有用上並行，而且同步佇列還有100GB待同步，同步速度只有50M每秒，還需要1小時才能同步完，這個時候可以在主節點把其他庫包括它自己都移出AG，移除後，從節點看到已經同步完的資料庫的的狀態是restoring，沒有同步完的資料庫的狀態會是not Synchronizing，等1小時候not Synchronizing變成restoring後，再新增進AG，新增的過程會很快，且這個庫的狀態會是(Initializing/In recovery)，但是在not Synchronizing到restoring這1小時期間主節點生成的日誌需要同步到從節點，這1小時期間生成的日誌需要30分鐘才能同步到從節點，但是此時因為這個庫以第一的順序加入了AG，使用了並行，比如並行度為8，這個(Initializing/In recovery)也會很快完成，只需要4分鐘(30分鐘/8)左右就能完成

3、檢視一個AG庫有沒有用上並行，查詢sys.sysprocesses該庫對應的cmd自動是否為"PARALLEL REDO TA"即可,為"PARALLEL REDO TA"，說明用上了並行

Sqlserver2016啟用了日誌並行，但是實際上某些資料庫日誌並行並沒有生效的問題

相關文章