Best Practices for failover during server failures [ID 1323472.1]
Applies to:
Oracle GoldenGate - Version: 10.0.0.0 to 11.1.1.0.0 - Release: 10.0.0 to 11.1.1Information in this document applies to any platform.
Goal
This document attempts to explain the following high availability scenario and
provides best practice suggestions:
Simplified scenario:
Machine A is primary, B is standby
and C is replicated via GoldenGate to A.
-Extract process run on C
-Datapump process run on C
-Replicat process run on A
The datapump process on C always has a trail file on A.
Therefor, when a fail-over from A to B happens, the trail file that C uses is
no longer there, hence the Datapump hangs.
Even if Golden Gate is restarted on
B, the original trail from A is not there anymore.
How to address this situation?
Can we simply recopy the trail file from A to B and then start GoldenGate?
What if the machine is no longer accessible? A anymore so the trail file is
lost?
Solution
Can we simply recopy the trail file from A to B and then start Golden Gate? Yes.
In fact, some
customers who do all their processing at night write all their trails on the local
machine, then after hours they zip them and ftp them to the target. They unzip
on the target and start the replicat at the appropriate spot.
Writing the trail locally first is always best. The trail on A could be lost if
A loses it somehow, but if you have used minkeep on C then the trail is still
there and can be moved anywhere else
We also suggest clients write their first trails locally and have enough
space for a minkeep of 4 days. That way, if you lose connection or the target
node over a 3 day weekend, there is still one day to move the files elsewhere
and nothing is lost
Here is a more detailed step by step:
Shutdown A (assuming A is no longer accessible).
1- On C, run ggsci. Find what trail the datapump on C was writing to A
ggsci> info exttrail *, detail
1- 2- Zip the trail, ftp it to B, unzip it, possibly replacing an existing file of the same name.
(It should be more complete and not risk being damaged by the cause of the abort).
2- 3- add HANDLECOLLISIONS to the replicat param file
3-
4- Alter replicat to read from RBA 0 of that trail you
just copied over--
process that trail with the replicat to end.
5- Then, on C, stop the datapump extract. Add a new rmttrail into the datapump extract parameters to be written to B. Comment out the old trail.
5- 6- Change the rmthost to point to B.
6- 7- do an add rmttrail command in ggsci to associate the extract datapump on C with the to be written trail on B
7-
8- Start the extract datapump on C. It is now
writing the new trail name, extseqno 0, extrba 0, to B.
on B,stop the replicat
8- 9- On B, alter the replicat to read from the newly written trail name at extseqno 0, extrba 0 and start it.
9- 10- After a few minutes of running, stop the replicat.
11-
Remove handlecollisions and restart the
replicat.
This presumes that once A is down, its down. If you can still run ggsci on A,
it makes things simpler because you can check replicat's checkpoints.
This would allow you to be sure about where to restart.
If B contains the replicat checkpoints current from A when the abend happens,
you can just replace the existing trail file on B with the zipped one from C.
The one on C might have more data that would otherwise be missed.
This also assumes that replicat runs current or near current. If there is a
long lag, you may need another trail file that was not processed.
This procedure will work but is also dependent on how much you know about the
state of the replicat lag and checkpoint. If you understand how these things
work, you could safely fail this over. Otherwise, in a production scenario, get
help from OGG support so they can be sure no data is missed and the least
amount of data possible is reprocessed.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/161195/viewspace-1056336/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Data Guard Switchover and Failover Best PracticesAI
- 微軟推出Microsoft Exchange Server Best Practices Analyzer Tool微軟ROSServer
- Mobile Web Best Practices 1.0Web
- Microsoft® SQL Server® 2008 R2 Best Practices AnalyzerROSSQLServer
- 轉享:Architecting for the Cloud: Best PracticesCloud
- Best Practices for Speeding up Your Web SiteWeb
- RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices (Generic) [IDOracle
- java單元測試:unit testing best practicesJava
- RAC and Oracle Clusterware Best Practices and Starter Kit (Solaris)_811280.1Oracle
- RAC and Oracle Clusterware Best Practices and Starter Kit (Windows)_811271.1OracleWindows
- RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)_811293.1OracleAI
- Best Practices and Recommendations for RAC databases with SGA size over 100GBDatabase
- RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)_811306.1OracleLinux
- RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX)_811303.1OracleUX
- RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices (Generic)Oracle
- redis_failover - Automatic Redis Failover Client/ServerRedisAIclientServer
- How To Configure Server Side Transparent Application FailoverServerIDEAPPAI
- How to Identify Hard Parse Failures (文件 ID 1353015.1)IDEAI
- SQL Server Availability Group Failover 測試SQLServerAI
- 錯誤Error during artifact deployment. See server log for details.ErrorServerAI
- Failover 之 Client-Side Connect time Failover、Client-Side TAF、Service-Side TAFAIclientIDE
- Mysql mysql lost connection to server during query 問題解決方法MySqlServer
- mysql5.5_2013 Lost connection to Mysql server during queryMySqlServer
- angular practices-練習Angular
- PostgreSQL DBA(57) - Could not choose a best candidate operatorSQL
- How to configure Client Failover after Data Guard Switchover or Failover [ID 316740.1]clientAI
- SQL Server資料庫映象的FailOver自動連線SQLServer資料庫AI
- Error IMP-32 Obtained During Import (文件 ID 846397.1)ErrorAIImport
- mysqldump: Error 2013: Lost connection to MySQL server during query when dumpingMySqlErrorServer
- "vSphere HA virtual machine failed to failover" error in vCenter Server問題分析MacAIErrorServer
- Surviving AWS Failures with a Node.js and MongoDB StackAINode.jsMongoDB
- Oracle GoldenGate Best Practice - sample parameter files (文件 ID 1321696.1)OracleGo
- MySQL報錯ERROR 2013 (HY000): Lost connection to MySQL server during queryMySqlErrorServer
- Best Practice in Writing
- ORA-04045: errors during recompilation/revalidation of LBACSYS.LBAC_EVENTSError
- Ins-06001 During Grid Infrastructure Installation (文件 ID 1270620.1)ASTStruct
- Microsoft.Practices.EnterpriseLibrary.Data.dllROS
- Ora-12720 During Rman Duplication Of Database (文件 ID 341089.1)Database