RAC資料庫建立STANDBY(四)

yangtingkun發表於2009-05-28

這篇文章描述為RAC環境建立STANDBY資料庫。

由於篇幅限制,加上碰到了很多的bug,只能將文章拆分成多篇。

由於錯誤太多,導致一篇文章無法完全記錄下來,這一篇繼續記錄STANDBY資料庫建立過程中碰到的問題。

RAC資料庫建立STANDBY(一):http://yangtingkun.itpub.net/post/468/484988

RAC資料庫建立STANDBY(二):http://yangtingkun.itpub.net/post/468/485013

RAC資料庫建立STANDBY(三):http://yangtingkun.itpub.net/post/468/485054

 

 

剛剛在進行RAC環境的DUPLICATE DATABASE的時候,就碰到了很多問題,由於二者命令比較相似,本來認為這次不會碰到太多的問題,沒有想到的是,這次碰到的問題居然比DUPLICATE碰到的問題多出一倍。而且基本上所有碰到的問題都是DUPLICATE操作時不曾遇到的。

執行DUPLICATE DATABASE FOR STANDBY時出現下面的錯誤:

bash-3.00$ rman target sys/test@rac11g auxiliary sys/test@rac11g1_s

Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:28:54 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: RAC11G (DBID=1712482917)
connected to auxiliary database: RAC11G (not mounted)

RMAN> run
2> {
3> allocate channel c1 device type disk connect sys/test@rac11g1;
4> allocate channel c2 device type disk connect sys/test@rac11g2;
5> allocate auxiliary channel ac1 device type disk;
6> allocate auxiliary channel ac2 device type disk;
7> duplicate target database for standby
8> db_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
9> dorecover
10> from active database
11> spfile
12> parameter_value_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
13> set log_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
14> set fal_client='RAC11G_S'
15> set fal_server='RAC11G'
16> set log_archive_dest_1='LOCATION=+DATA/RAC11G VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'
17> set log_archive_dest_2='SERVICE=rac11g LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'
18> set standby_archive_dest='+DATA/RAC11G'
19> set db_unique_name='rac11g_s';
20> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=619 instance=rac11g1 device type=DISK

allocated channel: c2
channel c2: SID=119 instance=rac11g2 device type=DISK

allocated channel: ac1
channel ac1: SID=112 device type=DISK

allocated channel: ac2
channel ac2: SID=39 device type=DISK

Starting Duplicate Db at 09-SEP-08

contents of Memory Script.:
{
   backup as copy reuse
   file  '/data/oracle/product/11.1/database/dbs/orapwrac11g2' auxiliary format
 '/data/oracle/product/11.1/database/dbs/orapwrac11g1'   file
 '/dev/vx/rdsk/datavg/rac11g_spfile' auxiliary format
 '+DATA/rac11g/spfilerac11g.ora'   ;
   sql clone "alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''";
}
executing Memory Script

Starting backup at 09-SEP-08
RMAN-03009: failure of backup command on c1 channel at 09/09/2008 16:29:06
ORA-19505: failed to identify file "/data/oracle/product/11.1/database/dbs/orapwrac11g2"
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
continuing other job steps, job failed will not be re-run
released channel: c1
released channel: c2
released channel: ac1
released channel: ac2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/09/2008 16:29:08
RMAN-03015: error occurred in stored script. Memory Script

RMAN-03009: failure of backup command on c1 channel at 09/09/2008 16:29:06
ORA-19505: failed to identify file "/data/oracle/product/11.1/database/dbs/orapwrac11g2"
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3

RMAN> exit


Recovery Manager complete.

這個錯誤是由於連線源資料庫的服務名採用了RAC的總體配置,因此很可能連線到例項2上進行恢復,有兩種方法解決這個問題,一個是在任意節點上保證密碼檔案orapwrac11g1orapwrac11g2都存在,另一種方法更加簡單,可以連線源資料庫的時候指定唯一一個例項進行連線,比如這裡將RAC11G替換為RAC11G1

bash-3.00$ rman target sys/test@rac11g auxiliary sys/test@rac11g1_s

Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:31:00 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: RAC11G (DBID=1712482917)
connected to auxiliary database: RAC11G (not mounted)

RMAN> run
2> {
3> allocate channel c1 device type disk connect sys/test@rac11g1;
4> allocate channel c2 device type disk connect sys/test@rac11g2;
5> allocate auxiliary channel ac1 device type disk;
6> allocate auxiliary channel ac2 device type disk;
7> duplicate target database for standby
8> db_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
9> dorecover
10> from active database
11> spfile
12> parameter_value_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
13> set log_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
14> set fal_client='RAC11G_S'
15> set fal_server='RAC11G'
16> set log_archive_dest_1='LOCATION=+DATA/RAC11G VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'
17> set log_archive_dest_2='SERVICE=rac11g LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'
18> set standby_archive_dest='+DATA/RAC11G'
19> set db_unique_name='rac11g_s';
20> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=619 instance=rac11g1 device type=DISK

allocated channel: c2
channel c2: SID=119 instance=rac11g2 device type=DISK

allocated channel: ac1
channel ac1: SID=113 device type=DISK

allocated channel: ac2
channel ac2: SID=39 device type=DISK

Starting Duplicate Db at 09-SEP-08

contents of Memory Script.:
{
   backup as copy reuse
   file  '/data/oracle/product/11.1/database/dbs/orapwrac11g2' auxiliary format
 '/data/oracle/product/11.1/database/dbs/orapwrac11g1'   file
 '/dev/vx/rdsk/datavg/rac11g_spfile' auxiliary format
 '+DATA/rac11g/spfilerac11g.ora'   ;
   sql clone "alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''";
}
executing Memory Script

Starting backup at 09-SEP-08
Finished backup at 09-SEP-08

sql statement: alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''

contents of Memory Script.:
{
   sql clone "alter system set  log_file_name_convert =
 ''/dev/vx/rdsk/datavg'', ''+DATA/RAC11G'' comment=
 '''' scope=spfile";
   sql clone "alter system set  fal_client =
 ''RAC11G_S'' comment=
 '''' scope=spfile";
   sql clone "alter system set  fal_server =
 ''RAC11G'' comment=
 '''' scope=spfile";
   sql clone "alter system set  log_archive_dest_1 =
 ''LOCATION=+DATA/RAC11G VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'' comment=
 '''' scope=spfile";
   sql clone "alter system set  log_archive_dest_2 =
 ''SERVICE=rac11g LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'' comment=
 '''' scope=spfile";
   sql clone "alter system set  standby_archive_dest =
 ''+DATA/RAC11G'' comment=
 '''' scope=spfile";
   sql clone "alter system set  db_unique_name =
 ''rac11g_s'' comment=
 '''' scope=spfile";
   shutdown clone immediate;
   startup clone nomount ;
}
executing Memory Script

sql statement: alter system set  log_file_name_convert =  ''/dev/vx/rdsk/datavg'', ''+DATA/RAC11G'' comment= '''' scope=spfile
released channel: c1
released channel: c2
released channel: ac1
released channel: ac2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/09/2008 16:31:11
RMAN-03015: error occurred in stored script. Memory Script
RMAN-03009: failure of sql command on clone_default channel at 09/09/2008 16:31:11
RMAN-11003: failure during parse/execution of SQL statement: alter system set  log_file_name_convert =  '/dev/vx/rdsk/datavg', '+DATA/RAC11G' comment= '' scope=spfile
ORA-17510: Attempt to do i/o beyond file size
ORA-17510: Attempt to do i/o beyond file size

RMAN> exit


Recovery Manager complete.

檢查了一下,這個問題應該是ASMbug造成的。關於ORA-17510的錯誤有一些,但是沒有和當前情況相似的。

由於找不到解決這個問題的辦法,只好想辦法繞過這個bug。於是從這裡以後,採用了先在本地建立一個SPFILE的方式,將所有需要修改的引數都提前修改完成,避免在DUPLICATE命令的時候設定SPFILE引數。這樣Oracle就不會執行ALTER SYSTEM語句,也就不會導致SPFILE的動態擴充套件。

隨後利用SQLPLUS建立了本地的SPFILE,然後啟動到NOMOUNT狀態,透過RMAN連線卻報錯:

bash-3.00$ rman target sys/test@rac11g auxiliary sys/test@rac11g1_s

Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:44:09 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: RAC11G (DBID=1712482917)
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04006: error from auxiliary database: ORA-01031: insufficient privileges

這是因為RMAN剛才的恢復操作覆蓋了密碼檔案,所以這裡必須重建。從源資料庫再次複製密碼檔案到本地,再次執行:

bash-3.00$ rman target sys/test@rac11g auxiliary sys/test@rac11g1_s

Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 17:25:24 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: RAC11G (DBID=1712482917)
connected to auxiliary database: RAC11GS (not mounted)

RMAN> run
2> {
3> allocate channel c1 device type disk connect sys/test@rac11g1;
4> allocate channel c2 device type disk connect sys/test@rac11g2;
5> allocate auxiliary channel ac1 device type disk;
6> allocate auxiliary channel ac2 device type disk;
7> duplicate target database for standby
8> dorecover
9> from active database;
10> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=621 instance=rac11g1 device type=DISK

allocated channel: c2
channel c2: SID=628 instance=rac11g2 device type=DISK

allocated channel: ac1
channel ac1: SID=333 instance=rac11g1 device type=DISK

allocated channel: ac2
channel ac2: SID=306 instance=rac11g1 device type=DISK

Starting Duplicate Db at 09-SEP-08

contents of Memory Script.:
{
   backup as copy reuse
   file  '/data/oracle/product/11.1/database/dbs/orapwrac11g1' auxiliary format
 '/data/oracle/product/11.1/database/dbs/orapwrac11g1'   ;
}
executing Memory Script

Starting backup at 09-SEP-08
Finished backup at 09-SEP-08

contents of Memory Script.:
{
   backup as copy current controlfile for standby auxiliary format  '+DATA/rac11g/rac11g_control_1';
   restore clone controlfile to  '+DATA/rac11g/rac11g_control_2' from
 '+DATA/rac11g/rac11g_control_1';
   restore clone controlfile to  '+DATA/rac11g/rac11g_control_3' from
 '+DATA/rac11g/rac11g_control_1';
   sql clone 'alter database mount standby database';
}
executing Memory Script

Starting backup at 09-SEP-08
channel c1: starting datafile copy
copying standby control file
released channel: c1
released channel: c2
released channel: ac1
released channel: ac2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/09/2008 17:25:35
RMAN-03015: error occurred in stored script. Memory Script
RMAN-03009: failure of backup command on c1 channel at 09/09/2008 17:25:35
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-01031: insufficient privileges
ORA-17629: Cannot connect to the remote database server

RMAN> exit


Recovery Manager complete.

這個問題導致的原因比較複雜。在本地測試連線,沒有發現任何問題。在遠端用sys測試連線輔助例項,也沒有發現任何異常的情況。

但是隻要執行DUPLICATE命令就會報這個錯誤。

由於沒有其他可供參考的資訊,只能根據現有的錯誤資訊進行推測。首先報錯發生在CHANNEL c1上,而且是連線遠端資料庫伺服器。說明問題是在執行過程中,源資料庫嘗試連線輔助例項時出現的錯誤,懷疑問題可能與密碼檔案有關。

可是透過命令列測試沒有發現任何的問題。而且前面採用SPFILE方式的時候,也經歷了類似的步驟,為什麼沒有報錯。

透過觀察Oracle執行的指令碼發現了問題,Oracle執行了下面的操作:

contents of Memory Script.:
{
   backup as copy reuse
   file  '/data/oracle/product/11.1/database/dbs/orapwrac11g1' auxiliary format
 '/data/oracle/product/11.1/database/dbs/orapwrac11g1'   ;
}
executing Memory Script

這個操作時前面透過SPFILE方式所沒有的,這也是為什麼原來沒有報錯,而這裡報錯的原因。

不過問題又出現了,Oracle這個操作的目錄是將遠端的密碼檔案同步到本地伺服器上,使得後續的SYS連線可以正常登陸。這個操作本來是沒有任何問題的,而且實際上前面我就是採用了作業系統上面的類似操作,將源資料庫的密碼檔案同步到本地的。那麼為什麼Oraclerman執行了這個操作後,源資料庫再連線輔助例項的時候反而報錯了呢。

檢查了源資料庫的配置後,發現了問題的所在,源資料庫採用了裸裝置的方式,而本地的$ORACLE_HOME/dbs目錄下的密碼檔案只是一個連結而已。

$ ls -l
total 38404
-rw-rw----   1 oracle   oinstall    1552 Sep  5 16:09 hc_rac11g1.dat
-rw-r--r--   1 oracle   oinstall    8385 Sep 11  1998 init.ora
-rw-r--r--   1 oracle   oinstall   12920 May  3  2001 initdw.ora
-rw-r-----   1 oracle   oinstall      43 Jul 16 15:48 initrac11g1.ora
lrwxrwxrwx   1 oracle   oinstall      34 Jul 16 14:39 orapwrac11g1 -> /dev/vx/rdsk/datavg/rac11g_pwdfile
-rw-r-----   1 oracle   oinstall 19611648 Sep  8 18:27 snapcf_rac11g1.f

很可能Oracle在這裡執行備份的時候出現了錯誤,沒有采用裸裝置備份而是當作作業系統檔案進行了複製,導致恢復到輔助例項後,密碼檔案不可用。

於是先去掉了連線,直接將裸裝置中的密碼檔案複製到本地目錄,使得RMAN可以正常的複製這個檔案:

$ dd if=/dev/vx/rdsk/datavg/rac11g_pwdfile f=/data/rac11g_pwfile
204800+0 records in
204800+0 records out
bash-3.00$ rm orapwrac11g1
bash-3.00$ rm orapwrac11g2
bash-3.00$ cp /data/rac11g_pwfile orapwrac11g1

再次執行DUPLICATE,則許可權不足的問題被解決。

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4227/viewspace-604418/,如需轉載,請註明出處,否則將追究法律責任。

相關文章