Windows下RAC歸檔滿和ORA-04031問題
Windows下RAC歸檔滿和ORA-04031問題
(V 1.0)
文章版權所有Jusin Hao(luckyfriends),支援原創,轉載請註明。
2012年8月
版本資訊
日期 | 版本 | 描述 | 作者/修改人 | 備註 |
2012-8-31 | 1.0 | 建立 | Jusin Hao | |
目 錄
本文件用於記錄******安裝操作過程。
本文件包含****的安裝操作等內容。
:歸檔滿現象及環境資訊:
# localhost name resolution is handled within DNS itself.
# 127.0.0.1 localhost
# ::1 localhost
10.139.5.13 db01
10.139.5.14 db02
10.139.5.15 db01-vip
10.139.5.16 db02-vip
10.139.5.17 scan
10.10.10.1 db01-pri
10.10.10.2 db02-pri
10.139.5.20 dbbackup
10.139.5.21 appbackup
SQL> select inst_id,instance_name from gv$instance;
INST_ID INSTANCE_NAME
---------- ----------------
1 jchr1
2 jchr2
SQL> show parameter db_recovery
NAME TYPE VALUE
------------------------------------ ----------- -----------------
db_recovery_file_dest string
db_recovery_file_dest_size big integer 20G
SQL> archive log list
資料庫日誌模式 存檔模式
自動存檔 啟用
存檔終點 +DATA
最早的聯機日誌序列 31075
下一個存檔日誌序列 31076
當前日誌序列 31076
SQL>
SQL> select flashback_on from v$database;
FLASHBACK_ON
------------------
NO
指令碼delete.bat:
rman target / nocatalog CMDFILE 'D:\d-arch\rmanbackup.txt' LOG 'D:\d-arch\rman_backup_%DATE:~0,4%%DATE:~5,2%%DATE:~8,2%".log'
指令碼D:\d-arch\rmanbackup.txt
delete noprompt archivelog ALL COMPLETED BEFORE "sysdate - 6";
檢視指令碼日誌發現指令碼是執行成功的;
但是客戶經常報歸檔空間滿,由上面引數,我們知道其歸檔日誌是存放在+DATA卷組中的(ASM卷組)
:分析:
1、常規的:我們可以手動備份後清除日誌:
1)backup format '/install_source/rman_bak/arch_%d_%U' archivelog all delete input; --刪除已經備份過的歸檔
2)delete noprompt archivelog ALL COMPLETED BEFORE "sysdate - 6";
執行上面這兩個命令會發現發現
3)delete noprompt archivelog all; --這會清除所有歸檔日誌,包括剛新生成的(慎用)
2、檢視ASM卷組狀態
C:\>asmcmd
ASMCMD>
ASMCMD> ls
DATA/
OCR/
ASMCMD> cd data
ASMCMD> ls
JCHR/
ASMCMD> cd jchr
ASMCMD> ls
ARCHIVELOG/
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
STANDBYLOG/
TEMPFILE/
spfilejchr.ora
spfilejchr2.ora
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files N
ame
MOUNTED EXTERN N 512 4096 1048576 819197 48329 0 48329 0 N D
ATA/
MOUNTED EXTERN N 512 4096 1048576 10237 9841 0 9841 0 N O
CR/
透過上面的資訊,我們可以發現+DATA\jchr\archivelog\下存放的是資料庫的歸檔檔案,但是存在很多2012年和2013年前幾個月的的歸檔日誌檔案,說明以前的歸檔檔案並沒有從ASM卷組上物理刪除;同時,有這些歸檔日誌長期佔用著物理空間,整個ASM卷組的磁碟空間只剩下40多個G(48329M),因此後續的歸檔日誌很容易超出整個空間,導致無法歸檔。為什麼會這樣呢?
Oracle控制檔案以及Oracle RMAN的的備份恢復的原理,Oracle 控制檔案裡邊記錄了資料庫的名字,id,建立的時間戳,以及歸檔記錄和備份資訊。
Oracle RMAN的備份恢復的所有資訊都依賴於:要麼是控制檔案,要麼是恢復目錄(catalog),因為所有的備份與恢復資訊都會依據備份是的方式儲存到這兩個位置。
理所當然的是,對這兩個東東里的備份集、映象副本、歸檔日誌等等所有能備份的物件的任意操作,首先會參這些物件的記錄的資訊,其次是當被記錄的物件發生變化時做相應的更新。
--檢視已經歸檔日誌的統計:
select name,status,count(*) from gv$archived_log group by name,status order by 1 desc;
從上面的查詢可知,當前的兩個節點其歸檔日誌只有2個,其餘的17630個其NAME都是NULL值。
看看關於檢視v$archived_log中NAME列的解釋 :
Archived log file name. If set to NULL, either the log file was cleared before it was archived or an RMAN backup command with the "delete input" option was executed to back up archivelog all (RMAN> backup archivelog all delete input;).
上面的這段話表明當前的這些日誌檔案要麼被手動清除,要麼被rman的delete input選項清除。
其次status列的D欄位也表明了這些個名字為空的歸檔日誌已經被Deleted.也就是說有17630個歸檔日誌已經被刪除了。
--如下檢視查詢,我們可以看到歸檔日誌狀態為D且name為null的,兩個例項上都是8815個:
select inst_id,name,count(*) from gv$archived_log group by inst_id,name order by 1 desc;
--檢視控制檔案已歸檔記錄的相關資訊:
SQL> select * from gv$controlfile_record_section where type='ARCHIVED LOG';
INST_ID TYPE RECORD_SIZE RECORDS_TOTAL RECORDS_USED FIRST_INDEX LAST_INDEX LAST_RECID
----------- ---------- ----------------------------------------------------------------------------
2 ARCHIVED LOG 584 10952 10710 8822 8579 52415
1 ARCHIVED LOG 584 10952 10710 8822 8579 52415
SQL> select count (*) from v$archived_log;
COUNT(*)
----------
10710
RECORDS_TOTAL:Number of records allocated for the section
列RECORDS_TOTAL表明為當前TYPE分配的可儲存的總數,在兩個instance上都為10952條
從最近一次切換日誌的查詢結果可知,被刪除的有8815條,如果下次日誌切換再增加一條往哪裡放呢?
那些已經超出預設保留期的歸檔日誌被覆蓋(在控制檔案中的記錄),即被重用,使用者在控制檔案中儲存ARCHIVED LOG部分的保留時間由誰來決定呢,引數control_file_record_keep_time,預設為7天 ,這意味著7天前的歸檔日誌和備份資訊可能在控制檔案中已經不存在了
SQL> show parameter control
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time integer 7
control_files string +DATA/jchr/controlfile/current
.261.793900303, +DATA/jchr/con
trolfile/current.260.793900303
control_management_pack_access string DIAGNOSTIC+TUNING
# 下面的查詢正好表明為什麼2012_10_23和之前的日誌為什麼沒有被刪除
# 因為20130505 12:00:04之後的歸檔日誌記錄在控制檔案中已經被覆蓋了,所以使用delete archivelog all時是根本無法清除之前的日誌的,即對於rman下的delete archivelog all方式不會刪除控制檔案中對應的歸檔日誌資訊(在控制檔案中設定delete狀態的歸檔日誌)
SQL> alter session set nls_date_format='yyyymmdd hh24:mi:ss';
會話已更改。
SQL> select min (FIRST_TIME), min (COMPLETION_TIME), max (FIRST_TIME), max (COMP
LETION_TIME) from v$archived_log;
MIN(FIRST_TIME) MIN(COMPLETION_TI MAX(FIRST_TIME) MAX(COMPLETION_TI
----------------- ----------------- ----------------- -----------------
20130505 10:06:16 20130519 22:18:57 20130519 22:24:57
SQL> select min (FIRST_TIME), min (COMPLETION_TIME), max (FIRST_TIME), max (COMP
LETION_TIME) from gv$archived_log;
MIN(FIRST_TIME) MIN(COMPLETION_TI MAX(FIRST_TIME) MAX(COMPLETION_TI
----------------- ----------------- ----------------- -----------------
20130505 10:06:16 20130505 12:00:04 20130519 22:18:57 20130519 22:24:57
:方案:
如下手工依次刪除之前廢棄的歸檔日誌即可:
ASMCMD> rm -r 2012*
You may delete multiple files and/or directories.
Are you sure? (y/n) y
:ASM INSTANCE 操作報ORA-04031現象
在命令體是否下執行select、crsctl、crs_stat等命令會報錯,如下:
1) ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","unknown object",) (DBD ERROR: error possibly near indicator at char 44 in '/* ASMCMD */ select name, parent_index from v$asm_alias where reference_index=33582893')
2) ("shared pool","unknown object","sga heap(1,0)","kglsim heap") (DBD ERROR: OCIStmtExecute)
3) SQL> show parameter share
ORA-00604: error occurred at recursive SQL level 2
ORA-04031: unable to allocate 4064 bytes of shared memory ("sharedpool","select x.inst_id,x.indx+1,ks...","sga heap(1,0)","kglsim heap")
:分析
--無法檢視ASM的引數檔案路徑:
SQL> show parameter spfile
ORA-00604: error occurred at recursive SQL level 2
ORA-04031: unable to allocate 4064 bytes of shared memory ("shared
pool","select x.inst_id,x.indx+1,ks...","sga heap(1,0)","kglsim heap")
--備份引數檔案
SQL> create pfile='d:\initasm1.ora' from spfile;
File created.
--引數檔案內容:
*._library_cache_advice=FALSE
+asm1.asm_diskgroups='DATA'#Manual Mount
+asm2.asm_diskgroups='DATA'#Manual Mount
*.asm_power_limit=1
*.diagnostic_dest='D:\app\Administrator'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'
--嘗試手動停止個元件:
D:\app\11.2.0\grid\BIN>srvctl.bat status asm
PRCR-1070 : 無法檢查 資源 ora.asm 是否已註冊
Cannot communicate with crsd
D:\app\11.2.0\grid\BIN>srvctl.bat status asm
PRCR-1070 : 無法檢查 資源 ora.asm 是否已註冊
Cannot communicate with crsd
D:\app\11.2.0\grid\BIN>crs_stat.exe -t
Errors in file :
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","begin
dbms_diskgroup.read(:h...","sga heap(1,0)","kglsim object batch")
D:\app\11.2.0\grid\BIN>crs_stat.exe -t -n db01
^C
D:\app\11.2.0\grid\BIN>srvctl.bat status asm -n db01
PRCR-1070 : 無法檢查 資源 ora.asm 是否已註冊
Cannot communicate with crsd
D:\app\11.2.0\grid\BIN>srvctl.bat status nodeapps -n db01
-n
PRCR-1035 : 無法查詢 null 的 CRS 資源 ora.cluster_vip.type
PRCR-1068 : 無法查詢資源
Cannot communicate with crsdPRCR-1070 : 無法檢查 資源 ora.net1.network 是否已註冊
Cannot communicate with crsdPRCR-1070 : 無法檢查 資源 ora.gsd 是否已註冊
Cannot communicate with crsdPRCR-1070 : 無法檢查 資源 ora.ons 是否已註冊
Cannot communicate with crsdPRCR-1070 : 無法檢查 資源 ora.eons 是否已註冊
Cannot communicate with crsd
D:\app\11.2.0\grid\BIN>
D:\app\11.2.0\grid\BIN>srvctl.bat status nodeapps
PRCR-1035 : 無法查詢 1 的 CRS 資源 ora.cluster_vip.type
PRCR-1068 : 無法查詢資源
Cannot communicate with crsdPRCR-1070 : 無法檢查 資源 ora.net1.network 是否已註冊
Cannot communicate with crsdPRCR-1070 : 無法檢查 資源 ora.gsd 是否已註冊
Cannot communicate with crsdPRCR-1070 : 無法檢查 資源 ora.ons 是否已註冊
Cannot communicate with crsdPRCR-1070 : 無法檢查 資源 ora.eons 是否已註冊
Cannot communicate with crsd
D:\app\11.2.0\grid\BIN>srvctl.bat status instance
PRKO-2082 : 缺少必需的選項 -d
D:\app\11.2.0\grid\BIN>srvctl.bat status instance -d jchr
PRKO-3132 : 檢查資料庫 jchr 的例項的狀態需要 -i 選項或 -n 選項
D:\app\11.2.0\grid\BIN>srvctl.bat status instance -d jchr -i jchr1
PRCD-1027 : 無法檢索資料庫 jchr
PRCR-1070 : 無法檢查 資源 ora.jchr.db 是否已註冊
Cannot communicate with crsd
D:\app\11.2.0\grid\BIN>srvctl.bat stop instance -d jchr -i jchr1
PRCD-1027 : 無法檢索資料庫 jchr
PRCR-1070 : 無法檢查 資源 ora.jchr.db 是否已註冊
Cannot communicate with crsd
D:\app\11.2.0\grid\BIN>crsctl check crs
CRS-4638: Oracle High Availability Services 聯機
CRS-4535: 無法與叢集就緒服務通訊
CRS-4529: 叢集同步服務聯機
CRS-4533: 事件管理器聯機
D:\app\11.2.0\grid\BIN>
D:\app\11.2.0\grid\BIN>crs_stat
Errors in file :
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","begin
dbms_diskgroup.read(:h...","sga heap(1,0)","kglsim object batch")
D:\app\11.2.0\grid\BIN>set ORACLE_SID=jchr1
D:\app\11.2.0\grid\BIN>crs_stat
Errors in file :
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","begin
dbms_diskgroup.read(:h...","sga heap(1,0)","kglsim object batch")
D:\app\11.2.0\grid\BIN>
D:\app\11.2.0\grid\BIN>crs_stat -t
Errors in file :
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","begin
dbms_diskgroup.read(:h...","sga heap(1,0)","")
--檢視asm的alert日誌
D:\app\Administrator\diag\asm\+asm\+asm1\trace\ alert_+asm1.log
Thu Sep 13 15:17:26 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 Private 10.0.0.0 configured from GPnP Profile for use as a cluster interconnect
Interface type 1 Public 10.139.5.0 configured from GPnP Profile for use as a public interface
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as D:\app\11.2.0\grid\RDBMS
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
Using parameter settings in client-side pfile D:\APP\11.2.0\GRID\DATABASE\INIT+ASM1.ORA on machine DB01
System parameters with non-default values:
large_pool_size = 12M
instance_type = "asm"
remote_login_passwordfile= "EXCLUSIVE"
asm_power_limit = 1
diagnostic_dest = "D:\APP\ADMINISTRATOR"
Cluster communication is configured to use the following interface(s) for this instance
10.10.10.1
cluster interconnect IPC version:Oracle 11 Winsock2 TCP/IP IPC
IPC Vendor 1 proto 1
Version 1.0
Thu Sep 13 15:17:26 2012
PMON started with pid=2, OS id=916
Thu Sep 13 15:17:26 2012
VKTM started with pid=3, OS id=5972 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
--檢視對應的trace :+asm1_ora_3912.trc
Memory Utilization of Subpool 1
================================
Allocation Name Size
___________________________ ____________
"free memory " 87682648
"miscellaneous " 0
"KJC dest ctx " 1456
"kfcgx heap " 48376
"kfr group ctx " 736
"ksv work msg " 2272
---同樣例項2上面也無法檢視:
D:\app\11.2.0\grid\BIN>set ORACLE_SID=+ASM2
D:\app\11.2.0\grid\BIN>sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on 星期一 5月 20 20:11:38 2013
Copyright (c) 1982, 2010, Oracle. All rights reserved.
連線到:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> select name, round((bytes/1024/1024),0) MB from v$sgastat where pool='shar
d pool' order by bytes desc;
select name, round((bytes/1024/1024),0) MB from v$sgastat where pool='shared po
l' order by bytes desc
*
第 1 行出現錯誤:
ORA-00604: error occurred at recursive SQL level 2
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared
pool","select inst_id,'',ksmssnam,k...","sga heap(1,0)","kglsim object batch")
:方案
1、手動修改ASM例項的shared pool,為其指定SGA和shared_pool_size大小;
2、方法:介於Oracle11G+ ASM(ocr在asm上)時ASM例項無法單獨重啟,因此參考RAC_ASM_Practic(ASM例項引數修改).txt 裡的方法修改引數;
參考1:ASM Instance generates ORA-04031 ("shared pool","select name_kfgrp, number_kf...","sga heap(1,0)","kglsim heap" [ID 1450745.1]
Cause
This was researched in Bug 12566932
1. The over allocation occurred in "kglsim allocations" as you can see in these errors
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select name_kfgrp, number_kf...","sga heap(1,0)","kglsim object batch")
ORA-04031: unable to allocate 4064 bytes of shared memory ("shared pool","select name_kfgrp, number_kf...","sga heap(1,0)","kglsim heap")
2. You can also see which compoinent is using the memory by issuing this select statement:
set lines 100
set pages 9999
col mb format 999,999,999
select name, round((bytes/1024/1024),0) MB
from v$sgastat where pool='shared pool'
order by bytes
3. There was memory available at the time of the error which indicates the shared pool is fragmented
Solution
In the Bug the following workaround resolved the errors:
1. Set the following parameter _library_cache_advice=false
alter system set "_library_cache_advice"=false scope=spfile;
2. Bounce the database
如下資料庫的shared pool各元件及空閒大小,由於asm例項執行即報錯,根本無法返回結果;
SET ORACLE_SID=jchr1
SQL> select name, round((bytes/1024/1024),0) MB from v$sgastat where pool='shared pool' order by bytes
NAME MB
-------------------------- ----------
KQR L PO 14
dbwriter coalesce buffer 16
ges resource 18
KCL name table 18
SQLA 19
dbktb: trace buffer 23
ksunfy : SSO free list 26
event statistics per sess 28
gcs shadows 30
ges big msg buffers 30
FileOpenBlock 30
NAME MB
-------------------------- ----------
ges enqueues 31
gcs resources 41
ASH buffers 50
free memory 2914:
已選擇961行。
參考2:How to Start (or stop) 10gR2 or 11gR1 Oracle Clusterware Services Manually in Windows [ID 729512.1]
RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent) [ID 810394.1]
1. Try starting crs with "crsctl start crs" command from $CRS_HOME\bin directory
OR
2. Start the following services from Windows 'Services' or by using the Windows command 'net start
command prompt:
a. If OPMD is present, then start it: Oracle Process Manager Service. In addition to OPMD, start the
OracleClusterVolumeService if you are using Oracle Cluster File System (OCFS) in your environment.
b. If OPMD is not present then start the following services in the order provided:
OracleObjectService
OracleClusterVolumeService (if using OCFS)
OracleCSService
OracleEVMService
OracleCRService
http://cywxzyh.blog.163.com/blog/static/163857248201144112236781/
參考3:ASM & Shared Pool (ORA-4031) [ID 437924.1]
參考4:其他參考:
http://zhang41082.itpub.net/post/7167/463093
查詢了metalink文件,發現果然是Oracle的bug,metalink文件Bug No. 4431215描述了這個問題。如果在ASM上建立的目錄名稱和SID字首相同,則這個目錄下的所有目錄一旦為空,就會被ASM例項自動刪除,不管這個目錄是ASM自動建立,還是使用者手工建立的。
解決這個問題的辦法倒是很多,比如將歸檔目錄設定到日誌檔案的目錄中,或者改變主目錄的名稱使其和ORACLE_SID有所區別。還有更簡單的辦法,在歸檔目錄下複製一個小檔案,使得歸檔被刪除後,歸檔目錄不為空,從而避免路徑被清除。
Oracle在10.2.0.4和11.1.0.6中fixed了這個bug。因此升級或打PATCH也是解決這個問題的一個方法。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/14710393/viewspace-1100971/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- oracle歸檔切換以及歸檔日誌滿報錯問題Oracle
- ORACLE RAC模式下歸檔模式和非歸檔模式的切換方法Oracle模式
- oracle 10g rac+asm 歸檔路徑磁碟組空間滿問題處理Oracle 10gASM
- rac使用預設閃回區歸檔空間滿
- RAC 環境下修改歸檔模式模式
- 【RAC】RAC環境下歸檔日誌格式約定
- RAC下啟動日誌歸檔模式模式
- Oracle 10G RAC歸檔引數格式問題導致歸檔至ASM時出錯Oracle 10gASM
- 手工清除歸檔處理歸檔空間滿
- Oracle RAC 歸檔與非歸檔切換Oracle
- rac更改歸檔模式模式
- Oracle10g RAC下啟用歸檔模式Oracle模式
- RAC下歸檔不放到共享盤的測試
- rac下啟動關閉與更改歸檔模式模式
- RAC環境中非歸檔改為歸檔模式模式
- RAC(ASM)歸檔模式修改ASM模式
- ORACLE RAC 設定歸檔Oracle
- ORACLE RAC 啟用歸檔Oracle
- rac下啟動關閉與更改歸檔模式(zt)模式
- 關於一個歸檔問題?
- dataguard歸檔路徑的問題
- 【轉】雙節點RAC下將資料庫從非歸檔模式更改到歸檔模式資料庫模式
- oracle RAC開啟歸檔模式Oracle模式
- RAC 開啟歸檔日誌
- Oracle歸檔引數設定問題Oracle
- AIX檔案系統滿的問題分析AI
- Oracle日常問題處理ORA-04031Oracle
- git Windows下重新命名檔案,大小寫敏感問題GitWindows
- ORA-00257歸檔日誌滿
- RAC歸檔配置方案:使用NFS共享儲存儲存歸檔檔案NFS
- RAC資料庫大量載入資料造成歸檔日誌空間滿處理資料庫
- windows下vim的使用問題Windows
- RAC開啟資料庫歸檔資料庫
- 10G RAC歸檔設定
- 【archive_dest】歸檔的路徑問題Hive
- rac歸檔模式/非歸檔模式下檔案建至本地儲存修復方法-ORA-01157模式
- 當ORACLE歸檔日誌滿後如何正確刪除歸檔日誌Oracle
- RAC環境下節點歸檔目的地相互獨立的配置和備份要求