Oracle10g RAC的一些服務(CRS),trc及log檔案簡介[final]

tolywang發表於2010-11-16


0. 先認識一下Oracle10g RAC的一些服務及概念  

Cluster Synchronization Services (CSS)—
Manages the cluster configuration by controlling which nodes are members of the
cluster and by notifying members when a node joins or leaves the cluster. If
you are using third-party clusterware, then the css process interfaces with your
clusterware to manage node membership information.

Cluster Ready Services (CRS)—
The primary program for managing high availability operations within a cluster.
Anything that the crs process manages is known as a cluster resource which could
be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an
application process, and so on. The crs process manages cluster resources based on
the resource's configuration information that is stored in the OCR. This includes
start, stop, monitor and failover operations. The crs process generates events when
a resource status changes. When you have installed Oracle RAC, crs monitors the Oracle
instance, Listener, and so on, and automatically restarts these components when a failure
occurs. By default, the crs process makes five attempts to restart a resource and then
does not make further restart attempts if the resource does not restart.

Event Management (EVM):
A background process that publishes events that crs creates.

Oracle Notification Service (ONS):
A publish and subscribe service for communicating Fast Application Notification
(FAN) events.

RACG—
Extends clusterware to support Oracle-specific requirements and complex resources.
Runs server callout scripts when FAN events occur.

Process Monitor Daemon (OPROCD):
This process is locked in memory to monitor the cluster and provide I/O fencing.
OPROCD performs its check, stops running, and if the wake up is beyond the expected
time, then OPROCD resets the processor and reboots the node. An OPROCD failure results
in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux
platforms.

Voting Disk :
Manages cluster membership by way of a health check and arbitrates cluster
ownership among the instances in case of network failures. Oracle RAC uses the
voting disk to determine which instances are members of a cluster. The voting disk
must reside on shared disk. For high availability, Oracle recommends that you have
multiple voting disks. The Oracle Clusterware enables multiple voting disks but you
must have an odd number of voting disks, such as three, five, and so on. If you define
a single voting disk, then you should use external mirroring to provide redundancy.

Oracle Cluster Registry (OCR):
Maintains cluster configuration information as well as configuration information
about any cluster database within the cluster. The OCR also manages information about
processes that Oracle Clusterware controls. The OCR stores configuration information
in a series of key-value pairs within a directory tree structure. The OCR must reside
on shared disk that is accessible by all of the nodes in your cluster. The Oracle
Clusterware can multiplex the OCR and Oracle recommends that you use this feature
to ensure cluster high availability. You can replace a failed OCR online, and you can
update the OCR through supported APIs such as Enterprise Manager, the Server Control
Utility (SRVCTL), or the Database Configuration Assistant (DBCA).

 

 

CRS主要service --

crs主要程式
(1)crsd  -  負責管理HA操作, 管理crs資源,如linstener,vip,ons,gsn等,由root使用者管理、啟動
(2)ocssd -  管理各節點的關係,用於節點間通訊, 由oracle使用者執行管理
(3)oprocd - 叢集程式管理 —Process monitor for the cluster. 僅在沒有使用vendor的叢集軟體狀態下執行
(4)evmd -  事件檢測程式,由oracle使用者執行管理

相關log位置
$ORA_CRS_HOME/log/nodename/crsd
$ORA_CRS_HOME/crs/init
$ORA_CRS_HOME/css/log
$ORA_CRS_HOME/css/init
$ORA_CRS_HOME/evm/log
$ORA_CRS_HOME/evm/init
$ORA_CRS_HOME/srvm/log

 

 


1. 這裡 ORACLE_BASE=/u01/product , ORACLE_HOME=/u01/product/oracle 

mxrac05$ls
adump  bdump  cdump  dpdump  hdump  pfile  udump  

 

 

A. adump 記錄的是aud字尾的審計檔案,記錄SYS使用者的登陸資訊 。

Audit file /u01/product/admin/mxdell/adump/ora_24065.aud
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/product/oracle
System name:    Linux
Node name:      mxrac05
Release:        2.6.18-128.el5
Version:        #1 SMP Wed Dec 17 11:41:38 EST 2008
Machine:        x86_64
Instance name: mxdell5
Redo thread mounted by this instance: 5
Oracle process number: 54
Unix process pid: 24065, image:

Mon Sep 27 14:09:34 2010
LENGTH : '153'
ACTION :[7] 'CONNECT'
DATABASE USER:[3] 'SYS'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[12] 'harrison.han'
CLIENT TERMINAL:[11] 'MXWS-004570'
STATUS:[1] '0'

 

 


B. bdump 記錄的是所有後臺程式相關的trace檔案及各例項的alert log檔案

比如其中 alert_mxdell1.log 表示記錄RAC節點1例項 mxdell1 (例項名稱)對應的
告警日誌檔案及對應後臺程式的trc檔案;其中還有一些目錄比如cdmp_20101005101745
下有一些trw檔案,也是一種trace檔案,一般出現這種檔案,都會在alert log中
找到對應的錯誤日誌, 比如Tue Jul 13 22:01:16 2010 Trace dumping is performing
id=[cdmp_20100713220116], alert log中這些錯誤會生成含有時間戳的核心轉儲檔案
bdump/cdmp_timestamp, 其中timestamp表示錯誤發生的時間,一般出現core dump基本
都是bug導致 。


The directory cdmp_timestamp contains in-memory traces of Oracle RAC instance
failure information

Diagnosability Daemon (DIAG)
The Diagnosability Daemon captures diagnostic information related to process and
instance failures.  This information can be used Oracle World Wide Support to help
and analyze and resolve problems with your database and instances.

The DIAG process writes its diagnostic information to files in a subdirectory of
the directory specified by the initialization parameter BACKGROUN_DUMP_DEST.The
subdirectories are named cdmp_timestamp, where timestatmp identifies when the
subdirectory, and trace information, was written.


例子: 

mxrac01$ls -alhrt
total 4.1M
drwxr-xr-x 9 oracle dba 4.0K Mar  2  2010 ..
drwxr-x--- 2 oracle dba  24K Sep 17 10:59 cdmp_20100917105859
drwxr-x--- 2 oracle dba  24K Oct  1 22:14 cdmp_20101001221411
drwxr-x--- 2 oracle dba  24K Oct  5 10:17 cdmp_20101005101745
-rw-rw---- 1 oracle dba 1.1K Nov 12 23:00 mxdell1_m001_2876.trc
-rw-rw---- 1 oracle dba 1.1K Nov 13 19:00 mxdell1_m001_8809.trc
-rw-rw---- 1 oracle dba 1.1K Nov 13 21:00 mxdell1_m001_28585.trc
-rw-rw---- 1 oracle dba  964 Nov 14 17:00 mxdell1_m001_15713.trc
-rw-rw---- 1 oracle dba  773 Nov 14 17:10 mxdell1_q002_19652.trc
-rw-rw---- 1 oracle dba  976 Nov 14 17:12 mxdell1_arc0_13125.trc
-rw-rw---- 1 oracle dba  68K Nov 14 17:29 mxdell1_diag_12986.trc
-rw-rw---- 1 oracle dba  984 Nov 14 18:00 mxdell1_m001_7499.trc
-rw-rw---- 1 oracle dba 747K Nov 14 18:51 mxdell1_arc1_13127.trc
-rw-rw---- 1 oracle dba 1.1K Nov 14 19:00 mxdell1_m001_1103.trc
-rw-rw---- 1 oracle dba 432K Nov 15 21:41 mxdell1_lmd0_12992.trc
-rw-rw---- 1 oracle dba 1.1K Nov 15 22:00 mxdell1_m001_2387.trc
-rw-rw---- 1 oracle dba 240K Nov 15 22:01 mxdell1_lms3_13006.trc
-rw-rw---- 1 oracle dba 291K Nov 15 22:12 mxdell1_lms4_13011.trc
-rw-rw---- 1 oracle dba 212K Nov 15 22:26 mxdell1_lms1_12998.trc
drwxr-xr-x 5 oracle dba  12K Nov 15 22:33 .
-rw-r----- 1 oracle dba 1.1M Nov 15 22:43 alert_mxdell1.log
-rw-rw---- 1 oracle dba 1.9K Nov 15 22:43 mxdell1_lgwr_13027.trc
-rw-rw---- 1 oracle dba 229K Nov 15 22:46 mxdell1_lms0_12994.trc
-rw-rw---- 1 oracle dba 239K Nov 15 22:47 mxdell1_lms5_13015.trc
-rw-rw---- 1 oracle dba 253K Nov 15 22:47 mxdell1_lms2_13002.trc

 

核心轉儲(core dump目錄下的trw檔案)例子 :

mxrac01$ls -alhrt 
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_smon_13801.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_reco_13803.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_qmnc_14006.trw
-rw-rw---- 1 oracle dba  32K Sep 17 10:59 mxdell1_pz99_14013.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_psp0_13754.trw
-rw-rw---- 1 oracle dba  30K Sep 17 10:59 mxdell1_pmon_13748.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_994.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_9903.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_9720.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_9552.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_9546.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_9541.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_9436.trw
-rw-rw---- 1 oracle dba  32K Sep 17 10:59 mxdell1_ora_9420.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_9200.trw 
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_10287.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_10090.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_ora_10068.trw
-rw-rw---- 1 oracle dba  38K Sep 17 10:59 mxdell1_mmon_13807.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_mmnl_13809.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_mman_13788.trw
-rw-rw---- 1 oracle dba  30K Sep 17 10:59 mxdell1_lms5_13784.trw
-rw-rw---- 1 oracle dba  30K Sep 17 10:59 mxdell1_lms4_13780.trw
-rw-rw---- 1 oracle dba  30K Sep 17 10:59 mxdell1_lms3_13776.trw
-rw-rw---- 1 oracle dba  30K Sep 17 10:59 mxdell1_lms2_13772.trw
-rw-rw---- 1 oracle dba  32K Sep 17 10:59 mxdell1_lms1_13766.trw
-rw-rw---- 1 oracle dba  30K Sep 17 10:59 mxdell1_lms0_13760.trw
-rw-rw---- 1 oracle dba  30K Sep 17 10:59 mxdell1_lmon_13756.trw
-rw-rw---- 1 oracle dba  30K Sep 17 10:59 mxdell1_lmd0_13758.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_lgwr_13797.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_lck0_13823.trw
-rw-rw---- 1 oracle dba  10K Sep 17 10:59 mxdell1_j005_4826.trw
-rw-rw---- 1 oracle dba  30K Sep 17 10:59 mxdell1_j002_7331.trw
-rw-rw---- 1 oracle dba  30K Sep 17 10:59 mxdell1_j001_10548.trw
-rw-rw---- 1 oracle dba  32K Sep 17 10:59 mxdell1_j000_10521.trw
-rw-rw---- 1 oracle dba 6.0K Sep 17 10:59 mxdell1_diag_13750.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_dbw2_13795.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_dbw1_13793.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_dbw0_13790.trw
-rw-rw---- 1 oracle dba  38K Sep 17 10:59 mxdell1_ckpt_13799.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_cjq0_13805.trw
-rw-rw---- 1 oracle dba  36K Sep 17 10:59 mxdell1_arc1_13937.trw
-rw-rw---- 1 oracle dba  34K Sep 17 10:59 mxdell1_arc0_13935.trw

 


mxrac01$crs_stat -t 
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.mxdell.db  application    ONLINE    ONLINE    mxrac01    
ora....l1.inst application    ONLINE    ONLINE    mxrac01    
ora....l3.inst application    ONLINE    ONLINE    mxrac03    
ora....l4.inst application    ONLINE    ONLINE    mxrac04    
ora....l5.inst application    ONLINE    ONLINE    mxrac05    
ora....01.lsnr application    ONLINE    ONLINE    mxrac01    
ora....c01.gsd application    ONLINE    ONLINE    mxrac01    
ora....c01.ons application    ONLINE    ONLINE    mxrac01    
ora....c01.vip application    ONLINE    ONLINE    mxrac01    
ora....03.lsnr application    ONLINE    ONLINE    mxrac03    
ora....c03.gsd application    ONLINE    ONLINE    mxrac03    
ora....c03.ons application    ONLINE    ONLINE    mxrac03    
ora....c03.vip application    ONLINE    ONLINE    mxrac03    
ora....04.lsnr application    ONLINE    ONLINE    mxrac04    
ora....c04.gsd application    ONLINE    ONLINE    mxrac04    
ora....c04.ons application    ONLINE    ONLINE    mxrac04    
ora....c04.vip application    ONLINE    ONLINE    mxrac04    
ora....05.lsnr application    ONLINE    ONLINE    mxrac05    
ora....c05.gsd application    ONLINE    ONLINE    mxrac05    
ora....c05.ons application    ONLINE    OFFLINE              
ora....c05.vip application    ONLINE    ONLINE    mxrac05    
mxrac01$
mxrac01$
mxrac01$

 


C.  cdump 記錄很多core_ 開頭的目錄,core檔案是程式的核心映像,使用者一般
不用看這些檔案 。 core_ 後面的數字表示process ID .

cdump下存放的是oracle內部錯誤時的核心資訊,在bdump或udump中都會有對應的檔案。
cdump資訊對oracle support很有用。修改引數 core_dump_dest 更改路徑 。


mxrac01$ls -alhrt
total 60K
drwxr-x---  2 oracle dba 4.0K Dec  7  2009 core_2662
drwxr-x---  2 oracle dba 4.0K Dec 16  2009 core_20943
drwxr-x---  2 oracle dba 4.0K Dec 21  2009 core_27896
drwxr-x---  2 oracle dba 4.0K Dec 21  2009 core_23068
drwxr-x---  2 oracle dba 4.0K Dec 21  2009 core_21673
drwxr-x---  2 oracle dba 4.0K Dec 21  2009 core_2039
drwxr-x---  2 oracle dba 4.0K Dec 21  2009 core_11681
drwxr-x---  2 oracle dba 4.0K Jan 21  2010 core_18290
drwxr-x---  2 oracle dba 4.0K Jan 22  2010 core_4613
drwxr-x---  2 oracle dba 4.0K Jan 22  2010 core_18850
drwxr-x---  2 oracle dba 4.0K Jan 22  2010 core_5644
drwxr-x---  2 oracle dba 4.0K Feb 16  2010 core_15445
drwxr-xr-x  9 oracle dba 4.0K Mar  2  2010 ..
drwxr-x---  2 oracle dba 4.0K Aug  8 16:52 core_31833
drwxr-xr-x 15 oracle dba 4.0K Aug  8 16:52 .
mxrac01$

下面檔案類似:

mxrac01$ls -alhrt
total 14M
-rw-------  1 oracle dba  16M Dec 21  2009 core.23068

開啟這個檔案可以看出是二進位制檔案 。


D.  dpdump :是存放一些登入資訊的檔案。

E.  hdump  很少會產生一些記錄,表示Oracle High Availability Log Files 。

F.  udump :前臺手動trace的, 比如sql trace之後session的trace檔案

 

 

2.  CRS相關的服務log (mxrac01是節點1的hostname) . 

CRS 目錄下的Log

admin => 記錄一些概要資訊
alertmxrac01.log =>記錄節點crs狀態變化時候的一些概要資訊,詳細還是要看css log
client =>記錄crs初始化,ocr application including: CLSCFG, CSS, OCRCHECK, OCRCONFIG, OCRDUMP and OIFCFG
crsd =>記錄crsd的相關日誌,crs等待css進入fatal模式後,啟動crsd然後啟動相關的resource
cssd =>記錄cssd的相關日誌,節點停止,啟動,reconfig等,所有問題都會記錄,最重要的日誌
evmd =>記錄evmd的日誌
racg =>記錄ons,vip的相關日誌

遇到問題一般先看ocssd.log,然後根據時間和需要會檢視crsd的日誌,所有資源相關的日誌都在crsd.log,
另外如果日誌看不出關鍵資訊,可以把相關模組日誌級別調高(不同版本預設log級別不太一樣):

crsctl debug log css CSSD:5
crsctl debug log crs CRSD:3    等

這裡每個模組相關的資訊可以透過 crsctl lsmodule crs檢視

例子: 

mxrac01$ls -alh
total 200K
drwxr-xr-x 45 root   dba  4.0K Nov 18  2009 .
drwxrwxr-x  6 oracle dba  4.0K Nov 19  2009 ..
drwxr-xr-x  2 root   dba  4.0K Feb 24  2010 bin
drwxrwxr-x  4 oracle dba  4.0K Nov 18  2009 cdata
drwxrwxr-x  5 oracle dba  4.0K Apr  2  2010 cfgtoollogs
...
drwxr-xr-x  4 oracle dba  4.0K Nov 18  2009 log
drwxrwx--- 10 oracle dba  4.0K Nov 18  2009 network
drwxrwx---  5 oracle dba  4.0K Nov 18  2009 nls
....
drwxrwx---  4 oracle dba  4.0K Nov 18  2009 xdk


mxrac01$ls
admin  alertmxrac01.log  client  crsd  cssd  evmd  racg


mxrac01$ls
crsd.log


mxrac01$ls
cssdOUT.log  mxrac01.pid  oclsmon  oclsomon  ocssd.l05  ocssd.log  ocssd.trc


mxrac01$ls
evmd.log  evmdOUT.log


mxrac01$ls -alh
total 104K
drwxrwxr-t 5 oracle dba  4.0K Nov 15 01:37 .
drwxr-xr-t 8 root   dba  4.0K Jan 24  2010 ..
-rw-r--r-- 1 oracle dba   494 Dec 13  2009 evtf.log
-rw-r--r-- 1 oracle dba  2.1K Dec 13  2009 ora.mxdell.db.log
-rw-r--r-- 1 oracle dba   56K Nov  7 02:21 ora.mxrac01.ons.log
-rw-r--r-- 1 root   root 4.0K Jun 20 19:04 ora.mxrac01.vip.log
-rw-r--r-- 1 root   root 2.4K Jun 20 19:04 ora.mxrac03.vip.log
-rw-r--r-- 1 root   root 1.5K Apr  2  2010 ora.mxrac04.vip.log
-rw-r--r-- 1 root   root  247 Apr  2  2010 ora.mxrac05.vip.log
drwxrwxrwt 2 oracle dba  4.0K Nov 18  2009 racgeut
drwxrwxrwt 2 oracle dba  4.0K Nov 18  2009 racgevtf
drwxrwxrwt 2 oracle dba  4.0K Nov 18  2009 racgmain

 

 

RACG --

mxrac01$ls
admin  alertmxrac01.log  client  crsd  cssd  evmd  racg

在RAC裡有在CRS的日誌目錄裡有一個子目錄名字RACG, 在此目錄下有關於ons,vip和gsd的一些日誌

mxrac01$ls
evtf.log           ora.mxrac01.ons.log  ora.mxrac03.vip.log  ora.mxrac05.vip.log  racgevtf
ora.mxdell.db.log  ora.mxrac01.vip.log  ora.mxrac04.vip.log  racgeut              racgmain

Oracle文件的解釋:
RACG—
Extends clusterware to support Oracle-specific requirements and complex resources.
Runs server callout scripts when FAN events occur.

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/35489/viewspace-678252/,如需轉載,請註明出處,否則將追究法律責任。

相關文章