有關ocssd程式的問題解決:

orchidllh發表於2005-03-02

昨天有一個資料庫的使用者提出,在資料庫伺服器上的/var/log/messages檔案中,每5秒鐘寫一些日誌,內容:
Feb 27 08:11:44 bj su(pam_unix)[7692]: session opened for user oracle by (uid=0)
Feb 27 08:11:44 bj su(pam_unix)[7692]: session closed for user oracle
Feb 27 08:11:44 bj logger: Failure in CSS initialization opening OCR.
Feb 27 08:11:49 bj su(pam_unix)[7731]: session opened for user oracle by (uid=0)
Feb 27 08:11:49 bj su(pam_unix)[7731]: session closed for user oracle
Feb 27 08:11:49 bj logger: Failure in CSS initialization opening OCR.


我檢查了另外一個10g的資料庫伺服器,相同的檔案:
Feb 28 10:02:27 bj sshd(pam_unix)[5985]: session opened for user lisa by (uid=502)
Feb 28 10:06:40 bj sshd(pam_unix)[5985]: session closed for user lisa
Feb 28 15:31:17 bj sshd(pam_unix)[6115]: session opened for user lisa by (uid=502)
Feb 28 15:32:09 bj sshd(pam_unix)[6115]: session closed for user lisa
Mar  1 10:19:54 bj sshd(pam_unix)[15042]: session opened for user lisa by (uid=502)
Mar  1 10:54:29 bj su(pam_unix)[15086]: session opened for user root by lisa(uid=502)
Mar  1 10:54:33 bj su(pam_unix)[15119]: session opened for user oracle by lisa(uid=0)
Mar  1 12:12:30 bj su(pam_unix)[15189]: session opened for user root by lisa(uid=501)

記錄的是一些使用者登入的資訊,以及使用者su的資訊,其中前面的程式碼是程式的ID,後面的程式碼是使用者的ID。

檢視有問題的資料庫伺服器的bdump目錄和udump目錄,以及alert.log檔案,均沒有發現異常記錄。
檢視系統程式:
[oracle@db1 udump]$ ps -ef | grep css
root      5716     1  0 Jan11 ?        00:00:00 /bin/sh /etc/init.d/init.cssd run
root      5721  5716  0 Jan11 ?        00:13:17 /bin/sh /etc/init.d/init.cssd startcheck
oracle   17210  5844  0 14:02 pts/2    00:00:00 grep css

正確的資料庫伺服器上的系統程式:
[root@bj log]# ps -ef | grep css
root      4669     1  0  2004 ?        00:00:00 /bin/su oracle -c exec /home/oracle/product/10.1.0/db_1/bin/ocssd
oracle    4771  4669  0  2004 ?        00:25:53 /home/oracle/product/10.1.0/db_1/bin/ocssd.bin
root     15278 15225  0 14:05 pts/0    00:00:00 grep css

隨即我檢視了/etc/init.d/init.cssd,沒有什麼收穫,太長了,我沒有仔細看。

察看oracle的文件有關css的部分:
Oracle Cluster Synchronization Services (CSS) is a daemon process that is configured by the root.sh script when you install Oracle Database 10g for the first time. It is configured to start every time the system boots. This daemon process is required to enable synchronization between Oracle ASM and database instances. It must be running if an Oracle database is using ASM for database file storage.
CSS是一個後臺程式,安裝的時候預設安裝的,系統啟動的時候自動啟動,用來做ASM和資料庫例項的同步,如果使用ASM則必須要使用這個程式。

先放了一半的心,因為現在的資料庫並沒有使用ASM,實在不行還可以把它停掉。

然後檢視了oracle文件中有關Reconfiguring Oracle Cluster Synchronization Services 部分,摘錄如下:
1、Identifying Oracle Database 10g Oracle Homes
To identify all of the Oracle Database 10g Oracle home directories, enter one of the following commands:
$ more /etc/oratab

這是在我的伺服器上的結果
[root@bj log]# more /etc/oratab
#

# This file is used by ORACLE utilities.  It is created by root.sh
# and updated by the Database Configuration Assistant when creating
# a database.

# A colon, ':', is used as the field terminator.  A new line terminates
# the entry.  Lines beginning with a pound sign, '#', are comments.
#
# Entries are of the form:
#   $ORACLE_SID:$ORACLE_HOME::
#
# The first and second fields are the system identifier and home
# directory of the database respectively.  The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
# *:/home/oracle/product/10.1.0/db_1:N
$ORACLE_SID:/home/oracle/product/10.1.0/db_1:N
*:/home/oracle/product/10.1.0/db_1:N
$ORACLE_SID:/home/oracle/product/10.1.0/db_1:N

From the output, identify any Oracle home directories where Oracle Database 10g is installed. Oracle homes that contain Oracle Database 10g typically have paths similar to the following. However, they might use different paths.
/mount_point/app/oracle/product/10.1.0/db_n
If there is only one Oracle home directory that contains Oracle Database 10g, see the "Deleting the Oracle CSS Daemon Configuration" section for information about deleting the Oracle CSS daemon configuration.
If you identify more than one Oracle Database 10g Oracle home directory, see the following section for information about reconfiguring the Oracle CSS daemon.

2、Reconfiguring the Oracle CSS Daemon
To reconfigure the Oracle CSS daemon so that it runs from an Oracle home that you are not removing, follow these steps:
In all Oracle home directories on the system, stop all Oracle ASM instances and any Oracle Database instances that use ASM for database file storage.
Switch user to root.
Depending on your operating system, enter one of the following commands to identify the Oracle home directory being used to run the CSS daemon:
# more /etc/oracle/ocr.loc
The output from this command is similar to the following:
ocrconfig_loc=/u01/app/oracle/product/10.1.0/db_1/cdata/localhost/local.ocr
local_only=TRUE

這是在我的伺服器上的結果
[root@bj log]# more /etc/oracle/ocr.loc
ocrconfig_loc=/home/oracle/product/10.1.0/db_1/cdata/localhost/local.ocr
local_only=TRUE

The ocrconfig_loc parameter specifies the location of the Oracle Cluster Registry (OCR) used by the CSS daemon. The path up to the cdata directory is the Oracle home directory where the CSS daemon is running (/Volumes/u01/app/oracle/product/10.1.0/db_1 in this example).
Note:
If the value for the local_only parameter is FALSE, Oracle CRS is installed on this system. See the Oracle Real Application Clusters Installation and Configuration Guide for information about removing RAC or CRS. 
If this Oracle home directory is not the Oracle home that you want to remove, you can continue to the "Removing Oracle Software" section.
Change directory to the Oracle home directory for an Oracle Database 10g installation that you are not removing.
Set the ORACLE_HOME environment variable to specify the path to this Oracle home directory:
Bourne, Bash, or Korn shell:
# ORACLE_HOME=/u01/app/oracle/product/10.1.0/db_2;
# export ORACLE_HOME
C shell:
# setenv ORACLE_HOME /u01/app/oracle/product/10.1.0/db_2
Enter the following command to reconfigure the CSS daemon to run from this Oracle home:
# $ORACLE_HOME/bin/localconfig reset $ORACLE_HOME
The script stops the Oracle CSS daemon, reconfigures it in the new Oracle home, and then restarts it. When the system boots, the CSS daemon starts automatically from the new Oracle home.
To remove the original Oracle home directory, see the "Removing Oracle Software" section.

3、Deleting the Oracle CSS Daemon Configuration
To delete the Oracle CSS daemon configuration, follow these steps:
Note:
Delete the CSS daemon configuration only if you are certain that no other Oracle Database 10g installation requires it. 
Remove any databases or ASM instances associated with this Oracle home. See the preceding sections for information about how to complete these tasks.
Switch user to root.
Change directory to the Oracle home directory that you are removing.
Set the ORACLE_HOME environment variable to specify the path to this Oracle home directory:
Bourne, Bash, or Korn shell:
# ORACLE_HOME=/u01/app/oracle/product/10.1.0/db_1;
# export ORACLE_HOME
C shell:
# setenv ORACLE_HOME /u01/app/oracle/product/10.1.0/db_1
Enter the following command to delete the CSS daemon configuration from this Oracle home:
# $ORACLE_HOME/bin/localconfig delete
The script stops the Oracle CSS daemon, then deletes its configuration. When the system boots, the CSS daemon no longer starts.


那麼可以試著重新設定或者刪除css程式的配置,但是這兩個操作需要用root使用者來做,但是那臺錯誤的伺服器,我並沒有root的口令,並且我也沒有什麼把握。

於是我開始檢查我的其他的兩臺安裝10g的伺服器:
第一臺伺服器:
[lisa@localhost lisa]$ ps -ef | grep css
lisa      3336  3294  0 14:39 pts/0    00:00:00 grep css
什麼程式也沒有,呵呵

網上有人提到,將這個檔案的最後一樣去掉,就可以將occsd.bin的程式去掉,但是不提倡這樣做:
[lisa@localhost lisa]$ cat /etc/inittab
......
# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon
h1:35:respawn:/etc/init.d/init.cssd run >/dev/null 2>&1

檢視/etc/oracle/ocr.loc和/etc/oratab,都沒有什麼問題,和正確的伺服器上的配置是相同的。

檢視日誌檔案,現象是每5分鐘要執行crsstart,我理解是要啟動ocssd程式:
[root@localhost log]# tail messages
Mar  1 14:39:39 localhost logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 14:39:39 localhost logger:    Could not access /etc/oracle/scls_scr/localhost.localdomain/root/crsstart.
Mar  1 14:39:39 localhost logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 14:39:39 localhost logger:    Could not access /etc/oracle/scls_scr/localhost.localdomain/root/crsstart.
Mar  1 14:39:39 localhost logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 14:39:39 localhost logger:    Could not access /etc/oracle/scls_scr/localhost.localdomain/root/crsstart.
Mar  1 14:39:39 localhost logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 14:39:39 localhost logger:    Could not access /etc/oracle/scls_scr/localhost.localdomain/root/crsstart.
Mar  1 14:39:39 localhost init: Id "h1" respawning too fast: disabled for 5 minutes
Mar  1 14:41:48 localhost su(pam_unix)[3482]: session opened for user root by lisa(uid=502)

/etc/oracle/scls_scr/這個目錄下面並沒有localhost.localdomain這個目錄。
檢視環境變數:
[root@localhost scls_scr]# env
HOSTNAME=localhost.localdomain
應該是HOSTNAME不對造成的,於是修改HOSTNAME。
由於修改HOSTNAME操作遇到一點兒問題,所以,我當時打算放棄了,注掉了/etc/inittab最後一行,企圖停止程式啟動。

[lisa@localhost lisa]$ cat /etc/inittab
......
# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon
#h1:35:respawn:/etc/init.d/init.cssd run >/dev/null 2>&1

但是注掉這一行以後,並沒有如我所願的沒有再寫日誌,問題還是一如既往地存在著。

如果我直接執行那個檔案,提示沒有許可權(不論用root還是oracle):
[root@localhost log]# /etc/oracle/scls_scr/*/root/crsstart
bash: /etc/oracle/scls_scr/****/root/crsstart: Permission denied

最後在網管的指導下,成功修改了HOSTNAME(呵呵,汗顏)

再檢查日誌檔案:
[root@bj34 log]# tail messages
Mar  1 15:23:01 localhost su(pam_unix)[8583]: session opened for user oracle by (uid=0)
Mar  1 15:23:01 localhost su(pam_unix)[8583]: session closed for user oracle
Mar  1 15:23:01 localhost logger: Failed 3 to bind listening endpoint: (ADDRESS=(PROTOCOL=tcp)(HOST=**))
Mar  1 15:23:05 localhost su(pam_unix)[8623]: session opened for user root by lisa(uid=502)
Mar  1 15:23:06 localhost su(pam_unix)[8655]: session opened for user oracle by (uid=0)
Mar  1 15:23:06 localhost su(pam_unix)[8655]: session closed for user oracle
Mar  1 15:23:06 localhost logger: Failed 3 to bind listening endpoint: (ADDRESS=(PROTOCOL=tcp)(HOST=**))
Mar  1 15:23:11 localhost su(pam_unix)[8695]: session opened for user oracle by (uid=0)
Mar  1 15:23:11 localhost su(pam_unix)[8695]: session closed for user oracle
Mar  1 15:23:11 localhost logger: Failed 3 to bind listening endpoint: (ADDRESS=(PROTOCOL=tcp)(HOST=**))
仍然提示了錯誤,但是問題改變了,每5秒鐘寫一次,應該還是機器名配置的問題

檢視系統程式:
[root@bj34 log]# ps -ef | grep css
root      4722     1  0 15:14 ?        00:00:00 /bin/su -l oracle -c exec /home/oracle/product/10.1.0/db_1/bin/ocssd
oracle    9834  4722  0 15:25 ?        00:00:00 /home/oracle/product/10.1.0/db_1/bin/ocssd.bin
root      9957  9921  0 15:27 pts/0    00:00:00 grep css

程式上倒是對了的。

這次我修改了/etc/hosts,問題就解決了:
沒有再寫日誌檔案。


第二臺伺服器:
系統程式:
[root@bj72 root]# ps -ef | grep css
root      5498     1  0 15:50 ?        00:00:00 /bin/sh /etc/init.d/init.cssd run
root      5501  5498  0 15:50 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root      5515  5465  0 15:51 pts/0    00:00:00 grep css

[root@bj72 lisa]# tail /var/log/messages
Mar  1 15:45:50 bj72 logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 15:45:50 bj72 logger:    Could not access /etc/oracle/scls_scr/****/root/crsstart.
Mar  1 15:45:50 bj72 logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 15:45:50 bj72 logger:    Could not access /etc/oracle/scls_scr/****/root/crsstart.
Mar  1 15:45:50 bj72 logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 15:45:50 bj72 logger:    Could not access /etc/oracle/scls_scr/****/root/crsstart.
Mar  1 15:45:50 bj72 logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 15:45:50 bj72 logger:    Could not access /etc/oracle/scls_scr/****/root/crsstart.
Mar  1 15:45:50 bj72 init: Id "h1" respawning too fast: disabled for 5 minutes
Mar  1 15:47:55 bj72 su(pam_unix)[5464]: session opened for user root by lisa(uid=502)

也是提示錯誤的,但是錯誤的情況不同
在/etc/oracle/scls_scr/目錄下面沒有****這個目錄,檢視環境變數,****為HOSTNAME,由於這臺伺服器在資料庫已經安裝完畢,且執行了一段時間以後遷移到其他機房,並更換了IP和HOSTNAME,想必是這個原因引起的。

這次不能修改HOSTNAME了,所以我把****目錄重新命名為新的HOSTNAME,五分鐘後:
[root@bj72 root]# tail /var/log/messages
Mar  1 15:45:50 bj72 logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 15:45:50 bj72 logger:    Could not access /etc/oracle/scls_scr/****/root/crsstart.
Mar  1 15:45:50 bj72 init: Id "h1" respawning too fast: disabled for 5 minutes
Mar  1 15:47:55 bj72 su(pam_unix)[5464]: session opened for user root by lisa(uid=502)
Mar  1 15:50:51 bj72 su(pam_unix)[5505]: session opened for user oracle by (uid=0)
Mar  1 15:50:51 bj72 su(pam_unix)[5505]: session closed for user oracle
Mar  1 15:51:51 bj72 su(pam_unix)[5498]: session opened for user oracle by (uid=0)
Mar  1 15:51:52 bj72 su(pam_unix)[5498]: session closed for user oracle
Mar  1 15:51:52 bj72 su(pam_unix)[5532]: session opened for user oracle by (uid=0)
Mar  1 15:51:52 bj72 su(pam_unix)[5532]: session closed for user oracle

問題發生了變化,但是仍然存在。這次我試來試去都不行,決定把程式停掉:
首先
[root@bj72 etc]# cp inittab.no_cssd inittab
感覺上就是把最後一行刪掉了而已,仍舊寫日誌。

執行:
[root@bj72 bin]# $ORACLE_HOME/bin/localconfig reset $ORACLE_HOME
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Oracle Cluster Registry for cluster has been initialized

Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Adding to inittab
/home/oracle/product/10.1.0/db_1/bin/localconfig: line 1:  /bin/cp: No such file or directory
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.
Giving up: Oracle CSS stack appears NOT to be running.
Oracle CSS service would not start as installed
Automatic Storage Management(ASM) cannot be used until Oracle CSS service is started

再檢視程式,已經沒有了:
[root@bj72 bin]# ps -ef | grep css
root      6279  5465  0 16:30 pts/0    00:00:00 grep css

檢視日誌檔案,可以看到重新配置的過程,後來也沒有再寫:
[root@bj72 bin]# tail  /var/log/messages
Mar  1 16:17:19 bj72 su(pam_unix)[6125]: session opened for user oracle by (uid=0)
Mar  1 16:17:20 bj72 su(pam_unix)[6125]: session closed for user oracle
Mar  1 16:17:20 bj72 su(pam_unix)[6156]: session opened for user oracle by (uid=0)
Mar  1 16:17:20 bj72 su(pam_unix)[6156]: session closed for user oracle
Mar  1 16:18:20 bj72 su(pam_unix)[6149]: session opened for user oracle by (uid=0)
Mar  1 16:18:21 bj72 su(pam_unix)[6149]: session closed for user oracle
Mar  1 16:18:21 bj72 su(pam_unix)[6178]: session opened for user oracle by (uid=0)
Mar  1 16:18:21 bj72 su(pam_unix)[6178]: session closed for user oracle
Mar  1 16:19:09 bj72 lisa: (Oracle CSSD will be run out of init)
Mar  1 16:19:09 bj72 init: Re-reading inittab

根據我掌握的這三臺伺服器的情況看,使用者所提出的問題應該是HOSTNAME修改造成的,於是建議使用者用root執行
 $ORACLE_HOME/bin/localconfig reset $ORACLE_HOME

執行的結果:
[root@db1 oracle]# $ORACLE_HOME/bin/localconfig reset $ORACLE_HOME
nThe following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /home1/oracle/product/10.1.0/db_1
Failure at scls_scr_create with code 1
Internal Error Information:
  Category: 1234
  Operation: scls_scr_create
  Location: mkdir
  Other: Unable to make user dir
  Dep: 2
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Oracle Cluster Registry for cluster has been initialized

Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Adding to inittab
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
        db1
CSS is active on all nodes.
Oracle CSS service is installed and running under init(1M)

檢視程式:
[oracle@db1 oracle]$ ps -ef | grep css
root      5716     1  0 Jan11 ?        00:00:00 /bin/su -l oracle -c exec /home1/oracle/product/10.1.0/db_1/bin/ocssd
oracle    9933  5716  0 17:15 ?        00:00:00 /home1/oracle/product/10.1.0/db_1/bin/ocssd.bin

也沒有再寫日誌檔案,至此問題解決。

綜上所述,在資料庫伺服器安裝完畢以後,如果修改了HOSTNAME,會導致ocssd程式啟動錯誤,因為程式啟動的目錄是寫死了機器名的,執行
$ORACLE_HOME/bin/localconfig reset $ORACLE_HOME
重新配置引數就可以解決了。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/51862/viewspace-180537/,如需轉載,請註明出處,否則將追究法律責任。

相關文章