9.Oracle Clusterware Administration

Oracle Clusterware: Overview

• Portable cluster infrastructure that provides HA to RAC databases and/or other applications:

– Monitors applications’ health - 監控應用執行情況

– Restarts applications on failure - 重啟失敗的應用

– Can fail over applications on node failure - 將應用failover到其他節點

Oracle Clusterware Run-Time View

在Unix平臺上，clusterware stack的啟動是根據/etc/inittab檔案中定義的respawn條目

A basic description of each process:

• Cluster Synchronization Services Daemon (OCSSD) - 透過network interconnect和voting disk實現節點間的健康監控，一旦發現異常，則重啟問題節點以避免發生data corruption或split brain。由oracle使用者執行，vendor或nonvendor clusterware情況下都使用。

• Process Monitor Daemon (OPROCD) - nonvendor情況下生成，以root執行，用於檢測機器硬體或驅動層面的freeze。Linux和windows上使用kernel driver（如hangcheck-timer）來取代這個程式的作用。

• Cluster Ready Services Daemon (CRSD) - Oracle Clusterware高可用的核心，負責管理註冊應用的start、stop、monitor和failover。CRS為每個例項起一個專用的RACGIMON程式負責監控各應用的執行情況，CRSD進行負責管理OCR中的各資源的配置及狀態。以root使用者執行，在程式失敗後自動重啟。此外，CRSD還可臨時生成以下程式來執行某些操作：

- racgeut (Execute Under Timer), to kill actions that do not complete after a certain amount of time

- racgmdb (Manage Database), to start/stop/check instances

- racgchsn (Change Service Name), to add/delete/check service names for instances

- racgons, to add/remove ONS configuration to OCR

- racgvip, to start/stop/check instance virtual IP

• Event Management Daemon (EVMD) - 負責事件發生時event的傳送，會生成一個evmlogger的永久進行，並在需要時生成racgevtf程式來執行callouts。以oracle使用者執行，程式失敗後自動重啟。

Manually Control Oracle Clusterware Stack

# crsctl stop crs

# crsctl start crs

# crsctl disable crs

# crsctl enable crs

以root身份執行，用於一些計劃停機場合。為使節點重啟時不自動啟動crs，必須首先disable掉crs。Disable命令並不停止當前執行的crs。

CRS Resources

CRS管理的應用被成為resource。它的profile及相關屬性存放在OCR中。Profile中的屬性包含Check interval、Action script、Dependencies、Failure policies、Privileges等。每個resource必須擁有action script，其中必須包含startstopcheck操作。

一個resource的生命週期一般包含：

建立或修改profile（crs_profile） -> 將resource註冊到OCR中（crs_register）->啟動（crs_start） ->檢視狀態（crs_stat）->重新分配到其他節點（crs_relocate）->停止（crs_stop）->取消註冊（crs_unregister）

RAC Resources

可使用crs_stat –t來檢視各resource的執行情況。在預設情況下，Oracle Clusterware管理的resouce包括databases、database and asm instances、VIP/ONS/GSD/Listener（統稱為nodeapps）、services和service members等。

當某個resouce的state為unknown時，可使用crs_stop –f resoucename命令強行終止。

可使用crs_stat –p resourcename來檢視某個resouce的各項屬性。

Main Voting Disk Function

CSS使用private network和voting disk來管理節點的驅逐策略。各節點間透過彼此的心跳來檢測各個節點是否處於正常狀態。當節點間的網路連線出現問題時，各節點透過voting disk來交換狀態資訊，從而決定哪個節點需要被驅逐。被驅逐的節點從voting disk接收到驅逐資訊後自行重啟。

在資料庫各例項間也有類似機制。有時在各節點間網路正常的情況下也需要驅逐某個例項（被稱為Instance Membership Reconfiguration (IMR)），這一功能透過控制檔案來實現。

Important CSS Parameters

• MISSCOUNT: - 網路心跳超時以及reconfiguration時的磁碟I/O超時

– Represents network heartbeat timeouts

– Determines disk I/O timeouts during reconfiguration

– Defaults to 30 seconds (60 for Linux)

– Defaults to 600 when using vendor (non-Oracle) clusterware

– Should not be changed

• DISKTIMEOUT: - 非reconfiguration時的磁碟I/O超時，可臨時修改

– Represents disk I/O timeouts outside reconfiguration

– Defaults to 200 seconds

– Can be temporarily changed when experiencing very long I/O latencies to voting disks:

1. Shut down Oracle Clusterware on all nodes but one.

2. As root on available node, use: crsctl set css disktimeout M+1

3. Reboot available node.

4. Restart all other nodes.

Multiplexing Voting Disks

Voting disk如要作冗餘，則至少要三份。

使用以下公式決定冗餘度：v=f*2+1 - v代表voting disk的個數，f代表可以容忍的問題磁碟個數。

Change Voting Disk Configuration

# crsctl add css votedisk

# crsctl delete css votedisk

# crsctl add css votedisk -force

# crsctl delete css votedisk -force

OU教材上說可以動態增加或刪除voting disk的副本，但在linux上實驗沒有成功。實驗中只能透過加force選項完成。命令執行前必須停止clusterware。

Back Up and Recover Your Voting Disks

由於voting disk的路徑配置在OCR中，難以直接修改，因此推薦使用link檔案，使得當原始路徑不可用時能較容易地恢復。

• Back up one voting disk by using the dd command. 使用dd命令進行備份，可聯機備份。

– After Oracle Clusterware installation

– After node addition or deletion

– Can be done online

$ crsctl query css votedisk

$ dd if= of= bs=4k

OCR Architecture

OCR記錄了叢集的配置資訊。在叢集中的每個節點上，都有一份OCR的複製保留在記憶體中，同時有一個CRSD程式對OCR cache進行訪問。但只有一個CRSD程式實際地讀取和修改OCR檔案。該進行負責重新整理所在節點以及其他節點上的OCR cache。當本地可戶端需要讀取OCR的資訊時，直接和本地的CRS程式進行互動並讀取。當客戶端需要修改OCR時，它們透過本地的CRS程式，與負責修改OCR的CRS程式進行通訊並完成修改。

主要的客戶端包括OUI、srvctl、EM、DBCA、DBUA、NetCA以及VIPCA等。

OCR記錄並維護resource間的依賴關係以及狀態資訊。在安裝cluserware時，OUI會提示是否需要對OCR作鏡象。在UNIX平臺，/etc/oracle/ocr.loc檔案中記錄了OCR以及其鏡象檔案的所在路徑。

OCR Contents and Organization

OCR中的資訊以key-value的形式按照樹狀結構儲存。

The main branches composing the OCR structure:

• The SYSTEM keys - 儲存Oracle Clusterware程式（CSSDCRSDEVMD等）相關的資料。

• The DATABASE keys - 儲存RAC db（instancenodeappsservices等）相關資料。

• The CRS keys - 儲存其他註冊應用的相關資料。

Managing OCR Files and Locations: Overview

使用ocrconfig命令對OCR進行維護：

• 使用-export/import匯出或匯入邏輯備份

• Upgrade/downgrade OCR

• 使用-showbackup檢視系統自動生成的備份，使用-backuploc修改自動備份生成的路徑，使用-restore恢復備份

• 使用-replace ocr/ocrmirror來增加、修改或刪除OCR及OCR鏡象

• 在Oracle support的支援下，在一個或多個節點由於OCR corruption無法啟動時，使用-overwrite命令修改OCR保護機制

• 使用-repair修改OCR或OCR鏡象的相關引數

ocrcheck - 驗證OCR及鏡象的完整性

ocrdump - 將OCR內容dump成文字或XML檔案

Automatic OCR Backups

OCR檔案自動備份頻率（無法修改）：

每四小時一次（保留最近三份備份）

每天結束時（保留最近兩份備份）

每週結束時（保留最近兩份備份）

修改預設備份路徑：# ocrconfig –backuploc /shared/bak

Back Up OCR Manually

• Daily backups of your automatic OCR backups to a different storage device:

– Use your favorite backup tool.

• Logical backups of your OCR before and after making significant changes:

# ocrconfig –export file name

• Make sure that you restore OCR backups that match your current system configuration.

Recover OCR Using Physical Backups

1. Locate a physical backup:

$ ocrconfig –showbackup

2. Review its contents:

# ocrdump –backupfile file_name

3. Stop Oracle Clusterware on all nodes:

# crsctl stop crs

4. Restore the physical OCR backup:

# ocrconfig –restore /cdata/jfv_clus/day.ocr

5. Restart Oracle Clusterware on all nodes:

# crsctl start crs

6. Check OCR integrity:

$ cluvfy comp ocr -n all

Recover OCR Using Logical Backups

1. Locate a logical backup created using an OCR export.

2. Stop Oracle Clusterware on all nodes.

3. Restore the logical OCR backup:

# ocrconfig –import /shared/export/ocrback.dmp

4. Restart Oracle Clusterware on all nodes.

5. Check OCR integrity:

$ cluvfy comp ocr -n all

Replace an OCR Mirror: Example

# ocrconfig –replace ocrmirror /oradata/OCR2

/etc/oracle/ocr.loc檔案同步修改，經實驗，檔案必須存在才能增加OCR或鏡象。可用於新增、修改、刪除OCR及鏡象檔案。

Repair OCR Configuration: Example

當不同節點間對OCR配置資訊存在不一致時（一般發生在某個節點未起的情況下進行了OCR/ocrmirror的配置修改），需要使用-repair命令來進行修復。Repair操作必須在crs停止的情況下進行。該命令僅修改OCR配置資訊（ocr.loc檔案），不對OCR 本身進行修復。

1. Stop Oracle Clusterware on Node2:

# crsctl stop crs

2. Add OCR mirror from Node1:

# ocrconfig –replace ocrmirror /OCRMirror

3. Repair OCR mirror location on Node2:

# ocrconfig –repair ocrmirror /OCRMirror

4. Start Oracle Clusterware on Node2:

# crsctl start crs

OCR Considerations

• If using raw devices to store OCR files, make sure they exist before add or replace operations.

• You must be the root user to be able to add, replace, or remove an OCR file while using ocrconfig.

• While adding or replacing an OCR file, its mirror needs to be online.

• If you remove a primary OCR file, the mirror OCR file becomes primary.

• Never remove the last remaining OCR file.

Change VIP Addresses

1. Determine the interface used to support your VIP:

$ ifconfig -a

2. Stop all resources depending on the VIP:

$ srvctl stop instance -d DB -i DB1

$ srvctl stop asm -n node1

# srvctl stop nodeapps -n node1

3. Verify that the VIP is no longer running:

$ ifconfig -a + $ crs_stat

4. Change IP in /etc/hosts and DNS.

5. Modify your VIP address using srvctl:

# srvctl modify nodeapps -n node1 –A 192.168.2.125/255.255.255.0/eth0

6. Start nodeapps and all resources depending on it:

# srvctl start nodeapps -n node1

7. Repeat from step 1 for the next node.

如修改了虛擬主機名。則相應的listener以及tnsnames也需要修改。

Change Public/Interconnect IP Subnet Configuration: Example

1. You get the current interfaces information by using the getif option.

$ /bin/oifcfg getif

eth0 139.2.156.0 global public

eth1 192.168.0.0 global cluster_interconnect

2. You delete the entry corresponding to public interface first by using the delif option, and

then enter the correct information by using the setif option.

$ oifcfg delif -global eth0

$ oifcfg setif –global eth0/139.2.166.0:public

3. You do the same for your private interconnect.

$ oifcfg delif –global eth1

$ oifcfg setif –global eth1/192.168.1.0:cluster_interconnect

4. You check that the new information is correct.

$ oifcfg getif

eth0 139.2.166.0 global public

eth1 192.168.1.0 global cluster_interconnect

Third-Party Application Protection: Overview

• High Availability framework: - 將應用納入CRS以提供高可用性

– Command-line tools to register applications with CRS

– Calls control application agents to manage applications

– OCR used to describe CRS attributes for the applications

• High Availability C API:

– Modify directly CRS attributes in OCR

– Modify CRS attributes on the fly

• Application VIPs: - 應用虛擬IP，提供failover功能（VIP與應用同時failover）

– Used for applications accessed by network means

– NIC redundancy

– NIC failover

• OCFS: - 為應用提供共享儲存

– Store application configuration files

– Share files between cluster nodes

Application VIP and RAC VIP Differences

• RAC VIP is mainly used in case of node down events: - resource無須failover，RAC VIP被接管後，並不再接受連線，而只是立刻傳送失敗資訊以便於客戶端更換地址重連。

– VIP is failed over to a surviving node.

– From there it returns NAK to clients forcing them to reconnect.

– There is no need to fail over resources associated to the VIP.

• Application VIP is mainly used in case of application down events:

– VIP is failed over to another node together with the application(s).

– From there, clients can still connect through the VIP.

– Although not recommended, one VIP can serve many applications.

Use CRS Framework: Overview

1. Create an application VIP, if necessary:

a) Create a profile: Network data + usrvip predefined script

b) Register the application VIP.

c) Set user permissions on the application VIP.

d) Start the application VIP by using crs_start.

2. Write an application action script that accepts three parameters:

• start: Script should start the application.

• check: Script should confirm that the application is up.

• stop: Script should stop the application.

3. Create an application profile:

• Action script location

• Check interval

• Failover policies

• Application VIP, if necessary

4. Set permissions on your application.

5. Register the profile with Oracle Clusterware.

6. Start your application by using crs_start.

Prevent Automatic Instance Restarts

將as值改為2可disable掉自動重啟（1-自動重啟，0-保持重啟前的狀態）

# crs_register ora.….inst -update -o as=2,ra=1,ut=7d

10.Diagnosing Oracle Clusterware and RAC Components

The One Golden Rule in RAC Debugging

• Always make sure that your nodes have exactly the same system time to:

– Facilitate log information analysis

– Ensure accurate results when reading GV$ views

– Avoid untimely instance evictions

• The best recommendation is to synchronize nodes using Network Time Protocol.

避免節點間時鐘不同步引發的問題

Oracle Clusterware Main Log Files

The main directories used by Oracle Clusterware to store its log files:

• CRS日誌在$ORA_CRS_HOME/log//crsd/下，每10MB一歸檔

• CSS 日誌在$ORA_CRS_HOME/log//cssd/下，每20MB一歸檔

• EVM 日誌在$ORA_CRS_HOME/log//evmd下

• 各個resource的日誌，格式為imon_.log，每10MB一歸檔，在

$ORA_CRS_HOME/log//racg 及$ORACLE_HOME/log//racg下

RACG下各可執行命令的日誌在該路徑下與命令同名的子目錄中。

• SRVM (srvctl) 和OCR (ocrdump, ocrconfig, ocrcheck) 等客戶端工具的日誌在

$ORA_CRS_HOME/log//client/及$ORACLE_HOME/log//client/下

• Oracle Clusterware的重要日誌在$ORA_CRS_HOME/log/下，檔名為alert.log

Diagnostics Collection Script

• Script to collect all important log files:

– Must be executed as root

– Is located in $ORA_CRS_HOME/bin/

– Is called diagcollection.pl

• Generates the following files in the local directory:

– basData_.tar.gz

– crsData _. tar.gz

– ocrData _. tar.gz

– oraData _. tar.gz

Cluster Verify: Overview

cluvfy是一種提供各項叢集驗證功能的工具，在10.2版本中包含該工具。Cluvfy命令可分為兩種：

• Stage commands - 在軟體安裝、資料庫建立、配置項更改等各場景前後進行系統環境檢查

• Component commands - 對單個元件的狀態及完整性等進行驗證

Cluster Verify Stages

• -post hwos: Postcheck for hardware and operating system

• -pre cfs: Precheck for CFS setup

• -post cfs: Postcheck for CFS setup

• -pre crsinst: Precheck for CRS installation

• -post crsinst: Postcheck for CRS installation

• -pre dbinst: Precheck for database installation

• -pre dbcfg: Precheck for database configuration

Cluster Verify Components

• nodereach: Checks reachability between nodes

• nodecon: Checks node connectivity

• cfs: Checks Oracle Cluster File System integrity

• ssa: Checks shared storage accessibility

• space: Checks space availability

• sys: Checks minimum system requirements

• clu: Checks cluster integrity

• clumgr: Checks cluster manager integrity

• ocr: Checks OCR integrity

• crs: Checks CRS integrity

• nodeapp: Checks node applications existence

• admprv: Checks administrative privileges

• peer: Compares properties with peers

[@more@]

[RAC]ORACLE Database 10g RAC for Administrators學習筆記（五）

相關文章