【效能調整】等待事件(九) latch原理

Latch基本原理

Latch是用於保護SGA區中共享資料結構的一種序列化鎖定機制。Latch的實現是與作業系統相關的，尤其和一個程式是否需要等待一個latch、需要等待多長時間有關。

Latch是一種能夠極快地被獲取和釋放的鎖，它通常用於保護描述buffer cache中block的資料結構。與每個latch相聯絡的還有一個清除過程，當持有latch的程式成為死程式時，該清除過程就會被呼叫。Latch還具有相關級別，用於防止死鎖，一旦一個程式在某個級別上得到一個latch，它就不可能再獲得等同或低於該級別的latch。

latch通過os原語簡單地通過spin以避免cpu context switch 來避免cpu消耗(當然達到spin_count次數還沒有成功則sleep,要注意對於單cpu系統spin_count沒有意義)。latch不具有fifo的特性,其設計思想的基礎就是該部分處理過程能快速完成極少可能miss。

Latch有三個引數：程式pid，記憶體地址，長度。Latch以獨佔的方式訪問sga中共享的資料結構，在多個session同時修改相同的結構時，保護訪問sga中資料結構的操作不corruption。

當一個程式準備訪問sga中的資料結構式，就要獲得一個latch，當程式獲得latch後，它將一直持有該latch直到它不再使用此資料結構，這時latch才會被釋放。可通過latch名稱來區分它所保護的不同資料結構。

Latch和鎖的區別

	Latch	Lock
目的	單一目的：保護獨佔訪問記憶體結構，從9i開始，cache buffers chains latch是隻讀可共享的	兩個目的：當lock模式為可並存的，允許多個程式共享相同的資源，當lock模式是不可並存的，則強制獨佔的訪問資源
使用範圍	只在sga中的資料結構上應用；保護臨時記憶體物件；控制訪問記憶體結構的單一操作；非事務相關。	保護資料庫物件，如table，data blocks和state objects；由應用程式驅動，用於控制訪問資料庫中的資料或後設資料；事務相關；
請求方式	可以有兩種模式的請求：willing-to-wait和no-wait	六種不同的模式：nul, row share, row exclusive, share, share row exclusive, or exclusive
範圍	資訊保持在記憶體中，並只能在本地例項中可見，即latch操作在例項級	資訊保持在資料庫中，對資料庫中所有例項可見，即lock操作在資料庫級
複雜性	使用簡單的指令實現，test-and-set，compare-and-swap，或者簡單cpu指令實現。輕量級	使用一系列的上下文交換的指令來實現，重量級的。
週期	短暫持有	一般為可延長一段時間持有（事務週期）
佇列	當一個程式請求latch失敗後會休眠一段時間，求情時非佇列的和非排序的（例外：latch wait list是一個佇列）。Latch遵循公平搶佔的原則。	當一個程式獲取鎖失敗時，請求是佇列的和排序的，除非是nowait選項。
死鎖	Latch不會產生死鎖	Lock支援排隊，可能會產生死鎖，當發生死鎖的時候會產生死鎖的trace log。

Latch相關的檢視

有三種latch：parent,child和單獨的latch。

父latch和單獨latch固化在oracle的核心程式碼中，而子latch是由例項啟動後建立的。檢視v$latch_parent和v$latch_children各自包含了父latch和子latch的資訊。而v$latch檢視包含了單獨的latch的資訊，也包括了父latch和其子latch的總計的資訊。

看一下v$latch檢視的解釋和欄位釋義（來自oracle 10g r2的官方文件《reference》）：

V$LATCH

V$LATCH shows aggregate latch statistics for both parent and child latches, grouped by latch name. Individual parent and child latch statistics are broken down in the views V$LATCH_PARENT and V$LATCH_CHILDREN.

Column	Datatype	Description
ADDR	RAW(4 \| 8)	Address of the latch object
LATCH#	NUMBER	Latch number
LEVEL#	NUMBER	Latch level
NAME	VARCHAR2(50)	Latch name
HASH	NUMBER	Latch hash
GETS	NUMBER	Number of times the latch was requested in willing-to-wait mode
MISSES	NUMBER	Number of times the latch was requested in willing-to-wait mode and the requestor had to wait
SLEEPS	NUMBER	Number of times a willing-to-wait latch request resulted in a session sleeping while waiting for the latch
IMMEDIATE_GETS	NUMBER	Number of times a latch was requested in no-wait mode
IMMEDIATE_MISSES	NUMBER	Number of times a no-wait latch request did not succeed (that is, missed)
WAITERS_WOKEN	NUMBER	This column has been deprecated and is present only for compatibility with previous releases of Oracle. No data is accumulated for this column; it will always have a value of zero.
WAITS_HOLDING_LATCH	NUMBER	This column has been deprecated and is present only for compatibility with previous releases of Oracle. No data is accumulated for this column; it will always have a value of zero.
SPIN_GETS	NUMBER	Willing-to-wait latch requests which missed the first try but succeeded while spinning
SLEEP[1 \| 2 \| 3]	NUMBER	These columns have been deprecated and are present only for compatibility with previous releases of Oracle. No data is accumulated for these columns; they will always have a value of zero. As a substitute for this column you can query the appropriate rows of theV$EVENT_HISTOGRAM view where the EVENT column has a value of latch free or latch:%.
SLEEP4	NUMBER	This column has been deprecated and is present only for compatibility with previous releases of Oracle. No data is accumulated for this column; it will always have a value of zero. As a substitute for this column you can query the appropriate rows of theV$EVENT_HISTOGRAM view where the EVENT column has a value of latch free or latch:%.
SLEEP[5 \| 6 \| 7 \| 8 \| 9 \| 10 \| 11]	NUMBER	These columns have been deprecated and are present only for compatibility with previous releases of Oracle. No data is accumulated for these columns.
WAIT_TIME	NUMBER	Elapsed time spent waiting for the latch (in microseconds)

Nowait模式請求的latch的資訊在immediate_gets和immediate_misses列，這些列在latch系列的v$latch,v$latch_parent,v$latch_children中都有。一般的，nowait模式先用在有多個子latch的latch，比如redo copy latch。如果一個程式在最初獲得一個子latch的嘗試時失敗，將繼續請求下一個，即使是在nowait模式下。當所有的nowait請求子latch都失敗了，才會轉到使用willing-to-wait模式。

使用willing-to-wait模式的latch請求的資訊在gets和misses列，當一個程式在willing-to-wait模式下請求一次latch，gets數量就增加一次。

如果latch在初次請求的時候可用，則程式獲得latch。在修改保護的資料結構之前，程式在latch的恢復區（recovery area）寫恢復資訊，以讓pmon知道如果擁有latch的程式die之後要清理哪些東西。

如果latch不可用，那麼程式就在cpu上spin一小段時間，然後再去嘗試獲得latch，這個spin和重試的次數由隱含引數_SPIN_COUNT來確定。

SQL> col name on format a30

SQL> col value on format a30

SQL> select x.ksppinm name, y.ksppstvl value

2 from sys.x$ksppi x, sys.x$ksppcv y

3 where x.indx = y.indx

4 and x.ksppinm = '_spin_count';

NAME VALUE

------------------------------ -------------------------

_spin_count 2000

SQL>

_spin_count的預設值為2000

簡單的將，如果在2000次重試中獲得了latch，那麼程式就在spin_gets和misses資訊中增加1，否則就post一個latch free等待事件到v$session_wait（9i如此，從10g開始latch的各種等待從latch free中分離），並讓出cpu，然後休眠。在休眠週期內，程式被喚醒並重新嘗試在另外一個_spin_count次數內獲得latch。這個spin（自旋），retry（重試），sleep（休眠）和week up（喚醒）操作在latch最後獲得之前會一直進行。Sleep的資訊（sleeps 和 sleep1,2,3）只有在當獲得latch成功之後update，而不是每次嘗試。（sleep4至sleep11不會被更新）

唯一的跳出請求latch的途徑是獲得latch，那麼當擁有latch的程式dead後會怎麼樣呢？

當程式在retry獲得latch多次失敗後，將會通知pmon程式檢查latch的擁有者，如果latch的擁有者已經dead，那麼pmon就清理掉dead的程式，並釋放latch。每個latch都會有一個等級號(level number 0-13)，不同的版本編號不同。固化在oracle 核心程式碼中的Solitary和parent latch都擁有一個level號，而例項啟動後建立的child latch則繼承其父latch的level。這個level主要是用來解決latch的死鎖。下面的兩個原則是latch用來如何避免死鎖的：

當一個程式請求一個nowait模式的latch，任何level被允許提供給擁有相同level的latch。

當一個程式請求一個willing-to-wait模式的latch，他的level必須比現在已經持有的所有latch的等級高。

長等待和短等待 latch

大多數的latch是short-wait latch，因此，程式獲得latch不應該耗費很長時間的等待。對這些latch，oracle程式使用指數基數的機制來休眠，意味著每個其他的等待時間是雙數(1,1,2,2,4,4,8,8,16,16,32,32,64,64百分之一秒等)。最大的休眠時間由隱含引數_max_exponential_sleep來確定的，預設值通常是2秒（10gr1為0），如果休眠的程式有一個獲得多個latch正持有，那麼就會被減少到_max_sleep_holding_latch的值，這個引數的預設值是4百分之一秒。一個擁有latch的程式不允許休眠太長時間，否則將會增加其他程式miss其latch請求的機率。

也有一些latch是長等待latch，這意味著他們通常持有更長的時間，且在這些latch上休眠的oracle程式依賴另外的程式來喚醒。常見的等待是latch wait posting，相關隱含引數為_latch_wait_posting,自9i開始廢除，而是用latch的waiters_worken列。

看看上面提到的幾個引數的值：

SQL> select * from v$version;

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

PL/SQL Release 10.2.0.1.0 - Production

CORE 10.2.0.1.0 Production

TNS for 32-bit Windows: Version 10.2.0.1.0 - Production

NLSRTL Version 10.2.0.1.0 - Production

SQL> select x.ksppinm name, y.ksppstvl value

2 from sys.x$ksppi x, sys.x$ksppcv y

3 where x.indx = y.indx

4 and x.ksppinm = '_max_exponential_sleep';

NAME VALUE

------------------------------ ------------------------------

_max_exponential_sleep 0

SQL> select x.ksppinm name, y.ksppstvl value

2 from sys.x$ksppi x, sys.x$ksppcv y

3 where x.indx = y.indx

4 and x.ksppinm = '_max_sleep_holding_latch';

NAME VALUE

------------------------------ ------------------------------

_max_sleep_holding_latch 4

SQL>

Latch分類

從9ir2開始，latch開始分類，並且每個類都能設定_spin_count值，這在之前的版本中是不行的，修改了_spin_count那麼所有latch的_spin_cpunt都改變。使用不同的latch分類來限定這個引數，可以提升一點cpu的使用率。比如說cache buffers chains latch有很高的sleeps，而cpu資源相對比較充足，那麼可以修改一個更高的_spin_count值，使得在獲得這個latch的時候多spin一下然後再sleep，這樣會減少misses和sleeps。檢視x$ksllclass(kernel service lock latches class)包含了所有的八個分類：

SQL> select indx, spin, yield, waittime

2 from x$ksllclass;

INDX SPIN YIELD WAITTIME

---------- ---------- ---------- ----------

0 16000 0 1

1 16000 0 1

2 16000 0 1

3 16000 0 1

4 16000 0 1

5 16000 0 1

6 16000 0 1

7 16000 0 1

已選擇8行。

上面的每一行，即每一個類，都對應一個_latch_class_n的初始化引數，可以用來修改_spin_count,yield 和 waittime的值。這些初始化引數如下：

SQL> select x.ksppinm name, y.ksppstvl value

2 from sys.x$ksppi x, sys.x$ksppcv y

3 where x.indx = y.indx

4 and x.ksppinm like '_latch_class__';

NAME VALUE

------------------------------ ------------------------------

_latch_class_0

_latch_class_1

_latch_class_2

_latch_class_3

_latch_class_4

_latch_class_5

_latch_class_6

_latch_class_7

_latch_classes

已選擇9行。

SQL>

可以在初始化引數中加上對應的設定，並新增_latch_classes引數，指定具體的latch：

SQL> select latch#, name

2 from v$latchname

3 where name = 'cache buffers chains';

LATCH# NAME

---------- ------------------------------

122 cache buffers chains

則為了將122的latch的cpu spin次數增加到20000，可以新增如下引數到init.ora:

_latch_class_1 = "20000"

_latch_classes = "122:1"

如果cpu資源非常緊缺的話就不能增加自旋次數了。

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/16179598/viewspace-677107/，如需轉載，請註明出處，否則將追究法律責任。

【效能調整】等待事件(九) latch原理

相關文章