結論

1,為了模擬db hang,嘗試oradebug suspend ckpt,dbwr,smon,lmd無果，可見對於後臺程式還要繼續深入研究
2,oradebug suspend process allcation latch,模擬出會話無法登陸
3,從目前測試看，普通的等待事件，仍在other chains中，僅為latch或mutex方會在open chains中出現
4,latch free診斷，透過v$session.p1或p2定位到具體的latch
然後結合v$latch_misses，找到最終的原因
5,name-service call wait等待事件，沒有明確告訴你如何解決此事件
嘗試用systemstate dump or processstate dump皆未發現有價值的資訊
，嘗試用STRACE發現一點有價值資訊，發現POLL時報錯，函式呼叫被中斷，然後反覆嘗試一個動作
6,如何高效理解strace的報錯涉及的函式，非常重要
又如何把這些報錯函式與ORACLE聯絡起來，即聯絡能力非常重要
7,透過用strace -p 跟蹤name-service call wait對應的程式，發現程式工作的一些原理
8,最終定位到SOCKET，如何把SOCKET與ORACLE聯絡起來，還要努力思考
9,oradebug suspend process allocation latch後，開始新建會話可以連線上，後面再新建會話不會在v$session建立資訊
這塊理解不夠還要繼續學習

測試

---監控會話
SQL> select sid,serial#,paddr from v$session where sid=(select sid from v$mystat where rownum=1);

SID SERIAL# PADDR
---------- ---------- ----------------
153 8 0000000083A5E580

SQL> select pid,spid from v$process where addr='0000000083A5E580';

PID SPID
---------- ------------
18 16457

----用oradebug suspend lgwr,ckpt,dbwr,lmon未成功，上於對這些後臺程式的理解仍不深所致

---沒有等待事件前的DUMP
Open chains found:
Other chains found:
Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/138/1/0x83a63c78/29290/Streams AQ: qmn slave idle wait>
Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/143/10/0x83a624c0/29028/Streams AQ: qmn coordinator idle>
Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/146/1/0x83a60d08/28837/Streams AQ: waiting for messages>
Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/152/14/0x83a63490/29288/Streams AQ: waiting for time man>
Chain 5 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/153/3/0x83a5e580/28897/No Wait>
Chain 6 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/170/1/0x83a56ee8/28481/DIAG idle wait>

---新增一個普通等待事件後的DUMP

可見新增等待事件對應會話未出現在other chains,出現於state of nodes
SQL> delete from t_lock where rownum=1;

1 row deleted.

Open chains found:
Other chains found:
Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/138/1/0x83a63c78/29290/Streams AQ: qmn slave idle wait>
Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/143/10/0x83a624c0/29028/Streams AQ: qmn coordinator idle>
Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/146/1/0x83a60d08/28837/Streams AQ: waiting for messages>
Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/152/14/0x83a63490/29288/Streams AQ: waiting for time man>
Chain 5 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/153/8/0x83a5e580/16457/No Wait>
Chain 6 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/170/1/0x83a56ee8/28481/DIAG idle wait>

[147]/0/148/928/0x83b62eb0/15740/IGN/13/14//none

---透過oradebug模擬HANG會話登陸，即POKE PROCESS ALLOCATION LATCH
SQL> select name,addr,latch#,level# from v$latch where name='process allocation';

NAME ADDR LATCH# LEVEL#
------------------------------ ---------------- ---------- ----------
process allocation 0000000060007498 3 1

SQL> select 'oradebug poke 0x'||addr||' 4 0x00000001;' from v$latch where latch#=3;

'ORADEBUGPOKE0X'||ADDR||'40X00000001;'
----------------------------------------------
oradebug poke 0x0000000060007498 4 0x00000001;

SQL> oradebug setmypid
Statement processed.
SQL> oradebug poke 0x0000000060007498 4 0x00000001;
BEFORE: [060007498, 06000749C) = 00000000
AFTER: [060007498, 06000749C) = 00000001

--新建會話連線不上，無法登陸
[oracle@jingfa1 ~]$ sqlplus tbs_zxy/system

SQL*Plus: Release 10.2.0.1.0 - Production on Fri Nov 13 02:58:46 2015

Copyright (c) 1982, 2005, Oracle. All rights reserved.

可見other chains新增1條記錄，即會話167，等待LATCH FREE
Open chains found:
Other chains found:
Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/138/1/0x83a63c78/29290/Streams AQ: qmn slave idle wait>
Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/143/10/0x83a624c0/29028/Streams AQ: qmn coordinator idle>
Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/146/1/0x83a60d08/28837/Streams AQ: waiting for messages>
Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/152/14/0x83a63490/29288/Streams AQ: waiting for time man>
Chain 5 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/153/8/0x83a5e580/16457/No Wait>
Chain 6 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/167/1/0x83a57eb8/28485/latch free> --新增
Chain 7 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/170/1/0x83a56ee8/28481/DIAG idle wait>

--新建第2個會話無法登陸
[root@jingfa1 ~]# su - oracle
[oracle@jingfa1 ~]$ sqlplus tbs_zxy/system

SQL*Plus: Release 10.2.0.1.0 - Production on Fri Nov 13 06:03:34 2015

Copyright (c) 1982, 2005, Oracle. All rights reserved.
可見other chains繼續新增1條記錄
Open chains found:
Other chains found:
Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/138/1/0x83a63c78/29290/Streams AQ: qmn slave idle wait>
Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/143/10/0x83a624c0/29028/Streams AQ: qmn coordinator idle>
Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/146/1/0x83a60d08/28837/Streams AQ: waiting for messages>
Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/152/14/0x83a63490/29288/Streams AQ: waiting for time man>
Chain 5 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/153/8/0x83a5e580/16457/No Wait>
Chain 6 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/156/1/0x83a5cdc8/28532/os thread startup> --新增
Chain 7 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/167/1/0x83a57eb8/28485/latch free>
Chain 8 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/170/1/0x83a56ee8/28481/DIAG idle wait>

--新建第3個會話無法登陸

可見open chains出現資訊，並且167會話由other chains移動到open chains,且148會話為新增
而且轉儲級別由原來的5及10新增了4和6，共計4個級別
Open chains found:
Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/167/1/0x83a57eb8/28485/latch free> --移動
-- <0/148/1074/0x83a5fd38/18767/name-service call wait> --新增
Other chains found:
Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/138/1/0x83a63c78/29290/Streams AQ: qmn slave idle wait>
Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/143/10/0x83a624c0/29028/Streams AQ: qmn coordinator idle>
Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/146/1/0x83a60d08/28837/Streams AQ: waiting for messages>
Chain 5 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/152/14/0x83a63490/29288/Streams AQ: waiting for time man>
Chain 6 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/153/8/0x83a5e580/16457/No Wait>
Chain 7 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/156/1/0x83a5cdc8/28532/DFS lock handle>
Chain 8 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/170/1/0x83a56ee8/28481/DIAG idle wait>

Extra information that will be dumped at higher levels:
[level 4] : 1 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]
[level 5] : 7 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level 6] : 1 node dumps -- [NLEAF]
[level 10] : 18 node dumps -- [IGN]

--繼續新增第4個會話無法登陸

可見open chains及other chains沒有變化
Open chains found:
Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/167/1/0x83a57eb8/28485/latch free>
-- <0/148/1074/0x83a5fd38/18767/name-service call wait>
Other chains found:
Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/138/1/0x83a63c78/29290/Streams AQ: qmn slave idle wait>
Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/143/10/0x83a624c0/29028/Streams AQ: qmn coordinator idle>
Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/146/1/0x83a60d08/28837/Streams AQ: waiting for messages>
Chain 5 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/152/14/0x83a63490/29288/Streams AQ: waiting for time man>
Chain 6 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/153/8/0x83a5e580/16457/No Wait>
Chain 7 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/156/1/0x83a5cdc8/28532/DFS lock handle>
Chain 8 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/170/1/0x83a56ee8/28481/DIAG idle wait>

--繼續新增第5個會話無法登陸

如下基於上述的第4上無法登陸的會話
上述發現在open chains及other chains以及state of nodes全沒有發現這些無法登陸的會話，我們分析下
SQL> select distinct type from v$session;

TYPE
----------
USER
BACKGROUND

SQL> select count(*) from v$session where type='USER';

COUNT(*)
----------
5

SQL> select sid,serial#,program,event from v$session where type='USER' order by 1;

SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------------------------ ----------------------------------------------------------------
144 1 racgimon@jingfa1 (TNS V1-V3) SQL*Net message from client
145 1 racgimon@jingfa1 (TNS V1-V3) SQL*Net message from client
146 1 racgimon@jingfa1 (TNS V1-V3) Streams AQ: waiting for messages in the queue
148 1074 name-service call wait
153 8 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client

--新生成一個無法登陸的會話，看上述的V$SESSION會否變化即可，經分析確實如果會話沒有登陸成功，確實不會生成V$SESSION
SQL> select sid,serial#,program,event from v$session where type='USER' order by 1;

SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------------------------ ----------------------------------------------------------------
144 1 racgimon@jingfa1 (TNS V1-V3) SQL*Net message from client
145 1 racgimon@jingfa1 (TNS V1-V3) SQL*Net message from client
146 1 racgimon@jingfa1 (TNS V1-V3) Streams AQ: waiting for messages in the queue
148 1074 name-service call wait
153 8 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client


我們再學習下整合v$latch相關檢視，先分析上述的latch free

167會話在等待LATCH FREE
SQL> select sid,serial#,program,event,blocking_session,p1,p1text,p2,p2text,p3,p3text from v$session where sid=167;

SID SERIAL# PROGRAM EVENT BLOCKING_SESSION P1 P1TEXT P2 P2TEXT P3 P3TEXT
---------- ---------- ------------------------------ -------------------- ---------------- ---------- --------------- ---------- --------------- ---------- ---------------
167 1 oracle@jingfa1 (LMON) latch free 1610642584 address 3 number 135485 tries

p2即latch的編號，可見是在等待process allocation latch,也和我們測試前面的oradebug poke關聯起來
SQL> select name,latch# from v$latch where latch#=3;

NAME LATCH#
-------------------------------------------------- ----------
process allocation 3

或者透過p1即latch addr,不過長度是32個，所以字首要補8個0
SQL> select name,latch# from v$latch where addr='0000000060007498';

NAME LATCH#
-------------------------------------------------- ----------
process allocation 3

再學習下v$latch_misses
SQL> select parent_name,nwfail_count,sleep_count,wtr_slp_count,longhold_count,location from v$latch_misses where parent_name='process allocation';

PARENT_NAME NWFAIL_COUNT SLEEP_COUNT WTR_SLP_COUNT LONGHOLD_COUNT LOCATION
-------------------------------------------------- ------------ ----------- ------------- -------------- ----------------------------------------------------------------
process allocation 0 0 841049 0 ksuapc
process allocation 0 0 0 0 ksukia
process allocation 0 0 0 0 ksucrp
process allocation 0 1019502 178453 0 ksufap: active procs
process allocation 0 0 0 0 ksdxwcwpt
process allocation 0 0 0 0 ksdxwdwpt
process allocation 0 0 0 0 ksusigskip
process allocation 0 0 0 0 ksu_reserve
process allocation 0 0 0 0 ksu_unreserve
process allocation 0 0 0 0 ksu_unreserve_proc
process allocation 0 0 0 0 ksudlp

11 rows selected.

在生產環境你就可以用下SQL，查詢定位到底是哪個LATCH具體哪些程式碼競爭最嚴重
SQL> set pause on
SQL> select parent_name,nwfail_count,sleep_count,wtr_slp_count,longhold_count,location from v$latch_misses order by 3 desc;

PARENT_NAME NWFAIL_COUNT SLEEP_COUNT WTR_SLP_COUNT LONGHOLD_COUNT LOCATION
-------------------------------------------------- ------------ ----------- ------------- -------------- ----------------------------------------------------------------
process allocation 0 1058941 183540 0 ksufap: active procs
ges resource hash list 0 5 0 0 kjrmas1: lookup master node

再看下148會話等待事件name-service call wait
SQL> select sid,serial#,program,event,blocking_session,p1,p1text,p2,p2text,p3,p3text from v$session where sid=148;

SID SERIAL# PROGRAM EVENT BLOCKING_SESSION P1 P1TEXT P2 P2TEXT P3 P3TEXT
---------- ---------- --------------- -------------------- ---------------- ---------- --------------- ---------- --------------- ---------- ---------------
148 1074 name-service call wa 50 waittime 0 0
it

SQL> select paddr from v$session where sid=148;

PADDR
----------------
0000000083A5FD38

SQL> select spid from v$process where addr='0000000083A5FD38';

SPID
------------
18767

沒找到有價值的資訊
SQL> oradebug setospid 18767
Oracle pid: 21, Unix process pid: 18767, image: oracle@jingfa1 (PZ99)
SQL> oradebug dump processstate 10
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/admin/jingfa/bdump/jingfa1_pz99_18767.trc

發現等待事件是OTHER
SQL> select sid,wait_class from v$session_wait class where sid=148;

SID WAIT_CLASS
---------- ----------------------------------------------------------------
148 Other

我僮嘗試用strace跟蹤分析下
[oracle@jingfa1 ~]$ strace -p 18767
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [], NULL, 8) = 0
times(NULL) = 434946713
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
times(NULL) = 434946713
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={5, 0}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={5, 0}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [], NULL, 8) = 0
rt_sigreturn(0x1) = -1 EINTR (Interrupted system call)

定位到POLL函式對應的FD檔案描述符
[oracle@jingfa1 fd]$ pwd
/proc/18767/fd
[oracle@jingfa1 fd]$ ll
total 0
lr-x------ 1 oracle oinstall 64 Nov 13 07:09 0 -> /dev/null
lr-x------ 1 oracle oinstall 64 Nov 13 07:09 1 -> /dev/null
lrwx------ 1 oracle oinstall 64 Nov 13 07:09 10 -> /u01/app/oracle/product/10.2.0/db_1/dbs/lkinstjingfa1 (deleted)
lr-x------ 1 oracle oinstall 64 Nov 13 07:09 11 -> /dev/zero
lrwx------ 1 oracle oinstall 64 Nov 13 07:09 12 -> /u01/app/oracle/admin/jingfa/adump/ora_28419.aud
lrwx------ 1 oracle oinstall 64 Nov 13 07:09 13 -> socket:[1736890] --13 FD
lr-x------ 1 oracle oinstall 64 Nov 13 07:09 14 -> /dev/zero
lrwx------ 1 oracle oinstall 64 Nov 13 07:09 15 -> /u01/app/oracle/product/10.2.0/db_1/dbs/hc_jingfa1.dat
lr-x------ 1 oracle oinstall 64 Nov 13 07:09 16 -> /dev/zero
lr-x------ 1 oracle oinstall 64 Nov 13 07:09 17 -> /u01/app/oracle/product/10.2.0/db_1/rdbms/mesg/oraus.msb
lrwx------ 1 oracle oinstall 64 Nov 13 07:09 18 -> socket:[1736894]
lrwx------ 1 oracle oinstall 64 Nov 13 07:09 19 -> socket:[1736897] ---19 FD
l-wx------ 1 oracle oinstall 64 Nov 13 07:09 2 -> /u01/app/oracle/admin/jingfa/bdump/jingfa1_pz99_18767.trc
lr-x------ 1 oracle oinstall 64 Nov 13 07:09 3 -> /dev/null
lr-x------ 1 oracle oinstall 64 Nov 13 07:09 4 -> /dev/null
l-wx------ 1 oracle oinstall 64 Nov 13 07:09 5 -> /u01/app/oracle/admin/jingfa/udump/jingfa1_ora_28419.trc
l-wx------ 1 oracle oinstall 64 Nov 13 07:09 6 -> /u01/app/oracle/admin/jingfa/bdump/alert_jingfa1.log
lrwx------ 1 oracle oinstall 64 Nov 13 07:09 7 -> /u01/app/oracle/product/10.2.0/db_1/dbs/hc_jingfa1.dat
l-wx------ 1 oracle oinstall 64 Nov 13 07:09 8 -> /u01/app/oracle/admin/jingfa/bdump/alert_jingfa1.log
lr-x------ 1 oracle oinstall 64 Nov 13 07:09 9 -> /dev/null

查閱MAN POOL，獲取系統函式的使用說明
[oracle@jingfa1 ~]$ man poll
POLL(2) Linux Programmer’s Manual POLL(2)

NAME
poll, ppoll - wait for some event on a file descriptor --這個函式，等待在個FD檔案描述符上面一些事件發生

SYNOPSIS
#include <poll.h>

int poll(struct pollfd *fds, nfds_t nfds, int timeout); --函式的引數說明，共計3個引數

#define _GNU_SOURCE
#include <poll.h>

int ppoll(struct pollfd *fds, nfds_t nfds,
const struct timespec *timeout, const sigset_t *sigmask);

DESCRIPTION
poll() performs a similar task to select(2): it waits for one of a set of file descriptors to become ready to perform I/O.--此函式對SELECT函式進行簡單的任務，等待一系列FD檔案描述符，準備開始進行IO操作

一系列的檔案描述符FD用入POLL的輸入引數，第一個輸入引數是個結構體，類似於ORACLE的陣列
The set of file descriptors to be monitored is specified in the fds argument, which is an array of nfds structures of the following form:
第一個引數的構成，即此結構有個元素
struct pollfd {
int fd; /* file descriptor */ ---檔案描述符
short events; /* requested events */ - 請求的事件
short revents; /* returned events */ 返回的事件
};

The field fd contains a file descriptor for an open file.

The field events is an input parameter, a bitmask specifying the events the application is interested in.

The field revents is an output parameter, filled by the kernel with the events that actually occurred. The bits returned in revents can include any of
those specified in events, or one of the values POLLERR, POLLHUP, or POLLNVAL. (These three bits are meaningless in the events field, and will be set in
the revents field whenever the corresponding condition is true.)

If none of the events requested (and no error) has occurred for any of the file descriptors, then poll() blocks until one of the events occurs.

第3個引數即超時時間，即如果POLL函式被阻塞多久的最大期限，會報錯返回給前端或被呼叫方，如果提定一個負數表明無限期的超時
The timeout argument specifies an upper limit on the time for which poll() will block, in milliseconds. Specifying a negative value in timeout means an
infinite timeout.

具體的事件含義定義在poll.h，請見下
The bits that may be set/returned in events and revents are defined in <poll.h>:

POLLIN There is data to read.

POLLPRI
There is urgent data to read (e.g., out-of-band data on TCP socket; pseudo-terminal master in packet mode has seen state change in slave).

POLLOUT
Writing now will not block.

POLLRDHUP (since Linux 2.6.17)
Stream socket peer closed connection, or shut down writing half of connection. The _GNU_SOURCE feature test macro must be defined in order
to obtain this definition.

POLLERR
Error condition (output only).

POLLHUP
Hang up (output only).

POLLNVAL
Invalid request: fd not open (output only).

When compiling with _XOPEN_SOURCE defined, one also has the following, which convey no further information beyond the bits listed above:

POLLRDNORM
Equivalent to POLLIN.

POLLRDBAND
Priority band data can be read (generally unused on Linux).

POLLWRNORM
Equivalent to POLLOUT.

POLLWRBAND
Priority data may be written.

Linux also knows about, but does not use POLLMSG.

ppoll()
The relationship between poll() and ppoll() is analogous to the relationship between select() and pselect(): like pselect(), ppoll() allows an application
to safely wait until either a file descriptor becomes ready or until a signal is caught.

Other than the difference in the timeout argument, the following ppoll() call:

ready = ppoll(&fds, nfds, timeout, &sigmask);

is equivalent to atomically executing the following calls:

sigset_t origmask;

sigprocmask(SIG_SETMASK, &sigmask, &origmask);
ready = ppoll(&fds, nfds, timeout);
sigprocmask(SIG_SETMASK, &origmask, NULL);

See the description of pselect(2) for an explanation of why ppoll() is necessary.

The timeout argument specifies an upper limit on the amount of time that ppoll() will block. This argument is a pointer to a structure of the following
form:

struct timespec {
long tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};

If timeout is specified as NULL, then ppoll() can block indefinitely.

返回值，成功為0，失敗為-1
RETURN VALUE
On success, a positive number is returned; this is the number of structures which have non-zero revents fields (in other words, those descriptors with
events or errors reported). A value of 0 indicates that the call timed out and no file descriptors were ready. On error, -1 is returned, and errno is set
appropriately.

具體的返回錯誤值
ERRORS
EBADF An invalid file descriptor was given in one of the sets.

EFAULT The array given as argument was not contained in the calling program’s address space.

EINTR A signal occurred before any requested event. --上述STRACE跟蹤返回的錯誤值

EINVAL The nfds value exceeds the RLIMIT_NOFILE value.

ENOMEM There was no space to allocate file descriptor tables.

回過頭再來分析poll
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = -1 EINTR (Interrupted system call)

對POLL請求的事件在中途失改了
POLLIN There is data to read. --要讀取的資料

POLLPRI ---馬上要讀取的資料
There is urgent data to read (e.g., out-of-band data on TCP socket; pseudo-terminal master in packet mode has seen state change in slave).


When compiling with _XOPEN_SOURCE defined, one also has the following, which convey no further information beyond the bits listed above:

POLLRDNORM ---等同於pollin
Equivalent to POLLIN.

POLLRDBAND --優先順序相關的資料能被讀取，一般在LINUX不使用這個
Priority band data can be read (generally unused on Linux).

經過用STRACE 反覆多次跟蹤發現，函式呼叫一直會多次反覆嘗試，先是多次TIMEOUT超時，最後報
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = -1
說明先是多次對13及19 FD檔案描述符嘗試操作，因故不能操作，然後報錯，接著又開始重複上述的動作

poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 250) = 0 (Timeout)
times({tms_utime=43, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435139784
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 172) = 0 (Timeout)
times({tms_utime=43, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435139801
getrusage(RUSAGE_SELF, {ru_utime={0, 439933}, ru_stime={0, 91986}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 439933}, ru_stime={0, 91986}, ...}) = 0
times({tms_utime=43, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435139801
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = 0 (Timeout)
times({tms_utime=43, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435139851
getrusage(RUSAGE_SELF, {ru_utime={0, 439933}, ru_stime={0, 91986}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 439933}, ru_stime={0, 91986}, ...}) = 0
times({tms_utime=43, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435139851
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = 0 (Timeout)
times({tms_utime=43, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435139901
getrusage(RUSAGE_SELF, {ru_utime={0, 439933}, ru_stime={0, 91986}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 439933}, ru_stime={0, 91986}, ...}) = 0
times({tms_utime=43, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435139902
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = 0 (Timeout)
times({tms_utime=43, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435139952
getrusage(RUSAGE_SELF, {ru_utime={0, 439933}, ru_stime={0, 91986}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 439933}, ru_stime={0, 91986}, ...}) = 0
times({tms_utime=43, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435139952
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = 0 (Timeout)
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140002
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 91986}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 91986}, ...}) = 0
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140002
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = 0 (Timeout)
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140052
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 91986}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 91986}, ...}) = 0
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140052
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = 0 (Timeout)
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140102
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 91986}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 91986}, ...}) = 0
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140103
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = 0 (Timeout)
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140153
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 92985}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 92985}, ...}) = 0
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140153
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = 0 (Timeout)
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140203
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 92985}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 92985}, ...}) = 0
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140203
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = 0 (Timeout)
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140253
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 93985}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 440932}, ru_stime={0, 93985}, ...}) = 0
times({tms_utime=44, tms_stime=9, tms_cutime=0, tms_cstime=0}) = 435140254
poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=19, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 2, 500) = -1 EINTR (Interrupted system call)

基於oracle 10.2.0.1 rac使用oradebug dump hanganalyze 分析oracle hang系列六

結論

測試

相關文章