[20221018]本地執行與遠端執行.txt

--//連結http://blog.tanelpoder.com/2008/02/05/oracle-hidden-costs-revealed-part-1/.
--//裡面提到一個問題本地執行與遠端執行,oracle效能存在怎麼區別,理論講如果不考慮網路傳輸,兩組差別不大.
--//因為Oracle是一個客戶端伺服器資料庫系統。所有的執行都是在本地執行的，而不管客戶機的位置如何，因此效能是相同的。
--//作者給出一個例子,說明一些區別:

1.環境:
SCOTT@book> @ ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

2.分別從widnows客戶端以及linux服務端測試看看.
--//注:lotslios.sql 來自 tpt 裡面的測試指令碼.
--//測試在本地伺服器.
SCOTT@book> @spid
       SID    SERIAL# PROCESS                  SERVER    SPID       PID P_SERIAL# C50
---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------
        36      45801 14705                    DEDICATED 14706       26        206 alter system kill session '36,45801' immediate;

SCOTT@book> set timing on
SCOTT@book> @lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:08.11
--//注:@lotslios 1e5根本測試不出來.

--//測試在客戶端windows:
SCOTT@78> @spid
       SID    SERIAL# PROCESS                  SERVER    SPID                     PID P_SERIAL# C50
---------- ---------- ------------------------ --------- -------------------- ------- ---------- --------------------------------------------------
        53      44367 4476:8736                DEDICATED 14714                     27        144 alter system kill session '53,44367' immediate;

SCOTT@78> set timing on
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:10.40

--//你可以執行多次,都是windows下測試時間大於在本地伺服器的測試時間.
--//作者透過使用他自己寫Snapper包以及V$SESSTAT,看不出任何差異.
--//使用strace跟蹤(注:作者的伺服器solaris,使用truss).
--//測試在本地伺服器,使用strace跟蹤服務端程式.
$ strace -cp 14706
Process 14706 attached - interrupt to quit
^CProcess 14706 detached
% time     seconds usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
   nan    0.000000           0         2           read
   nan    0.000000           0         2           write
   nan    0.000000           0        27           getrusage
   nan    0.000000           0        10           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                    41           total

--//測試在客戶端windows,使用strace跟蹤服務端程式.
$ strace -cp 14714
Process 14714 attached - interrupt to quit
^CProcess 14714 detached
% time     seconds usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
96.13    0.024820           0    949593           poll
3.87    0.001000         500         2           read
0.00    0.000000           0         2           write
0.00    0.000000           0        83           getrusage
0.00    0.000000           0        10           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.025820                949690           total
--//很慢!!我不得按ctrl+c停止strace,你可以發現大量呼叫poll.
--//可以發現測試在客戶端windows,多了一個poll 系統呼叫,作者測試平臺solaris,呼叫的是pollsys.

# man -a pool
POLL(2)                    Linux Programmer's Manual                   POLL(2)

NAME
       poll, ppoll - wait for some event on a file descriptor

SYNOPSIS
       #include <poll.h>

       int poll(struct pollfd *fds, nfds_t nfds, int timeout);

       #define _GNU_SOURCE
       #include <poll.h>

       int ppoll(struct pollfd *fds, nfds_t nfds,
               const struct timespec *timeout, const sigset_t *sigmask);

--//轉載:http://blog.tanelpoder.com/2008/02/05/oracle-hidden-costs-revealed-part-1/.
So, there is a big difference in number of pollsys() system calls, depending on which client was used for connecting.
The pollsys syscall is normally used for checking whether there is any data that can be read from a file descriptor (or
whether the file descriptor is ready for receiving more writes). As TCP sockets on Unix are also accessed through file
descriptors, Oracle could be polling the client TCP connection file descriptor… but (without prior knowledge) we can
not be sure.
因此，pollsys()系統呼叫的數量有很大的差異，這取決於用於連線的客戶端。pollsys系統通常用於檢查是否有可以從檔案描述符讀取的
資料(或者檔案描述符是否準備好接收更多的寫操作)。由於Unix上的TCP套接字也可以透過檔案描述符訪問，Oracle可以輪詢客戶端TCP連
接檔案描述符…但是(沒有事先知識)我們不能確定。

...

Oracle client server communication normally works in RPC fashion – for example a client sends a command to Oracle and
Oracle doesn't return anything until the command is completed.
Oracle客戶端伺服器通訊通常以RPC的方式工作——例如，客戶端向Oracle傳送一個命令，而Oracle在該命令完成之前不會返回任何東西
。

Now if a user tries to cancel their query (using CTRL+C in sqlplus or calling OCIBreak in non-blocking OCI), a cancel
packet is sent to server over TCP. The packet will be stored in the server side receive buffer of OS TCP stack and
becomes available for reading for the server process (via a TCP socket). However if the server process is in a
long-running loop executing a query, it needs to periodically check the TCP receive socket for any outstanding packets.
And this is exactly what the pollsys() system call does.
現在，如果使用者試圖取消他們的查詢(在sqlplus中使用CTRL+C或在非阻塞OCI中呼叫OCIBreak)，一個取消資料包將透過TCP傳送到伺服器
。該資料包將儲存在OS TCP堆疊的伺服器端接收緩衝區中，並可為伺服器程式讀取(透過TCP套接字)。但是，如果伺服器程式處於執行查詢
的長時間執行的迴圈中，那麼它需要定期檢查TCP接收套接字中是否有任何未完成的資料包。這正是pollsys()系統所做的。

This approach for cancelling an operation is called in-band break, as the break packet is sent in-band with all other
traffic. The server process has to be programmed to periodically check for any newly arrived packets, even if it is
already busy working on something else.
這種取消操作的方法稱為in-band break，因為中斷包與所有其他業務一起在帶內傳送。伺服器程式必須被程式設計，以定期檢查任何新到達
的資料包，即使它已經在忙於處理其他事情。

There are several functions in Oracle kernel where the developers have put the check for in-band breaks. This means that
in some highly repetitive operations (like nested loop join) the same functions are hit again and again – causing
frequent polling on the TCP socket. And too frequent polling is what causes the peformance degradation.
在Oracle核心中有幾個函式，開發人員可以檢查in-band breaks。這意味著在一些高度重複的操作(如巢狀迴圈連線)中，相同的函式會被
反覆命中，導致TCP套接字上頻繁輪詢。而過頻繁的輪詢是導致效能下降的原因。

However Oracle network layer has a sqlnet.ora parameter called break_poll_skip, which can help in such situations. This
parameters defines, how many times to just silently skip the TCP socket polling when the nsmore2recv() function is
called. The parameter defaults to 3 in recent versions, which means that only 1 of 3 polls are actually executed ( from
above test case it's seen that for 4 million consistent gets roughly 1/3 = 1.3 million pollsys() calls were executed ).
然而，Oracle網路層有一個名為break_poll_skip的sqlnet.ora引數，這可以在這種情況下提供幫助。此引數定義了當呼叫nsmore2recv()
函式時，只需無聲地跳過TCP套接字輪詢的次數。在最近的版本中，引數預設為3，這意味著實際3個輪詢中只有1個被執行(從上面的測試
用例可以看出，400萬一致得到大約1/3=130萬個民調系統()呼叫)。

--//換成執行lotslios 1e5,再使用strace跟蹤看看.
--//測試在本地伺服器,使用strace跟蹤服務端程式.
$ strace -cp 14706
Process 14706 attached - interrupt to quit
^CProcess 14706 detached
% time     seconds usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
   nan    0.000000           0         2           read
   nan    0.000000           0         2           write
   nan    0.000000           0        19           getrusage
   nan    0.000000           0        10           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                    33           total

--//測試在客戶端windows,使用strace跟蹤服務端程式.
$ strace -cp 14714
Process 14714 attached - interrupt to quit
^CProcess 14714 detached
% time     seconds usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000018           0       897           poll
0.00    0.000000           0         2           read
0.00    0.000000           0         2           write
0.00    0.000000           0        19           getrusage
0.00    0.000000           0        10           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.000018                   930           total
--//poll=897次.

SCOTT@78> set autot traceonly
SCOTT@78> @tpt/lotslios 1e5
generate lots of LIOs by repeatedly full scanning through a small table...
Elapsed: 00:00:00.06
Execution Plan
---------------------------
Plan hash value: 3691747574
-------------------------------------------------------------------------------
| Id | Operation              | Name | Rows | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    12 |    23P (1)|999:59:59 |
|   1 | SORT AGGREGATE        |      |     1 |    12 |            |          |
|* 2 |   COUNT STOPKEY        |      |       |       |            |          |
|   3 |    NESTED LOOPS        |      | 2401P|    15E|    23P (1)|999:59:59 |
|   4 |     NESTED LOOPS       |      |    79T|   650T|   769G (1)|999:59:59 |
|   5 |      NESTED LOOPS      |      | 2631M|    14G|    25M (1)| 84:57:18 |
|   6 |       TABLE ACCESS FULL| OBJ$ | 87098 |   255K|   295   (1)| 00:00:04 |
|* 7 |       TABLE ACCESS FULL| OBJ$ | 30210 | 90630 |   293   (1)| 00:00:04 |
|* 8 |      TABLE ACCESS FULL | OBJ$ | 30210 | 90630 |   293   (1)| 00:00:04 |
|* 9 |     TABLE ACCESS FULL | OBJ$ | 30210 | 90630 |   293   (1)| 00:00:04 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM<=1e5)
   7 - filter("A"."OWNER#"="B"."OWNER#")
   8 - filter("B"."OWNER#"="C"."OWNER#")
   9 - filter("C"."OWNER#"="D"."OWNER#")
Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
       2706 consistent gets
          0 physical reads
          0 redo size
        346 bytes sent via SQL*Net to client
        471 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

--//按照作者介紹break_poll_skip預設3, consistent gets/3 = 2706/3= 902,與跟蹤看到的897接近.
--//順便提一下,不知道作者如何測試的,@lotslios 10000,consistent gets達到了4089670.或許11.2.0.4執行計劃發生了變化.疑問??
--//因為返回是count(*),僅僅1行.即使設定arraysize=2 ,邏輯讀我的測試也是2706
--//可以透過改變break_poll_skip的sqlnet.ora引數,減少poll呼叫.

3.改變break_poll_skip引數在sqlnet.ora檔案中.
--//修改break_poll_skip=10,注意測試時要重新登入才生效!!

SCOTT@78> @ spid
       SID    SERIAL# PROCESS                  SERVER    SPID                     PID P_SERIAL# C50
---------- ---------- ------------------------ --------- -------------------- ------- ---------- --------------------------------------------------
        53      44369 5380:8500                DEDICATED 15041                     27        145 alter system kill session '53,44369' immediate;

SCOTT@78> set timing on
SCOTT@78> @tpt/lotslios 1e5
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
    100000

Elapsed: 00:00:00.04

$ strace -cp 15041
Process 15041 attached - interrupt to quit
^CProcess 15041 detached
% time     seconds usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
   nan    0.000000           0         2           read
   nan    0.000000           0         2           write
   nan    0.000000           0       269           poll
   nan    0.000000           0        19           getrusage
   nan    0.000000           0        10           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                   302           total
--//2706/10 = 270.6,poll呼叫269,已經非常接近.

SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:09.99
--//比前面10.40快了一點點.

--//修改break_poll_skip=1000
$ grep break sqlnet.ora
break_poll_skip=1000

SCOTT@78> @ spid
       SID    SERIAL# PROCESS                  SERVER    SPID                     PID P_SERIAL# C50
---------- ---------- ------------------------ --------- -------------------- ------- ---------- --------------------------------------------------
        53      44371 4544:8592                DEDICATED 15081                     27        146 alter system kill session '53,44371' immediate;

SCOTT@78> set timing on
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:09.65

SCOTT@78> @tpt/lotslios 1e5
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
    100000

Elapsed: 00:00:00.04
$ strace -cp 15081
Process 15081 attached - interrupt to quit
^CProcess 15081 detached
% time     seconds usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
   nan    0.000000           0         2           read
   nan    0.000000           0         2           write
   nan    0.000000           0         3           poll
   nan    0.000000           0        19           getrusage
   nan    0.000000           0        10           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                    36           total

--//補充測試@tpt/lotslios 1e8:
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:09.98

$ strace -cp 15081
Process 15081 attached - interrupt to quit
^CProcess 15081 detached
% time     seconds usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000066           0      2849           poll
0.00    0.000000           0         2           read
0.00    0.000000           0         2           write
0.00    0.000000           0        29           getrusage
0.00    0.000000           0        10           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.000066                  2892           total

SCOTT@78> set autot traceonly
SCOTT@78> @tpt/lotslios 1e8
SCOTT@78> set autot traceonly
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...

Elapsed: 00:00:09.68

Execution Plan
----------------------------------------------------------
Plan hash value: 3691747574

-------------------------------------------------------------------------------
| Id | Operation              | Name | Rows | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    12 |    23P (1)|999:59:59 |
|   1 | SORT AGGREGATE        |      |     1 |    12 |            |          |
|* 2 |   COUNT STOPKEY        |      |       |       |            |          |
|   3 |    NESTED LOOPS        |      | 2401P|    15E|    23P (1)|999:59:59 |
|   4 |     NESTED LOOPS       |      |    79T|   650T|   769G (1)|999:59:59 |
|   5 |      NESTED LOOPS      |      | 2631M|    14G|    25M (1)| 84:57:18 |
|   6 |       TABLE ACCESS FULL| OBJ$ | 87098 |   255K|   295   (1)| 00:00:04 |
|* 7 |       TABLE ACCESS FULL| OBJ$ | 30210 | 90630 |   293   (1)| 00:00:04 |
|* 8 |      TABLE ACCESS FULL | OBJ$ | 30210 | 90630 |   293   (1)| 00:00:04 |
|* 9 |     TABLE ACCESS FULL | OBJ$ | 30210 | 90630 |   293   (1)| 00:00:04 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(ROWNUM<=1e8)
   7 - filter("A"."OWNER#"="B"."OWNER#")
   8 - filter("B"."OWNER#"="C"."OWNER#")
   9 - filter("C"."OWNER#"="D"."OWNER#")

Statistics
----------------------------------------------------------
         12 recursive calls
          0 db block gets
    2859366 consistent gets
          0 physical reads
          0 redo size
        346 bytes sent via SQL*Net to client
        471 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          2 sorts (memory)
          0 sorts (disk)
          1 rows processed

--//2859366/1000= 2859.366

3.收尾:
--//取消break_poll_skip設定.
$ grep break sqlnet.ora
#break_poll_skip=1000

--//補充說明如果break_poll_skip設定10,100,使用strace跟蹤很慢.
--//break_poll_skip=100
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:16.32
--//2859358 consistent gets

$ strace -cp 15165
Process 15165 attached - interrupt to quit
^CProcess 15165 detached
% time     seconds usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000052           0     28487           poll
0.00    0.000000           0         5           read
0.00    0.000000           0         5           write
0.00    0.000000           0         1           lseek
0.00    0.000000           0        46           getrusage
0.00    0.000000           0        26           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.000052                 28570           total

[20221018]本地執行與遠端執行.txt

相關文章