[20190409]pre_page_sga=true與連線緩慢的問題.txt

--//曾經遇到11g下設定pre_page_sga=true啟動緩慢的問題(沒有使用hugepages).

--//連結:http://blog.itpub.net/267265/viewspace-2295412/

--//實際上這樣系統也會遇到連線緩慢的情況,透過測試說明問題.

1.建立pfile:

$ cat initxxxx.ora

db_name=xxxx

instance_name=xxxx

sga_target=20G

#sga_target=0

sga_max_size=20G

pre_page_sga=true

$ export ORACLE_SID=xxxx

SYS@xxxx> startup nomount

ORACLE instance started.

Total System Global Area 2.1379E+10 bytes

Fixed Size 2262656 bytes

Variable Size 2818574720 bytes

Database Buffers 1.8522E+10 bytes

Redo Buffers 36073472 bytes

$ grep -i page /proc/meminfo

AnonPages: 239336 kB

PageTables: 701008 kB

AnonHugePages: 0 kB

HugePages_Total: 104

HugePages_Free: 48

HugePages_Rsvd: 41

HugePages_Surp: 0

Hugepagesize: 2048 kB

--//才啟動沒有任何使用者連線,PageTables=701008 kB.

2.測試登陸問題:

$ time sqlplus -s -l / as sysdba <<<quit

real 0m4.944s

user 0m0.024s

sys 0m0.016s

--//單個連線需要5秒.如果多個使用者併發呢?

$ cat aa.sql

quit

$ time seq 50 | xargs -I{} -P 50 sqlplus -s -l / as sysdba @ aa.sql

real 0m13.978s

user 0m1.833s

sys 0m1.017s

--//多個使用者同時登入,全部完成需要13秒.

3.分析:

$ strace -f -c sqlplus -s -l / as sysdba <<<quit

Process 3120 attached

Process 3120 detached

% time seconds usecs/call calls errors syscall

------ ----------- ----------- --------- --------- ----------------

99.85 0.419648 69941 6 shmdt

0.10 0.000434 1 413 2 read

0.01 0.000046 0 287 127 open

0.01 0.000045 0 180 mmap

0.01 0.000037 0 178 close

0.01 0.000032 0 79 rt_sigaction

....

------ ----------- ----------- --------- --------- ----------------

100.00 0.420259 1970 227 total

$ man shmdt

...

shmdt() detaches the shared memory segment located at the address specified by shmaddr from the address space of the

calling process. The to-be-detached segment must be currently attached with shmaddr equal to the value returned by the

attaching shmat() call.

On a successful shmdt() call the system updates the members of the shmid_ds structure associated with the shared memory

segment as follows:

shm_dtime is set to the current time.

shm_lpid is set to the process-ID of the calling process.

shm_nattch is decremented by one. If it becomes 0 and the segment is marked for deletion, the segment is deleted.

--//給人的感覺好像消耗在呼叫shmdt,實際上消耗的時間0.419648.也就是這樣還是無法定位問題在那裡.

$ strace -fTtt -e shmdt sqlplus -s -l / as sysdba

Process 6046 attached

[pid 6046] 08:41:55.548586 shmdt(0x7f2c41600000) = 0 <0.000036>

quit

08:41:59.992835 shmdt(0x60000000) = 0 <0.000042>

08:41:59.992983 shmdt(0x68000000) = 0 <0.000042>

08:41:59.993097 shmdt(0x6c000000) = 0 <0.360175>

08:42:00.353446 shmdt(0x4ec000000) = 0 <0.037200>

08:42:00.390737 shmdt(0x560000000) = 0 <0.000026>

Process 6046 detached

--//我手工執行quit,可以發現shmdt呼叫發生在退出階段,也就是不是它導致登陸緩慢.

$ ipcs -smt

------ Shared Memory Attach/Detach/Change Times --------

shmid owner attached detached changed

352092161 oracle Apr 10 17:26:09 Apr 10 17:26:17 Apr 10 17:09:55

352124930 oracle Apr 10 17:26:09 Apr 10 17:26:17 Apr 10 17:09:55

352157699 oracle Apr 10 17:26:09 Apr 10 17:26:18 Apr 10 17:09:55

352190469 oracle Apr 10 17:26:09 Apr 10 17:26:18 Apr 10 17:09:55

352223238 oracle Apr 10 17:26:09 Apr 10 17:26:18 Apr 10 17:09:55

------ Semaphore Operation/Change Times --------

semid owner last-op last-changed

310738944 oracle Wed Apr 10 17:11:33 2019 Wed Apr 10 17:11:33 2019

--//從執行時間看4.5秒並不是消耗在這裡.

4.繼續分析,排除呼叫shmdt的情況:

$ strace -fTr -o /tmp/a2.txt sqlplus -s -l / as sysdba <<<quit

$ awk '{print $2}' /tmp/a2.txt | sort -nr | head

5.041521

0.287119

0.064745

0.008298

0.006386

0.004288

0.004272

0.001319

0.000982

0.000978

--//檢視a2.txt

3366 0.000067 stat("/u01/app/oracle/product/11.2.0.4/dbhome_1/lib", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0 <0.000025>

3366 0.000112 chdir("/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs") = 0 <0.000027>

3366 5.041521 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4c2c0b3000 <0.000048>

3366 0.000407 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4c2c090000 <0.000015>

3366 0.000127 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4c2c06d000 <0.000013>

3366 0.000218 getcwd("/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs"..., 256) = 46 <0.000016>

3366 0.000087 access("/etc/intchg.ora", F_OK) = -1 ENOENT (No such file or directory) <0.000019>

...

--//也就是在chdir呼叫與mmap呼叫之間有5秒不知道在幹什麼?(chdir 僅僅需要0.000027秒).strace看不出來!!

--//如果你使用top -u oracle觀察,你可以發現RES欄位不斷在增加,接近SGA設定.我的感覺就是在這裡建立pagetables的過程.

$ cat aa.sql

host sleep 1

quit

$ cat a.sh

#! /bin/bash

export ORACLE_SID=xxxx

sqlplus -s -l / as sysdba @ aa.sql &

a=$(($!+2))

top -p $a -d 1 -b

$ . a.sh | tee /tmp/aa.txt

...

--//按ctrl+c中斷.

$ grep PID /tmp/aa.txt | head -1; egrep 'oracle' /tmp/aa.txt

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

8498 oracle 20 0 20.2g 1.5g 1.5g R 93.9 1.2 0:00.47 oracle

8498 oracle 20 0 20.2g 5.8g 5.8g R 99.7 4.6 0:01.47 oracle

8498 oracle 20 0 20.2g 10g 10g R 99.8 8.4 0:02.47 oracle

8498 oracle 20 0 20.2g 15g 15g R 99.8 12.2 0:03.47 oracle

--//sleep 1

8498 oracle 20 0 20.2g 19g 19g S 95.9 15.7 0:04.43 oracle

8498 oracle 20 0 20.2g 19g 19g S 0.0 15.7 0:04.43 oracle

8498 oracle 20 0 20.0g 19g 19g R 1.0 15.6 0:04.44 oracle

--//你可以發現RES,SHR都在增加,CPU使用率在9X%,感覺11g版本在設定pre_page_sga=true下有問題.

--//大致定位時間定位在建立pagetables的過程.

5.改用hugepages呢?

$ grep -i page /proc/meminfo

AnonPages: 244144 kB

PageTables: 11508 kB

AnonHugePages: 0 kB

HugePages_Total: 26000

HugePages_Free: 15800

HugePages_Rsvd: 41

HugePages_Surp: 0

Hugepagesize: 2048 kB

$ time sqlplus -s -l / as sysdba <<<quit

real 0m0.180s

user 0m0.029s

sys 0m0.011s

$ strace -fTr -o /tmp/a3.txt sqlplus -s -l / as sysdba <<<quit

$ awk '{print $2}' /tmp/a3.txt | sort -nr | head

0.121929

0.009182

0.009144

0.009047

0.005081

0.002050

0.001984

0.001932

0.001321

0.001243

--//檢視a3.txt

8716 0.000095 chdir("/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs") = 0 <0.000019>

--//這裡等待時間很小.

8716 0.121929 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f552659d000 <0.000029>

8716 0.000143 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f552657a000 <0.000020>

8716 0.000166 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5526557000 <0.000020>

8716 0.000260 getcwd("/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs"..., 256) = 46 <0.000023>

--//總之:

如果pre_page_sga=true最好開啟hugepages.

很奇怪的是12c 預設pre_page_sga=true,沒有開啟hugepages的情況下,沒有這個問題.

[20190409]pre_page_sga=true與連線緩慢的問題.txt

相關文章