定位 UNIX 上常見問題的經驗總結

發表於2012-06-19

簡介： 本文主要對 UNIX 平臺常見的問題進行了分類，介紹一些常見問題分析時使用的方法和命令，對以下三種常見問題的分析方法做了簡單介紹：UNIX 下 Crash 問題的分析方法、UNIX 下記憶體洩露問題的分析方法和 UNIX 下 performance 問題的分析方法。

同時通過對下面兩個例子的介紹，鞏固了上面問題分析的介紹：

● 一個多執行緒應用的效能問題的分析

● 一個 crash 問題的分析

UNIX 程式常見問題分類

UNIX 下執行程式，經常會遇到以下幾類問題 :

1. Crash

2. 記憶體洩露

3. 控制程式碼洩露

4. 程式不響應

5. 效能不滿足預期

6. 邏輯錯誤

UNIX 程式常見問題的分析方法

UNIX 下 Crash 問題的分析方法

crash 原理和 core 檔案生成原因 ( 訊號的介紹 )

Crash 是程式崩潰，是由於應用程式做了錯誤的操作 ( 例如，陣列拷貝越界導致對系統記憶體進行了寫操作，使用了錯誤的指標地址 ), 作業系統嚮應用程式傳送了訊號，如果應用程式沒有做特殊處理，應用程式將 core dump 在程式當前的工作目錄下生成一個 core 檔案，core 檔案複製了該程式的儲存影像，是一個記憶體映像。

不是所有的訊號預設行為都是 crash, 常見預設 crash 訊號主要有：

SIGABRT

SIGBUS

SIGSEGV

SIGILL

SIGPIPE

可以通過 kill –l （適用所有 UNIX 平臺）檢視訊號的資訊。

檢視針對某個程式的所有訊號的預設行為（例如：在 Solaris 平臺使用 psig pid 命令檢視，其他平臺的命令略有不同，請參考各自平臺使用者手冊）.

[root@svs4qa09 SunOS a]# psig  25040 
 25040:  /qatest/ModelerServer/5.0.0.0.64/modelersrv_15_0 -server 
 HUP     caught  0x10002958c     0 
 INT     caught  0x100029580     0 
 QUIT    default 
 ILL     default 
 TRAP    default 
 ABRT    default 
 EMT     default 
 FPE     default 
 KILL    default 
 BUS     default 
 SEGV    default 
 SYS     default 
 PIPE    ignored 
 ALRM    default 
 TERM    caught  0x100029580     0 
 USR1    default 
 USR2    default 
 CLD     caught  0x100067f44     NOCLDSTOP

[root@svs4qa09 SunOS a]# psig 25040

25040: /qatest/ModelerServer/5.0.0.0.64/modelersrv_15_0 -server

HUP caught 0x10002958c 0

INT caught 0x100029580 0

QUIT default

ILL default

TRAP default

ABRT default

EMT default

FPE default

KILL default

BUS default

SEGV default

SYS default

PIPE ignored

ALRM default

TERM caught 0x100029580 0

USR1 default

USR2 default

CLD caught 0x100067f44 NOCLDSTOP

下面列舉一些常見訊號的預設操作以及可能產生的原因：

例如：Solaris 平臺如下。下面的資訊參考 Solaris 核心結構第 2 版第二章（Solaris 程式模型）第 75 頁，其他平臺基本相同，請參考各自平臺使用者手冊：

訊號值處理動作發出訊號的原因

SIGHUP 預設的動作是終止程式終端掛起或者控制程式終止

SIGINT 預設的動作是終止程式鍵盤中斷（如 break 鍵被按下）

SIGQUIT 預設的動作是終止程式並進行核心映像轉儲（dump core）鍵盤的退出鍵被按下

SIGILL 預設的動作是終止程式並進行核心映像轉儲（dump core）非法指令

SIGABRT 預設的動作是終止程式並進行核心映像轉儲（dump core）由 abort(3) 發出的退出指令

SIGFPE 預設的動作是終止程式並進行核心映像轉儲（dump core）浮點異常

SIGKILL 9 AEF Kill 訊號終止訊號

SIGSEGV 預設的動作是終止程式並進行核心映像轉儲（dump core）無效的記憶體引用

SIGPIPE 預設的動作是終止程式管道破裂 : 寫一個沒有讀埠的管道

SIGALRM 預設的動作是終止程式由 alarm(2) 發出的訊號

SIGTERM 預設的動作是終止程式終止訊號

SIGUSR1 預設的動作是終止程式使用者自定義訊號 1

SIGUSR2 預設的動作是終止程式使用者自定義訊號 2

SIGCHLD 預設的動作是忽略此訊號子程式結束訊號

SIGSTOP DEF 終止程式

SIGBUS 預設的動作是終止程式並進行核心映像轉儲（dump core）匯流排錯誤 ( 錯誤的記憶體訪問 )

core 檔案分析一般思路

首先使用 file 命令（所有 UNIX 平臺適用）檢視 core 檔案生成的源程式

bash-3.00$ file core 
 core:           ELF 64-bit MSB core file SPARCV9 Version 1, from 'qatest'

1 2	bash-3.00$ file core core: ELF 64-bit MSB core file SPARCV9 Version 1, from 'qatest'

從以上結果可以看出，該 core 檔案是由 64 位程式 qatest 生成的。

然後使用 gdb( 或者 dbx) 對 core 檔案進行分析：

bash-2.05$ dbx ./qatest ./core

1	bash-2.05$ dbx ./qatest ./core

再使用 where 命令檢視 core 的位置：

t@1 (l@1) program terminated by signal BUS(invalid address alignment) 
 Current function is MCXML_700::MCSetting::MCSetting 
   87       fpValue = s.GetValue()->Clone(); 
 (dbx) where

t@1 (l@1) program terminated by signal BUS(invalid address alignment)

Current function is MCXML_700::MCSetting::MCSetting

87 fpValue = s.GetValue()->Clone();

(dbx) where

從這個 core 檔案可以看到，它收到了 BUS 訊號，crash 的位置在 = s.GetValue()->Clone() 函式。

更多有關 gdb，dbx 的使用請參考 gdb，dbx 使用者手冊。

core 檔案無法生成常見原因

當程式崩潰時，並不是總會生成 core 檔案。經常有下面的情況導致 core 檔案沒有產生：

1. 對 core 檔案大小做了限制，可以通過 ulimit（所有 UNIX 平臺適用）的命令進行檢視：

bash-3.00$ ulimit -a 
 core file size        (blocks, -c) unlimited 
 data seg size         (kbytes, -d) unlimited 
 file size             (blocks, -f) unlimited 
 open files                    (-n) 256 
 pipe size          (512 bytes, -p) 10 
 stack size            (kbytes, -s) unlimited 
 cpu time             (seconds, -t) unlimited 
 max user processes            (-u) 29995 
 virtual memory        (kbytes, -v) unlimited

bash-3.00$ ulimit -a

core file size (blocks, -c) unlimited

data seg size (kbytes, -d) unlimited

file size (blocks, -f) unlimited

open files (-n) 256

pipe size (512 bytes, -p) 10

stack size (kbytes, -s) unlimited

cpu time (seconds, -t) unlimited

max user processes (-u) 29995

virtual memory (kbytes, -v) unlimited

建議使用下面的命令將這個限制改為 unlimited

bash-3.00$ ulimit –c unlimited

1	bash-3.00$ ulimit –c unlimited

2. 磁碟空間是否充足，通過 df 命令（所有 UNIX 平臺適用）檢視 Available 的空間是否充足。

bash-3.00$ df -k 
Filesystem           1024-blocks        Used   Available  Capacity  Mounted on 
/                              0    40975117    99394509    30%       / 
/dev                   140369626    40975117    99394509    30%       /dev

bash-3.00$ df -k

Filesystem 1024-blocks Used Available Capacity Mounted on

/ 0 40975117 99394509 30% /

/dev 140369626 40975117 99394509 30% /dev

3. 檢視訊號是否被捕獲（例如：Solaris 平臺使用 psig 進行檢視，見上面的例子，其他平臺的命令略有不同，請參考各自平臺使用者手冊）。

如果上面的情況導致 core 檔案沒有生成，請修改它。

4.沒有 core 檔案產生，如何分析 crash

有時候經常發現程式 crash 了，但是 core dump 檔案沒有產生。這時可以使用 dbx，gdb 等除錯工具首先 attach 到執行的程式上，然後再執行業務，如果程式 crash，dbx 或者 gdb 將終止在 crash 的位置，我們便可以根據這個堆疊資訊對 crash 進行分析，與分析 core 檔案相同。

UNIX 下記憶體洩露問題分析方法

記憶體洩露簡單的說就是申請了一塊記憶體空間，使用完畢後沒有釋放掉。它的一般表現方式是程式執行時間越長，佔用記憶體越多，最終用盡全部記憶體，整個系統崩潰。

封裝 new 和 delete 對記憶體洩漏進行分析

通過對 new 和 delete 的封裝，將 new 和 delete 的過程通過日誌檔案的儲存記錄下來。然後對日誌檔案進行分析，是否 new 和 delete 是匹配的，有哪些記憶體申請，但是沒有釋放。

下面通過一個簡單的測試程式（此程式碼使用 C++ 語言實現，目前沒有考慮申請陣列的情況）進行演示：

這個測試程式申請了 pTemp1，pTemp2，pTemp3 的三塊記憶體，但是僅僅釋放了 pTemp3，存在 pTemp1 和 pTemp2 的記憶體洩露。

程式解釋：

在每次記憶體申請時，將記憶體申請的資訊註冊到 MAP 表中，在每次記憶體釋放時，將對應的記憶體資訊從登錄檔中刪除，這樣登錄檔中將儲存未釋放的記憶體資訊，按照一定的規則將登錄檔中的資訊輸出（定時或者程式退出等）。然後我們從輸出資訊中便可以分析出記憶體洩漏點。

通過自定義巨集 DEMONEW 和 DEMODELETE 申請記憶體和釋放記憶體，在這兩個巨集中，我們將記憶體的申請和釋放做了記錄，從而可以得到未釋放記憶體的資訊，請參考下面的程式實現流程圖：

圖 1. 記憶體申請釋放流程：

圖 2.DEMONEW 實現流程：

圖 3.DEMODELETE 實現流程：

測試程式程式碼：

#include <map> 
 #include <iostream> 
 #include <string> 
 #include <fstream> 

 // 申請記憶體時，儲存 new 位置的資料結構
 typedef struct { 
 std::string filename; 
 int line; 
 } MEMINFO; 

 // 輸出檔案
 std::ofstream loginfo("//tmp/memory.log"); 

 typedef std::map<long long, MEMINFO> MemMap; 

 // 儲存記憶體申請記錄（在每次記憶體申請時，將記憶體申請的地址作為鍵值，
 // 記憶體申請操作所在的檔名和行號作為內容，儲存到下面的資料結構 memmap 中）
 MemMap memmap; 

 // 註冊記憶體申請資訊到上面的 map 容器中，輸入的引數分別為記憶體地址，檔名，行號
 void RegMemInfo(long long addr, const char *fname, long long lnum) 
 { 
        MEMINFO info; 

        if (fname) 
        { 
            info.filename = fname; 
        } 
        info.line = lnum; 
        memmap.insert(MemMap::value_type(addr, info)); 
 }; 

 // 解除安裝記憶體申請資訊從上面的 map 容器中，輸入的引數為記憶體地址
 void UnRegMemInfo(long long addr) 
 { 
        if (memmap.end() != memmap.find(addr)) 
        { 
                memmap.erase(addr); 
        } 
 } 

 // 定義巨集 DEMONEW，封裝了記憶體申請的操作，在記憶體申請成功後，呼叫 RegMemInfo 功能，
 // 將記憶體資訊註冊到 map 容器中
 #define DEMONEW(p, ptype)\ 
 do \ 
 {\ 
        p = new ptype;\ 
        if (p)\ 
        {\ 
            RegMemInfo((long long)p, __FILE__, __LINE__);\ 
        }\ 
        else\ 
    {\ 
            std::cout<<"NEW failed"<<std::endl;\ 
    }\ 
 }\ 
 while(0) 

 // 定義巨集 DEMODELETE，封裝了記憶體釋放的操作，在記憶體釋放時，呼叫 UnRegMemInfo 
 // 功能，將記憶體資訊從 map 容器中刪除
 #define DEMODELETE(p) \ 
 do\ 
 {\ 
        if (p)\ 
        {\ 
                UnRegMemInfo((long long)p);\ 
                delete p;\ 
                p = 0;\ 
        }\ 
 }while(0) 

 // 寫資訊流內容到檔案
 void WriteString(std::string buf) 
 { 
        loginfo << buf <<std::endl; 
 } 

 // 將整數轉換為字串
 std::string Int2Str(int value) 
 { 
        char buf[16] = {0}; 
        sprintf(buf, "%d", value); 
        return buf; 
 } 

 // 輸出 map 容器中儲存的記憶體沒有釋放的資訊
 void Output() 
 { 
        loginfo.clear(); 

        if (memmap.empty()) 
        { 
            WriteString("No Memory leak."); 
            return; 
        } 

        MemMap::iterator iter; 
        WriteString("The Memory leak is below:"); 
        for (iter = memmap.begin(); iter != memmap.end(); ++iter) 
        { 
                std::string buf; 
                std::string sAddr = Int2Str(iter->first); 
                std::string sLine = Int2Str(iter->second.line); 
                buf += "memory Address "; 
                buf += sAddr; 
                buf += ": FILE "; 
                buf += iter->second.filename; 
                buf += ", LINE "; 
                buf += sLine; 
                buf += " no freed"; 
                WriteString(buf); 
        } 
 } 

 // 測試程式主入口函式
 int main(int argc,  char* argv[]) 
 { 
        char* pTemp1 = 0; 
        DEMONEW(pTemp1, char); 
        char* pTemp2 = 0; 
        DEMONEW(pTemp2, char); 
        char* pTemp3 = 0; 
        DEMONEW(pTemp3, char); 
        DEMODELETE(pTemp1); 

        Output(); 
        loginfo.close(); 
        return 0; 
 }

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

#include <map>

#include <iostream>

#include <string>

#include <fstream>

// 申請記憶體時，儲存 new 位置的資料結構

typedef struct {

std::string filename;

int line;

} MEMINFO;

// 輸出檔案

std::ofstream loginfo("//tmp/memory.log");

typedef std::map<long long, MEMINFO> MemMap;

// 儲存記憶體申請記錄（在每次記憶體申請時，將記憶體申請的地址作為鍵值，

// 記憶體申請操作所在的檔名和行號作為內容，儲存到下面的資料結構 memmap 中）

MemMap memmap;

// 註冊記憶體申請資訊到上面的 map 容器中，輸入的引數分別為記憶體地址，檔名，行號

void RegMemInfo(long long addr, const char *fname, long long lnum)

{

MEMINFO info;

if (fname)

{

info.filename = fname;

}

info.line = lnum;

memmap.insert(MemMap::value_type(addr, info));

};

// 解除安裝記憶體申請資訊從上面的 map 容器中，輸入的引數為記憶體地址

void UnRegMemInfo(long long addr)

{

if (memmap.end() != memmap.find(addr))

{

memmap.erase(addr);

}

// 定義巨集 DEMONEW，封裝了記憶體申請的操作，在記憶體申請成功後，呼叫 RegMemInfo 功能，

// 將記憶體資訊註冊到 map 容器中

#define DEMONEW(p, ptype)\

do \

p = new ptype;\

if (p)\

RegMemInfo((long long)p, __FILE__, __LINE__);\

else\

std::cout<<"NEW failed"<<std::endl;\

while(0)

// 定義巨集 DEMODELETE，封裝了記憶體釋放的操作，在記憶體釋放時，呼叫 UnRegMemInfo

// 功能，將記憶體資訊從 map 容器中刪除

#define DEMODELETE(p) \

do\

if (p)\

UnRegMemInfo((long long)p);\

delete p;\

p = 0;\

}while(0)

// 寫資訊流內容到檔案

void WriteString(std::string buf)

{

loginfo << buf <<std::endl;

}

// 將整數轉換為字串

std::string Int2Str(int value)

{

char buf[16] = {0};

sprintf(buf, "%d", value);

return buf;

}

// 輸出 map 容器中儲存的記憶體沒有釋放的資訊

void Output()

{

loginfo.clear();

if (memmap.empty())

{

WriteString("No Memory leak.");

return;

}

MemMap::iterator iter;

WriteString("The Memory leak is below:");

for (iter = memmap.begin(); iter != memmap.end(); ++iter)

{

std::string buf;

std::string sAddr = Int2Str(iter->first);

std::string sLine = Int2Str(iter->second.line);

buf += "memory Address ";

buf += sAddr;

buf += ": FILE ";

buf += iter->second.filename;

buf += ", LINE ";

buf += sLine;

buf += " no freed";

WriteString(buf);

}

// 測試程式主入口函式

int main(int argc, char* argv[])

{

char* pTemp1 = 0;

DEMONEW(pTemp1, char);

char* pTemp2 = 0;

DEMONEW(pTemp2, char);

char* pTemp3 = 0;

DEMONEW(pTemp3, char);

DEMODELETE(pTemp1);

Output();

loginfo.close();

return 0;

}

上面測試程式的輸出是：

[dyu@xilinuxbldsrv ~]$ vi /tmp/memory.log 

 The Memory leak is below: 
 memory Address 280929008: FILE test.cpp, LINE 109 no freed 
 memory Address 280929152: FILE test.cpp, LINE 111 no freed

[dyu@xilinuxbldsrv ~]$ vi /tmp/memory.log

The Memory leak is below:

memory Address 280929008: FILE test.cpp, LINE 109 no freed

memory Address 280929152: FILE test.cpp, LINE 111 no freed

輸出分析：

從輸出結果我們可以發現，此測試程式在 test.cpp 檔案的 109 和 111 行各有一處記憶體洩漏，檢視原始碼，它們分別是 pTemp1 和 pTemp2。

使用 Purify（適用所有 UNIX 平臺）或者 valgrind（適用 Linux 平臺）工具對記憶體洩漏進行分析

1. 使用 Purify 對記憶體洩漏進行分析

Purify 是 IBM Rational PurifyPlus 的工具之一，是一個面向 VC、VB 或者 Java 開發的測試 Visual C/C++ 和 Java 程式碼中與記憶體有關的錯誤的工具，它確保整個應用程式的質量和可靠性。在查詢典型的 C/C++ 程式中的傳統記憶體訪問錯誤， Rational Purify 可以大顯身手。在 UNIX 系統中，使用 Purify 需要重新編譯程式。通常的做法是修改 Makefile 中的編譯器變數。

例如定義 CC 變數為 purify gcc

CC=purify gcc

1	CC=purify gcc

首先執行 Purify 安裝目錄下的 purifyplus_setup.sh 來設定環境變數，然後執行 make 重新編譯程式。需要指出的是，程式必須編譯成除錯版本。在編譯器命令（例如 Solaris 的 CC 編譯器，Linux 的 gcc 編譯器等）後，也就是必須使用”-g”選項。在重新編譯的程式執行結束後，Purify 會列印出一個分析報告。

測試程式（此程式碼使用 C++ 語言實現）：

#include <stdlib.h> 
 void func1() 
 { 
    //char* pBuf = new char; 
 } 

 void func2() 
 { 
    char* pBuf = new char; 
 } 

 void func3() 
 { 
    char* pBuf = new char; 
 } 

 int main() 
 { 
    func1(); 
    func2(); 
    func3(); 
    return 0; 
 }

#include <stdlib.h>

void func1()

{

//char* pBuf = new char;

}

void func2()

{

char* pBuf = new char;

}

void func3()

{

char* pBuf = new char;

}

int main()

{

func1();

func2();

func3();

return 0;

}

編譯程式：

[dyu@xilinuxbldsrv purify]$ purify g++ -g tst.cpp -o tst1

1	[dyu@xilinuxbldsrv purify]$ purify g++ -g tst.cpp -o tst1

Purify 輸出：

[dyu@xilinuxbldsrv purify]$ ./tst1 
 16:50:59 (rational) OUT: "PurifyPlusUNIX" dyu@xilinuxbldsrv  
 ****  Purify instrumented ./tst1 (pid 530 at Fri Apr  6 16:50:59 2012) 
  * Purify 7.0.0.0-014 090319 Linux (64-bit) (C) Copyright IBM Corporation. 1992, 
  * 2009 All Rights Reserved.  
  * For contact information type: "purify -help"
  * For Purify Viewer output, set the DISPLAY environment variable. 
  * License successfully checked out. 
  * Command-line: ./tst1 
  * Options settings: -g++=yes -purify \ 
-purify-home=
/home/dyu/purify/PurifyPlus.7.0.0.0-014/Rational/releases/\
purify.i386_linux2.7.0.0.0-014
-process-large-objects=yes -gcc3_path=/usr/bin/g++ \ 
-cache-dir=
/home/dyu/purify/PurifyPlus.7.0.0.0-014/Rational/releases/\
purify.i386_linux2.7.0.0.0-014\
/cache

 ****  Purify instrumented ./tst1 (pid 530)  **** 
 Current file descriptors in use: 5 
 FIU: file descriptor 0: <stdin> 
 FIU: file descriptor 1: <stdout> 
 FIU: file descriptor 2: <stderr> 
 FIU: file descriptor 26: <reserved for Purify internal use> 
 FIU: file descriptor 27: <reserved for Purify internal use> 

 ****  Purify instrumented ./tst1 (pid 530)  **** 
 Purify: Searching for all memory leaks... 

 Memory leaked: 2 bytes (100%); potentially leaked: 0 bytes (0%) 

 MLK: 1 byte leaked at 0xa457098 
  * This memory was allocated from: 
        malloc         [rtlib.o] 
        operator new(unsigned long) [libstdc++.so.6] 
        operator new(unsigned long) [rtlib.o] 
        func2()        [tst.cpp:9] 
        main           [tst.cpp:20] 
        __libc_start_main [libc.so.6] 
        _start         [crt1.o] 

 MLK: 1 byte leaked at 0xa457138 
  * This memory was allocated from: 
        malloc         [rtlib.o] 
        operator new(unsigned long) [libstdc++.so.6] 
        operator new(unsigned long) [rtlib.o] 
        func3()        [tst.cpp:14] 
        main           [tst.cpp:21] 
        __libc_start_main [libc.so.6] 
        _start         [crt1.o] 

 Purify Heap Analysis (combining suppressed and unsuppressed blocks) 
                         Blocks        Bytes 
              Leaked          2            2 
  Potentially Leaked          0            0 
              In-Use          0            0 
  ---------------------------------------- 
     Total Allocated          2            2

[dyu@xilinuxbldsrv purify]$ ./tst1

16:50:59 (rational) OUT: "PurifyPlusUNIX" dyu@xilinuxbldsrv

**** Purify instrumented ./tst1 (pid 530 at Fri Apr 6 16:50:59 2012)

* Purify 7.0.0.0-014 090319 Linux (64-bit) (C) Copyright IBM Corporation. 1992,

* For contact information type: "purify -help"

* For Purify Viewer output, set the DISPLAY environment variable.

* License successfully checked out.

* Command-line: ./tst1

* Options settings: -g++=yes -purify \

-purify-home=

/home/dyu/purify/PurifyPlus.7.0.0.0-014/Rational/releases/\

purify.i386_linux2.7.0.0.0-014

-process-large-objects=yes -gcc3_path=/usr/bin/g++ \

-cache-dir=

/home/dyu/purify/PurifyPlus.7.0.0.0-014/Rational/releases/\

purify.i386_linux2.7.0.0.0-014\

/cache

**** Purify instrumented ./tst1 (pid 530) ****

Current file descriptors in use: 5

FIU: file descriptor 0: <stdin>

FIU: file descriptor 1: <stdout>

FIU: file descriptor 2: <stderr>

FIU: file descriptor 26: <reserved for Purify internal use>

FIU: file descriptor 27: <reserved for Purify internal use>

**** Purify instrumented ./tst1 (pid 530) ****

Purify: Searching for all memory leaks...

Memory leaked: 2 bytes (100%); potentially leaked: 0 bytes (0%)

MLK: 1 byte leaked at 0xa457098

* This memory was allocated from:

malloc [rtlib.o]

operator new(unsigned long) [libstdc++.so.6]

operator new(unsigned long) [rtlib.o]

func2() [tst.cpp:9]

main [tst.cpp:20]

__libc_start_main [libc.so.6]

_start [crt1.o]

MLK: 1 byte leaked at 0xa457138

* This memory was allocated from:

malloc [rtlib.o]

operator new(unsigned long) [libstdc++.so.6]

operator new(unsigned long) [rtlib.o]

func3() [tst.cpp:14]

main [tst.cpp:21]

__libc_start_main [libc.so.6]

_start [crt1.o]

Purify Heap Analysis (combining suppressed and unsuppressed blocks)

Blocks Bytes

Leaked 2 2

Potentially Leaked 0 0

In-Use 0 0

----------------------------------------

Total Allocated 2 2

Purify 圖形輸出：

[dyu@xilinuxbldsrv purify]$ export DISPLAY=9.119.131.33:0

1	[dyu@xilinuxbldsrv purify]$ export DISPLAY=9.119.131.33:0

安裝 Xmanager 等工具，設定 DISPLAY 為本機 IP，見下圖

輸出分析：

從 purify 的輸出可以看出，此測試程式存在兩處記憶體洩漏，它分別是 func2 和 func3，在 tst.cpp 檔案的第 9 和第 14 行。

2. 使用 valgrind（現在僅僅支援 Linux 平臺）對記憶體洩漏進行分析

Valgrind 是一套 Linux 下，開放原始碼（GPL V2）的模擬除錯工具的集合。Valgrind 由核心（core）以及基於核心的其他除錯工具組成。核心類似於一個框架，它模擬了一個 CPU 環境，並提供服務給其他工具；而其他工具則類似於外掛 (plug-in)，利用核心提供的服務完成各種特定的記憶體除錯任務。Valgrind 在對程式進行偵測的時候，不需要對程式進行重新編譯。

下面使用 valgrind 對一個簡單的測試程式進行。

測試程式：

同 Purify 的測試程式相同。

編譯程式：

[dyu@xilinuxbldsrv purify]$ g++ -g tst.cpp -o tst

1	[dyu@xilinuxbldsrv purify]$ g++ -g tst.cpp -o tst

valgrind 輸出：

[dyu@xilinuxbldsrv purify]$ valgrind --leak-check=full ./tst 
 ==25396== Memcheck, a memory error detector 
 ==25396== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. 
 ==25396== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info 
 ==25396== Command: ./tst 
 ==25396== 
 ==25396== 
 ==25396== HEAP SUMMARY: 
 ==25396==     in use at exit: 2 bytes in 2 blocks 
 ==25396==   total heap usage: 2 allocs, 0 frees, 2 bytes allocated 
 ==25396== 
 ==25396== 1 bytes in 1 blocks are definitely lost in loss record 1 of 2 
 ==25396==    at 0x4A0666E: operator new(unsigned long) (vg_replace_malloc.c:220) 
 ==25396==    by 0x4005C7: func2() (tst.cpp:9) 
 ==25396==    by 0x4005DB: main (tst.cpp:20) 
 ==25396== 
 ==25396== 1 bytes in 1 blocks are definitely lost in loss record 2 of 2 
 ==25396==    at 0x4A0666E: operator new(unsigned long) (vg_replace_malloc.c:220) 
 ==25396==    by 0x4005AF: func3() (tst.cpp:14) 
 ==25396==    by 0x4005E0: main (tst.cpp:21) 
 ==25396== 
 ==25396== LEAK SUMMARY: 
 ==25396==    definitely lost: 2 bytes in 2 blocks 
 ==25396==    indirectly lost: 0 bytes in 0 blocks 
 ==25396==      possibly lost: 0 bytes in 0 blocks 
 ==25396==    still reachable: 0 bytes in 0 blocks 
 ==25396==         suppressed: 0 bytes in 0 blocks 
 ==25396== 
 ==25396== For counts of detected and suppressed errors, rerun with: -v 
 ==25396== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4) 
 [dyu@xilinuxbldsrv purify]$

[dyu@xilinuxbldsrv purify]$ valgrind --leak-check=full ./tst

==25396== Memcheck, a memory error detector

==25396== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info

==25396== Command: ./tst

==25396==

==25396== HEAP SUMMARY:

==25396== in use at exit: 2 bytes in 2 blocks

==25396== total heap usage: 2 allocs, 0 frees, 2 bytes allocated

==25396==

==25396== 1 bytes in 1 blocks are definitely lost in loss record 1 of 2

==25396== at 0x4A0666E: operator new(unsigned long) (vg_replace_malloc.c:220)

==25396== by 0x4005C7: func2() (tst.cpp:9)

==25396== by 0x4005DB: main (tst.cpp:20)

==25396==

==25396== 1 bytes in 1 blocks are definitely lost in loss record 2 of 2

==25396== at 0x4A0666E: operator new(unsigned long) (vg_replace_malloc.c:220)

==25396== by 0x4005AF: func3() (tst.cpp:14)

==25396== by 0x4005E0: main (tst.cpp:21)

==25396==

==25396== LEAK SUMMARY:

==25396== definitely lost: 2 bytes in 2 blocks

==25396== indirectly lost: 0 bytes in 0 blocks

==25396== possibly lost: 0 bytes in 0 blocks

==25396== still reachable: 0 bytes in 0 blocks

==25396== suppressed: 0 bytes in 0 blocks

==25396==

==25396== For counts of detected and suppressed errors, rerun with: -v

==25396== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4)

[dyu@xilinuxbldsrv purify]$

輸出分析：

從 valgrind 的輸出可以看出，此測試程式存在兩處記憶體洩漏，它分別是 func2 和 func3，在 tst.cpp 檔案的第 9 和第 14 行，與 purify 的檢測結果相同。

UNIX 下程式效能問題分析方法

● 檢查 CPU 佔用情況（包含單個執行緒的 CPU 使用情況）

下面分別對 Solaris 和 Linux 平臺做了舉例，其他平臺的命令略有不同，請參考各自平臺使用者手冊。

例如：在 Solaris 平臺

使用 prtdiag 命令檢視系統 CPU 資訊：

 [/rnd/homes/builder1/purify >prtdiag

1	[/rnd/homes/builder1/purify >prtdiag

輸出資訊：

SystemConfiguration:SunMicrosystemssun4uSunFireV210
Systemclockfrequency:167MHZ
Memorysize:8GB
==================================== CPUs ====================================
               E$          CPU                    CPU     Temperature
CPU  Freq      Size        Implementation        Mask    Die   Amb.  Status  Location
---  --------  ----------  -------------------- -----   ----  ----  ------  --------
0    1002 MHz  1MB         SUNW,UltraSPARC-IIIi   2.3     -     -    online  MB/P0
1    1002 MHz  1MB         SUNW,UltraSPARC-IIIi   2.3     -     -    online  MB/P1

SystemConfiguration:SunMicrosystemssun4uSunFireV210

Systemclockfrequency:167MHZ

Memorysize:8GB

==================================== CPUs ====================================

E$ CPU CPU Temperature

CPU Freq Size Implementation Mask Die Amb. Status Location

--- -------- ---------- -------------------- ----- ---- ---- ------ --------

0 1002 MHz 1MB SUNW,UltraSPARC-IIIi 2.3 - - online MB/P0

1 1002 MHz 1MB SUNW,UltraSPARC-IIIi 2.3 - - online MB/P1

輸出分析:

從上面的輸出可以發現，此伺服器有兩個 CPU，以及各個 CPU 的資訊。

使用 prstat 命令程式中每個執行緒的 CPU 使用情況：

[/rnd/homes/builder1/purify >>prstat -Lu root

1	[/rnd/homes/builder1/purify >>prstat -Lu root

輸出資訊：

   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID      
   230 root     3896K 3072K sleep   59    0   0:04:37 0.0% nscd/18 
 26229 root      201M  118M sleep   59    0   4:38:06 0.0% java/4

PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID

230 root 3896K 3072K sleep 59 0 0:04:37 0.0% nscd/18

26229 root 201M 118M sleep 59 0 4:38:06 0.0% java/4

輸出分析：

LWPID 雖然不是執行緒 ID，但是在 Solaris10 版本與執行緒 ID 是一一對應關係，所以可以通過 LWPID 進行分析。

例如：在 Linux 平臺

檢視 CPU 個數 ( 使用 top 命令，然後按 1 鍵可顯示 CPU 的個數以及每個 CPU 的負載情況 )：

 [dyu@xilinuxbldsrv purify]$top

1	[dyu@xilinuxbldsrv purify]$top

輸出資訊：

Tasks: 205 total,   7 running, 196 sleeping,   1 stopped,   1 zombie 
 Cpu0  : 92.7%us,  7.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st 
 Cpu1  : 94.2%us,  5.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st 
 Mem:   2033948k total,  1867908k used,   166040k free,    20088k buffers 
 Swap:  4095992k total,   393420k used,  3702572k free,   389476k cached 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3088 root      19   0  136m  73m 6960 R 57.2  3.7   0:00.89 cc1plus
 3082 root      21   0 40580  33m 4732 R 52.7  1.7   0:00.75 cc1plus
 3068 root      25   0  232m 163m  10m R 45.2  8.3   0:03.69 cc1plus
 3085 root      19   0 98.8m  33m 5696 R 28.6  1.7   0:00.24 cc1plus
 2901 root      25   0 89732  83m 4508 R 15.1  4.2   0:09.40 cc1plus
 3069 dyu       15   0 10884 1120  768 R  1.5  0.1   0:00.04 top
    1 root      15   0 10372  380  348 S  0.0  0.0   0:03.19 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.15 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0

Tasks: 205 total, 7 running, 196 sleeping, 1 stopped, 1 zombie

Cpu0 : 92.7%us, 7.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu1 : 94.2%us, 5.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 2033948k total, 1867908k used, 166040k free, 20088k buffers

Swap: 4095992k total, 393420k used, 3702572k free, 389476k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

3088 root 19 0 136m 73m 6960 R 57.2 3.7 0:00.89 cc1plus

3082 root 21 0 40580 33m 4732 R 52.7 1.7 0:00.75 cc1plus

3068 root 25 0 232m 163m 10m R 45.2 8.3 0:03.69 cc1plus

3085 root 19 0 98.8m 33m 5696 R 28.6 1.7 0:00.24 cc1plus

2901 root 25 0 89732 83m 4508 R 15.1 4.2 0:09.40 cc1plus

3069 dyu 15 0 10884 1120 768 R 1.5 0.1 0:00.04 top

1 root 15 0 10372 380 348 S 0.0 0.0 0:03.19 init

2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0

3 root 34 19 0 0 0 S 0.0 0.0 0:00.15 ksoftirqd/0

4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0

輸出分析：

從上面輸出可以得到，此伺服器有兩個 CPU，均為滿負荷工作，空閒的 CPU 使用均為 0%。

檢視程式中每個執行緒的 CPU 使用情況：

[dyu@xilinuxbldsrv purify]$ ps -e -o user,pid,ppid,tid,time,%cpu,cmd --sort=%cpu

1	[dyu@xilinuxbldsrv purify]$ ps -e -o user,pid,ppid,tid,time,%cpu,cmd --sort=%cpu

–sort=%cpu 命令輸出要求按 CPU 的使用多少進行排序輸出。

輸出資訊：

[dyu@xilinuxbldsrv ~]$ ps -e -o user,pid,ppid,tid,time,%cpu,cmd --sort=%cpu 
 USER       PID  PPID   TID     TIME %CPU CMD 
 root         2     1     2 00:00:00  0.0 [migration/0] 
 root         3     1     3 00:00:00  0.0 [ksoftirqd/0] 
 root         4     1     4 00:00:00  0.0 [watchdog/0] 
 root         7     1     7 00:00:00  0.0 [watchdog/1] 
 root        10     1    10 00:00:00  0.0 [khelper] 
 root        27     1    27 00:00:00  0.0 [kthread] 
 root        34    27    34 00:00:00  0.0 [kacpid]

[dyu@xilinuxbldsrv ~]$ ps -e -o user,pid,ppid,tid,time,%cpu,cmd --sort=%cpu

USER PID PPID TID TIME %CPU CMD

root 2 1 2 00:00:00 0.0 [migration/0]

root 3 1 3 00:00:00 0.0 [ksoftirqd/0]

root 4 1 4 00:00:00 0.0 [watchdog/0]

root 7 1 7 00:00:00 0.0 [watchdog/1]

root 10 1 10 00:00:00 0.0 [khelper]

root 27 1 27 00:00:00 0.0 [kthread]

root 34 27 34 00:00:00 0.0 [kacpid]

輸出分析：

從上面輸出可以根據每個執行緒的 CPU 使用情況分析，效能瓶頸在哪個執行緒。

檢查 IO

使用 iostat 命令可以檢視系統 IO 狀態等資訊。

例如：Solaris 平臺 iostat 輸出：

/rnd/homes/builder1/purify >>iostat 1 
   tty        sd0           sd1           nfs1          nfs2           cpu 
 tin tout kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id 
   0    6  12   2    9    0   0   27    0   0    0    4   0    8    2  1  0 97 
   0  234   0   0    0    0   0    0    0   0    0    0   0    0    0  1  0 99 
   0   80   1   1   11    0   0    0    0   0    0    0   0    0    0  5  0 95 
   0   80   0   0    0    0   0    0    0   0    0    0   0    0    0  1  0 99

/rnd/homes/builder1/purify >>iostat 1

tty sd0 sd1 nfs1 nfs2 cpu

tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id

0 6 12 2 9 0 0 27 0 0 0 4 0 8 2 1 0 97

0 234 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 99

0 80 1 1 11 0 0 0 0 0 0 0 0 0 0 5 0 95

0 80 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 99

上面是 iostat 的一個簡單輸出，可以檢視 iostat 的幫助（man iostat）瞭解更多資訊。

使用 Quantify 對程式效能進行分析

Rational Quantify 是 IBM Rational PurifyPlus 的工具之一，可以按多種級別（包括程式碼行級和函式級）測定效能，並提供分析效能改進所需的準確和詳細的資訊，使您可以核實程式碼效能確實有所提高。使用 Rational Quantify，您可以更好地控制資料記錄的速度和準確性。您可以按模組調整工具收集資訊的級別：對於應用程式中感興趣的那部分，可以收集詳細資訊；而對於不太重要的模組，您可以加快資料記錄的速度並簡化資料收集。使用“執行控制工具欄”，可以直接、實時地控制效能資料的記錄。既可以收集應用程式在整個執行過程中的效能資料，也可以只收集程式執行過程中您最感興趣的某些階段的效能資料。

下面是一個使用 Quantify 對程式效能進行分析的例子

測試程式（此程式碼使用 C++ 語言實現）

#include <stdlib.h> 
 #include <iostream> 
 void func1() 
 { 
    for (int i = 0; i < 10000; ++i) 
        { 
        char* pBuf = new char[1024]; 
              delete []pBuf; 
    } 
 } 

 void func2() 
 { 
    for (int i = 0; i < 10000; ++i) 
        { 
        char* pBuf = new char[1024]; 
              delete []pBuf; 
    } 
 } 

 void func3() 
 { 
    for (int i = 0; i < 100; ++i) 
        { 
            std::cout << "Hello World" << std::endl; 
        } 
 } 

 int main() 
 { 
    func1(); 
    func2(); 
    func3(); 
    return 0; 
 }

#include <stdlib.h>

#include <iostream>

void func1()

{

for (int i = 0; i < 10000; ++i)

{

char* pBuf = new char[1024];

delete []pBuf;

}

void func2()

{

for (int i = 0; i < 10000; ++i)

{

char* pBuf = new char[1024];

delete []pBuf;

}

void func3()

{

for (int i = 0; i < 100; ++i)

{

std::cout << "Hello World" << std::endl;

}

int main()

{

func1();

func2();

func3();

return 0;

}

編譯程式：

[dyu@xilinuxbldsrv purify]$ quantify g++ -g performance.c -o performancetst

1	[dyu@xilinuxbldsrv purify]$ quantify g++ -g performance.c -o performancetst

Quantify 輸出：

****  Quantify instrumented ./performancetst (pid 18503 at 20:12:14)
  Quantify 7.0.0.0-014 090319 Linux (64-bit) (C) Copyright IBM Corporation. 1992, 
 2009 All Rights Reserved.  
  * For contact information type: "quantify -help"
  * License successfully checked out. 
 Quantify: Sending data for 178 of 5713 functions 
 from ./performancetst (pid 18772).........done. 

 Quantify: Resource Statistics for ./performancetst (pid 18772) 
 *                                            cycles       secs 
 * Total counted time:                      56984465      0.031 (100.0%) 
 *     Time in your code:                    5599024      0.003 (  9.8%) 
 *     Time in system calls:                51252884      0.027 ( 89.9%) 
 *     Dynamic library loading:               132557      0.000 (  0.2%) 
 * 
 * Note: Data collected assuming a machine type of Intel-Core with 
 * clock rate of 1.867 GHz. 
 * Note: These times exclude Quantify overhead and possible memory effects. 
 * 
 * Elapsed data collection time:       0.383 secs 
 * 
 * Note: This measurement includes Quantify overhead. 
 * 
 To view your saved Quantify data, type: 
 quantify -view /home/dyu/purify/performancetst.18772.0.qv

**** Quantify instrumented ./performancetst (pid 18503 at 20:12:14)

* For contact information type: "quantify -help"

* License successfully checked out.

Quantify: Sending data for 178 of 5713 functions

from ./performancetst (pid 18772).........done.

Quantify: Resource Statistics for ./performancetst (pid 18772)

* cycles secs

* Total counted time: 56984465 0.031 (100.0%)

* Time in your code: 5599024 0.003 ( 9.8%)

* Time in system calls: 51252884 0.027 ( 89.9%)

* Dynamic library loading: 132557 0.000 ( 0.2%)

* Note: Data collected assuming a machine type of Intel-Core with

* clock rate of 1.867 GHz.

* Note: These times exclude Quantify overhead and possible memory effects.

* Elapsed data collection time: 0.383 secs

* Note: This measurement includes Quantify overhead.

To view your saved Quantify data, type:

quantify -view /home/dyu/purify/performancetst.18772.0.qv

Quantify 的圖形輸出：

安裝 Xmanager 等工具，設定 DISPLAY 為本機 IP，見下圖：

[dyu@xilinuxbldsrv purify]$ export DISPLAY=9.119.131.33:0

1	[dyu@xilinuxbldsrv purify]$ export DISPLAY=9.119.131.33:0

輸出分析：

從 Quantify 的輸出可以對程式的效能進行初步分析，func1 時間花費為 43.14%，func2 為 42.59%，func3 為 14.27%。

示例演示

通過兩個例項去演示多執行緒效能問題和產品不相容導致 crash 的問題。

一個多執行緒互斥導致效能瓶頸問題

1.問題描述：

對某個多執行緒程式，當單執行緒的情況下，執行任務 1 花費 70s，但是當配置為 16 個執行緒時，執行任務 1 仍然花費時間大約 70s。

2.問題分析：

首先檢視單個執行緒或多個執行緒的 CPU 使用情況

PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND 
 11248 czhou      7   0    0  556M  555M cpu/1    0:17 6.25% CFTestApp

1 2	PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND 11248 czhou 7 0 0 556M 555M cpu/1 0:17 6.25% CFTestApp

當多執行緒情況下，檢視每個執行緒的 CPU 佔用情況：
 PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID 
 11248 czhou    3606M 3605M cpu1     0    0   0:07:11  6.25% CFTestApp/12 
 11248 czhou    3606M 3605M cpu1     0    0   0:07:11  0% CFTestApp/1 
 11248 czhou    3606M 3605M cpu1     0    0   0:07:11  0% CFTestApp/5 
 11248 czhou    3606M 3605M cpu1     0    0   0:07:11  0% CFTestApp/7

當多執行緒情況下，檢視每個執行緒的 CPU 佔用情況：

PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID

11248 czhou 3606M 3605M cpu1 0 0 0:07:11 6.25% CFTestApp/12

11248 czhou 3606M 3605M cpu1 0 0 0:07:11 0% CFTestApp/1

11248 czhou 3606M 3605M cpu1 0 0 0:07:11 0% CFTestApp/5

11248 czhou 3606M 3605M cpu1 0 0 0:07:11 0% CFTestApp/7

從上面可以發現。當多執行緒的情況下，在 16 個執行緒中僅僅一個執行緒的 CPU 佔用 6.25%，其他執行緒佔用均為 0%。可以斷定大多數的執行緒被 block 住。然後需要檢視這個程式的堆疊資訊，得到每個執行緒的函式呼叫關係。

Pstack 11248 
 -----------------  lwp# 1 / thread# 1  -------------------- 
 ff11af60 ???(ffbff8e8, ffbff8e0) 
 ff15e328 dynamic_cast(258, 7, 10f88, 0, 139c0, 21508) + 58 
 0001110c main     (1, ffbff9d4, ffbff9dc, 21400, 0, 0) + fc 
 00010b58 _start   (0, 0, 0, 0, 0, 0) + 108 
 -----------------  lwp# 7  -------------------------------- 
 ff11af60 ???(fef7ff30, fef7ff28) 
 ff15e328 dynamic_cast(1, ff010224, 0, 0, 0, 0) + 58 
 00010fd8 __1cLwork_thread6Fpv_0_ (0, 0, 0, 0, 0, 0) + 8 
 ff165e48 _lwp_start (0, 0, 0, 0, 0, 0) 
 Then I remove the dynamic_cast, the problem was resolved.

Pstack 11248

----------------- lwp# 1 / thread# 1 --------------------

ff11af60 ???(ffbff8e8, ffbff8e0)

ff15e328 dynamic_cast(258, 7, 10f88, 0, 139c0, 21508) + 58

0001110c main (1, ffbff9d4, ffbff9dc, 21400, 0, 0) + fc

00010b58 _start (0, 0, 0, 0, 0, 0) + 108

----------------- lwp# 7 --------------------------------

ff11af60 ???(fef7ff30, fef7ff28)

ff15e328 dynamic_cast(1, ff010224, 0, 0, 0, 0) + 58

00010fd8 __1cLwork_thread6Fpv_0_ (0, 0, 0, 0, 0, 0) + 8

ff165e48 _lwp_start (0, 0, 0, 0, 0, 0)

Then I remove the dynamic_cast, the problem was resolved.

從上面的執行緒堆疊資訊，我們可以看到，大部分的執行緒幾乎都 block 在 dynamic_cast 函式。

3.（3）問題解決：

針對上面的分析對這個效能瓶頸程式碼進行修正。

一個由於產品不相容導致 crash 的問題

1. 問題描述：

在 Linux 平臺，產品 A 使用編譯器 icpc 編譯，產品 B 使用編譯器 g++ 編譯。程式 C 會同時載入產品 A 與產品 B，當程式 C 執行時呼叫產品 A 中的函式 funcA 時 crash 發生。

2.問題分析：

從 core 檔案，我們可以得到下面的資訊：

(gdb) where 
 #0  std::_Rb_tree<int, std::pair<int const, int>, 
 std::_Select1st<std::pair<int const, 
 int> >, std::less<int>, std::allocator<std::pair<int const, int> > >
 ::_M_end (this=0x602a20) 
at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../include/c++/4.1.2/bits/stl_tree.h:477
#1  0x0000000000400d3c in std::_Rb_tree<int, std::pair<int const, int>, 
 std::_Select1st<std::pair<int const, int> >, std::less<int>, 
 std::allocator<std::pair<int const, int> > >
 ::lower_bound (this=0x602a20, __k=@0x7fffffffe76c) 
at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/c++/4.1.2/bits/stl_tree.h:1368
#2  0x0000000000400dbd in std::map<int, int, std::less<int>, std::allocator<
 std::pair<int const, int> > >::lower_bound ( 
    this=0x602a20, __x=@0x7fffffffe76c) 
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/c++/4.1.2/bits/stl_map.h:576 
 #3  0x0000000000401a0c in std::map<int, int, std::less<int>, 
 std::allocator<std::pair<int const, int> > >::operator[] ( 
    this=0x602a20, __k=@0x7fffffffe76c) 
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/c++/4.1.2/bits/stl_map.h:345 
 #4  0x0000000000400a1b in funcA () at map.cpp:7

(gdb) where

#0 std::_Rb_tree<int, std::pair<int const, int>,

std::_Select1st<std::pair<int const,

int> >, std::less<int>, std::allocator<std::pair<int const, int> > >

::_M_end (this=0x602a20)

at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../include/c++/4.1.2/bits/stl_tree.h:477

#1 0x0000000000400d3c in std::_Rb_tree<int, std::pair<int const, int>,

std::_Select1st<std::pair<int const, int> >, std::less<int>,

std::allocator<std::pair<int const, int> > >

::lower_bound (this=0x602a20, __k=@0x7fffffffe76c)

at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/c++/4.1.2/bits/stl_tree.h:1368

#2 0x0000000000400dbd in std::map<int, int, std::less<int>, std::allocator<

std::pair<int const, int> > >::lower_bound (

this=0x602a20, __x=@0x7fffffffe76c)

at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/c++/4.1.2/bits/stl_map.h:576

#3 0x0000000000401a0c in std::map<int, int, std::less<int>,

std::allocator<std::pair<int const, int> > >::operator[] (

this=0x602a20, __k=@0x7fffffffe76c)

at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/c++/4.1.2/bits/stl_map.h:345

#4 0x0000000000400a1b in funcA () at map.cpp:7

查詢產品 A 的依賴庫，我們可以得到下面的資訊

dyu@ xilinuxbldsrv> ldd libA.so 
        linux-vdso.so.1 =>  (0x00007fffa2eb9000) 
        libm.so.6 => /lib64/libm.so.6 (0x00002b6783a80000) 
        libstdc++.so.5 => /usr/lib64/libstdc++.so.5 (0x00002b6783cd6000) 
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b6783fb9000) 
        libc.so.6 => /lib64/libc.so.6 (0x00002b67841d1000) 
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b678452f000) 
        /lib64/ld-linux-x86-64.so.2 (0x00002b6783626000)

dyu@ xilinuxbldsrv> ldd libA.so

linux-vdso.so.1 => (0x00007fffa2eb9000)

libm.so.6 => /lib64/libm.so.6 (0x00002b6783a80000)

libstdc++.so.5 => /usr/lib64/libstdc++.so.5 (0x00002b6783cd6000)

libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b6783fb9000)

libc.so.6 => /lib64/libc.so.6 (0x00002b67841d1000)

libdl.so.2 => /lib64/libdl.so.2 (0x00002b678452f000)

/lib64/ld-linux-x86-64.so.2 (0x00002b6783626000)

查詢產品 B 的依賴庫，我們可以得到下面的資訊

[dyu@xilinuxbldsrv]$ ldd libB.so 
        linux-vdso.so.1 =&gt;  (0x00007fff02dfc000) 
        libstdc++.so.6 =&gt; /usr/lib64/libstdc++.so.6 (0x0000003d2c600000) 
        libm.so.6 =&gt; /lib64/libm.so.6 (0x0000003d1a400000) 
        libgcc_s.so.1 =&gt; /lib64/libgcc_s.so.1 (0x0000003d27e00000) 
        libc.so.6 =&gt; /lib64/libc.so.6 (0x0000003d19c00000) 
        /lib64/ld-linux-x86-64.so.2 (0x0000003d19800000)

[dyu@xilinuxbldsrv]$ ldd libB.so

linux-vdso.so.1 => (0x00007fff02dfc000)

libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003d2c600000)

libm.so.6 => /lib64/libm.so.6 (0x0000003d1a400000)

libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003d27e00000)

libc.so.6 => /lib64/libc.so.6 (0x0000003d19c00000)

/lib64/ld-linux-x86-64.so.2 (0x0000003d19800000)

這個 crash 的位置是在 stl 的 map 資料結構中，從上面的執行緒棧呼叫可以發現，4.1.2 為 libstdc++.so.6，但是從 A 的依賴庫看，產品 A 依賴 libstdc++.so.5 而不是 libstdc++.so.6，所以 funcA 應該呼叫 libstdc++.so.5。

從上面我們可以發現 crash 的原因是由於 libstdc++.so 的呼叫錯誤導致的。

3.問題解決：

在編譯選項中增加 -fvisibility=hidden ，在 API 宣告中增加

__attribute__ ((visibility (“default”)))。

UNIX 程式問題分析常用命令

1. ulimit — 設定和檢視使用者的使用的資源限制情況

2. nm — 顯示目標檔案的符號表資訊

3. ldd –顯示動態庫的依賴資訊

4. pstack（Solaris， Linux）， procstack（AIX）– 列印十六進位制地址和符號名稱

5. pmap(Solaris, Linux), procmap(AIX) –列印地址空間對映

6. pldd(Solaris), procldd(AIX) —列出程式載入的庫

7. pfiles(Solaris), procfiles(AIX)– 報告有關的所有檔案描述符

8. prstat(Solaris), ps -e -o user,pid,ppid,tid,time,%cpu,cmd –sort=%cpu(Linux)– 檢查每個執行緒的處理器

9. ps –報告程式的狀態

10. iostat –報告中央處理單元（中央處理器）統計和輸入 / 輸出裝置和分割槽統計

11. pwdx（Linux，Solaris）　 pid 顯示當前工作目錄

12. top（Linux，Solaris，HP），topas（AIX）

總結

本文簡單介紹了作者在 UNIX 平臺的一些經常遇到的問題以及一些基本命令的使用，希望對讀者能有幫助。良好的基礎知識（C/C++ 語言，作業系統，核心結構等）是分析問題解決問題的基礎，同時一些 debug 工具以及一些第三方工具的熟練使用對問題分析也能很好的幫助作用。另外良好的程式設計習慣（例如申請的變數要初始化等）是避免問題產生的有效手段，在軟體開發前期的缺陷控制應該是我們的目標。

常見問題總結
2017-08-06
mysql常見問題總結
2021-04-17
MySql
Vue 常見問題總結
2020-09-11
Vue
TCP常見問題總結
2020-10-12
TCP
GeoServer 常見問題總結
2012-12-02
Server
Flink 常見問題總結
2020-11-27
前端常見JS問題總結
2017-12-14
前端JS
Oracle EM 常見問題總結
2012-09-30
Oracle
我的支付總結（三）常見問題
2017-04-04
Python常見面試題總結——個人Python學習經驗
2019-07-16
Python面試題
Kubernetes 常見問題總結
2021-04-07
關於echarts使用的常見問題總結
2017-06-08
Echarts
滑鼠定位問題總結
2015-09-06
網路爬蟲常見問題（個人總結）
2019-01-24
爬蟲
redis快取常見問題場景總結
2023-04-09
Redis快取
VC常見入門問題總結(二) (轉)
2007-08-17
VC常見入門問題總結(一) (轉)
2007-08-17
MyBatis學習總結（24）——Mybatis常見問題彙總
2018-06-07
MyBatis
2015異常問題解決方案經驗總結（一）
2015-01-12
Webpack常見面試題總結
2021-12-12
Web面試題
iOS常見面試題總結
2016-06-05
iOS面試題
解析SQLite中的常見問題與總結詳解
2021-09-09
SQLite
RecyclerView的使用總結以及常見問題解決方案
2018-12-21
View
Redis常見問題彙總
2019-02-11
Redis
Java 常見問題彙總
2020-08-27
Java
Bootstrap常見問題彙總
2018-05-21
boot
做題經驗總結
2024-03-26
Spring Cloud中，Feign常見問題總結
2018-11-22
SpringCloud
【多執行緒】常見問題簡單總結
2023-12-29
執行緒
Cocos平臺整合AGC常見問題總結
2020-10-12
GC
資料庫建模——powerdesiger常見問題總結
2013-07-31
資料庫
嘔心蒐集總結的15個“swoole”常見問題（一）
2020-07-27
我的刷題經驗總結
2021-10-27
手把手教你定位常見Java效能問題
2020-12-19
Java
前端常見面試題少量總結
2019-03-14
前端面試題
域名解析常見問題（上）
2022-10-19
代理IP常見問題彙總
2021-09-11
Redis Manager 常見問題彙總
2018-12-15
Redis

定位 UNIX 上常見問題的經驗總結

相關文章