python效能分析

發表於2016-08-31

調優簡介

什麼是效能分析

沒有優化過的程式通常會在某些子程式(subroutine)上消耗大部分的CPU指令週期(CPU　cycle)。效能分析就是分析程式碼和它正在使用的資源之間有著怎樣的關係。例如,效能分析可以告訴你一個指令佔用了多少CPU時間,或者整個程式消耗了多少記憶體。效能分析是通過使用一種被稱為效能分析器(profiler)的工具,對程式或者二進位制可執行檔案(如果可以拿到)的原始碼進行調整來完成的。

效能分析軟體有兩類方法論:基於事件的效能分析(event-based profiling)和統計式效能分析(statistical profiling)。

支援這類基於事件的效能分析的程式語言主要有以下幾種。

Java:JVMTI(JVM Tools Interface,JVM工具介面)為效能分析器提供了鉤子,可以跟蹤諸如函式呼叫、執行緒相關的事件、類載入之類的事件。
.NET:和Java一樣,.NET執行時提供了事件跟蹤功能(https://en.wikibooks.org/wiki/Intro-duction_to_Software_Engineering/Testing/Profiling#Methods_of_data_gathering)。
Python: 開發者可以用 sys.setprofile 函式,跟蹤 python_[call|return|exception]或 c_[call|return|exception] 之類的事件。

基於事件的效能分析器(event-based profiler,也稱為軌跡效能分析器,tracing profiler)是通過收集程式執行過程中的具體事件進行工作的。這些效能分析器會產生大量的資料。基本上,它們需要監聽的事件越多,產生的資料量就越大。這導致它們不太實用,在開始對程式進行效能分析時也不是首選。但是,當其他效能分析方法不夠用或者不夠精確時,它們可以作為最後的選擇。

Python基於事件的效能分析器的簡單示例程式碼

import sys

def profiler(frame, event, arg):
    print 'PROFILER: %r %r' % (event, arg)
    
sys.setprofile(profiler)

#simple (and very ineficient) example of how to calculate the Fibonacci sequence for a number.
def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)
    
def fib_seq(n):
    seq = [ ]
    if n > 0:
        seq.extend(fib_seq(n-1))
    seq.append(fib(n))
    return seq

print fib_seq(2)

import sys

def profiler(frame, event, arg):

print 'PROFILER: %r %r' % (event, arg)

sys.setprofile(profiler)

#simple (and very ineficient) example of how to calculate the Fibonacci sequence for a number.

def fib(n):

if n == 0:

return 0

elif n == 1:

return 1

else:

return fib(n-1) + fib(n-2)

def fib_seq(n):

seq = [ ]

if n > 0:

seq.extend(fib_seq(n-1))

seq.append(fib(n))

return seq

print fib_seq(2)

執行結果：

$ python test.py 
PROFILER: 'call' None
PROFILER: 'call' None
PROFILER: 'call' None
PROFILER: 'call' None
PROFILER: 'return' 0
PROFILER: 'c_call' <built-in method append of list object at 0x7f113d7f67a0>
PROFILER: 'c_return' <built-in method append of list object at 0x7f113d7f67a0>
PROFILER: 'return' [0]
PROFILER: 'c_call' <built-in method extend of list object at 0x7f113d7e0d40>
PROFILER: 'c_return' <built-in method extend of list object at 0x7f113d7e0d40>
PROFILER: 'call' None
PROFILER: 'return' 1
PROFILER: 'c_call' <built-in method append of list object at 0x7f113d7e0d40>
PROFILER: 'c_return' <built-in method append of list object at 0x7f113d7e0d40>
PROFILER: 'return' [0, 1]
PROFILER: 'c_call' <built-in method extend of list object at 0x7f113d7e0758>
PROFILER: 'c_return' <built-in method extend of list object at 0x7f113d7e0758>
PROFILER: 'call' None
PROFILER: 'call' None
PROFILER: 'return' 1
PROFILER: 'call' None
PROFILER: 'return' 0
PROFILER: 'return' 1
PROFILER: 'c_call' <built-in method append of list object at 0x7f113d7e0758>
PROFILER: 'c_return' <built-in method append of list object at 0x7f113d7e0758>
PROFILER: 'return' [0, 1, 1]
[0, 1, 1]
PROFILER: 'return' None
PROFILER: 'call' None
PROFILER: 'c_call' <built-in method discard of set object at 0x7f113d818960>
PROFILER: 'c_return' <built-in method discard of set object at 0x7f113d818960>
PROFILER: 'return' None
PROFILER: 'call' None
PROFILER: 'c_call' <built-in method discard of set object at 0x7f113d81d3f0>
PROFILER: 'c_return' <built-in method discard of set object at 0x7f113d81d3f0>
PROFILER: 'return' None

$ python test.py

PROFILER: 'call' None

PROFILER: 'return' 0

PROFILER: 'c_call' <built-in method append of list object at 0x7f113d7f67a0>

PROFILER: 'c_return' <built-in method append of list object at 0x7f113d7f67a0>

PROFILER: 'return' [0]

PROFILER: 'c_call' <built-in method extend of list object at 0x7f113d7e0d40>

PROFILER: 'c_return' <built-in method extend of list object at 0x7f113d7e0d40>

PROFILER: 'call' None

PROFILER: 'return' 1

PROFILER: 'c_call' <built-in method append of list object at 0x7f113d7e0d40>

PROFILER: 'c_return' <built-in method append of list object at 0x7f113d7e0d40>

PROFILER: 'return' [0, 1]

PROFILER: 'c_call' <built-in method extend of list object at 0x7f113d7e0758>

PROFILER: 'c_return' <built-in method extend of list object at 0x7f113d7e0758>

PROFILER: 'call' None

PROFILER: 'return' 1

PROFILER: 'call' None

PROFILER: 'return' 0

PROFILER: 'return' 1

PROFILER: 'c_call' <built-in method append of list object at 0x7f113d7e0758>

PROFILER: 'c_return' <built-in method append of list object at 0x7f113d7e0758>

PROFILER: 'return' [0, 1, 1]

[0, 1, 1]

PROFILER: 'return' None

PROFILER: 'call' None

PROFILER: 'c_call' <built-in method discard of set object at 0x7f113d818960>

PROFILER: 'c_return' <built-in method discard of set object at 0x7f113d818960>

PROFILER: 'return' None

PROFILER: 'call' None

PROFILER: 'c_call' <built-in method discard of set object at 0x7f113d81d3f0>

PROFILER: 'c_return' <built-in method discard of set object at 0x7f113d81d3f0>

PROFILER: 'return' None

統計式效能分析器以固定的時間間隔對程式計數器(program counter)進行抽樣統計。這樣做可以讓開發者掌握目標程式在每個函式上消耗的時間。由於它對程式計數器進行抽樣,所以資料結果是對真實值的統計近似。不過,這類軟體足以窺見被分析程式的效能細節,查出效能瓶頸之所在。它使用抽樣的方式(用作業系統中斷)，分析的資料更少，對效能造成的影響更小。

Linux統計式效能分析器OProfile(http://oprofile.sourceforge.net/news/)的分析結果:

Function name,File name,Times Encountered,Percentage
"func80000","statistical_profiling.c",30760,48.96%
"func40000","statistical_profiling.c",17515,27.88%
"func20000","static_functions.c",7141,11.37%
"func10000","static_functions.c",3572,5.69%
"func5000","static_functions.c",1787,2.84%
"func2000","static_functions.c",768,1.22%
func1500","statistical_profiling.c",701,1.12%
"func1000","static_functions.c",385,0.61%
"func500","statistical_profiling.c",194,0.31%

Function name,File name,Times Encountered,Percentage

"func80000","statistical_profiling.c",30760,48.96%

"func40000","statistical_profiling.c",17515,27.88%

"func20000","static_functions.c",7141,11.37%

"func10000","static_functions.c",3572,5.69%

"func5000","static_functions.c",1787,2.84%

"func2000","static_functions.c",768,1.22%

func1500","statistical_profiling.c",701,1.12%

"func1000","static_functions.c",385,0.61%

"func500","statistical_profiling.c",194,0.31%

下面我們使用statprof進行分析：

import statprof
def profiler(frame, event, arg):
    print 'PROFILER: %r %r' % (event, arg)
    
#simple (and very ineficient) example of how to calculate the Fibonacci sequence for a number.
def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)
  
def fib_seq(n):
    seq = [ ]
    if n > 0:
        seq.extend(fib_seq(n-1))
    seq.append(fib(n))
    return seq

statprof.start()

try:
    print fib_seq(20)

finally:
    statprof.stop()
statprof.display()

import statprof

def profiler(frame, event, arg):

print 'PROFILER: %r %r' % (event, arg)

#simple (and very ineficient) example of how to calculate the Fibonacci sequence for a number.

def fib(n):

if n == 0:

return 0

elif n == 1:

return 1

else:

return fib(n-1) + fib(n-2)

def fib_seq(n):

seq = [ ]

if n > 0:

seq.extend(fib_seq(n-1))

seq.append(fib(n))

return seq

statprof.start()

try:

print fib_seq(20)

finally:

statprof.stop()

statprof.display()

執行結果：

$ python test.py 
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
  %   cumulative      self          
 time    seconds   seconds  name    
100.00      0.01      0.01  test.py:15:fib
  0.00      0.01      0.00  test.py:21:fib_seq
  0.00      0.01      0.00  test.py:20:fib_seq
  0.00      0.01      0.00  test.py:27:<module>
---
Sample count: 2
Total time: 0.010000 seconds

$ python test.py

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

% cumulative self

time seconds seconds name

100.00 0.01 0.01 test.py:15:fib

0.00 0.01 0.00 test.py:21:fib_seq

0.00 0.01 0.00 test.py:20:fib_seq

0.00 0.01 0.00 test.py:27:<module>

---

Sample count: 2

Total time: 0.010000 seconds

注意上面程式碼我們把計算fib_seq的引數從2改成20，因為執行時間太快的情況下，statprof是獲取不到任何資訊的。

效能分析的重要性

效能分析並不是每個程式都要做的事情,尤其對於那些小軟體來說,是沒多大必要的(不像那些殺手級嵌入式軟體或專門用於演示的效能分析程式)。效能分析需要花時間,而且只有在程式中發現了錯誤的時候才有用。但是,仍然可以在此之前進行效能分析,捕獲潛在的bug,這樣可以節省後期的程式除錯時間。
我們已經擁有測試驅動開發、程式碼審查、結對程式設計,以及其他讓程式碼更加可靠且符合預期的手段，為什麼還需要效能分析？
隨著我們使用的程式語言越來越高階(幾年間我們就從組合語言進化到了JavaScript),我們愈加不關心CPU迴圈週期、記憶體配置、CPU暫存器等底層細節了。新一代程式設計師都通過高階語言學習程式設計技術,因為它們更容易理解而且開箱即用。但它們依然是對硬體和與硬體互動行為的抽象。隨著這種趨勢的增長,新的開發者越來越不會將效能分析作為
軟體開發中的一個步驟了。
如今,隨便開發一個軟體就可以獲得上千使用者。如果通過社交網路一推廣,使用者可能馬上就會呈指數級增長。一旦使用者量激增,程式通常會崩潰,或者變得異常緩慢,最終被客戶無情拋棄。
上面這種情況,顯然可能是由於糟糕的軟體設計和缺乏擴充套件性的架構造成的。畢竟,一臺伺服器有限的記憶體和CPU資源也可能會成為軟體的瓶頸。但是,另一種可能的原因,也是被證明過許多次的原因,就是我們的程式沒有做過壓力測試。我們沒有考慮過資源消耗情況;我們只保證了測試已經通過,而且樂此不疲。

效能分析可以幫助我們避免專案崩潰夭折,因為它可以相當準確地為我們展示程式執行的情況,不論負載情況如何。因此,如果在負載非常低的情況下,通過效能分析發現軟體在I/O操作上消耗了80%的時間,那麼這就給了我們一個提示。是產品負載過重時,記憶體洩漏就可能發生。效能分析可以在負載真的過重之前,為我們提供足夠的證據來發現這類隱患。

效能分析的內容

執行時間

如果你對執行的程式有一些經驗(比如說你是一個網路開發者,正在使用一個網路框架),可能很清楚執行時間是不是太長。例如,一個簡單的網路伺服器查詢資料庫、響應結果、反饋到客戶端,一共需要100毫秒。但是,如果程式執行得很慢,做同樣的事情需要花費60秒,你就得考慮做效能分析了。

import datetime

tstart = None
tend = None

def start_time():
    global tstart
    tstart = datetime.datetime.now()
    
def get_delta():
    global tstart
    tend = datetime.datetime.now()
    return tend - tstart

def fib(n):
    return n if n == 0 or n == 1 else fib(n-1) + fib(n-2)

def fib_seq(n):
    seq = [ ]
    if n > 0:
        seq.extend(fib_seq(n-1))
    seq.append(fib(n))
    return seq

start_time()
print "About to calculate the fibonacci sequence for the number 30"
delta1 = get_delta()

start_time()
seq = fib_seq(30)
delta2 = get_delta()

print "Now we print the numbers: "
start_time()
for n in seq:
    print n
delta3 = get_delta()

print "====== Profiling results ======="
print "Time required to print a simple message: %(delta1)s" % locals()
print "Time required to calculate fibonacci: %(delta2)s" % locals()
print "Time required to iterate and print the numbers: %(delta3)s" %locals()
print "====== ======="

import datetime

tstart = None

tend = None

def start_time():

global tstart

tstart = datetime.datetime.now()

def get_delta():

global tstart

tend = datetime.datetime.now()

return tend - tstart

def fib(n):

return n if n == 0 or n == 1 else fib(n-1) + fib(n-2)

def fib_seq(n):

seq = [ ]

if n > 0:

seq.extend(fib_seq(n-1))

seq.append(fib(n))

return seq

start_time()

print "About to calculate the fibonacci sequence for the number 30"

delta1 = get_delta()

start_time()

seq = fib_seq(30)

delta2 = get_delta()

print "Now we print the numbers: "

start_time()

for n in seq:

print n

delta3 = get_delta()

print "====== Profiling results ======="

print "Time required to print a simple message: %(delta1)s" % locals()

print "Time required to calculate fibonacci: %(delta2)s" % locals()

print "Time required to iterate and print the numbers: %(delta3)s" %locals()

print "====== ======="

執行結果：

$ python test.py 
About to calculate the fibonacci sequence for the number 30
Now we print the numbers: 
0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
10946
17711
28657
46368
75025
121393
196418
317811
514229
832040
====== Profiling results =======
Time required to print a simple message: 0:00:00.000064
Time required to calculate fibonacci: 0:00:01.430740
Time required to iterate and print the numbers: 0:00:00.000075
====== =======

$ python test.py

About to calculate the fibonacci sequence for the number 30

Now we print the numbers:

144

233

377

610

987

1597

2584

4181

6765

10946

17711

28657

46368

75025

121393

196418

317811

514229

832040

====== Profiling results =======

Time required to print a simple message: 0:00:00.000064

Time required to calculate fibonacci: 0:00:01.430740

Time required to iterate and print the numbers: 0:00:00.000075

====== =======

可見計算部分是最消耗時間的。

發現瓶頸

只要你測量出了程式的執行時間,就可以把注意力移到執行慢的環節上做效能分析。一般瓶頸由下面的一種或者幾種原因組成：
* 重的I/O操作,比如讀取和分析大檔案,長時間執行資料庫查詢,呼叫外部服務(比如HTTP請求),等等。
* 現了記憶體洩漏,消耗了所有的記憶體,導致後面的程式沒有記憶體來正常執行。
* 未經優化的程式碼頻繁執行。
* 可以快取時密集的操作沒有快取,佔用了大量資源。

I/O關聯的程式碼(檔案讀/寫、資料庫查詢等)很難優化,因為優化有可能會改變程式執行I/O操作的方式(通常是語言的核心函式操作I/O)。相反,優化計算關聯的程式碼(比如程式使用的演算法很糟糕),改善效能會比較容易(並不一定很簡單)。這是因為優化計算關聯的程式碼就是改寫程式。

記憶體消耗和記憶體洩漏

記憶體消耗不僅僅是關注程式使用了多少記憶體,還應該考慮控制程式使用記憶體的數量。跟蹤程式記憶體的消耗情況比較簡單。最基本的方法就是使用作業系統的工作管理員。它會顯示很多資訊,包括程式佔用的記憶體數量或者佔用總記憶體的百分比。工作管理員也是檢查CPU時間使用情況的好工具。在下面的top截圖中,你會發現一個簡單的Python程式(就是前面那段程式)幾乎佔用了全部CPU(99.8%),記憶體只用了0.1%。

當執行過程啟動之後,記憶體消耗會在一個範圍內不斷增加。如果發現增幅超出範圍,而且消
耗增大之後一直沒有回落,就可以判斷出現記憶體洩漏了。

過早優化的風險

優化通常被認為是一個好習慣。但是,如果一味優化反而違背了軟體的設計原則就不好了。在開始開發一個新軟體時,開發者經常犯的錯誤就是過早優化(permature optimization)。如果過早優化程式碼,結果可能會和原來的程式碼截然不同。它可能只是完整解決方案的一部分,還可能包含因優化驅動的設計決策而導致的錯誤。一條經驗法則是,如果你還沒有對程式碼做過測量(效能分析)
,優化往往不是個好主意。首先,應該集中精力完成程式碼,然後通過效能分析發現真正的效能瓶頸,最後對程式碼進行優化。

執行時間複雜度

執行時間複雜度(Running Time Complexity,RTC)用來對演算法的執行時間進行量化。它是對演算法在一定數量輸入條件下的執行時間進行數學近似的結果。因為是數學近似,所以我們可以用這些數值對演算法進行分類。

RTC常用的表示方法是大O標記(big O notation)。數學上,大O標記用於表示包含無限項的
函式的有限特徵(類似於泰勒展開式)。如果把這個概念用於電腦科學,就可以把演算法的執行
時間描述成漸進的有限特徵(數量級)。

主要模型有：

常數時間——O(1)：比如判斷一個數是奇數還是偶數、用標準輸出方式列印資訊等。對於理論上更復雜的操作,比如在字典(或雜湊表)中查詢一個鍵的值,如果演算法合理,就
可以在常數時間內完成。技術上看,在雜湊表中查詢元素的消耗時間是O(1)平均時間,這意味著每次操作的平均時間(不考慮特殊情況)是固定值O(1)。
線性時間——O(n)：比如查詢無序列表中的最小元素、比較兩個字串、刪除連結串列中的最後一項
對數時間——O(logn)：對數時間(logarithmic time)複雜度的演算法,表示隨著輸入數量的增加,演算法的執行時間會達到固定的上限。隨著輸入數量的增加,對數函式開始增長很快,然後慢慢減速。它不會停止增長,但是越往後增長的速度越慢,甚至可以忽略不計。比如：二分查詢(binary search)、計算斐波那契數列(用矩陣乘法)。
線性對數時間——O(nlogn)：把前面兩種時間型別組合起來就變成了線性對數時間(linearithmic time)。隨著x的增大,演算法的執行時間會快速增長。比如歸併排序(merge sort)、堆排序(heap sort)、快速排序(quick sort,至少是平均執行時間)
階乘時間——O(n!)：階乘時間(factorial time)複雜度的演算法是最差的演算法。其時間增速特別快,圖都很難畫。比如：用暴力破解搜尋方法解貨郎擔問題(遍歷所有可能的路
徑)。
平方時間——O(n 2 )：平方時間是另一個快速增長的時間複雜度。輸入數量越多,需要消耗的時間越長(大多數演算法都是這樣,這類演算法尤其如此)。平方時間複雜度的執行效率比線性時間複雜度要慢。比如氣泡排序(bubble sort)、遍歷二維陣列、插入排序(insertion sort)

速度：對數>線性>線性對數>平方>階乘, 要考慮最好情況、正常情況和最差情況。

效能分析最佳實踐

建立迴歸測試套件、思考程式碼結構、耐心、儘可能多地收集資料(其他資料資源,如網路應用的系統日誌、自定義日誌、系統資源快照(如作業系統工作管理員))、資料預處理、資料視覺化

python中最出名的效能分析庫：cProfile、line_profiler。
前者是標準庫：https://docs.python.org/2/library/profile.html#module-cProfile。
後者參見：https://github.com/rkern/line_profiler。
專注於CPU時間。

Python 效能分析工具簡介
2016-11-21
Python
Python 效能分析入門指南
2014-07-25
Python
基於Python的效能分析
2024-05-18
Python
Python：用pyinstrument做效能分析
2022-03-21
Python
Python呼叫C模組以及效能分析
2016-12-15
Python
Python—Requests庫的爬取效能分析
2018-05-16
Python
Python效能分析與優化（譯者序）
2016-06-28
Python優化
效能分析
2016-09-15
Python：對程式做效能分析及計時統計
2022-11-27
Python
CPU效能分析
2019-01-30
效能分析大全
2019-01-03
redis 效能分析
2020-12-07
Redis
效能分析SQL
2009-02-27
SQL
Java 效能分析
2024-09-30
Java
PHP 效能分析與實驗：效能的微觀分析
2015-09-13
PHP
PHP 效能分析與實驗：效能的巨集觀分析
2015-08-21
PHP
python效能優化之函式執行時間分析
2019-01-13
Python優化函式
使用python進行Oracle資料庫效能趨勢分析
2018-06-14
PythonOracle資料庫
如何進行 Python效能分析，你才能如魚得水？
2016-05-16
Python
前端效能優化 —— 前端效能分析
2018-01-11
前端優化
In和exists使用及效能分析(三)：in和exists的效能分析
2019-07-03
效能最快的程式碼分析工具，Ruff 正在席捲 Python 圈！
2023-04-09
Python
IO效能探索分析
2019-05-10
MySQL SQL效能分析
2021-09-09
MySql
效能分析命令：vmstat
2018-11-08
MongoDB索引，效能分析
2018-09-18
MongoDB索引
Perfview 分析程式效能
2020-11-24
View
效能分析工具 - pprof
2023-01-03
MySQL索引效能分析
2021-01-21
MySql索引
golang slice效能分析
2018-01-26
Golang
iOS APP效能分析
2018-05-16
iOSAPP
linux效能分析
2006-04-21
Linux
Unity效能分析（二）CPU/GPU分析
2024-04-30
UnityGPU
Unity效能分析（三）記憶體分析
2024-04-30
Unity記憶體
PHP 效能分析（三）: 效能調優實戰
2015-10-29
PHP
CPU效能分析工具原理
2020-06-15
Go 語言效能分析
2019-11-14
Go
H5效能分析
2021-05-21
H5

python效能分析

什麼是效能分析

效能分析的重要性

效能分析的內容

記憶體消耗和記憶體洩漏

過早優化的風險

執行時間複雜度

效能分析最佳實踐

相關文章