【實測】Python 和 C++ 下字串查詢的速度對比

小葉Little_Ye發表於2022-03-07

原文網址 : https://www.cnblogs.com/littleye233/p/15977014.html

完整格式連結：https://blog.imakiseki.cf/2022/03/07/techdev/python-cpp-string-find-perf-test/

背景

最近在備戰一場演算法競賽，語言誤選了 Python ，無奈只能著手對常見場景進行語言遷移。而字串查詢的場景在演算法競賽中時有出現。本文即對此場景在 Python 和競賽常用語言 C++ 下的速度進行對比，並提供相關引數和執行結果供他人蔘考。

引數

硬體和作業系統

                   -`                    root@<hostname>
                  .o+`                   ------------
                 `ooo/                   OS: Arch Linux ARM aarch64
                `+oooo:                  Host: Raspberry Pi 4 Model B
               `+oooooo:                 Kernel: 5.16.12-1-aarch64-ARCH
               -+oooooo+:                Uptime: 3 hours, 32 mins
             `/:-:++oooo+:               Packages: 378 (pacman)
            `/++++/+++++++:              Shell: zsh 5.8.1
           `/++++++++++++++:             Terminal: /dev/pts/0
          `/+++ooooooooooooo/`           CPU: (4) @ 1.500GHz
         ./ooosssso++osssssso+`          Memory: 102MiB / 7797MiB
        .oossssso-````/ossssss+`
       -osssssso.      :ssssssso.
      :osssssss/        osssso+++.
     /ossssssss/        +ssssooo/-
   `/ossssso+/:-        -:/+osssso+-
  `+sso+:-`                 `.-/+oso:
 `++:.                           `-/+/
 .`                                 `/

編譯環境和解釋環境

Python
- 直譯器：Python 3.10.2 (main, Jan 23 2022, 21:20:14) [GCC 10.2.0] on linux
- 互動環境：IPython 8.0.1
C++
- 編譯器：g++ (GCC) 11.2.0
- 編譯命令：g++ test.cpp -Wall -O2 -g -std=c++11 -o test

場景

本次實測設定兩個場景：場景 1 的源串字元分佈使用偽隨機數生成器生成，表示字串查詢的平均情況；場景 2 的源串可連續分割成 20,000 個長度為 50 的字元片段，其中第 15,001 個即為模式串，形如“ab…b”（1 個“a”，49 個 “b”），其餘的字元片段形如“ab…c”（1 個“a”，48 個“b”，1 個“c”）。

專案	場景 1：平均情況	場景 2：較壞情況
字符集	小寫字母	`abc`
字元分佈	`random.choice`	有較強規律性
源串長度	1,000,000	1,000,000
模式串長度	1,000	50
模式串出現位置	250,000、500,000、750,000	750,000
模式串出現次數	1	1

測試方法

本次實測中，Python 語言使用內建型別 str 的 .find() 成員函式，C++ 語言分別使用 string 類的 .find() 成員函式、strstr 標準庫函式和使用者實現的 KMP 演算法。

測試物件	核心程式碼
Python	`src.find(pat)`
C++ - `test.cpp`	`src.find(pat)`
C++ - `test_strstr.cpp`	`strstr(src, pat)`
C++ - `test_kmp.cpp`	`KMP(src, pat)`

原始碼

生成源串和模式串

import random

# 場景 1：
# 源串
s = "".join(chr(random.choice(range(ord("a"), ord("z") + 1))) for _ in range(1000000))
# 模式串列表，三個元素各對應一個模式串
p = [s[250000:251000], s[500000:501000], s[750000:751000]]

# 場景 2：
# 模式串
p = 'a' + 'b' * 49
# 其他字元片段
_s = "a" + "b" * 48 + "c"
# 源串
s = _s * 15000 + p + _s * 4999

# 儲存到檔案，便於 C++ 程式獲取
with open('source.in', 'w') as f:
    f.write(s)
with open('pattern.in', 'w') as f:
    f.write(p[0])

測試程式碼

Python

In []: %timeit s.find(p[0])

C++ - `test.cpp`

#include <chrono>
#include <iostream>
#include <cstring>
#include <fstream>
#define LOOP_COUNT (1000)
using namespace std;
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::duration;
using std::chrono::milliseconds;

double test(string s, string p, size_t* pos_ptr) {
    auto t1 = high_resolution_clock::now();
    *pos_ptr = s.find(p);
    auto t2 = high_resolution_clock::now();
    duration<double, milli> ms_double = t2 - t1;
    return ms_double.count();
}

int main() {
    string s, p;
    size_t pos;
    ifstream srcfile("source.in");
    ifstream patfile("pattern.in");
    srcfile >> s;
    patfile >> p;

    double tot_time = 0;
    for (int i = 0; i < LOOP_COUNT; ++i) {
        tot_time += test(s, p, &pos);
    }

    cout << "Loop count:            " << LOOP_COUNT << endl;
    cout << "Source string length:  " << s.length() << endl;
    cout << "Pattern string length: " << p.length() << endl;
    cout << "Search result:         " << pos << endl;
    cout << "Time:                  " << tot_time / LOOP_COUNT << " ms" << endl;

    return 0;
}

C++ - `test_strstr.cpp`

#include <chrono>
#include <iostream>
#include <cstring>
#include <fstream>
#define LOOP_COUNT (1000)
using namespace std;
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::duration;
using std::chrono::milliseconds;
char s[1000005], p[1005], *pos=NULL;

double test(char* s, char* p, char** pos_ptr) {
    auto t1 = high_resolution_clock::now();
    *pos_ptr = strstr(s, p);
    auto t2 = high_resolution_clock::now();
    duration<double, milli> ms_double = t2 - t1;
    return ms_double.count();
}

int main() {
    ifstream srcfile("source.in");
    ifstream patfile("pattern.in");
    srcfile >> s;
    patfile >> p;

    double tot_time = 0;
    for (int i = 0; i < LOOP_COUNT; ++i) {
        tot_time += test(s, p, &pos);
    }

    cout << "Loop count:            " << LOOP_COUNT << endl;
    cout << "Source string length:  " << strlen(s) << endl;
    cout << "Pattern string length: " << strlen(p) << endl;
    cout << "Search result:         " << pos - s << endl;
    cout << "Time:                  " << tot_time / LOOP_COUNT << " ms" << endl;

    return 0;
}

C++ - `test_kmp.cpp`

#include <chrono>
#include <iostream>
#include <cstring>
#include <fstream>
#include <cstdlib>
#define LOOP_COUNT (1000)
using namespace std;
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::duration;
using std::chrono::milliseconds;
int dp[1005];

int KMP(string s, string p) {
    int m = s.length(), n = p.length();
    if (n == 0) return 0;
    if (m < n) return -1;
    memset(dp, 0, sizeof(int) * (n+1));
    for (int i = 1; i < n; ++i) {
        int j = dp[i+1];
        while (j > 0 && p[j] != p[i]) j = dp[j];
        if (j > 0 || p[j] == p[i]) dp[i+1] = j + 1;
    }
    for (int i = 0, j = 0; i < m; ++i)
        if (s[i] == p[j]) { if (++j == n) return i - j + 1; }
        else if (j > 0) {
            j = dp[j];
            --i;
        }
    return -1;
}

double test(string s, string p, int* pos_ptr) {
    auto t1 = high_resolution_clock::now();
    *pos_ptr = KMP(s, p);
    auto t2 = high_resolution_clock::now();
    duration<double, milli> ms_double = t2 - t1;
    return ms_double.count();
}

int main() {
    string s, p;
    int pos;
    ifstream srcfile("source.in");
    ifstream patfile("pattern.in");
    srcfile >> s;
    patfile >> p;

    double tot_time = 0;
    for (int i = 0; i < LOOP_COUNT; ++i) {
        tot_time += test(s, p, &pos);
    }

    cout << "Loop count:            " << LOOP_COUNT << endl;
    cout << "Source string length:  " << s.length() << endl;
    cout << "Pattern string length: " << p.length() << endl;
    cout << "Search result:         " << pos << endl;
    cout << "Time:                  " << tot_time / LOOP_COUNT << " ms" << endl;

    return 0;
}

結果

IPython 的 %timeit 魔法命令可以輸出程式碼多次執行的平均時間和標準差，在此取平均時間。C++ 的程式碼對每個模式串固定執行 1,000 次後取平均時間。

以下時間若無特別說明，均以微秒為單位，保留到整數位。

場景	模式串出現位置	Python	C++ - `test.cpp`	C++ - `test_strstr.cpp`	C++ - `test_kmp.cpp`
場景 1	250,000	105	523	155	2564
場景 1	500,000	183	1053	274	3711
場景 1	750,000	291	1589	447	4900
場景 2	750,000	2630*	618	353	3565

* 原輸出為“2.63 ms”。IPython 的 %timeit 輸出的均值保留 3 位有效數字，由於此時間已超過 1 毫秒，微秒位被捨棄。此處仍以微秒作單位，數值記為“2630”。

侷限性

本次實測時使用的裝置硬體上劣於演算法競賽中的標準配置機器，實測結果中的“絕對數值”參考性較低。

總結

根據上表中的結果，在給定環境和相關引數條件下，場景 1 中 Python 的執行時間大約為 C++ 中 string::find 的五分之一，與 std:strstr 接近；而在場景 2 中 Python 的執行時間明顯增長，但 C++ 的前兩種測試方法的執行時間與先前接近甚至更短。四次測試中，C++ 的使用者實現的 KMP 演算法執行時間均較長，長於同條件下 Python 的情況。

Python 中的內建型別 str 的快速查詢（.find()）和計數（.count()）演算法基於 Boyer-Moore 演算法和 Horspool 演算法的混合，其中後者是前者的簡化，而前者與 Knuth-Morris-Pratt 演算法有關。

有關 C++ 的 string::find 比 std::strstr 執行時間長的相關情況，參見 Bug 66414 - string::find ten times slower than strstr。

值得關注的是：C++ 中自行實現的 KMP 演算法的執行時間竟然遠長於 C++ 標準庫甚至 Python 中的演算法。這也類似於常說的“自己設計彙編程式碼執行效率低於編譯器”的情況。Stack Overflow 的一個問題 strstr faster than algorithms? 下有人回答如下：

Why do you think strstr should be slower than all the others? Do you know what algorithm strstr uses? I think it's quite likely that strstr uses a fine-tuned, processor-specific, assembly-coded algorithm of the KMP type or better. In which case you don't stand a chance of out-performing it in C for such small benchmarks.

KMP 演算法並非是所有線性複雜度演算法中最快的。在不同的環境（軟硬體、測試資料等）下，KMP 與其變種乃至其他線性複雜度演算法，孰優孰劣都無法判斷。編譯器在設計時考慮到諸多可能的因素，儘可能使不同環境下都能有相對較優的策略來得到結果。因而，在保證結果正確的情況下，與其根據演算法原理自行編寫，不如直接使用標準庫中提供的函式。

同時本次實測也在執行時間角度再次印證 Python 並不適合在演算法競賽中取得高成績的說法。

參考

c++map 查詢元素和list查詢元素速度對比
2024-10-11
C++
c++字串查詢函式實現
2018-11-20
C++字串函式
Python字串string的查詢和替換
2018-10-04
Python字串
TDengine 和 InfluxDB 查詢效能對比測試報告
2022-03-29
UX測試報告
ElasticSearch第4篇（億級中文資料量 ElasticSearch與Sphinx建索引速度、查詢速度、併發效能、實測對比）
2024-07-28
Elasticsearch索引
linux下查詢字串
2020-04-04
Linux字串
想問一下如何測百度的查詢速度
2020-11-19
Java Go python 執行速度對比
2020-02-12
JavaGoPython
Go 與 C++ 的對比和比較
2021-07-12
GoC++
Python查詢包含指定字串的所有Office文件
2019-01-26
Python字串
Python查詢包含指定字串的所有檔案
2019-01-22
Python字串
Python 和 c++/c/java 對於負數的儲存方式對比
2020-04-24
PythonC++Java
MySQL中使用or、in與union all在查詢命令下的效率對比
2021-09-09
MySql
折半查詢（C++實現）
2020-12-07
C++
python怎麼查詢字串中是否包含某個字串
2021-09-11
Python字串
Linuxvivim查詢和替換字串命令
2018-04-12
Linux字串
Python 和 Ruby 的對比
2018-04-11
Python
字串查詢（字串雜湊）
2020-11-10
字串
Python—Django：關於在Django框架中對資料庫的查詢函式，查詢集和關聯查詢
2020-10-31
PythonDjango框架資料庫函式
python 程式碼實現查詢功能介面測試
2020-12-02
Python
md5碼查詢對比工具
2020-12-04
Python中查詢字串某個字元最常用的方法！
2024-01-11
Python字串字元
PolarDB-X 1.0和RDS效能對比之複雜查詢
2021-01-19
SSH：hiberate實現資料的查詢（單查詢和全查詢）
2019-01-01
BST查詢結構與折半查詢方法的實現與實驗比較
2023-01-05
優化sql查詢速度
2020-10-25
優化SQL
關於c++ STL map 和 unordered_map 的效率的對比測試
2021-08-12
C++
Python和Java、PHP、C、C#、C++等其他語言的對比？
2020-11-10
PythonJavaPHPC#C++
MySQL 查詢字串的個數
2018-08-01
MySql字串
C++,Java,Python,Javascript實現二分查詢演算法
2024-11-24
C++PythonJavaScript演算法
python主流框架測試對比
2024-09-13
Python框架
C++中單例模式和static的對比
2024-11-26
C++單例模式
對比SQL中簡單巢狀查詢與非巢狀查詢CF
2022-03-21
SQL巢狀
SQL查詢的：子查詢和多表查詢
2020-11-18
SQL
關係型資料庫查詢語言 SQL 和圖資料庫查詢語言 nGQL 對比
2020-07-23
資料庫SQL
C#學習筆記（與Java、C、C++和Python對比）
2019-02-16
C#筆記JavaC++Python
aes和sm4對128bit資料加密的速度對比
2020-10-04
加密
對於過長字串的大小比對
2018-06-26
字串