Python加速運算——"-O最佳化"和Cython

RakanLiu發表於2024-10-20

原文網址 : https://www.cnblogs.com/RakanLiu/p/18487177

1. 以 `release`模式執行Python

python -O process_file.py

可以在程式碼中加入以下命令，判斷是否為release模式：

if __debug__:
    print("Debug mode")
else:
    print("Release mode")

2.使用Cython

下載Cython：

pip install cython

編寫pyx檔案，即要編譯的Python程式碼：

為了後面方便呼叫，你可以把需要執行的函式放到一個函式中，例如我放到了main()函式中

# process_file.pyx

# python -O process_file.py
import pandas as pd
from tqdm import tqdm

def clean_str(input:str)->str:
    # u"\u3000": 全形空格
    # u"\xa0": #nbsp
    # output = input.strip()\
    #                   .replace('"', '')\
    #                   .replace(u"\u3000", "")\
    #                   .replace(u"\xa0", "")\
    #                   .replace("【", "")\
    #                   .replace("】", "")\
    #                   .replace(" ", "")
    output = input.strip()\
                      .replace(u"\u3000", " ")\
                      .replace(u"\xa0", " ")\
                      .replace("【", "[")\
                      .replace("】", "]")
    return output

def main():

  file_in = "ownthink_v2\ownthink_v2.csv"
  file_out = "ownthink_v2\ownthink_v2_cleaned.csv"
  file_out_2 = "ownthink_v2\ownthink_v2_cleaned_rfiltered.csv"

  chunk_size = 10000


  # 逐塊讀取CSV檔案  
  data_all = pd.read_csv(file_in, chunksize=chunk_size)# 139951300


  # 進行資料清洗
  lc = 0  # 計數
  head_flag = True

  for data_chunk in tqdm(data_all, total=13996):
    # 刪除含有 NAN 的行 和 空行 
    data_chunk = data_chunk.dropna()
    # column_names_list = data_chunk.columns.tolist()
    for index, row in data_chunk.iterrows():
      # 實體，屬性，值
      entity = row["實體"]
      attribution = row["屬性"]
      value = row["值"]

      if entity == value:
        # 過濾掉 實體 和 值 相等的情況（比如 “英雄聯盟 中文名 英雄聯盟”）
        data_chunk = data_chunk.drop(index=index, axis="rows")
        continue
    
      # line =  entity + attribution + value 
      # if "歧義關係" in line or "歧義權重" in line:
      #   data_chunk = data_chunk.drop(index=index, axis="rows")
      #   print(line)
      #   continue

      # 進行清理，並賦值給 data_chunk
      row["實體"] = clean_str(entity)
      row["屬性"] = clean_str(attribution)
      row["值"] = clean_str(value)

      lc += 1

    # 寫入檔案
    # mode = 'a'為追加資料，index為每行的索引序號，header為標題
    if head_flag:
      data_chunk.to_csv(file_out, mode='w', index=False, header=True, encoding="utf-8")
      head_flag = False
    else:
      data_chunk.to_csv(file_out, mode='a', index=False, header=False, encoding="utf-8")

    # if lc > 10000:
    #   break

  print(lc)

編寫setup.py檔案，使得 Cython 可以將我們的 Python 程式碼編譯成 C 程式碼：

# setup.py
from setuptools import setup
from Cython.Build import cythonize


setup(
    ext_modules = cythonize('process_file.pyx')
)

接著，執行命令：

python setup.py build_ext --inplace

這樣會生成build資料夾，.cpp檔案，.pyd檔案，其中，build資料夾和 .pyd檔案是對你有用的；

你可以在Python程式碼中呼叫編譯好的cython檔案：

from process_file import main

main()

什麼是Cython?和Python有什麼關係？
2021-01-26
Python
【Python】透過Cython提升效能
2024-07-18
Python
如何將Python自然語言處理速度提升100倍：用spaCy/Cython加速NLP
2018-07-13
Python自然語言處理
cython和python分別是什麼？區別有哪些？
2022-07-22
Python
python3 中 and 和 or 運算規律
2018-06-26
Python
Python的運算物件、運算子、表示式和語句
2018-12-22
Python物件
使用cython擴充套件python庫
2024-11-27
套件Python
用位運算為你的程式加速
2022-07-31
python資料型別和四則運算
2018-10-11
Python資料型別
4、python基礎運算和流程控制
2018-09-16
Python
Python 影像處理 OpenCV （12）： Roberts 運算元、 Prewitt 運算元、 Sobel 運算元和 Laplacian 運算元邊緣檢測技術
2020-06-29
PythonOpenCV
Python學習-算術運算子,賦值運算子和複合運算子
2018-10-20
Python賦值
Python—三目運算
2019-08-01
Python
python運算元據
2024-08-17
Python
Python邏輯運算
2024-06-05
Python
Python數學運算
2024-06-05
Python
[Python影象處理] 十.形態學之影象頂帽運算和黑帽運算
2018-11-04
Python
Python從零到壹丨影像增強的頂帽運算和底帽運算
2023-05-18
Python
Python--I/O格式化與運算子
2021-03-18
Python
Cython加密python程式碼防止反編譯
2023-10-12
加密Python編譯
【廖雪峰python入門筆記】布林運算和短路計算
2018-07-05
Python筆記
深度學習運算元最佳化-FFT
2021-09-09
深度學習FFT
使用 Auto-scheduling 最佳化運算元
2023-02-20
【TVM 學習資料】用 Schedule 模板和 AutoTVM 最佳化運算元
2023-02-13
影像處理領域的加速運算元收集
2024-10-28
小白必看的python中的Bool運算和真假值
2021-09-11
Python
Python基礎運算分享
2023-09-14
Python
[Python影象處理] 九.形態學之影象開運算、閉運算、梯度運算
2018-11-02
Python梯度
Python 影像處理 OpenCV （13）： Scharr 運算元和 LOG 運算元邊緣檢測技術
2020-07-11
PythonOpenCV
openGauss都做了哪些運算元最佳化工作？
2023-04-13
cython 編譯python程式碼時候找不到Python.h
2024-05-21
編譯Python
高效能運算-粒子狀態模擬計算最佳化
2024-12-10
cython 筆記
2024-06-26
筆記
Python運算元據庫（3）
2019-01-09
Python
基於python的集合運算
2018-10-02
Python
向量化實現矩陣運算最佳化(一)
2023-09-28
矩陣
計算機I/O與I/O模型
2019-05-10
計算機模型
Python基礎學習篇-2-數值運算和字串
2019-08-26
Python字串

Python加速運算——"-O最佳化"和Cython

1. 以 release模式執行Python

2.使用Cython

相關文章

1. 以 `release`模式執行Python