無線表格識別模型LORE轉換庫:ConvertLOREToONNX

Danno發表於2024-03-10

引言

總有小夥伴問到阿里的無線表格識別模型是如何轉換為ONNX格式的。這個說來有些慚愧,現有的ONNX模型是很久之前轉換的了,轉換環境已經丟失,且沒有做任何筆記。

今天下定決心再次嘗試轉換,慶幸的是轉換成功了。於是有了轉換筆記:ConvertLOREToONNX

這次吸取教訓,環境檔案採用Anaconda匯出的,更加詳細記錄當前轉換環境。以下是轉換倉庫的README,感興趣小夥伴可以點選文末的“閱讀原文”跳轉到轉換倉庫嘗試。

1. Clone the source code.

git clone https://github.com/SWHL/ConvertLaTeXOCRToONNX.git

2. Install env.

conda install --yes --file requirements.txt

3. Run the demo, and the converted model is located in the moodels directory.

python main.py

4. Install lineless_table_rec

pip install lineless_table_rec

5. Use

from pathlib import Path

from lineless_table_rec import LinelessTableRecognition

detect_path = "models/lore_detect.onnx"
process_path = "models/lore_process.onnx"
engine = LinelessTableRecognition(
    detect_model_path=detect_path, process_model_path=process_path
)

img_path = "images/lineless_table_recognition.jpg"
table_str, elapse = engine(img_path)

print(table_str)
print(elapse)

with open(f"{Path(img_path).stem}.html", "w", encoding="utf-8") as f:
    f.write(table_str)

print("ok")

相關文章