引言
總有小夥伴問到阿里的無線表格識別模型是如何轉換為ONNX格式的。這個說來有些慚愧,現有的ONNX模型是很久之前轉換的了,轉換環境已經丟失,且沒有做任何筆記。
今天下定決心再次嘗試轉換,慶幸的是轉換成功了。於是有了轉換筆記:ConvertLOREToONNX。
這次吸取教訓,環境檔案採用Anaconda匯出的,更加詳細記錄當前轉換環境。以下是轉換倉庫的README,感興趣小夥伴可以點選文末的“閱讀原文”跳轉到轉換倉庫嘗試。
1. Clone the source code.
git clone https://github.com/SWHL/ConvertLaTeXOCRToONNX.git
2. Install env.
conda install --yes --file requirements.txt
3. Run the demo, and the converted model is located in the moodels
directory.
python main.py
4. Install lineless_table_rec
pip install lineless_table_rec
5. Use
from pathlib import Path
from lineless_table_rec import LinelessTableRecognition
detect_path = "models/lore_detect.onnx"
process_path = "models/lore_process.onnx"
engine = LinelessTableRecognition(
detect_model_path=detect_path, process_model_path=process_path
)
img_path = "images/lineless_table_recognition.jpg"
table_str, elapse = engine(img_path)
print(table_str)
print(elapse)
with open(f"{Path(img_path).stem}.html", "w", encoding="utf-8") as f:
f.write(table_str)
print("ok")