使用DeepKE訓練命名實體識別模型DEMO（官方DEMO）

shizidushu發表於2024-10-10

原文網址 : https://www.cnblogs.com/shizidushu/p/18456727

使用DeepKE訓練命名實體識別模型DEMO（官方DEMO）

說明：

首次發表日期：2024-10-10
DeepKE資源：
- 文件： https://www.zjukg.org/DeepKE/
- 網站： http://deepke.zjukg.cn/
- cnschema： http://cnschema.openkg.cn/

如果需要，設定Github映象

git config --system url."https://githubfast.com/".insteadOf https://github.com/

如果要取消，則輸入：
git config --system --unset url.https://githubfast.com/.insteadof

建立conda環境

conda create -n deepke python=3.8
conda activate deepke

# 安裝torch
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113

# 使用阿里雲映象安裝torch 1.11.0
# pip install https://mirrors.aliyun.com/pytorch-wheels/cu113/torch-1.11.0+cu113-cp38-cp38-linux_x86_64.whl https://mirrors.aliyun.com/pytorch-wheels/cu113/torchvision-0.12.0+cu113-cp38-cp38-linux_x86_64.whl https://mirrors.aliyun.com/pytorch-wheels/cu113/torchaudio-0.11.0+cu113-cp38-cp38-linux_x86_64.whl -i https://mirrors.aliyun.com/pypi/simple/

安裝DeepKE：

git clone https://github.com/zjunlp/DeepKE.git
cd DeepKE

pip install pip==24.0

pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
python setup.py install
python setup.py develop

pip install prettytable==2.4.0
pip install ipython==8.12.0

下載資料集

# apt-get install wget
cd example/ner/standard
wget 120.27.214.45/Data/ner/standard/data.tar.gz
tar -xzvf data.tar.gz

可以看到data資料夾下有：

train.txt: Training set
valid.txt : Validation set
test.txt: Test set

配置wandb

在 https://wandb.ai/ 上註冊賬號，並新建一個project，取一個名字，比如：deepke-ner-official-demo

開啟 https://wandb.ai/authorize 獲取 API key

執行 wandb init，輸入剛獲取的 API key 和建立的project

執行訓練和預測

刪除之前訓練時儲存的checkpoints和logs資料夾（如果有）：

rm -r checkpoints/
rm -r logs/

lstmcrf

開啟 example/ner/standard/run_lstmcrf.py，確保wandb和yaml庫有正常匯入：

import wandb
import yaml

修改wandb的project名稱：

if config['use_wandb']:
    wandb.init(project="deepke-ner-official-demo")

修改 example/ner/standard/conf/config.yaml 中的 use_wandb 為 True。

如果需要使用多個GPU訓練，修改 example/ner/standard/conf/train.yaml 中的 use_multi_gpu 為 True

開始訓練：

python run_lstmcrf.py

>> total: 109870 loss: 27.181508426008552
              precision    recall  f1-score   support

       B-LOC     0.8920    0.8426    0.8666      1951
       B-ORG     0.8170    0.7439    0.7787       984
       B-PER     0.8783    0.8167    0.8464       884
       I-LOC     0.8650    0.8264    0.8453      2581
       I-ORG     0.8483    0.8365    0.8424      3945
       I-PER     0.8860    0.8436    0.8643      1714
           O     0.9861    0.9912    0.9886     97811

    accuracy                         0.9732    109870
   macro avg     0.8818    0.8430    0.8618    109870
weighted avg     0.9727    0.9732    0.9729    109870

用於的預測文字儲存在example/ner/standard/conf/predict.yaml中，修改為如下：

text: "“熱水器等以舊換新，節省了2000多元。”10月3日，在湖北省襄陽市的一家購物廣場，市民金煜輕觸手機，下單、付款、登記。湖北著力推動大規模裝置更新和消費品以舊換新。“力爭到今年底，全省汽車報廢更新、置換更新分別達到4.5萬輛、12.5萬輛，家電以舊換新170萬套。”湖北省商務廳廳長龍小紅介紹。"

執行預測：

python predict.py

NER結果:

[('湖', 'B-LOC'), ('北', 'I-LOC'), ('省', 'I-LOC'), ('襄', 'B-LOC'), ('陽', 'I-LOC'), ('市', 'I-LOC'), ('場', 'I-LOC'), ('煜', 'I-PER'), ('湖', 'B-ORG'), ('北', 'I-ORG'), ('省', 'I-ORG'), ('商', 'I-ORG'), ('務', 'I-ORG'), ('廳', 'I-ORG'), ('廳', 'I-ORG'), ('龍', 'B-PER'), ('小', 'I-PER'), ('紅', 'I-PER')]

bert

修改 example/ner/standard/conf/config.yaml中的hydra/model為bert。

bert的超參設定在 example/ner/standard/conf/hydra/model/bert.yaml，如有需要可以修改。

修改 example/ner/standard/conf/config.yaml 中的 use_wandb 為 True。

修改 example/ner/standard/run_bert.py 中的wandb的project名稱：

    if cfg.use_wandb:
        wandb.init(project="deepke-ner-official-demo")

根據需要，修改example/ner/standard/conf/train.yaml中的train_batch_size，對於bert來說推薦不小於64

開始訓練：

export HF_ENDPOINT=https://hf-mirror.com
python run_bert.py

w2ner

w2ner是一個新的SOTA模型。

基於W2NER (AAAI’22)的應對多種場景的實體識別方法 (詳情請查閱論文Unified Named Entity Recognition as Word-Word Relation Classification).

命名實體識別 (NER) 涉及三種主要型別，包括平面、重疊（又名巢狀）和不連續的 NER，它們大多是單獨研究的。最近，人們對統一 NER 越來越感興趣， W2NER使用一個模型同時處理上述三項工作。

由於使用單卡GPU，修改example/ner/standard/w2ner/conf/train.yaml中的 device 為 0。

修改example/ner/standard/w2ner/conf/train.yaml中的data_dir和do_train：

data_dir: "../data"
do_train: True

以便使用之前下載的資料集和開始訓練。

執行訓練：

python run.py

虹軟人臉識別——官方 Qt Demo 移植到 Linux
2020-06-23
QTLinux
Yolov8訓練識別模型
2024-03-29
YOLO模型
實驗12-使用keras預訓練模型完成貓狗識別
2024-04-27
Keras模型
海南話語音識別模型——模型訓練(一)
2024-11-02
模型
微調大型語言模型進行命名實體識別
2024-03-17
模型
Swift keychain 官方封裝Demo
2018-03-07
SwiftAI封裝
Android 指紋識別，指紋支付demo
2018-08-30
Android
使用Red5-Pro Android官方Demo拆解分析（一）
2020-07-16
Android
swoole 的練習 demo（1）
2022-08-31
swoole 的練習 demo（2）
2022-08-31
HanLP-命名實體識別總結
2019-07-31
HanLP
「NLP-NER」如何使用BERT來做命名實體識別
2019-09-29
最新人臉識別demo開發經驗
2019-01-30
使用人工神經網路訓練手寫數字識別模型
2023-10-09
神經網路模型
貓狗識別訓練
2020-12-01
訓練一個影像分類器demo in PyTorch【學習筆記】
2022-06-30
PyTorch筆記
python呼叫hanlp進行命名實體識別
2019-07-15
PythonHanLP
Android Firebase接入（序）--Firebase簡介以及Firebase官方Demo的使用
2019-02-15
Android
JavaCPP快速入門(官方demo增強版)
2021-10-22
Java
條件隨機場實現命名實體識別
2019-03-02
條件隨機場
PaddleOCR手寫文字識別模型訓練（摘抄所得，非原創）
2024-03-14
模型
ImageAI實現完整的流程：資料集構建、模型訓練、識別預測
2019-08-28
AI模型
在Mac系統執行dlib人臉識別Demo
2018-05-21
Mac
python3+arcface2.0 離線人臉識別 demo
2019-01-31
Python
「NLP-NER」命名實體識別中最常用的兩種深度學習模型
2019-09-26
深度學習模型
BERT微調進行命名實體識別並將模型儲存為pb形式
2020-11-21
模型
用飛槳做命名實體識別，手把手教你實現經典模型 BiGRU + CRF
2019-09-23
模型CRF
初識NIO之Java小Demo
2018-08-21
Java
利用PyTorch訓練模型識別數字+英文圖片驗證碼
2024-04-15
PyTorch模型
C# 離線人臉識別 ArcSoft V2.0 Demo
2019-01-29
C#
Java 基於ArcFace人臉識別2.0 服務端Demo
2019-01-31
Java服務端
C# 離線人臉識別虹軟ArcFace 2.0 demo
2019-01-30
C#
HelloWorld：通過demo，構建黑盒模型
2022-03-12
模型
demo
2024-07-01
swoole 的練習 demo（5）- 加入心跳功能
2022-08-31
快遞物流單號識別查詢api介面呼叫對接demo使用方法
2019-09-09
API
6-3使用GPU訓練模型
2024-08-04
GPU模型
PyTorch 模型訓練實⽤教程（程式碼訓練步驟講解）
2020-09-25
PyTorch模型

使用DeepKE訓練命名實體識別模型DEMO（官方DEMO）

使用DeepKE訓練命名實體識別模型DEMO（官方DEMO）

說明：

如果需要，設定Github映象

建立conda環境

下載資料集

配置wandb

執行訓練和預測

lstmcrf

bert

w2ner

相關文章