【論文筆記】A Survey on Deep Learning for Named Entity Recognition

本筆記理出來綜述中的點，並將大體的論文都列出，方便日後調研使用查詢，詳細可以看論文。

神經網路的解釋：

The forward pass com- putes a weighted sum of their inputs from the previous layer and pass the result through a non-linear function. The backward pass is to compute the gradient of an objective function with respect to the weights of a multilayer stack of modules via the chain rule of derivatives.

很簡潔

優點：表示學習、可以學習到語義

The key advantage of deep learning is the capability of representation learning and the semantic composition empowered by both the vector representation and neural processing. This allows a machine to be fed with raw data and to automatically discover latent representations and processing needed for classification or detection

手工特徵

D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvist. Investig., vol. 30, no. 1, pp. 3–26, 2007.
M. L. Patawar and M. Potey, “Approaches to named entity recognition: a survey,” Int. J. Innov. Res. Comput. Commun. Eng., vol. 3, no. 12, pp. 12 201–12 208, 2015.

domain-specific gazetteers

O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates, “Unsupervised named- entity extraction from the web: An experimental study,” Artif. Intell., vol. 165, no. 1, pp. 91–134, 2005.
S. Sekine and C. Nobata, “Definition, dictionaries and tagger for extended named entity hierarchy.” in LREC, 2004, pp. 1977–1980.

syntactic-lexical patterns

S. Zhang and N. Elhadad, “Unsupervised biomedical named en- tity recognition: Experiments with clinical and biological texts,” J. Biomed. Inform., vol. 46, no. 6, pp. 1088–1098, 2013.

biomedical domain

D. Hanisch, K. Fundel, H.-T. Mevissen, R. Zimmer, and J. Fluck, “Prominer: rule-based protein and gene entity recognition,” BMC Bioinform., vol. 6, no. 1, p. S14, 2005.
A. P. Quimbaya, A. S. Múnera, R. A. G. Rivera, J. C. D. Rodríguez, O. M. M. Velandia, A. A. G. Peña, and C. Labbé, “Named entity recognition over electronic health records through a combined dictionary-based approach,” Procedia Comput. Sci., vol. 100, pp. 55–61, 2016.

character-level representation

is that it naturally handles out-of-vocabulary. Thus character-based model is able to in- fer representations for unseen words and share information of morpheme-level regularities.

Hybrid Representation

words, POS tags, chunking, and word shape features
spelling features, context features, word embeddings, and gazetteer features.
additional word-level features (capitalization, lexicons) and character-level features (4-dimensional vector repre- senting the type of a character: upper case, lower case, punctuation, other)
5-dimensional word shape vector (e.g., all capitalized, not capitalized, first-letter capitalized or contains a capital letter)

Word-level features

G. Zhou and J. Su, “Named entity recognition using an hmm- based chunk tagger,” in ACL, 2002, pp. 473–480.
W. Liao and S. Veeramachaneni, “A simple semi-supervised algorithm for named entity recognition,” in NAACL-HLT, 2009, pp. 58–65.
A. Ghaddar and P. Langlais, “Robust lexical features for im- proved neural network named-entity recognition,” in COLING, 2018, pp. 1896–1907.

document and corpus features

Y. Ravin and N. Wacholder, Extracting names from natural-language text. IBM Research Report RC 2033, 1997.
V. Krishnan and C. D. Manning, “An effective two-stage model for exploiting non-local dependencies in named entity recogni- tion,” in ACL, 2006, pp. 1121–1128.

More features

D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvist. Investig., vol. 30, no. 1, pp. 3–26, 2007.
R. Sharnagat, “Named entity recognition: A literature survey,” Center For Indian Language Technology, 2014.
D. Campos, S. Matos, and J. L. Oliveira, “Biomedical named entity recognition: a survey of machine-learning tools,” in Theory Appl. Adv. Text Min., 2012.

非監督方法

D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvist. Investig., vol. 30, no. 1, pp. 3–26, 2007.
O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates, “Unsupervised named- entity extraction from the web: An experimental study,” Artif. Intell., vol. 165, no. 1, pp. 91–134, 2005.
S. Zhang and N. Elhadad, “Unsupervised biomedical named en- tity recognition: Experiments with clinical and biological texts,” J. Biomed. Inform., vol. 46, no. 6, pp. 1088–1098, 2013.
M. Collins and Y. Singer, “Unsupervised models for named entity classification,” in EMNLP, 1999, pp. 100–110.
D. Nadeau, P. D. Turney, and S. Matwin, “Unsupervised named- entity recognition: Generating gazetteers and resolving ambigu- ity,” in CSCSI, 2006, pp. 266–277.

language-model-augmented

M. E. Peters, W. Ammar, C. Bhagavatula, and R. Power, “Semi- supervised sequence tagging with bidirectional language mod- els,”
M.E.Peters,M.Neumann,M.Iyyer,M.Gardner,C.Clark,K.Lee, and L. Zettlemoyer, “Deep contextualized word representations,”
M. Rei, “Semi-supervised multitask learning for sequence label- ing,”
L. Liu, X. Ren, J. Shang, J. Peng, and J. Han, “Efficient contex- tualized representation: Language model pruning for sequence labeling,”
L. Liu, J. Shang, F. Xu, X. Ren, H. Gui, J. Peng, and J. Han, “Empower sequence labeling with task-aware neural language model,”
C. Jia, L. Xiao, and Y. Zhang, “Cross-domain NER using cross- domain language modeling,”

ner型別：

coarse-grained NER
fine-grained NER tasks
1. X. Ling and D. S. Weld, “Fine-grained entity recognition.” in AAAI, vol. 12, 2012, pp. 94–100.
2. X. Ren, W. He, M. Qu, L. Huang, H. Ji, and J. Han, “Afet: Automatic fine-grained entity typing by hierarchical partial-label embedding,” in EMNLP, 2016, pp. 1369–1378.
3. A. Abhishek, A. Anand, and A. Awekar, “Fine-grained entity type classification by jointly learning representations and label embeddings,” in EACL, 2017, pp. 797–807.
4. A. Lal, A. Tomer, and C. R. Chowdary, “Sane: System for fine grained named entity typing on textual data,” in WWW, 2017, pp. 227–230.
5. L. d. Corro, A. Abujabal, R. Gemulla, and G. Weikum, “Finet: Context-aware fine-grained named entity typing,” in EMNLP, 2015, pp. 868–878.

資料集

有些資料集會有幾百個標籤，如HYENA Gillick

OntoNotes CoNLL03

https://github.com/juand-r/entity-recognition-datasets

https://github.com/cambridgeltl/MTL-Bioinformatics-2016/tree/master/data

工具

指標

ner有兩個任務：boundary detection and type identification

FP: 模型返回的正例，但是ground-truth中沒有

FN:模型沒有返回的正例，但是ground-truth中有

TP: 模型返回的正例，但是ground-truth中有

F值分巨集觀和微觀,巨集觀是所有類別的F平均值。微觀是每個類別自己的F

MUC-6 忽略邊界統計分類指標；忽略類別統計邊界的指標
ACE 太複雜一般不用

Context Encoder Architectures

被廣泛使用的內容encoder架構：卷積神經網路、迴圈神經網路、遞迴神經網路、深度transformer 。

1、R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,”

cnn用來捕捉單詞的區域性特徵

2、E. Strubell, P. Verga, D. Belanger, and A. McCallum, “Fast and accurate entity recognition with iterated dilated convolutions,”

傳統並行的LSTMs長度為n的序列的時間複雜度是O(n)，ID-CNNs有更長的文字和結構預測能力。速度專案BI LSTM CRF的速度上有14-20倍提高。

3、BILSTM 因為rnn最後的單詞對句子表達影響比較大 P. Zhou, S. Zheng, J. Xu, Z. Qi, H. Bao, and B. Xu, “Joint extraction of multiple relations and entities by using a hybrid neural network,”

4、GRU LSTM

5、遞迴神經網路是非線性的自適應模型，能夠學習到具有拓撲順序的結構其中的深度結構化資訊。

P.-H. Li, R.-P. Dong, Y.-S. Wang, J.-C. Chou, and W.-Y. Ma, “Leveraging linguistic structures for named entity recognition with bidirectional recursive neural networks,”
M. Rei, “Semi-supervised multitask learning for sequence label- ing,” in ACL, 2017, pp. 2121–2130.

神經語言模型

前向、逆向神經語言模型

在多工學習中，語言模型和序列標記模型共享同一字元層。來自字元級嵌入、預先訓練的單詞嵌入和局

域網語言模型表示的向量被連線並送入單詞級LSTMs中。實驗結果表明，多工學習是一種有效的指導語言

模型學習特定任務知識的方法。

Deep Transformer

traditional embeddings and language model embeddings聯合使用

A. Ghaddar and P. Langlais, “Robust lexical features for im- proved neural network named-entity recognition,” in COLING, 2018, pp. 1896–1907.
Z. Jie and W. Lu, “Dependency-guided lstm-crf for named entity recognition,” in EMNLP, 2018, pp. 3860–3870.
C. Xia, C. Zhang, T. Yang, Y. Li, N. Du, X. Wu, W. Fan, F. Ma, and P. S. Yu, “Multi-grained named entity recognition,” in ACL, 2019, pp. 1430–1440.
Y. Luo, F. Xiao, and H. Zhao, “Hierarchical contextual- ized representation for named entity recognition,” CoRR, vol. abs/1911.02257, 2019.
Y. Liu, F. Meng, J. Zhang, J. Xu, Y. Chen, and J. Zhou, “GCDT: A global context enhanced deep transition architecture for sequence labeling,” in ACL, 2019, pp. 2431–2441.
Y. Jiang, C. Hu, T. Xiao, C. Zhang, and J. Zhu, “Improved differ- entiable architecture search for language modeling and named entity recognition,”

將ner變為mrc方法

X. Li, J. Feng, Y. Meng, Q. Han, F. Wu, and J. Li, “A uni- fied MRC framework for named entity recognition,” CoRR, vol. abs/1910.11476, 2019.
X. Li, X. Sun, Y. Meng, J. Liang, F. Wu, and J. Li, “Dice loss for data-imbalanced NLP tasks

Tag Decoder Architectures

four architectures of tag decoders:

MLP + softmax layer, conditional random fields (CRFs), recurrent neural networks, and pointer networks.

A. Akbik, D. Blythe, and R. Vollgraf, “Contextual string embeddings for sequence labeling,” in COLING, 2018, pp. 1638–1649.

指標網路首先識別一塊(或段)，然後標記它。重複此操作，直到處理完輸入序列中的所有單詞。在圖12(d)中，給定開始令牌“<s>”，段“Michael Jeffery Jordan”首先被識別，然後被標記為“PERSON”。分割和標記可以用指標網路中的兩個獨立的神經網路來完成。接下來，“邁克爾·傑弗瑞·喬丹”作為輸入輸入到指標網路中。結果，段“was”被識別並標記為“O”。

F. Zhai, S. Potdar, B. Xiang, and B. Zhou, “Neural models for sequence chunking.” in AAAI, 2017, pp. 3365–3371.
O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in NIPS, 2015, pp. 2692–2700.
J. Li, A. Sun, and S. Joty, “Segbot: A generic neural text segmenta- tion model with pointer network,” in IJCAI, 2018, pp. 4166–4172

DNN 架構總結

以下結果表明，外部知識可以促進NER效能的提高。

T. Liu, J. Yao, and C. Lin, “Towards improving neural named entity recognition with gazetteers,” in ACL, 2019, pp. 5301–5307.
Z. Jie and W. Lu, “Dependency-guided lstm-crf for named entity recognition,” in EMNLP, 2018, pp. 3860–3870.
C. Xia, C. Zhang, T. Yang, Y. Li, N. Du, X. Wu, W. Fan, F. Ma, and P. S. Yu, “Multi-grained named entity recognition,”
J. Zhuo, Y. Cao, J. Zhu, B. Zhang, and Z. Nie, “Segment-level se- quence modeling using gated recursive semi-markov conditional random fields,”

缺點：

1) acquiring external knowledge is labor-intensive (e.g., gazetteers) or computationally expensive (e.g., dependency);

2) integrat- ing external knowledge adversely affects end-to-end learn- ing and hurts the generality of DL-based systems.

預訓練好的transformer比lstm更有效。沒有預訓練並且資料有限，transformer表現會不好（

Q. Guo, X. Qiu, P. Liu, Y. Shao, X. Xue, and Z. Zhang, “Star- transformer,” in NAACL-HLT, 2019, pp. 1315–1325.

H. Yan, B. Deng, X. Li, and X. Qiu, “Tener: Adapting trans- former encoder for name entity recognition,” arXiv preprint arXiv:1911.04474, 2019.

）

transformer當序列長度n小於embedding維度d，會更快，complexities: self- attention O(n2 · d) and recurrent O(n · d2) [A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017, pp. 5998–6008.]

對於終端使用者，選擇什麼體系結構取決於資料和領域任務。如果資料充足，可以考慮從零開始用RNNs訓練模型和對上下文化語言模型進行微調。如果資料稀缺，採用遷移策略可能是更好的選擇。對於新聞域，有許多預先訓練的現成模型可用。對於特定領域(例如，醫療和社會媒體)，使用特定領域資料微調通用目的上下文化語言模型通常是一種有效的方法。

low-resource and across- domain NER

C. Jia, L. Xiao, and Y. Zhang, “Cross-domain NER using cross- domain language modeling,”
S. J. Pan, Z. Toh, and J. Su, “Transfer joint embedding for cross- domain named entity recognition,” ACM Trans. Inf. Syst., vol. 31, no. 2, p. 7, 2013.
J. Y. Lee, F. Dernoncourt, and P. Szolovits, “Transfer learning for named-entity recognition with neural networks,” arXiv preprint arXiv:1705.06273, 2017.
B. Y. Lin and W. Lu, “Neural adaptation layers for cross-domain named entity recognition,” in EMNLP, 2018, pp. 2012–2022.
Y. Cao, Z. Hu, T. Chua, Z. Liu, and H. Ji, “Low-resource name tagging learned with weakly labeled data,” in EMNLP, 2019, pp. 261–270.
X. Huang, L. Dong, E. Boschee, and N. Peng, “Learning A unified named entity tagger from multiple partially annotated corpora for efficient adaptation,” in CoNLL, 2019, pp. 515–527.

通過bootstrapping整合的傳統方法

J. Jiang and C. Zhai, “Instance weighting for domain adaptation in nlp,” in ACL, 2007, pp. 264–271.
D. Wu, W. S. Lee, N. Ye, and H. L. Chieu, “Domain adaptive bootstrapping for named entity recognition,” in EMNLP, 2009, pp. 1523–1532.
A. Chaudhary, J. Xie, Z. Sheikh, G. Neubig, and J. G. Carbonell, “A little annotation does a lot of good: A study in bootstrapping low-resource named entity recognizers,” pp. 5163–5173, 2019.

遷移學習

如果兩個任務具有可對映的標籤集，則存在一個共享的CRF層，否則，每個任務學習一個單獨的CRF層。實驗結果表明，在低資源條件下，各種資料集都有顯著的改進，提出三種遷移場景，Z. Yang, R. Salakhutdinov, and W. W. Cohen, “Transfer learning for sequence tagging with hierarchical recurrent networks,” in ICLR, 2017.

Zhao等人[H. Zhao, Y. Yang, Q. Zhang, and L. Si, “Improve neural entity recognition via multi-task data selection and constrained decod- ing,” in NAACL-HLT, vol. 2, 2018, pp. 346–351.]提出了一種具有域自適應的多工模型，其中全連線層適用於不同的資料集，CRF特徵分別計算。趙的模型的一個主要優點是在資料選擇過程中過濾了具有不同分佈和不正確標註指南的例項。

在原任務上訓練，然後在目標任務資料上微調 J. Y. Lee, F. Dernoncourt, and P. Szolovits, “Transfer learning for named-entity recognition with neural networks,” arXiv preprint arXiv:1705.06273, 2017.

B. Y. Lin and W. Lu, “Neural adaptation layers for cross-domain named entity recognition,” in EMNLP, 2018, pp. 2012–2022.提出在三個神經網路自適應層word adapta- tion layer, sentence adaptation layer, and output adaptation layer 微調的辦法

tag-hierarchy model

提出了一種用於異構標記集NER設定的標記層次模型，在推理過程中使用層次將細粒度標記對映到目標標記集。

G. Beryozkin, Y. Drori, O. Gilon, T. Hartman, and I. Szpektor, “A joint named-entity recognizer for heterogeneous tag-sets using a tag hierarchy,” in ACL, 2019, pp. 140–150.

醫學上的一些遷移學習，用來減少標註資料量

X. Wang, Y. Zhang, X. Ren, Y. Zhang, M. Zitnik, J. Shang, C. Langlotz, and J. Han, “Cross-type biomedical named en- tity recognition with deep multi-task learning,” arXiv preprint arXiv:1801.09851, 2018.
J. M. Giorgi and G. D. Bader, “Transfer learning for biomedical named entity recognition with neural networks,” Bioinformatics, 2018.
Z. Wang, Y. Qu, L. Chen, J. Shen, W. Zhang, S. Zhang, Y. Gao, G. Gu, K. Chen, and Y. Yu, “Label-aware double transfer learning for cross-specialty medical named entity recognition,” in NAACL- HLT, 2018, pp. 1–15.

深度主動學習的NER

Y. Shen, H. Yun, Z. C. Lipton, Y. Kronrod, and A. Anandkumar, “Deep active learning for named entity recognition,” 提出增量學習，可以在每個batch上增加新的label。

主動學習演算法實現99%的效能最好的深度學習模型訓練的完整的資料在英語使用只有24.9%的訓練資料集和30.1%對中國資料集。此外，有12.0%和16.9%的訓練資料足以使深度主動學習模型優於在全訓練資料上學習的淺層模型

D. D. Lewis and W. A. Gale, “A sequential algorithm for training text classifiers,” in SIGIR, 1994, pp. 3–12.

S. Pradhan, A. Moschitti, N. Xue, H. T. Ng, A. Björkelund, O. Uryupina, Y. Zhang, and Z. Zhong, “Towards robust linguistic analysis using ontonotes,” in CoNLL, 2013, pp. 143–152.

NER的深度強化學習

強化學習論文：

L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artif. Intell. Res., vol. 4, pp. 237–285, 1996.

R. S. Sutton and A. G. Barto, Introduction to reinforcement learning. MIT press Cambridge, 1998, vol. 135.

S. C. Hoi, D. Sahoo, J. Lu, and P. Zhao, “Online learning: A comprehensive survey,” arXiv preprint arXiv:1802.02871, 2018.

強化學習包括三個組成：1、狀態轉移函式；2、觀察函式（如輸出函式）；3、激勵函式

也能被建模為一個隨機有限狀態機，具有輸入(來自環境的觀察/獎勵)和輸出(對環境的行動)。它由兩部分組成:(i)狀態轉移函式，(ii) 策略/輸出函式。

Y. Yang, W. Chen, Z. Li, Z. He, and M. Zhang, “Distantly super- vised NER with partial annotation learning and reinforcement learning,” in COLING, 2018, pp. 2159–2169. 利用遠端監督生成的資料在新的域上實現新型別的命名體體識別

NER的深度對抗學習

對抗學習：D.LowdandC.Meek,“Adversariallearning,”inSIGKDD,2005, pp. 641–647.

目的是模型更魯棒或者對輸入的乾淨的資料，減少測試錯誤。有generative network和discriminative network。

L.Huang,H.Ji,andJ.May,“Cross-lingualmulti-leveladversarial transfer to enhance low-resource name tagging,” in NAACL-HLT, 2019, pp. 3823–3833.
J. Li, D. Ye, and S. Shang, “Adversarial transfer for named entity boundary detection with pointer networks,” in IJCAI, 2019, pp. 5053–5059.
P. Cao, Y. Chen, K. Liu, J. Zhao, and S. Liu, “Adversarial transfer learning for chinese named entity recognition with self-attention mechanism,” in EMNLP, 2018, pp. 182–192.

Neural Attention for NER

神經注意機制使神經網路有能力集中在它的輸入的子集上。通過應用注意機制，NER模型可以捕獲輸入資訊中資訊最豐富的元素。

M. Rei, G. K. Crichton, and S. Pyysalo, “Attending to characters in neural sequence labeling models,” in COLING, 2016, pp. 309– 318.
A. Zukov-Gregoric, Y. Bachrach, P. Minkovsky, S. Coope, and B. Maksak, “Neural named entity recognition using a self- attention mechanism,” in ICTAI, 2017, pp. 652–656.
G. Xu, C. Wang, and X. He, “Improving clinical named entity recognition with global neural attention,” in APWeb-WAIM, 2018, pp. 264–279.
Q. Zhang, J. Fu, X. Liu, and X. Huang, “Adaptive co-attention network for named entity recognition in tweets,” in AAAI, 2018.