Stanford Natural Language Inference (SNLI)和Multi-Genre NLI Corpus(MultiNLI) 資料集

CopperDong發表於2018-03-12

Stanford Natural Language Inference (SNLI)和Multi-Genre NLI Corpus(MultiNLI) 資料集

https://nlp.stanford.edu/projects/snli/
https://www.nyu.edu/projects/bowman/multinli/
MultiNLI是SNLI的升級版,格式一樣,規模相當,但是前者變化更多,也包含了一個輔助測試集用於cross-genre transfer 評估

SNLI1.0包含570,000的人工手寫英文句子對,人工標註了平衡的分類標籤:蘊含entailment,矛盾,中性
支援NLI(natural language inference)任務,也被視為RTE( recognizing textual entailment )任務

詳細介紹:
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). [pdf] [bib]

除了gold label,還包含了5個標註人的評估結果,另外句子以兩種解析表示:

gold_label sentence1_binary_parse sentence2_binary_parse sentence1_parse sentence2_parse sentence1 sentence2 captionID pairID label1 label2 label3 label4 label5
neutral ( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) ) ( ( A person ) ( ( is ( ( training ( his horse ) ) ( for ( a competition ) ) ) ) . ) ) (ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .))) (ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (VP (VBG training) (NP (PRP$ his) (NN horse)) (PP (IN for) (NP (DT a) (NN competition))))) (. .))) A person on a horse jumps over a broken down airplane. A person is training his horse for a competition. 3416050480.jpg#4 3416050480.jpg#4r1n neutral 


相關文章