6.8 Summary 小結
- Modeling the linguistic data found in corpora can help us to understand linguistic patterns, and can be used to make predictions about new language data.
建模語料庫中的語言資料可以幫助我們理解語言模型,並且可以用於進行關於新語言資料的預測。
- Supervised classifiers use labeled training corpora to build models that predict the label of an input based on specific features of that input.
監督式分類器使用標籤訓練語料庫來構建模型,預測基於特定要素輸入的所輸入的標籤。
- Supervised classifiers can perform a wide variety of NLP tasks, including document classification, part-of-speech tagging, sentence segmentation, dialogue act type identification, and determining entailment relations, and many other tasks.
監督式分類器可以執行很多NLP任務,包括了文件分類,詞性表彰,語句分割,對話行為型別識別,以及確定蘊含關係,以及其他任務。
- When training a supervised classifier, you should split your corpus into three datasets: a training set for building the classifier model; a dev-test set for helping select and tune the model's features; and a test set for evaluating the final model's performance.
當徐連一個監督式分類器,你應該把你的語料庫分為三個資料集:用於構造分類模型的訓練及,一個用於幫助選擇和調整模型特性的偏差測試集,以及一個用於評價最終模型效能的測試集。
- When evaluating a supervised classifier, it is important that you use fresh data, that was not included in the training or dev-test set. Otherwise, your evaluation results may be unrealistically optimistic.
當評價一個監督式分類器時,重要的是你要使用新鮮的沒有包含在訓練或者偏差測試集中的資料。否則,你的評估結果可能會不切實際地樂觀。
- Decision trees are automatically constructed tree-structured flowcharts that are used to assign labels to input values based on their features. Although they're easy to interpret, they are not very good at handling cases where feature values interact in determining the proper label.
決策樹可以自動地構建樹結構的流程圖,用於為輸入變數基於它們的特性賦值。儘管它們可以簡單地解釋,但是它們不適合處理特性值相互影響來決定合適標籤的情況。
- In naive Bayes classifiers, each feature independently contributes to the decision of which label should be used. This allows feature values to interact, but can be problematic when two or more features are highly correlated with one another.
在樸素貝葉斯分類器中,每個特性獨立地貢獻來決定哪個標籤應該被使用。它允許特徵值互動,但是當兩個或更多的特性高度地相互對應時將會有問題。
- Maximum Entropy classifiers use a basic model that is similar to the model used by naive Bayes; however, they employ iterative optimization to find the set of feature weights that maximizes the probability of the training set.
最大熵分類器使用基本的與樸素貝葉斯相似的模型;不過,它們使用了迭代優化來尋找特性加權集來最大化訓練集的可能性。
- Most of the models that are automatically constructed from a corpus are descriptive — they let us know which features are relevant to a given patterns or construction, but they don't give any information about causal relationships between those features and patterns.
大多數從語料庫自動地構建的模型是描述性的——它們讓我們知道哪個特性與給定的模式或結構是相關的,但是它們沒有給出關於這些特性和模式之間的因果關係的任何資訊。