xgboost 特徵重要性計算

躲不過這哀傷發表於2018-11-13

在XGBoost中提供了三種特徵重要性的計算方法:

‘weight’ - the number of times a feature is used to split the data across all trees. 
‘gain’ - the average gain of the feature when it is used in trees 
‘cover’ - the average coverage of the feature when it is used in trees

簡單來說 
weight就是在所有樹中特徵用來分割的節點個數總和; 
gain就是特徵用於分割的平均增益 
cover 的解釋有點晦澀,在[R-package/man/xgb.plot.tree.Rd]有比較詳盡的解釋:(https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd):the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be。實際上coverage可以理解為被分到該節點的樣本的二階導數之和,而特徵度量的標準就是平均的coverage值。

還是舉李航書上那個例子,我們用不同顏色來表示不同的特徵,繪製下圖 
這裡寫圖片描述

轉載於:https://www.cnblogs.com/cupleo/p/9951436.html

相關文章