xgboost 特徵選擇，篩選特徵的正要性

weixin_34255793發表於2018-04-17

原文網址 : https://blog.csdn.net/weixin_34255793/article/details/86130013

import pandas as pd
import xgboost as xgb
import operator
from matplotlib import pylab as plt

def ceate_feature_map(features):
    outfile = open('xgb.fmap', 'w')
    i = 0
    for feat in features:
        outfile.write('{0}\t{1}\tq\n'.format(i, feat))
        i = i + 1

    outfile.close()

def get_data():
    train = pd.read_csv("../input/train.csv")

    features = list(train.columns[2:])

    y_train = train.Hazard

    for feat in train.select_dtypes(include=['object']).columns:
        m = train.groupby([feat])['Hazard'].mean()
        train[feat].replace(m,inplace=True)

    x_train = train[features]

    return features, x_train, y_train

def get_data2():
    from sklearn.datasets import load_iris
    #獲取資料
    iris = load_iris()
    x_train=pd.DataFrame(iris.data)
    features=["sepal_length","sepal_width","petal_length","petal_width"]
    x_train.columns=features
    y_train=pd.DataFrame(iris.target)
    return features, x_train, y_train

#features, x_train, y_train = get_data()
features, x_train, y_train = get_data2()
ceate_feature_map(features)

xgb_params = {"objective": "reg:linear", "eta": 0.01, "max_depth": 8, "seed": 42, "silent": 1}
num_rounds = 1000

dtrain = xgb.DMatrix(x_train, label=y_train)
gbdt = xgb.train(xgb_params, dtrain, num_rounds)

importance = gbdt.get_fscore(fmap='xgb.fmap')
importance = sorted(importance.items(), key=operator.itemgetter(1))

df = pd.DataFrame(importance, columns=['feature', 'fscore'])
df['fscore'] = df['fscore'] / df['fscore'].sum()

plt.figure()
df.plot()
df.plot(kind='barh', x='feature', y='fscore', legend=False, figsize=(16, 10))
plt.title('XGBoost Feature Importance')
plt.xlabel('relative importance')
plt.gcf().savefig('feature_importance_xgb.png')

根據結構分數的增益情況計算出來選擇哪個特徵的哪個分割點，某個特徵的重要性，就是它在所有樹中出現的次數之和。

參考：https://blog.csdn.net/q383700092/article/details/53698760

另外：使用xgboost，遇到一個問題

看到網上有一個辦法：

重新新建Python檔案，把你的程式碼拷過去；或者重新命名也可以；還不行，就把程式碼複製到別的地方（不能在原始資料夾內），會重新編譯，就正常了

但是我覺得本質問題不是這樣解決的，但臨時應急還是可以的，歡迎討論！

問題根源：

初學者或者說不太瞭解Python才會犯這種錯誤，其實只需要注意一點！不要使用任何模組名作為檔名，任何型別的檔案都不可以！我的錯誤根源是在資料夾中使用xgboost.*的檔名，當import xgboost時會首先在當前檔案中查詢，才會出現這樣的問題。所以，再次強調：不要用任何的模組名作為檔名！

另外：若出現問題：

D:\Program\Python3.5\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)

先解除安裝原先版本的xgboost， pip uninstall xgboost

然後下載安裝新版本的xgboost，地址：https://www.lfd.uci.edu/~gohlke/pythonlibs/#xgboost

命令：pip install xgboost-0.6-cp35-none-win_amd64.whl

XGBoost 輸出特徵重要性以及篩選特徵
2018-08-26
特徵
【特徵工程】（資料）使用Xgboost篩選特徵重要性
2019-12-14
特徵工程
XGBoost學習（六）：輸出特徵重要性以及篩選特徵
2020-09-03
特徵
xgboost 特徵重要性選擇 / 看所有特徵哪個重要
2018-06-06
特徵
RF、GBDT、XGboost特徵選擇方法
2018-04-19
特徵
機器學習之基於xgboost的特徵篩選
2020-03-19
機器學習特徵
特徵工程之特徵選擇
2018-10-26
特徵工程
特徵選擇和特徵生成問題初探
2018-07-29
特徵
特徵工程特徵選擇 reliefF演算法
2020-11-07
特徵工程演算法
xgboost特徵重要性
2019-02-16
特徵
特徵選擇技術總結
2022-11-24
特徵
xgboost 特徵重要性計算
2018-11-13
特徵
專欄 | 基於 Jupyter 的特徵工程手冊：特徵選擇（五）
2020-05-24
特徵工程
專欄 | 基於 Jupyter 的特徵工程手冊：特徵選擇（四）
2020-05-07
特徵工程
專欄 | 基於 Jupyter 的特徵工程手冊：特徵選擇（一）
2020-04-22
特徵工程
專欄 | 基於 Jupyter 的特徵工程手冊：特徵選擇（二）
2020-04-24
特徵工程
專欄 | 基於 Jupyter 的特徵工程手冊：特徵選擇（三）
2020-04-24
特徵工程
基於條件熵的特徵選擇
2020-08-09
熵特徵
決策樹模型(2)特徵選擇
2024-03-26
模型特徵
決策樹中結點的特徵選擇方法
2018-05-09
特徵
xgboost模型特徵重要性的不同計算方式
2019-09-17
模型特徵
ch11 特徵選擇與稀疏學習
2024-06-21
特徵
【原始碼】MATLAB特徵選擇函式庫version 6.2.2018.1
2018-11-09
原始碼Matlab特徵函式
用遺傳演算法進行特徵選擇
2019-01-20
演算法特徵
Relief 特徵選擇演算法簡單介紹
2018-08-08
特徵演算法
用xgboost模型對特徵重要性進行排序
2018-08-12
模型特徵排序
用xgboost獲取特徵重要性及應用
2019-11-20
特徵
Python中XGBoost的特性重要性和特性選擇
2019-03-27
Python
機器學習中，有哪些特徵選擇的工程方法？
2018-07-09
機器學習特徵
【演算法】關於xgboost特徵重要性的評估
2019-05-29
演算法特徵
用xgboost獲取特徵重要性原理及實踐
2019-04-13
特徵
xgboost輸出特徵重要性排名和權重值
2018-07-29
特徵
[Python人工智慧] 六.神經網路的評價指標、特徵標準化和特徵選擇
2018-06-12
Python人工智慧神經網路指標特徵
Android開商品屬性篩選與商品篩選
2019-03-01
Android
jQuery基本篩選選擇器使用指南
2021-09-09
jQuery
使用XGboost模組XGBClassifier、plot_importance來做特徵重要性排序
2019-09-01
Import特徵排序
機器學習—降維-特徵選擇6-4（PCA-Kernel方法）
2022-03-16
機器學習特徵PCA
機器學習中特徵選擇怎麼做？這篇文章告訴你
2020-04-06
機器學習特徵

xgboost 特徵選擇，篩選特徵的正要性

相關文章