特徵工程:互動特徵與多項式特徵理解

python__reported發表於2020-12-29

特徵工程:互動特徵與多項式特徵理解

一、理解

互動特徵與多項式特徵與資料預處理中的MinMaxScaler是相似的,都是對資料進行縮放處理
縮放處理、互動特徵與多項式特徵都是對原始資料進行縮放,縮放意義在於使得權重與偏置更具有敏感性,更易對資料預測

二、測試程式碼比較

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor

"""波士頓房價資料集”“”
"""載入資料,利用MinMaxScaler將其放縮到0-1之間。"""
boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split( boston.data, boston.target, random_state=0)
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

"""提取多項式特徵和互動特徵"""
poly = PolynomialFeatures(degree=2).fit(X_train_scaled)
X_train_poly = poly.transform(X_train_scaled)
X_test_poly = poly.transform(X_test_scaled)
print("X_train.shape: {}".format(X_train.shape))
print("X_train_poly.shape: {}".format(X_train_poly.shape))

"""Ridge在有互動特徵地資料集和沒有互動特徵地資料上地效能進行對比"""
ridge = Ridge().fit(X_train, y_train)
print("Origin: {:.3f}".format(ridge.score(X_test, y_test)))
ridge = Ridge().fit(X_train_scaled, y_train)
print("Score without interactions: {:.3f}".format(ridge.score(X_test_scaled, y_test)))
ridge = Ridge().fit(X_train_poly, y_train)
print("Score with interactions: {:.3f}".format(ridge.score(X_test_poly, y_test)))


"""隨機森林"""
rf = RandomForestRegressor(n_estimators=400).fit(X_train, y_train)
print("Origin: {:.3f}".format(rf.score(X_test, y_test)))
rf = RandomForestRegressor(n_estimators=400).fit(X_train_scaled, y_train)
print("Score without interactions: {:.3f}".format(rf.score(X_test_scaled, y_test)))
rf = RandomForestRegressor(n_estimators=400).fit(X_train_poly, y_train)
print("Score with interactions: {:.3f}".format(rf.score(X_test_poly, y_test)))
#對迴歸模型更能增加精確度,而對分類模型則有所相反

輸出

X_train.shape: (379, 13)
X_train_poly.shape: (379, 105)
Origin: 0.627
Score without interactions: 0.621
Score with interactions: 0.753
Origin: 0.791
Score without interactions: 0.802
Score with interactions: 0.768

相關文章