特徵工程：互動特徵與多項式特徵理解

python__reported發表於2020-12-29

原文網址 : https://blog.csdn.net/python__reported/article/details/111938653

特徵工程：互動特徵與多項式特徵理解

一、理解

互動特徵與多項式特徵與資料預處理中的MinMaxScaler是相似的，都是對資料進行縮放處理
縮放處理、互動特徵與多項式特徵都是對原始資料進行縮放，縮放意義在於使得權重與偏置更具有敏感性，更易對資料預測

二、測試程式碼比較

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor

"""波士頓房價資料集”“”
"""載入資料，利用MinMaxScaler將其放縮到0-1之間。"""
boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split( boston.data, boston.target, random_state=0)
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

"""提取多項式特徵和互動特徵"""
poly = PolynomialFeatures(degree=2).fit(X_train_scaled)
X_train_poly = poly.transform(X_train_scaled)
X_test_poly = poly.transform(X_test_scaled)
print("X_train.shape: {}".format(X_train.shape))
print("X_train_poly.shape: {}".format(X_train_poly.shape))

"""Ridge在有互動特徵地資料集和沒有互動特徵地資料上地效能進行對比"""
ridge = Ridge().fit(X_train, y_train)
print("Origin: {:.3f}".format(ridge.score(X_test, y_test)))
ridge = Ridge().fit(X_train_scaled, y_train)
print("Score without interactions: {:.3f}".format(ridge.score(X_test_scaled, y_test)))
ridge = Ridge().fit(X_train_poly, y_train)
print("Score with interactions: {:.3f}".format(ridge.score(X_test_poly, y_test)))


"""隨機森林"""
rf = RandomForestRegressor(n_estimators=400).fit(X_train, y_train)
print("Origin: {:.3f}".format(rf.score(X_test, y_test)))
rf = RandomForestRegressor(n_estimators=400).fit(X_train_scaled, y_train)
print("Score without interactions: {:.3f}".format(rf.score(X_test_scaled, y_test)))
rf = RandomForestRegressor(n_estimators=400).fit(X_train_poly, y_train)
print("Score with interactions: {:.3f}".format(rf.score(X_test_poly, y_test)))
#對迴歸模型更能增加精確度，而對分類模型則有所相反

輸出

X_train.shape: (379, 13)
X_train_poly.shape: (379, 105)
Origin: 0.627
Score without interactions: 0.621
Score with interactions: 0.753
Origin: 0.791
Score without interactions: 0.802
Score with interactions: 0.768

特徵融合與特徵互動的區別
2024-04-18
特徵
特徵工程之特徵選擇
2018-10-26
特徵工程
特徵工程之特徵表達
2021-09-09
特徵工程
特徵工程系列：（三）特徵對齊與表徵
2021-07-19
特徵工程
特徵工程
2020-10-06
特徵工程
08 特徵工程 - 特徵降維 - LDA
2019-01-04
特徵工程LDA
特徵工程之特徵預處理
2018-05-26
特徵工程
特徵值與特徵向量
2020-04-04
特徵
特徵工程思路
2024-03-04
特徵工程
[特徵工程] encoding
2021-12-12
特徵工程Encoding
特徵工程梗概
2022-03-15
特徵工程
特徵工程特徵選擇 reliefF演算法
2020-11-07
特徵工程演算法
（特徵工程實戰）ML最實用的資料預處理與特徵工程常用函式！
2020-12-13
特徵工程函式
【特徵工程】（資料）使用Xgboost篩選特徵重要性
2019-12-14
特徵工程
特徵值和特徵向量
2024-04-25
特徵
掌握時間序列特徵工程：常用特徵總結與 Feature-engine 的應用
2024-04-20
特徵工程
影象特徵提取之HoG特徵
2018-03-06
特徵HOG
特徵值和特徵向量，thrive
2024-11-04
特徵
機器學習 | 特徵工程
2019-08-22
機器學習特徵工程
機器學習——特徵工程
2020-11-02
機器學習特徵工程
機器學習特徵工程
2018-03-03
機器學習特徵工程
資料分析特徵工程方法
2021-01-21
特徵工程
如何用Python做自動化特徵工程
2018-09-03
Python特徵工程
專欄 | 基於 Jupyter 的特徵工程手冊：特徵選擇（五）
2020-05-24
特徵工程
專欄 | 基於 Jupyter 的特徵工程手冊：特徵選擇（四）
2020-05-07
特徵工程
專欄 | 基於 Jupyter 的特徵工程手冊：特徵選擇（一）
2020-04-22
特徵工程
專欄 | 基於 Jupyter 的特徵工程手冊：特徵選擇（二）
2020-04-24
特徵工程
專欄 | 基於 Jupyter 的特徵工程手冊：特徵選擇（三）
2020-04-24
特徵工程
影像特徵計算——紋理特徵
2020-11-24
特徵
Alink漫談(九) ：特徵工程之特徵雜湊/標準化縮放
2020-07-04
特徵工程
特徵工程：基於梯度提升的模型的特徵編碼效果測試
2022-11-23
特徵工程梯度模型
機器學習的靜態特徵和動態特徵
2022-11-13
機器學習特徵
矩陣的特徵值和特徵向量
2024-05-07
矩陣特徵
特徵模型和特徵-這是什麼？
2022-01-05
特徵模型
[譯] 使用 Python 進行自動化特徵工程
2019-03-03
Python特徵工程
用Dask並行化特徵工程！
2018-08-20
並行特徵工程
一文讀懂特徵工程
2018-07-31
特徵工程
機器學習之特徵工程
2020-06-14
機器學習特徵工程

特徵工程：互動特徵與多項式特徵理解

特徵工程：互動特徵與多項式特徵理解

一、理解

二、測試程式碼比較

相關文章