Machine Learning (10) - Decision Tree

Rachel發表於2019-06-09

原文網址 : https://learnku.com/articles/29562

Mac

引言

對於上圖這樣一個資料集，我們可以很容易地使用 Logistic Regression 來畫出這條分界線。

但是，對於一些資料分佈更加隨意的資料集, 就要像下圖一樣，需要多條線來做分割，才能做到比較準確的分類。這時我們就需要 Desision Tree Algorithm 來幫我們完成這個工作。

Machine Learning (10) - Decision Tree

應用場景

從公司, 職位, 學歷三個維度衡量工資水平是否超過 100k，以下是草料：

Machine Learning (10) - Decision Tree

拿到這個需求, 首先都會在腦海裡做一個樹形分類:

Machine Learning (10) - Decision Tree

上面是以公司作為總維度來劃分的, 還可以嘗試以學歷為總維度來劃分, 都會得到不一樣的效果:

Machine Learning (10) - Decision Tree

但其實目前的資料是相對來說, 非常簡單的, 在實際應用中, 這個樹可能會有50層那麼高, 那就很難這樣呈現了. 所以這裡就可以用DecisionTreeClassifier 來完成這個工作.

如何使用

引入資料檔案

import pandas as pd
df = pd.read_csv('/Users/rachel/Downloads/py-master/ML/9_decision_tree/salaries.csv')
df

Machine Learning (10) - Decision Tree

轉換非數字列

由於 Machine Learning 只支援對數字的分析, 所以要把 company, job 和 degree 列的資料都轉成數字。這裡用的是 LabelEncoder, 雖然像 company 和 job 列都是 nominal 而非 ordinary, 但是由於我們要用的是 DecisionTreeClassifier, 所以可以用 LabelEncoder。

from sklearn.preprocessing import LabelEncoder
le_company = LabelEncoder()
le_job = LabelEncoder()
le_degree = LabelEncoder()

dfle = df
dfle.company = le_company.fit_transform(dfle.company)
dfle.job = le_job.fit_transform(dfle.job)
dfle.degree = le_degree.fit_transform(dfle.degree)
dfle.head()

Machine Learning (10) - Decision Tree

整理用於訓練模型的資料

inputs = dfle.drop('salary_more_then_100k', axis = 'columns')
target = dfle['salary_more_then_100k']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size = 0.3)

訓練模型

from sklearn import tree
model = tree.DecisionTreeClassifier()
model.fit(X_train, y_train)

Machine Learning (11) - 關於 Decision Tree 的小練習
2019-06-09
Mac
Decision Tree
2018-06-29
Decision tree——決策樹
2020-04-30
決策樹（Decision Tree）
2021-07-13
《machine learning》引言
2020-10-13
Mac
Machine Learning with Sklearn
2020-12-11
Mac
Machine Learning (12) - Support Vector Machine (SVM)
2019-06-10
Mac
Machine Learning－Introduction
2019-04-03
Mac
Machine Learning - Basic points
2020-01-17
Mac
大資料————決策樹（decision tree）
2022-10-20
大資料
Machine Learning (1) - Linear Regression
2019-04-14
Mac
Extreme Learning Machine 翻譯
2019-01-20
REMMac
pages bookmarks for machine learning domain
2018-12-05
MacAI
Machine Learning（13）- Random Forest
2019-06-12
MacrandomREST
Machine learning terms_01
2021-04-07
Mac
Machine Learning (5) - Training and Testing Data
2019-06-06
MacAI
SciTech-BigDataAIML-Machine Learning Tutorials
2024-08-12
AIMac
機器學習演算法系列（十七）-決策樹學習演算法（Decision Tree Learning Algorithm）
2022-02-23
機器學習演算法Go
分類演算法-決策樹 Decision Tree
2020-01-18
演算法
《深度學習》PDF Deep Learning: Adaptive Computation and Machine Learning series
2019-12-17
深度學習APTMac
Machine Learning Yearning 要點筆記
2018-10-24
Mac筆記
Machine Learning（14） - K Fold Cross Validation
2019-06-18
MacROS
Machine Learning (6) - Logistic Regression (Binary Classification)
2019-06-07
Mac
Machine Learning (8) - Logistic Regression (Multiclass Classification)
2019-06-07
Mac
MATH38161 Multivariate Statistics and Machine Learning
2024-11-23
Mac
MPHY0041 Machine Learning in Medical Imaging
2024-12-01
Mac
Machine Learning（機器學習）之二
2018-10-25
Mac機器學習
Machine Learning（機器學習）之一
2019-02-27
Mac機器學習
使用Octave來學習Machine Learning(二)
2019-02-27
Mac
Machine Learning 機器學習筆記
2018-03-27
Mac機器學習筆記
Machine Learning With Go 第4章：迴歸
2022-06-01
MacGo
ml-10-1-規模機器學習( ( Large Scale Machine Learning) )
2020-10-14
機器學習Mac
Monetizing Machine Learning.pdf 免費下載
2018-10-17
Mac
machine learning model(algorithm model) .vs. statistical model
2018-08-16
MacGo
Matlab機器學習3（Machine Learning Onramp）
2020-10-27
Matlab機器學習Mac
論文閱讀：《Learning by abstraction: The neural state machine》
2022-04-10
Mac
機器學習之決策樹(Decision Tree)python實現
2018-06-12
機器學習Python
Coursera 吳恩達《Machine Learning》視訊 + 作業
2018-08-03
吳恩達Mac

Machine Learning (10) - Decision Tree

引言

應用場景

如何使用

引入資料檔案

轉換非數字列

整理用於訓練模型的資料

訓練模型

相關文章