決策樹ID3演算法python實現 -- 《機器學習實戰》

劉川楓發表於2017-11-13

 1 from math import log
 2 import numpy as np
 3 import matplotlib.pyplot as plt
 4 import operator
 5 
 6 #計算給定資料集的夏農熵
 7 def calcShannonEnt(dataSet):
 8     numEntries = len(dataSet)
 9     labelCounts = {}
10     for featVec in dataSet:                         #|
11         currentLabel = featVec[-1]                  #|
12         if currentLabel not in labelCounts.keys():  #|獲取標籤類別取值空間(key)及出現的次數(value)
13             labelCounts[currentLabel] = 0           #|
14         labelCounts[currentLabel] += 1              #|
15     shannonEnt = 0.0
16     for key in labelCounts:                         #|
17         prob = float(labelCounts[key])/numEntries   #|計算夏農熵
18         shannonEnt -= prob * log(prob, 2)           #|
19     return shannonEnt
20 
21 #建立資料集
22 def createDataSet():
23     dataSet = [[1,1,'yes'],
24                [1,1,'yes'],
25                [1,0,'no'],
26                [0,1,'no'],
27                [0,1,'no']]
28     labels = ['no surfacing', 'flippers']
29     return dataSet, labels
30 
31 #按照給定特徵劃分資料集
32 def splitDataSet(dataSet, axis, value):
33     retDataSet = []
34     for featVec in dataSet:                         #|
35         if featVec[axis] == value:                  #|
36             reducedFeatVec = featVec[:axis]         #|抽取出符合特徵的資料
37             reducedFeatVec.extend(featVec[axis+1:]) #|
38             retDataSet.append(reducedFeatVec)       #|
39     return retDataSet
40 
41 #選擇最好的資料集劃分方式
42 def chooseBestFeatureToSplit(dataSet):
43     numFeatures = len(dataSet[0]) - 1
44     basicEntropy = calcShannonEnt(dataSet)
45     bestInfoGain = 0.0; bestFeature = -1
46     for i in range(numFeatures):        #計算每一個特徵的熵增益
47         featlist = [example[i] for example in dataSet]
48         uniqueVals = set(featlist)
49         newEntropy = 0.0
50         for value in uniqueVals:        #計算每一個特徵的不同取值的熵增益
51             subDataSet = splitDataSet(dataSet, i, value)
52             prob = len(subDataSet)/float(len(dataSet))
53             newEntropy += prob * calcShannonEnt(subDataSet) #不同取值的熵增加起來就是整個特徵的熵增益
54         infoGain = basicEntropy - newEntropy
55         if (infoGain > bestInfoGain):   #選擇最高的熵增益作為劃分方式
56             bestInfoGain = infoGain
57             bestFeature = i
58     return bestFeature
59 #挑選出現次數最多的類別
60 def majorityCnt(classList):
61     classCount={}
62     for vote in classList:
63         if vote not in classCount.keys():
64             classCount[vote] = 0
65         classCount[vote] += 1
66     sortedClassCount = sorted(classCount.items(), key = operator.itemgetter(1), reverse=True)
67     return sortedClassCount[0][0]
68 
69 def createTree(dataSet, labels):
70     classList = [example[-1] for example in dataSet]
71     if classList.count(classList[0]) == len(classList): #停止條件一：判斷所有類別標籤是否相同，完全相同則停止繼續劃分
72         return classList[0]
73     if len(dataSet[0]) == 1:    #停止條件二：遍歷完所有特徵時返回出現次數最多的
74         return majorityCnt(classList)
75     bestFeat = chooseBestFeatureToSplit(dataSet)    #得到列表包含的所有屬性值
76     bestFeatLabel = labels[bestFeat]
77     myTree = {bestFeatLabel:{}}
78     del(labels[bestFeat])
79     featValues = [example[bestFeat] for example in dataSet]
80     uniqueVals = set(featValues)
81     for value in uniqueVals:
82         subLabels = labels[:]
83         myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels)
84     return myTree
85 
86 # Simple unit test of func: createDataSet()
87 myDat, labels = createDataSet()
88 print (myDat)
89 #print (labels)
90 # Simple unit test of func: splitDataSet()
91 splitData = splitDataSet(myDat,0,1)
92 print (splitData)
93 # Simple unit test of func: chooseBestFeatureToSplit()
94 chooseResult = chooseBestFeatureToSplit(myDat)
95 print (chooseResult)
96 # Simple unit test of func: createTree(
97 myDat, labels = createDataSet()
98 myTree = createTree(myDat, labels)
99 print(myTree)

Output:

[[1, 1, 'yes'], [1, 1, 'yes'], [1, 0, 'no'], [0, 1, 'no'], [0, 1, 'no']]
[[1, 'yes'], [1, 'yes'], [0, 'no']]
0
{'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}}

Reference:

《機器學習實戰》

機器學習之決策樹ID3(python實現)
2019-02-27
機器學習Python
【Python機器學習實戰】決策樹和整合學習（二）——決策樹的實現
2021-08-25
Python機器學習
機器學習——決策樹模型：Python實現
2020-11-09
機器學習模型Python
《機器學習Python實現_09_01_決策樹_ID3與C4.5》
2020-05-26
機器學習Python
【Python機器學習實戰】決策樹和整合學習（一）
2021-08-19
Python機器學習
機器學習實戰（三）決策樹ID3：樹的構建和簡單分類
2018-05-17
機器學習
機器學習|決策樹-sklearn實現
2020-12-19
機器學習
機器學習之決策樹(Decision Tree)python實現
2018-06-12
機器學習Python
《機器學習Python實現_09_02_決策樹_CART》
2020-05-27
機器學習Python
【Python機器學習實戰】決策樹與整合學習（三）——整合學習（1）
2021-08-30
Python機器學習
【Python機器學習實戰】決策樹與整合學習（四）——整合學習（2）GBDT
2021-09-03
Python機器學習
Python機器學習：決策樹001什麼是決策樹
2020-12-24
Python機器學習
機器學習 - 決策樹：技術全解與案例實戰
2023-12-11
機器學習
機器學習之決策樹在sklearn中的實現
2019-03-06
機器學習
【Python機器學習實戰】決策樹與整合學習（六）——整合學習（4）XGBoost原理篇
2021-09-11
Python機器學習
機器學習之決策樹演算法
2019-07-28
機器學習演算法
機器學習：決策樹
2020-08-01
機器學習
鵝廠優文 | 決策樹及ID3演算法學習
2018-03-20
演算法
決策樹演算法-實戰篇
2020-11-16
演算法
【機器學習】實現層面決策樹並用graphviz視覺化樹
2020-10-28
機器學習視覺化
機器學習之決策樹原理和sklearn實踐
2019-06-24
機器學習
機器學習——決策樹模型
2023-12-26
機器學習模型
機器學習之決策樹
2024-06-09
機器學習
決策樹在機器學習的理論學習與實踐
2018-03-29
機器學習
機器學習經典演算法之決策樹
2019-06-16
機器學習演算法
《統計學習方法》——從零實現決策樹
2021-03-17
ML《決策樹（一）ID3》
2020-12-27
【機器學習】--決策樹和隨機森林
2018-03-27
機器學習隨機森林
機器學習筆記（四）決策樹
2020-10-28
機器學習筆記
機器學習Sklearn系列：（三）決策樹
2021-07-16
機器學習
機器學習西瓜書吃瓜筆記之(二)決策樹分類附一鍵生成決策樹&視覺化python程式碼實現
2020-10-13
機器學習筆記視覺化Python
《機器學習：演算法原理和程式設計實踐》3：決策樹的發展
2018-07-07
機器學習演算法程式設計
Reinventing the wheel：決策樹演算法的實現
2019-02-16
演算法
決策樹演算法的推理與實現
2022-06-03
演算法
python機器學習實戰（二）
2018-12-26
Python機器學習
機器學習 Day 9 | 決策樹基礎
2018-08-16
機器學習
通用機器學習演算法：線性迴歸+決策樹+Xgboost
2021-09-09
機器學習演算法
機器學習之分類迴歸樹(python實現CART)
2018-03-04
機器學習Python
機器學習演算法系列（十七）-決策樹學習演算法（Decision Tree Learning Algorithm）
2022-02-23
機器學習演算法Go

決策樹ID3演算法python實現 -- 《機器學習實戰》

相關文章