人工智慧-機器學習-邏輯迴歸

最爱喝开水發表於2024-05-15

資料集：https://www.123pan.com/s/RbfGjv-vOem3.html
提取碼:rgzn

一、邏輯迴歸-預測考試透過

1、匯入模組

# 匯入模組
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
``

## 2、讀取資料

```python
# 讀取資料（載入資料，載入後列印首幾行確認資料載入成功。）
data = pd.read_csv('../data/examdata.csv')
data.head()     # 列印前幾行

3、視覺化資料

# 視覺化資料
# 以Exam1為x軸，Exam2為y軸，繪製散點圖
fig1 = plt.figure()
plt.scatter(data.loc[:, 'Exam1'], data.loc[:, 'Exam2'])
plt.title('Exam1-Exam2')
plt.xlabel('Exam1')
plt.ylabel('Exam2')
plt.show()

4、新增標籤標記

標籤標記的作用是，將透過考試的記錄標記為True，未透過考試的記錄標記為False。方便對透過和未透過的資料進行分離。

# 新增標籤標記
mask = data.loc[:, 'Pass'] == 1
# 這個比較操作的結果是一個布林向量（Boolean Series），其中的值為 True 當對應的 'Pass' 值等於1，否則為 False。
# 這個布林向量被賦值給變數 mask，通常用於後續的條件過濾或標記。
print(~mask)
# ~是對布林向量取反

5、將有標記的標籤資料視覺化

# 將有標記的標籤資料視覺化
fig2 = plt.figure()
passed = plt.scatter(data.loc[:, 'Exam1'][mask], data.loc[:, 'Exam2'][mask])
# 在新建立的圖形上繪製一個散點圖，其中x軸是data DataFrame中透過mask篩選出的'Exam1'列的值，y軸是其對應的'Exam2'列的值。
# 這些點代表的是'Pass'列值為1的行，即透過的考試記錄。
# 使用passed變數來儲存這個散點圖物件，以便在圖例中顯示。
failed = plt.scatter(data.loc[:, 'Exam1'][~mask], data.loc[:, 'Exam2'][~mask])
# 同理failed為未透過考試的散點圖
plt.title('Exam1-Exam2')
plt.xlabel('Exam1')
plt.ylabel('Exam2')
# 新增圖例，passed和failed是之前散點圖的物件，而('passed','failed')是對應圖例中的文字標籤。
plt.legend((passed, failed), ('passed', 'failed'))
plt.show()

6、定義X、y

# 定義 X,y
X = data.drop(['Pass'], axis=1)
# drop方法用於移除指定的列。['Pass']表示要移除的列名,axis=1表示移除列，預設是移除行。
# 所以X就是data DataFrame中除了'Pass'列的所有列。
y = data.loc[:, 'Pass']  # y為data DataFrame中'Pass'列
X1 = data.loc[:, 'Exam1']  # X1為data DataFrame中'Exam1'列
X2 = data.loc[:, 'Exam2']  # X2為data DataFrame中'Exam2'列
print(X1.head())    # 列印X1的前幾行
print(X.shape, y.shape)  # 列印X和y的形狀
# X和y的形狀：(100, 2) (100,)

7、建立並訓練模型

# 建立並訓練模型
LR = LogisticRegression()  # 建立邏輯迴歸模型
LR.fit(X, y)  # 訓練模型

8、預測結果並評估模型

預測結果

# 預測結果
y_predict = LR.predict(X)  # 預測結果
print(y_predict)

預測結果如下：

[0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 1 0 0 1 1 0 1 0 0 1 1 1 1 0 0 1 1 0 0 0 0 1
 1 0 0 1 0 1 1 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1
 1 1 1 1 0 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 1 0 1]

計算準確率

# 計算準確率
accuracy = accuracy_score(y, y_predict)  # 計算準確率
print(accuracy)		# 0.89

預測exam1=70,exam2=65是否透過

# 預測所給的資料是否透過    exam1=70,exam2=65
y_test = LR.predict([[70, 65]])
print('passed' if y_test == 1 else 'failed')	# passed

9、計算決策邊界

迴歸模型其實就是一個函式。例如f(x)=theta0+theta1x1+theta2x2

對於這個函式來說，theta0就是它的截距，theta1和theta2就是係數，或者說權重。

邊界函式：theta0+theta1x1+theta2x2=0

對函式變形得x2=-(theta0+theta1*x1)/theta2

迴歸模型建立好後，theta、theta1、theta2就已經確定了。
邊界函式就是關於x1和x2的函式。每給定一個x1，就有一個x2與之對應；把X1作為應變數，就可以計算出X2_new。
X1中的一個值，對應的X2_new的一個值，確定一個點，將這些點連起來，就繪製出邊界函式。

獲取theta並計算決策邊界

係數theta儲存在coef_中

# LR2.coef_中存放的是模型的係數，或者說權重。
# print(LR.coef_)     # [[0.20535491 0.2005838 ]]

# 獲取theta0,theta1,theta2
theta0 = LR.intercept_  # 截距項
theta1, theta2 = LR.coef_[0][0], LR.coef_[0][1]  # 'Exam1' 和 'Exam2' 的特徵權重。
print(theta0, theta1, theta2)	# [-25.05219314] 0.205354912177904 0.2005838039546907

# 以X1作為應變數，計算X2_new。
X2_new = -(theta0 + theta1 * X1) / theta2
print(X2_new)

X2_new的結果：

0     89.449169
1     93.889277
2     88.196312
3     63.282281
4     43.983773
        ...    
95    39.421346
96    81.629448
97    23.219064
98    68.240049
99    48.341870
Name: Exam1, Length: 100, dtype: float64

10、繪製決策邊界

# 視覺化邊界函式
fig3 = plt.figure()
passed = plt.scatter(data.loc[:, 'Exam1'][mask], data.loc[:, 'Exam2'][mask])  # 透過
failed = plt.scatter(data.loc[:, 'Exam1'][~mask], data.loc[:, 'Exam2'][~mask])  # 未透過
plt.plot(X1, X2_new)  # 繪製邊界函式
plt.title('Exam1-Exam2')
plt.xlabel('Exam1')
plt.ylabel('Exam2')
plt.legend((passed, failed), ('passed', 'failed'))  # 新增圖例
plt.show()

二、邏輯迴歸-晶片質量檢測

1、匯入模組

# 匯入模組
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np

2、載入資料集

#載入資料
import pandas as pd
data = pd.read_csv('../data/chip_test.csv')
print(data)

3、新增標記

#新增標記
mask = data.loc[:,'pass']==1
print(mask)

4、視覺化有標記的資料

#視覺化有標記的資料
import matplotlib.pyplot as plt
fig1 = plt.figure()
passed = plt.scatter(data.loc[:,'test1'][mask],data.loc[:,'test2'][mask])
failed = plt.scatter(data.loc[:,'test1'][~mask],data.loc[:,'test2'][~mask])
plt.title('test1-test2')
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()

5、定義X、y

#定義 X,y
X = data.drop(['pass'],axis=1)  # x為test1和test2兩列
y = data.loc[:,'pass']      # y為pass列
X1 = data.loc[:,'test1']
X2 = data.loc[:,'test2']
X1.head()
#create new data
# 增加二次項特徵
X1_2 = X1*X1
X2_2 = X2*X2
X1_X2 = X1*X2
X_new = {'X1':X1,'X2':X2,'X1_2':X1_2,'X2_2':X2_2,'X1_X2':X1_X2}
X_new = pd.DataFrame(X_new)
print(X_new)

6、建立並訓練模型

#建立並訓練模型
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression()
LR.fit(X_new,y)

7、評估模型

#評估模型表現
y_predict = LR.predict(X_new)   # 使用訓練好的模型預測

# 計算準確率
from sklearn.metrics import accuracy_score
accuracy =  accuracy_score(y,y_predict)
print(accuracy)		# 0.8135593220338984

8、初步畫決策邊界

#初步畫決策邊界

# 對X1進行排序
X1_new = X1.sort_values()

# 獲取theta
theta0 = LR.intercept_
theta1,theta2,theta3,theta4,theta5 = LR.coef_[0][0],LR.coef_[0][1],LR.coef_[0][2],LR.coef_[0][3],LR.coef_[0][4]
# 計算a、b、c
a = theta4
b = theta5*X1_new+theta2
c = theta0+theta1*X1_new+theta3*X1_new*X1_new
# 計算X2_new_boundary
X2_new_boundary = (-b+np.sqrt(b*b-4*a*c))/(2*a)

# 繪製決策邊界
fig2 = plt.figure()
passed = plt.scatter(data.loc[:,'test1'][mask],data.loc[:,'test2'][mask])
failed = plt.scatter(data.loc[:,'test1'][~mask],data.loc[:,'test2'][~mask])
plt.plot(X1_new,X2_new_boundary)
plt.title('test1-test2')
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()

9、畫完整的決策邊界

plt.show()
#%%
#畫完整的決策邊界
import numpy as np
d = np.array(b*b-4*a*c)     # 判別式deta（拼音）
#d = (-b+np.sqrt(b*b-4*a*c))/(2*a)
X1_new
#print(np.array(d))

定義f(x)用於計算上邊界和下邊界

#define f(x)
def f(x):
    """
    用於計算上下邊界。
    函式返回兩個值，分別是基於輸入x計算出的兩個可能的X2邊界值。
    """
    a = theta4
    b = theta5*x+theta2
    c = theta0+theta1*x+theta3*x*x
    # 方程a*x**2+b*x+c=0的兩個根
    X2_new_boundary1 = (-b+np.sqrt(b*b-4*a*c))/(2*a)    # 上邊界
    X2_new_boundary2 = (-b-np.sqrt(b*b-4*a*c))/(2*a)    # 下邊界
    return X2_new_boundary1,X2_new_boundary2

計算上下邊界

# 計算上下邊界
# 列表用於儲存上下邊界的值
X2_new_boundary1 = []
X2_new_boundary2 = []
# 取出x中的每個值，計算出對應的上、下邊界
for x in X1_new:
    X2_new_boundary1.append(f(x)[0])    # X2_new_boundary1
    X2_new_boundary2.append(f(x)[1])    # X2_new_boundary2
print(X2_new_boundary1,X2_new_boundary2)

視覺化決策邊界

# 視覺化決策邊界
fig3 = plt.figure()
passed=plt.scatter(data.loc[:,'test1'][mask],data.loc[:,'test2'][mask])
failed=plt.scatter(data.loc[:,'test1'][~mask],data.loc[:,'test2'][~mask])
plt.plot(X1_new,X2_new_boundary1)   # 上邊界
plt.plot(X1_new,X2_new_boundary2)   # 下邊界
plt.title('test1-test2')
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()

10、在更密集的座標中畫決策邊界

#在更密集的座標中
X1_range = [-0.9 + x/10000 for x in range(0,19000)]
X1_range = np.array(X1_range)   # 轉換為NumPy陣列，計算更高效
X2_new_boundary1 = []
X2_new_boundary2 = []
for x in X1_range:
    X2_new_boundary1.append(f(x)[0])
    X2_new_boundary2.append(f(x)[1])

視覺化

# coding:utf-8
import matplotlib as mlp

mlp.rcParams['font.family'] = 'SimHei'  # 設定字型為簡體黑體
mlp.rcParams['axes.unicode_minus'] = False  # 警用負號
fig4 = plt.figure()
passed=plt.scatter(data.loc[:,'test1'][mask],data.loc[:,'test2'][mask])
failed=plt.scatter(data.loc[:,'test1'][~mask],data.loc[:,'test2'][~mask])
# 上下決策邊界，紅色
plt.plot(X1_range,X2_new_boundary1,'r')
plt.plot(X1_range,X2_new_boundary2,'r')
plt.title('test1-test2')
plt.xlabel('測試1')
plt.ylabel('測試2')
plt.title('晶片質量預測')
plt.legend((passed,failed),('passed','failed'))
plt.show()

三、邏輯迴歸-鳶尾花

1、讀取鳶尾花資料集

sklearn模組中就有鳶尾花資料集。

# 讀取鳶尾花資料集
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data[iris.target < 2][:, :2]  # 只取前兩個特徵、前兩標籤的資料
y = iris.target[iris.target < 2]  # 只取0和1標籤（前兩類）
print(X, y)
X1 = X[:, 0]
X2 = X[:, 1]
print(X1)
print(X2)

2、視覺化資料集

# 視覺化
from matplotlib import pyplot as plt

fig1 = plt.figure()
plt.scatter(X1, X2, c=y)
plt.show()

3、建立迴歸模型並訓練

# 建立邏輯迴歸模型，並訓練
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression()
LR.fit(X,y)

4、預測和評估模型

# 預測和評估
y_predict = LR.predict(X)
print(y_predict)

from sklearn.metrics import accuracy_score
accuracy =  accuracy_score(y,y_predict)     # 計算準確率
print(accuracy)		# 精確率=1，擬合效果很好

5、計算決策邊界

# 計算決策邊界
theta0 = LR.intercept_
theta1, theta2 = LR.coef_[0][0], LR.coef_[0][1]
X2_new = (-theta0 - theta1 * X1) / theta2
print(X2_new)

6、視覺化決策邊界

# 畫出決策邊界
fig2 = plt.figure()
plt.scatter(X1, X2, c=y)
plt.plot(X1,X2_new)
plt.show()

四、邏輯迴歸知識點

邊界函式

機器學習：邏輯迴歸
2024-12-02
機器學習邏輯迴歸
機器學習之邏輯迴歸
2019-01-22
機器學習邏輯迴歸
機器學習整理（邏輯迴歸）
2022-03-23
機器學習邏輯迴歸
機器學習 | 線性迴歸與邏輯迴歸
2020-09-23
機器學習邏輯迴歸
【機器學習基礎】邏輯迴歸——LogisticRegression
2021-10-20
機器學習邏輯迴歸
機器學習之邏輯迴歸：計算機率
2020-06-21
機器學習邏輯迴歸計算機
【機器學習】邏輯迴歸過程推導
2019-02-15
機器學習邏輯迴歸
機器學習之邏輯迴歸：計算概率
2020-06-21
機器學習邏輯迴歸
機器學習之邏輯迴歸：模型訓練
2020-06-27
機器學習邏輯迴歸模型
機器學習之使用Python完成邏輯迴歸
2019-07-10
機器學習Python邏輯迴歸
【6%】100小時機器學習——邏輯迴歸
2021-09-09
機器學習邏輯迴歸
機器學習筆記-多類邏輯迴歸
2021-09-09
機器學習筆記邏輯迴歸
從零開始學機器學習——邏輯迴歸
2024-09-30
機器學習邏輯迴歸
機器學習入門 - 快速掌握邏輯迴歸模型
2019-01-15
機器學習邏輯迴歸模型
手擼機器學習演算法 - 邏輯迴歸
2021-06-24
機器學習演算法邏輯迴歸
機器學習演算法--邏輯迴歸原理介紹
2021-12-05
機器學習演算法邏輯迴歸
[DataAnalysis]機器學習演算法——線性模型（邏輯迴歸+LDA）
2018-08-26
機器學習演算法模型邏輯迴歸LDA
從零開始學習邏輯迴歸
2018-11-23
邏輯迴歸
機器學習之-邏輯迴歸演算法【人工智慧工程師--AI轉型必修課】
2020-04-04
機器學習邏輯迴歸演算法人工智慧工程師AI
機器學習-邏輯迴歸：從技術原理到案例實戰
2023-12-06
機器學習邏輯迴歸
100天搞定機器學習|Day17-18 神奇的邏輯迴歸
2019-08-14
機器學習邏輯迴歸
邏輯迴歸
2021-09-09
邏輯迴歸
機器學習簡介之基礎理論- 線性迴歸、邏輯迴歸、神經網路
2019-04-02
機器學習邏輯迴歸神經網路
數學推導+純Python實現機器學習演算法：邏輯迴歸
2019-03-03
Python機器學習演算法邏輯迴歸
把ChatGPT調教成機器學習專家，以邏輯迴歸模型的學習為例
2023-05-12
ChatGPT機器學習邏輯迴歸模型
【機器學習】求解邏輯迴歸引數（三種方法程式碼實現）
2018-06-08
機器學習邏輯迴歸
Python邏輯迴歸
2020-02-29
Python邏輯迴歸
邏輯迴歸模型
2024-09-05
邏輯迴歸模型
線性迴歸與邏輯迴歸
2019-07-08
邏輯迴歸
機器學習演算法（一）: 基於邏輯迴歸的分類預測
2020-11-09
機器學習演算法邏輯迴歸
機器學習-樹迴歸
2020-12-25
機器學習
學習筆記——機器學習演算法（一）: 基於邏輯迴歸的分類預測
2020-12-15
筆記機器學習演算法邏輯迴歸
人工智慧 (03) 機器學習 - 監督式學習迴歸方法
2019-12-18
人工智慧機器學習
對數機率迴歸（邏輯迴歸）原理與Python實現
2021-01-10
邏輯迴歸Python
機器學習(三)：理解邏輯迴歸及二分類、多分類程式碼實踐
2021-02-01
機器學習邏輯迴歸
機器學習：迴歸問題
2020-08-09
機器學習
機器學習：線性迴歸
2024-11-19
機器學習
機器學習之Logistic迴歸
2018-03-28
機器學習

人工智慧-機器學習-邏輯迴歸

一、邏輯迴歸-預測考試透過

1、匯入模組

3、視覺化資料

4、新增標籤標記

5、將有標記的標籤資料視覺化

6、定義X、y

7、建立並訓練模型

8、預測結果並評估模型

9、計算決策邊界

10、繪製決策邊界

二、邏輯迴歸-晶片質量檢測

1、匯入模組

2、載入資料集

3、新增標記

4、視覺化有標記的資料

5、定義X、y

6、建立並訓練模型

7、評估模型

8、初步畫決策邊界

9、畫完整的決策邊界

10、在更密集的座標中畫決策邊界

三、邏輯迴歸-鳶尾花

1、讀取鳶尾花資料集

2、視覺化資料集

3、建立迴歸模型並訓練

4、預測和評估模型

5、計算決策邊界

6、視覺化決策邊界

四、邏輯迴歸知識點

相關文章