Machine Learning with Sklearn

天羽東臻發表於2020-12-11

原文網址 : https://blog.csdn.net/lvxingvir_sigma/article/details/111027597

Python 的 sklearn 是一個真香的機器學習package。

廢話不多，這裡分享一套我自己摸索的基於sklearn做資料初步分析的流程：（suppose data contained in excel, details see the code）

1. read in data and clean

raw_data = pd.read_excel('**.xlsx')
clean_data = raw_data.dropna(axis = 0,how='any') # axis = 0 for columns

2. feature correlation analysis

d = clean_data

corr = d.corr()

f, ax = plt.subplots(figsize=(15, 10))

# Draw the heatmap using seaborn
sns.heatmap(corr, vmax=1, square=True,cmap='Blues')

3. feature cluster study

m = d.iloc[:,1:].values

m = m.T

from scipy.cluster.hierarchy import dendrogram, linkage

Z = linkage(m,'ward')

plt.figure(figsize=(15,10))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('relative index')
plt.ylabel('distance')
plt.yscale('log')
dendrogram(
    Z,
    leaf_rotation=45.,  # rotates the x axis labels
    leaf_font_size=16.,  # font size for the x axis labels
    labels=[x for x in dd.columns],
)

plt.show()

4. model training with ROC plot

def roc_plot(fpr,tpr,roc_auc,color,desc,auc,title):
#     plt.figure()
    lw = 2
    plt.plot(fpr, tpr, color,
             lw=lw, label='AUC of '+ desc +' ={:.2f}'.format(auc))
    plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC for Lung Cancer with '+title)
    plt.legend(loc="lower right")
#     plt.show()


clf_LR = LogisticRegression(random_state=0)

import matplotlib.pyplot as plt
import matplotlib.patches as patches
from itertools import cycle
from sklearn.metrics import roc_curve,auc, roc_auc_score
plt.figure(figsize=(8,6))
color_lst = cycle(['b','g','r','c','m','y'])
for key,color in zip(study_data.keys(),color_lst):
    
    train, test = train_test_split(study_data[key], test_size = 0.2, random_state=0) 
    survival_classes_train = train['Path']  
    features_train = train.drop('Path', axis = 1)
    survival_classes_test = test['Path'] 
    features_test = test.drop('Path', axis = 1)

    clf_pipe = make_pipeline(StandardScaler(),clf_LR)

    clf_pipe.fit(features_train, survival_classes_train) # fit with training data only  

    pred_y_prob = clf_pipe.predict_proba(features_test)
    fpr,tpr, _ = roc_curve(survival_classes_test,pred_y_prob[:,1])
    roc_auc = auc(fpr,tpr)
    roc_plot(fpr,tpr,roc_auc,color,desc=key,auc = roc_auc, title = 'train_cohort')

plt.show()

《machine learning》引言
2020-10-13
Mac
Machine Learning (12) - Support Vector Machine (SVM)
2019-06-10
Mac
Machine Learning－Introduction
2019-04-03
Mac
Machine Learning - Basic points
2020-01-17
Mac
Machine Learning (1) - Linear Regression
2019-04-14
Mac
Extreme Learning Machine 翻譯
2019-01-20
REMMac
pages bookmarks for machine learning domain
2018-12-05
MacAI
Machine Learning（13）- Random Forest
2019-06-12
MacrandomREST
Machine Learning (10) - Decision Tree
2019-06-09
Mac
Machine learning terms_01
2021-04-07
Mac
Machine Learning (5) - Training and Testing Data
2019-06-06
MacAI
SciTech-BigDataAIML-Machine Learning Tutorials
2024-08-12
AIMac
《深度學習》PDF Deep Learning: Adaptive Computation and Machine Learning series
2019-12-17
深度學習APTMac
Machine Learning Yearning 要點筆記
2018-10-24
Mac筆記
Machine Learning（14） - K Fold Cross Validation
2019-06-18
MacROS
Machine Learning (6) - Logistic Regression (Binary Classification)
2019-06-07
Mac
Machine Learning (8) - Logistic Regression (Multiclass Classification)
2019-06-07
Mac
MATH38161 Multivariate Statistics and Machine Learning
2024-11-23
Mac
MPHY0041 Machine Learning in Medical Imaging
2024-12-01
Mac
Machine Learning（機器學習）之二
2018-10-25
Mac機器學習
Machine Learning（機器學習）之一
2019-02-27
Mac機器學習
使用Octave來學習Machine Learning(二)
2019-02-27
Mac
Machine Learning 機器學習筆記
2018-03-27
Mac機器學習筆記
Machine Learning With Go 第4章：迴歸
2022-06-01
MacGo
Monetizing Machine Learning.pdf 免費下載
2018-10-17
Mac
machine learning model(algorithm model) .vs. statistical model
2018-08-16
MacGo
Matlab機器學習3（Machine Learning Onramp）
2020-10-27
Matlab機器學習Mac
論文閱讀：《Learning by abstraction: The neural state machine》
2022-04-10
Mac
Coursera 吳恩達《Machine Learning》視訊 + 作業
2018-08-03
吳恩達Mac
《Machine Learning in Action》—— 剖析支援向量機，優化SMO
2020-11-16
Mac優化
【論文筆記】Neural machine translation by jointly learning to align and translate
2018-12-02
筆記Mac
Machine Learning (11) - 關於 Decision Tree 的小練習
2019-06-09
Mac
閱讀翻譯Mathematics for Machine Learning之2.8 Affine Subspaces
2024-07-24
Mac
閱讀翻譯Mathematics for Machine Learning之2.7 Linear Mappings
2024-07-23
MacAPP
閱讀翻譯Mathematics for Machine Learning之2.5 Linear Independence
2024-07-18
Mac
Auto Machine Learning 自動化機器學習筆記
2019-08-13
Mac機器學習筆記
吳恩達《Machine Learning Yearning》完整中文版開源
2019-10-17
吳恩達Mac
吳恩達《Machine Learning》Jupyter Notebook 版筆記釋出！
2019-12-15
吳恩達Mac筆記

Machine Learning with Sklearn

相關文章