100天搞定機器學習：PyYAML基礎教程

機器學習演算法與Python發表於2021-04-18

原文網址 : https://www.cnblogs.com/jpld/p/14675055.html

程式設計中免不了要寫配置檔案，今天我們繼續Python網路程式設計，學習一個比 JSON 更簡潔和強大的語言————YAML 。本文老胡簡單介紹 YAML 的語法和用法，以及 YAML 在機器學習專案中的應用例項。歡迎大家一起學習，也歡迎點贊、在看、分享！

前篇：我開始學Python網路程式設計了

YAML

YAML 是 "YAML Ain't a Markup Language"（YAML 不是一種標記語言）的遞迴縮寫。YAML 的語法和其他高階語言類似，並且可以簡單表達清單、雜湊表，標量等資料形態。它使用空白符號縮排和大量依賴外觀的特色，特別適合用來表達或編輯資料結構、各種配置檔案、傾印除錯內容、檔案大綱。YAML 的配置檔案字尾為 .yaml

YAML 它的基本語法規則如下：

大小寫敏感
使用縮排表示層級關係
縮排時不允許使用Tab鍵，只允許使用空格。
縮排的空格數目不重要，只要相同層級的元素左側對齊即可
號表示註釋

YAML 支援的資料結構有三種：

物件：鍵值對的集合，物件鍵值對使用冒號結構表示 key: value，冒號後面要加一個空格。
陣列：一組按次序排列的值，又稱為序列/ 列表，用 - 表示。
純量（scalars）：單個的、不可再分的值

YAML 用法

安裝

pip install pyyaml

yaml 檔案格式很簡單，比如：

# categories.yaml file

sports: #注意，冒號後面要加空格

  - soccer # 陣列
  - football
  - basketball
  - cricket
  - hockey
  - table tennis

countries: 

  - Pakistan
  - USA
  - India
  - China
  - Germany
  - France
  - Spain

python 讀取 yaml 檔案

# read_categories.py file

import yaml

with open(r'categories.yaml') as file:
    documents = yaml.full_load(file)

    for item, doc in documents.items():
        print(item, ":", doc)

執行結果：

sports : ['soccer', 'football', 'basketball', 'cricket', 'hockey', 'table tennis']
countries : ['Pakistan', 'USA', 'India', 'China', 'Germany', 'France', 'Spain']

以上便是 YAML 最基礎的應用了，可能大家還是有點一頭霧水，我們們更進一步，看看在機器學習專案中如何寫 YAML 配置檔案。

YAML & Machine Learning

我們直接改寫100天搞定機器學習|Day62 隨機森林調參實戰中的程式碼。

Project structure

寫配置檔案rf_config.yaml

#INITIAL SETTINGS
data_directory: ./data/
data_name: creditcard.csv
target_name: Class
test_size: 0.3
model_directory: ./models/
model_name: RF_classifier.pkl


#RF parameters
n_estimators: 50
max_depth: 6
min_samples_split: 5
oob_score: True
random_state: 666
n_jobs: 2

完整程式碼，可以對比原始碼看看區別：

# rf_with_yaml_file.py
import os
import yaml
import joblib
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score

CONFIG_PATH = "./config/"


def load_config(config_name):
    with open(os.path.join(CONFIG_PATH, config_name)) as file:
        config = yaml.safe_load(file)

    return config


config = load_config("rf_config.yaml")

df = pd.read_csv(os.path.join(config["data_directory"], config["data_name"]))
data = df.iloc[:, 1:31]


X = data.loc[:, data.columns != config["target_name"]]
y = data.loc[:, data.columns == config["target_name"]]

number_records_fraud = len(data[data.Class == 1])
fraud_indices = np.array(data[data.Class == 1].index)
normal_indices = data[data.Class == 0].index
random_normal_indices = np.random.choice(
    normal_indices, number_records_fraud, replace=False)
random_normal_indices = np.array(random_normal_indices)
under_sample_indices = np.concatenate(
    [fraud_indices, random_normal_indices])
under_sample_data = data.iloc[under_sample_indices, :]
X_undersample = under_sample_data.loc[:,
                                      under_sample_data.columns != config["target_name"]]
y_undersample = under_sample_data.loc[:,
                                      under_sample_data.columns == config["target_name"]]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=config["test_size"], random_state=42
)


rf1 = RandomForestClassifier(
    n_estimators=config["n_estimators"],
    max_depth=config["max_depth"],
    min_samples_split=config["min_samples_split"],
    oob_score=config["oob_score"],
    random_state=config["random_state"],
    n_jobs=config["n_jobs"]
)

rf1.fit(X_train, y_train)
print(rf1.oob_score_)
y_predprob1 = rf1.predict_proba(X_test)[:, 1]
print("AUC Score (Train): %f" % roc_auc_score(y_test, y_predprob1))

joblib.dump(rf1, os.path.join(config["model_directory"], config["model_name"]))

reference

https://www.runoob.com/w3cnote/yaml-intro.html
https://www.ruanyifeng.com/blog/2016/07/yaml.html

1、python機器學習基礎教程——簡述
2019-01-04
Python機器學習
機器學習基礎——整合學習1
2021-03-16
機器學習
【機器學習基礎】神經網路/深度學習基礎
2021-11-05
機器學習神經網路深度學習
ML-機器學習基礎
2019-02-27
機器學習
【機器學習基礎】——梯度下降
2021-10-12
機器學習梯度
機器學習基礎總結
2023-02-07
機器學習
機器學習基礎知識1
2020-10-24
機器學習
機器學習基礎04DAY
2023-03-25
機器學習
機器學習基礎09DAY
2023-03-31
機器學習
機器學習基礎05DAY
2023-03-26
機器學習
機器學習開發流程基礎
2021-04-22
機器學習
《機器學習實戰》第一章機器學習基礎
2018-11-25
機器學習
【機器學習基礎】無監督學習（1）——PCA
2022-01-22
機器學習PCA
【機器學習基礎】半監督學習簡介
2021-12-23
機器學習
【機器學習基礎】關於深度學習的Tips
2021-11-12
機器學習深度學習
深度學習機器學習基礎-基本原理
2023-01-17
深度學習機器學習
【機器學習基礎】無監督學習（3）——AutoEncoder
2022-05-07
機器學習
機器學習基礎-資料降維
2019-05-02
機器學習
機器學習基礎——規則化（Regularization）
2021-01-30
機器學習
機器學習基礎專題：支援向量機SVM
2020-10-18
機器學習
機器學習數學複習 - 1.概率論基礎
2021-06-28
機器學習
【機器學習基礎】卷積神經網路（CNN）基礎
2021-11-25
機器學習卷積神經網路CNN
TensorFlow系列專題（二）：機器學習基礎
2018-11-05
機器學習
TensorFlow系列專題（一）：機器學習基礎
2018-11-05
機器學習
機器學習 Day 9 | 決策樹基礎
2018-08-16
機器學習
【機器學習基礎】邏輯迴歸——LogisticRegression
2021-10-20
機器學習邏輯迴歸
ES5基礎學習教程
2018-10-12
Python基礎教程該如何學習?
2020-01-14
Python
面向機器智慧的TensorFlow實戰4：機器學習基礎
2018-05-25
機器學習
沒有Python基礎，如何學習用Python寫機器學習
2024-03-27
Python機器學習
《機器學習實戰》-01機器學習基礎 #win8-anaconda prompt配置jupyter notebook
2018-12-07
機器學習
【機器學習基礎】熵、KL散度、交叉熵
2018-09-27
機器學習熵
小白機器學習基礎演算法學習必經之路（下）
2018-12-07
機器學習演算法
圖解機器學* | 機器學*基礎知識
2022-03-09
圖解
labview 標定 labview 機器視覺零基礎教程培訓學習視覺
2018-05-21
View視覺
超基礎的機器學習入門-原理篇
2021-12-02
機器學習
《機器學習數學基礎》已開源，附完整下載！
2020-02-04
機器學習
資源 | Intel釋出AI免費系列課程3部曲：機器學習基礎、深度學習基礎以及TensorFlow基礎
2018-03-13
IntelAI機器學習深度學習

100天搞定機器學習：PyYAML基礎教程

YAML

號 表示註釋

YAML 用法

YAML & Machine Learning

reference

相關文章

號表示註釋