ML.NET 示例：迴歸之價格預測

feiyun0112發表於2018-12-08

原文網址 : https://www.cnblogs.com/feiyun0112/p/10089175.html

寫在前面

準備近期將微軟的machinelearning-samples翻譯成中文，水平有限，如有錯漏，請大家多多指正。
如果有朋友對此感興趣，可以加入我：https://github.com/feiyun0112/machinelearning-samples.zh-cn

計程車費預測

ML.NET 版本	API 型別	狀態	應用程式型別	資料型別	場景	機器學習任務	演算法
v0.7	動態 API	最新版本	控制檯應用程式	.csv 檔案	價格預測	迴歸	Sdca 迴歸

在這個介紹性示例中，您將看到如何使用ML.NET預測計程車費。在機器學習領域，這種型別的預測被稱為迴歸

問題

這個問題主要集中在預測紐約計程車的行程費用。從表面看，它似乎僅僅取決於行程的距離。但是，由於其他因素（比如額外的乘客或使用信用卡而非現金付款），紐約的計程車供應商收費不同。這種預測可用於計程車供應商向使用者和司機提供乘車費用的估計。

為了解決這個問題，我們將使用下列輸入建立一個ML模型:

供應商ID
費率程式碼
乘客數量
出行時間
出行距離
支付方式

並預測乘車的費用。

ML 任務 - 迴歸

迴歸的廣義問題是預測給定引數的某些連續值，例如：

根據房間的數量、位置、建造年份等預測房子的價格。
根據燃油型別和汽車引數預測汽車燃油消耗量。
預測基於問題屬性來修復問題的時間估計。

所有這些示例的共同特徵是我們想要預測的引數可以取特定範圍內的任何數值。換句話說，這個值用integer或float/double表示，而不是由enum或boolean型別表示。

解決方案

為了解決這個問題，首先我們將建立一個ML模型。然後，我們將在現有資料的基礎上訓練模型，評估其有多好，最後我們將使用該模型來預測計程車費。

建立 -> 訓練 -> 評估 -> 使用

1. 建立模型

建立模型包括：上傳資料（使用TextLoader載入taxi-fare-train.csv），對資料進行轉換，以便ML演算法（本例中為“StochasticDualCoordinateAscent”）能夠有效地使用它：

//Create ML Context with seed for repeteable/deterministic results
MLContext mlContext = new MLContext(seed: 0);

// STEP 1: Common data loading configuration
TextLoader textLoader = mlContext.Data.TextReader(new TextLoader.Arguments()
                                {
                                    Separator = ",",
                                    HasHeader = true,
                                    Column = new[]
                                                {
                                                    new TextLoader.Column("VendorId", DataKind.Text, 0),
                                                    new TextLoader.Column("RateCode", DataKind.Text, 1),
                                                    new TextLoader.Column("PassengerCount", DataKind.R4, 2),
                                                    new TextLoader.Column("TripTime", DataKind.R4, 3),
                                                    new TextLoader.Column("TripDistance", DataKind.R4, 4),
                                                    new TextLoader.Column("PaymentType", DataKind.Text, 5),
                                                    new TextLoader.Column("FareAmount", DataKind.R4, 6)
                                                }
                                });

IDataView baseTrainingDataView = textLoader.Read(TrainDataPath);
IDataView testDataView = textLoader.Read(TestDataPath);

//Sample code of removing extreme data like "outliers" for FareAmounts higher than $150 and lower than $1 which can be error-data 
var cnt = baseTrainingDataView.GetColumn<float>(mlContext, "FareAmount").Count();
IDataView trainingDataView = mlContext.Data.FilterByColumn(baseTrainingDataView, "FareAmount", lowerBound: 1, upperBound: 150);
var cnt2 = trainingDataView.GetColumn<float>(mlContext, "FareAmount").Count();

// STEP 2: Common data process configuration with pipeline data transformations
var dataProcessPipeline = mlContext.Transforms.CopyColumns("FareAmount", "Label")
                .Append(mlContext.Transforms.Categorical.OneHotEncoding("VendorId", "VendorIdEncoded"))
                .Append(mlContext.Transforms.Categorical.OneHotEncoding("RateCode", "RateCodeEncoded"))
                .Append(mlContext.Transforms.Categorical.OneHotEncoding("PaymentType", "PaymentTypeEncoded"))
                .Append(mlContext.Transforms.Normalize(inputName: "PassengerCount", mode: NormalizerMode.MeanVariance))
                .Append(mlContext.Transforms.Normalize(inputName: "TripTime", mode: NormalizerMode.MeanVariance))
                .Append(mlContext.Transforms.Normalize(inputName: "TripDistance", mode: NormalizerMode.MeanVariance))
                .Append(mlContext.Transforms.Concatenate("Features", "VendorIdEncoded", "RateCodeEncoded", "PaymentTypeEncoded", "PassengerCount", "TripTime", "TripDistance"));

// STEP 3: Set the training algorithm, then create and config the modelBuilder - Selected Trainer (SDCA 迴歸 algorithm)                            
var trainer = mlContext.Regression.Trainers.StochasticDualCoordinateAscent(labelColumn: "Label", featureColumn: "Features");
var trainingPipeline = dataProcessPipeline.Append(trainer);

2. 訓練模型

訓練模型是在訓練資料（具有已知的費用）上執行所選演算法以調整模型引數的過程。它在Fit（）API中實現。要執行訓練，我們只需在提供DataView時呼叫該方法。

var trainedModel = trainingPipeline.Fit(trainingDataView);

3. 評估模型

我們需要這一步來總結我們的模型對新資料的準確性。為此，上一步中的模型針對另一個未在訓練中使用的資料集執行（taxi-fare-test.csv）。此資料集也包含已知的費用。 Regression.Evaluate()計算已知費用和模型預測的費用之間差異的各種指標。

IDataView predictions = trainedModel.Transform(testDataView);
var metrics = mlContext.Regression.Evaluate(predictions, label: "Label", score: "Score");

Common.ConsoleHelper.PrintRegressionMetrics(trainer.ToString(), metrics);

要了解有關如何理解指標的更多資訊，請檢視ML.NET指南中的機器學習詞彙表或使用任何有關資料科學和機器學習的材料。

如果您對模型的質量不滿意，可以採用多種方法對其進行改進，這些方法將在examples類別中介紹。

請記住，對於這個示例，其質量低於可能達到的水平，因為出於效能目的，資料集的大小已減小。您可以使用原始資料集來顯著提高質量（原始資料集在資料集README中引用）。

4. 使用模型

在訓練模型之後，我們可以使用Predict() API來預測指定行程的費用。

//Sample: 
//vendor_id,rate_code,passenger_count,trip_time_in_secs,trip_distance,payment_type,fare_amount
//VTS,1,1,1140,3.75,CRD,15.5

var taxiTripSample = new TaxiTrip()
{
    VendorId = "VTS",
    RateCode = "1",
    PassengerCount = 1,
    TripTime = 1140,
    TripDistance = 3.75f,
    PaymentType = "CRD",
    FareAmount = 0 // To predict. Actual/Observed = 15.5
};

ITransformer trainedModel;
using (var stream = new FileStream(ModelPath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    trainedModel = mlContext.Model.Load(stream);
}

// Create prediction engine related to the loaded trained model
var predFunction = trainedModel.MakePredictionFunction<TaxiTrip, TaxiTripFarePrediction>(mlContext);

//Score
var resultprediction = predFunction.Predict(taxiTripSample);

Console.WriteLine($"**********************************************************************");
Console.WriteLine($"Predicted fare: {resultprediction.FareAmount:0.####}, actual fare: 15.5");
Console.WriteLine($"**********************************************************************");

最後，您可以用方法PlotRegressionChart()在圖表中展現測試預測的分佈情況以及迴歸的執行方式，如下面的螢幕截圖所示：

Regression plot-chart

ML.NET 示例：迴歸之銷售預測
2018-12-09
使用線性迴歸模型預測黃金ETF價格
2024-07-09
模型
ML.NET教程之計程車車費預測(迴歸問題)
2018-12-24
機器學習之迴歸分析--預測值
2020-11-19
機器學習
ML.NET 示例：聚類之鳶尾花
2018-12-15
聚類
ML.NET 示例：二元分類之垃圾簡訊檢測
2018-12-03
【機器學習】線性迴歸預測
2022-06-23
機器學習
ML.NET 示例：深度學習之整合TensorFlow
2018-12-16
深度學習
ML.NET 示例：推薦之矩陣分解
2018-12-11
矩陣
基於sklearn的波士頓房價預測_線性迴歸學習筆記
2021-04-02
筆記
預測演算法之多元線性迴歸
2021-03-28
演算法
[機器學習實戰-Logistic迴歸]使用Logistic迴歸預測各種例項
2020-04-29
機器學習
ML.NET 示例：推薦之場感知分解機
2018-12-13
基於MXNET框架的線性迴歸從零實現（房價預測為例）
2021-05-10
框架
迴圈神經網路LSTM RNN迴歸：sin曲線預測
2021-09-11
神經網路RNN
ML.NET 示例：推薦之One Class 矩陣分解
2018-12-12
矩陣
python實現線性迴歸之簡單迴歸
2020-04-29
Python
ML.NET 示例：多類分類之問題分類
2018-12-06
ML.NET 示例：多類分類之鳶尾花分類
2018-12-07
Kaggle入門之房價預測
2018-08-02
機器學習股票價格預測從爬蟲到預測-預測與調參
2019-02-24
機器學習爬蟲
詳解迴歸測試
2023-03-02
勝三：2020媒體價格增長預測
2020-04-08
二手車交易價格預測筆記
2024-07-29
筆記
採用線性迴歸實現訓練和預測（Python）
2024-10-30
Python
ML.NET呼叫Tensorflow模型示例——MNIST
2019-05-21
模型
讓價值迴歸價值—讀懂黑馬POE
2018-04-09
機器學習實戰專案-預測數值型迴歸
2019-04-08
機器學習
泰坦尼克號生存預測邏輯迴歸，kaggle渣渣排名
2020-11-19
邏輯迴歸
Python機器學習筆記：使用Keras進行迴歸預測
2019-01-02
Python機器學習筆記Keras
R語言邏輯迴歸、GAM、LDA、KNN、PCA主成分分類分析預測房價及交叉驗證
2024-03-04
R語言邏輯迴歸GAMLDAKNNPCA
監督學習之迴歸
2019-08-30
迴歸模型的演算法效能評價
2024-06-05
模型演算法
【Matlab 041期】【數學建模】Matlab 電力預測預測之灰度預測組合預測指數平滑回歸分析
2021-01-04
Matlab
軟體迴歸測試常用方法有哪些?迴歸測試報告收費貴嗎?
2022-06-01
測試報告
線性迴歸——lasso迴歸和嶺迴歸（ridge regression）
2019-05-11
用線性迴歸無編碼實現文章瀏覽數預測
2021-09-09
C#使用ML.Net完成人工智慧預測
2020-12-08
C#人工智慧