時間序列交叉驗證

選西瓜專業戶發表於2021-01-02

來源

https://www.mdpi.com/1099-4300/21/10/1015/htm#FD3-entropy-21-01015

翻譯

#For cross-validation, we follow the time-series machine-learning literature and propose the use of rolling-origin evaluation [24], also known as rolling-origin-recalibration evaluation [25]. These are forms of nested cross-validation, which should give an almost unbiased estimate of error [23]. Once the number of institutions (forecasters) that we could be used to properly define the training, validation and test sets are selected, we can start to solve the optimization problem. As we will have already noticed, the institutions must be the same in the training, testing and validation sets. If this condition is not fulfilled, the problem will not be well defined. To solve this issue, in our application (see Section 4), the dimensionality of the initial data bank was reduced from 21 to around 10 forecasters satisfying the condition of existence of data for the three phases. This gives us three sets of data sampling with around 10 institutions for each phase.

為了進行交叉驗證,我們遵循時間序列機器學習文獻,並建議使用滾動原點評價[24],也稱為滾動原點再校準評價[25]。這些都是巢狀交叉驗證的形式,它應該給出錯誤[23]的幾乎無偏估計。一旦我們可以用來正確定義培訓、驗證和測試集的機構(預報員)的數量被選定,我們就可以開始解決優化問題。
正如我們已經注意到的,這些機構在培訓、測試和驗證集上必須是相同的。如果不滿足這個條件,這個問題就不會得到很好的定義。為了解決這一問題,在我們的應用中(見第4節),將初始資料庫的維數從21個降至滿足三個階段資料存在條件的預測者10個左右。這給了我們三組資料取樣,每個階段大約10個機構。

相關文章