初識tsfresh in Python

Quant_Learner發表於2020-11-29
  • tsfresh

    對時間序列資料進行特徵提取這個過程,進行模組化操作的工具.

    tsfresh is a python package. It automatically calculates a large number of time series characteristics, the so called features. Further the package contains methods to evaluate the explaining power and importance of such charateristics for regression or classification tasks.

  • Coding paradigms

    1. Keep it simple

      We believe that “Programs should be written for people to read, and only incidentally for machines to execute”

    2. keep it documented

      By at least including a docstring for each method and class. Do not describe what you are doing but why you are doing it.

    3. Keep it tested

      We aim for a high test coverage.

  • Feature Calculator Naming

    tsfresh enforces a strict naming of the created features, which you have to follow whenever you create new feature calculators.

    This is due to the tsfresh.feature_extraction.settings.from_columns() method which needs to deduce the following information from the feature name:

    • the time series that was used to calculate the feature
    • the feature calculator method that was used to derive the feature
    • all parameters that have been used to calculate the feature (optional)

    The features will be named in the following format:

    {time_series_name}__{feature_name}__{parameter_name_1}_{parameter_value_1}__[..]_{parameter_name_k}_{parameter_value_k}

  • Quick Start

    執行文章中的教程,可能出現下述錯誤:

    (20201125已解決)tsfresh下載案例資料出錯[Errno 111] Connection refused

    簡單梳理案例想要告訴我們的,給你一組資料,裡面表示的是不同機器人在6個維度上的時間序列資料,你可以把每個機器人的6個時間序列都畫出來,人工肉眼可以看出failure與否的機器人,不同維度上的圖有所區別。

    人工之外,總要有些資料用來說明機器人是否failure吧?好,tsfresh就是幹這個的,它可以從這6維資料中自動提取出1200多個特徵。

    然後,就可以把這1200多個特徵塞到模型中進行訓練了。

相關文章