logstash匯入movielens測試資料

niewj發表於2022-04-21

1. movielens資料

https://grouplens.org/dataset...
學習訓練,使用最小資料集即可:
(ml-latest-small)[https://files.grouplens.org/d...]

2. logstash配置檔案:

在logstash/conf目錄下拷貝一份logstash-sample.conf檔案, 命名為:logstash-movies.conf,內容如下:

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
  file {
    path => "/export/_backup/elk_bak/ml-latest-small/movies.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
  csv {
    separator => ","
    columns => ["id", "content", "genre"]
  }
  
  mutate {
    split => { "genre" => "|"}
    remove_field => ["path", "host", "@timestamp", "message"]
  }
  
  mutate {
    split => { "content" => "(" }
    add_field => { "title" => "%{[content][0]}"}
    add_field => { "year" => "%{[content][1]}"}
  }
  
  mutate {
    convert => {
      "year" => "integer"
    }
    strip => ["title"]
    remove_field => ["path", "host", "@timestamp", "content"]
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "movies"
    document_id => "%{id}"
    #user => "user"
    #password => "password"
  }
  stdout {}
}

3. 執行匯入

bin/logstash -f config config/logstash-movies.conf
執行需要等一會!
而後控制檯輸出內容,如下
......
{
          "id" => "193609",
       "genre" => [
        [0] "Comedy"
    ],
       "title" => "Andrew Dice Clay: Dice Rules",
    "@version" => "1",
        "year" => 1991
}

待控制檯不再輸出,ctrl+c停止即可

4. kibana檢查資料是否匯入index

index管理中出現所匯入的索引,即成功!
image.png

相關文章