Hive多分隔符支援示例

weixin_34007886發表於2018-05-08

問題描述

如何將多個字元作為欄位分割符的資料檔案載入到Hive表中,事例資料如下:
欄位分隔符為“@#$”

test1@#$test1name@#$test2value
test2@#$test2name@#$test2value
test3@#$test3name@#$test4value

Hive多分隔符支援

Hive在0.14及以後版本支援欄位的多分隔符,參考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe

操作步驟

1.準備多分隔符檔案並裝載到HDFS對應目錄

[root@server03 data]# more multi_delimiter_test.dat 
test1@#$test1name@#$test2value
test2@#$test2name@#$test2value
  1. 多分隔符檔案建表
create  external table multi_delimiter_test(
s1 string,
s2 string,
s3 string
) ROW FORMAT  SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH  SERDEPROPERTIES ("field.delim"="@#$")
stored as  textfile location '/fayson/multi_delimiter_test';

3.測試

2: jdbc:hive2://localhost:10000/default>  select * from multi_delimiter_test;
+--------------------------+--------------------------+--------------------------+--+
|  multi_delimiter_test.s1  |  multi_delimiter_test.s2  |  multi_delimiter_test.s3  |
+--------------------------+--------------------------+--------------------------+--+
| test1                    | test1name                | test2value               |
| test2                    | test2name                | test2value               |
| test3                    | test3name                | test4value               |
+--------------------------+--------------------------+--------------------------+--+

相關文章