從零自學Hadoop(16)：Hive資料匯入匯出，叢集資料遷移上

sinodzh發表於2016-01-08

原文網址 : https://www.cnblogs.com/mephisto/p/5081004.html

閱讀目錄

本文版權歸mephisto和部落格園共有，歡迎轉載，但須保留此段宣告，並給出原文連結，謝謝合作。

文章是哥(mephisto)寫的，SourceLink

序

上一篇，我們介紹了Hive的表操作做了簡單的描述和實踐。在實際使用中，可能會存在資料的匯入匯出，雖然可以使用sqoop等工具進行關係型資料匯入匯出操作，但有的時候只需要很簡便的方式進行匯入匯出即可

　下面我們開始介紹hive的資料匯入，匯出，以及叢集的資料遷移進行描述。

匯入檔案到Hive

一：語法
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
二：從本地匯入

　　使用"LOCAL"就可以從本地匯入

三：從叢集匯入

　　將語法中"LOCAL"去掉即可。

四：OVERWRITE

　　使用該引數，如果被匯入的地方存在了相同的分割槽或者檔案，則刪除並替換，否者直接跳過。

五：實戰

　　根據上篇我們建立的帶分割槽的score的例子，我們先構造兩個個文字檔案score_7和score_8分別代表7月和8月的成績，檔案會在後面附件提供下載。

　　由於建表的時候沒有指定分隔符，所以這兩個文字檔案的分隔符。

　　先將檔案放入到linux主機中,/data/tmp路徑下。

匯入本地資料
load data local inpath '/data/tmp/score_7.txt' overwrite into table score PARTITION (openingtime=201507);
　　我們發現001變成了1這是以為表的那一類為int形，所以轉成int了。

　　將score_8.txt 放到叢集中
su hdfs
hadoop fs -put score_8.txt /tmp/input
　　匯入叢集資料
load data inpath '/tmp/input/score_8.txt' overwrite into table score partition(openingtime=201508);

將其他表的查詢結果匯入表

一：語法

Standard syntax:

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

 

Hive extension (multiple inserts):

FROM from_statement

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1

[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] 

[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;

FROM from_statement

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1

[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] 

[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;

 

Hive extension (dynamic partition inserts):

INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

二：OVERWRITE

　　使用該引數，如果被匯入的表或者分割槽中有相同的內容，則該內容被替換，否者直接跳過。

三：INSERT INTO

　　該語法從0.80才開始支援，它會保持目標表，分割槽的原有的資料的完整性。

四：實戰

　　我們構造一個和score表結構一樣的表score1

create table score1 (

  id                int,

  studentid       int,

  score              double

)

partitioned by (openingtime string);

　　插入資料

insert into table score1 partition (openingtime=201509) values (21,1,'76'),(22,2,'45');

　　我們將表score1的查詢結果匯入到score中，這裡指定了201509分割槽。

insert overwrite table score partition (openingtime=201509) select id,studentid,score from score1;

動態分割槽插入

一：說明

　　本來動態分割槽插入屬於將其他表結果插入的內容，但是這個功能實用性很強，特將其單獨列出來闡述。該功能從Hive 0.6開始支援。

二：引數

　　動態分割槽引數會在該命令生命週期內有效，所以一般講修改的引數命令放在匯入之前執行。

Property Default Note

hive.error.on.empty.partition false Whether to throw an exception if dynamic partition insert generates empty results

hive.exec.dynamic.partition false Needs to be set to true to enable dynamic partition inserts

hive.exec.dynamic.partition.mode strict In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in nonstrict mode all partitions are allowed to be dynamic

hive.exec.max.created.files 100000 Maximum number of HDFS files created by all mappers/reducers in a MapReduce job

hive.exec.max.dynamic.partitions 1000 Maximum number of dynamic partitions allowed to be created in total

hive.exec.max.dynamic.partitions.pernode 100 Maximum number of dynamic partitions allowed to be created in each mapper/reducer node

三：官網例子

　　我們可以下看hive官網的例子
FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
       SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.cnt
　　在這裡country分割槽將會根據pva.cut的值，被動態的建立。注意，這個分割槽的名字是沒有被使用過的，在nonstrict 模式，dt這個分割槽也可以被動態建立。

四：實戰

　　我們先清空score表的資料（3個分割槽）
insert overwrite table score partition(openingtime=201507,openingtime=201508,openingtime=201509) select id,studentid,score from score where 1==0;
　　將7月8月資料插入到score1
load data local inpath '/data/tmp/score_7.txt' overwrite into table score1 partition(openingtime=201507);
load data local inpath '/data/tmp/score_8.txt' overwrite into table score1 partition(openingtime=201508);
　　

　　設定自動分割槽等引數
set  hive.exec.dynamic.partition=true;   
set  hive.exec.dynamic.partition.mode=nonstrict;   
set  hive.exec.max.dynamic.partitions.pernode=10000; 
　　將score1的資料自動分割槽的匯入到score
insert overwrite table score partition(openingtime) select id,studentid,score,openingtime from score1;
　　圖片

將SQL語句的值插入到表中

一：說明

　　該語句可以直接將值插入到表中。

二：語法

Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
 
Where values_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal

三：官網例子

CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
  CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
 
INSERT INTO TABLE students
  VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
 
 
CREATE TABLE pageviews (userid VARCHAR(64), link STRING, came_from STRING)
  PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;
 
INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')
  VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);
 
INSERT INTO TABLE pageviews PARTITION (datestamp)
  VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');

四：實戰

　　在將其他表資料匯入到表中的例子中，我們新建了表score1，並且通過SQL語句將資料插入到score1中。這裡就只是將上面的步驟重新列舉下。

　　插入資料

insert into table score1 partition (openingtime=201509) values (21,1,'76'),(22,2,'45');

--------------------------------------------------------------------

　　到此，本章節的內容講述完畢。

模擬資料檔案下載

Github https://github.com/sinodzh/HadoopExample/tree/master/2016/hive%20test%20file

系列索引

　　【源】從零自學Hadoop系列索引

本文版權歸mephisto和部落格園共有，歡迎轉載，但須保留此段宣告，並給出原文連結，謝謝合作。

文章是哥(mephisto)寫的，SourceLink

sqoop用法之mysql與hive資料匯入匯出
2020-12-22
OOPMySqlHive
HIVE資料匯入基礎
2021-09-09
Hive
資料庫 MySQL 資料匯入匯出
2021-08-10
資料庫MySql
Redis資料型別, Redis主從哨兵和叢集(將資料匯入叢集) ubuntu使用
2024-10-05
Redis資料型別Ubuntu
sqoop資料匯入匯出
2018-09-10
OOP
Oracle 資料匯入匯出
2018-06-14
Oracle
資料泵匯出匯入
2019-02-01
Oracle資料匯入匯出
2024-07-23
Oracle
phpMyAdmin匯入/匯出資料
2024-11-27
PHP
達夢資料庫遷移資料/複製表/匯入匯出2
2024-10-24
資料庫
將資料匯入kudu表（建立臨時hive表，從hive匯入kudu）步驟
2020-09-24
Hive
mongodb使用自帶命令工具匯出匯入資料
2023-04-24
MongoDB
MySQL入門--匯出和匯入資料
2019-06-04
MySql
Mongodb資料的匯出與匯入
2018-10-30
MongoDB
oracle資料匯出匯入（exp/imp）
2018-05-30
Oracle
匯入和匯出AWR的資料
2018-06-10
EasyPoi, Excel資料的匯入匯出
2020-10-01
Excel
Mysql 資料庫匯入與匯出
2024-06-15
MySql資料庫
Oracle資料庫匯入匯出。imp匯入命令和exp匯出命令
2022-03-01
Oracle資料庫
PHP大資料xlswriter匯入匯出(最優資料化)
2022-05-13
PHP大資料
Oracle資料泵匯出匯入（expdp/impdp）
2018-05-30
Oracle
【最佳實踐】MongoDB匯出匯入資料
2023-10-09
MongoDB
SQL資料庫的匯入和匯出
2020-10-29
SQL資料庫
Oracle資料泵的匯入和匯出
2020-11-24
Oracle
複雜「場景」資料匯入匯出
2023-03-27
ClickHouse 資料表匯出和匯入（qbit）
2022-06-01
透過 ETL 匯出 Hive 中的資料
2023-02-27
Hive
Sqoop將MySQL資料匯入到hive中
2019-01-30
OOPMySqlHive
Oracle使用資料泵expdp,impdp進行資料匯出匯入
2018-04-04
Oracle
使用navicat匯出查詢大量資料結果集並匯入到其他資料庫（mysql）
2024-03-19
資料庫MySql
MongoDB--Mongodb 中資料匯出與匯入
2020-10-06
MongoDB
sqoop1.4.7環境搭建及mysql資料匯入匯出到hive
2019-01-30
OOPMySqlHive
大文字資料，匯入匯出到資料庫
2018-08-28
資料庫
資料搬運元件：基於Sqoop管理資料匯入和匯出
2021-03-15
元件OOP
MongoDB 資料遷移備份匯入（自用）
2018-08-14
MongoDB
Hive資料匯入HBase引起資料膨脹引發的思考
2020-12-08
Hive
ClickHouse學習系列之八【資料匯入遷移&同步】
2021-07-22
如何將kafka中的資料快速匯入Hadoop？
2018-10-18
KafkaHadoop
[Docker核心之容器、資料庫檔案的匯入匯出、容器映象的匯入匯出]
2021-06-24
Docker資料庫

Property	Default	Note
hive.error.on.empty.partition	false	Whether to throw an exception if dynamic partition insert generates empty results
hive.exec.dynamic.partition	false	Needs to be set to `true` to enable dynamic partition inserts
hive.exec.dynamic.partition.mode	strict	In `strict` mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in `nonstrict` mode all partitions are allowed to be dynamic
hive.exec.max.created.files	100000	Maximum number of HDFS files created by all mappers/reducers in a MapReduce job
hive.exec.max.dynamic.partitions	1000	Maximum number of dynamic partitions allowed to be created in total
hive.exec.max.dynamic.partitions.pernode	100	Maximum number of dynamic partitions allowed to be created in each mapper/reducer node

從零自學Hadoop(16)：Hive資料匯入匯出，叢集資料遷移上

閱讀目錄

序

匯入檔案到Hive

一：語法

二：從本地匯入

三：從叢集匯入

四：OVERWRITE

五：實戰

將其他表的查詢結果匯入表

一：語法

二：OVERWRITE

三：INSERT INTO

四：實戰

動態分割槽插入

一：說明

二：引數

三：官網例子

四：實戰

將SQL語句的值插入到表中

一：說明

二：語法

三：官網例子

四：實戰

模擬資料檔案下載

系列索引

相關文章