hive資料倉儲匯入資料的方法

z597011036發表於2015-01-26
1.從本地檔案匯入資料
[hadoop@tong1 ~]$ cat 1.txt       --所用資料都用tab鍵分隔
2 3
1 1
[hadoop@tong1 ~]$ cat 2.txt
1 2
3 4
5 6
7 8
1 2
3 4
0 0
[hadoop@tong1 ~]$ cat 3.txt
5 6
7 8
[hadoop@tong1 ~]$hive> create table q(a int,b int) row format delimited fields terminated by '\t' stored as textfile;
OK
Time taken: 0.093 seconds
hive> desc q;
OK
a                    int                                     
b                    int                                     
Time taken: 0.117 seconds, Fetched: 2 row(s)
hive> load data local inpath '/home/hadoop/1.txt' into table q;    --into是追加資料,overwrite是覆蓋表中的資料
Loading data to table tong.q
Table tong.q stats: [numFiles=1, totalSize=8]
OK
Time taken: 0.307 seconds
hive> select * from q;
OK
2 3
1 1
Time taken: 0.121 seconds, Fetched: 2 row(s)
hive> load data local inpath '/home/hadoop/2.txt' overwrite into table q;       --overwrite覆蓋表中的資料
Loading data to table tong.q
Table tong.q stats: [numFiles=1, numRows=0, totalSize=28, rawDataSize=0]
OK
Time taken: 0.315 seconds
hive> select * from q;                                                  
OK
1 2
3 4
5 6
7 8
1 2
3 4
0 0
Time taken: 0.051 seconds, Fetched: 7 row(s)
hive>

2.從HDFS檔案系統中匯入資料
[hadoop@tong1 ~]$ hadoop fs -put /home/hadoop/3.txt  /user/hive/warehouse/    --將3.txt檔案匯入到HDFS檔案系統中
[hadoop@tong1 ~]$ hadoop fs -ls  /user/hive/warehouse/
Found 6 items
-rw-r--r--   2 hadoop supergroup          8 2015-01-26 14:41 /user/hive/warehouse/3.txt

[hadoop@tong1 ~]$ hive

Logging initialized using configuration in jar:file:/usr/local/hive-0.14.0/lib/hive-common-0.14.0.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hive-0.14.0/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

hive> load data inpath '/user/hive/warehouse/3.txt' into table q;
Loading data to table tong.q
Table tong.q stats: [numFiles=2, numRows=0, totalSize=36, rawDataSize=0]
OK
Time taken: 0.295 seconds
hive> select * from q;
OK
1 2
3 4
5 6
7 8
1 2
3 4
0 0
5 6
7 8
Time taken: 0.063 seconds, Fetched: 9 row(s)
hive>
[hadoop@tong1 ~]$ hadoop fs -ls  /user/hive/warehouse/           --檔案資料載入到表中後檔案就刪除了
Found 5 items
drwxr-xr-x   - hadoop supergroup          0 2015-01-12 13:31 /user/hive/warehouse/hwz
drwxr-xr-x   - hadoop supergroup          0 2015-01-13 15:21 /user/hive/warehouse/hwz1
drwxr-xr-x   - hadoop supergroup          0 2015-01-12 15:11 /user/hive/warehouse/t
drwxr-xr-x   - hadoop supergroup          0 2015-01-12 17:42 /user/hive/warehouse/t1
drwxr-xr-x   - hadoop supergroup          0 2015-01-26 14:32 /user/hive/warehouse/tong.db
[hadoop@tong1 ~]$

3.用建立表的方法匯入資料
hive> create table q1 as select * from q;
Query ID = hadoop_20150126144747_dbc07ce3-40b0-441c-8ec3-08a48092593d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1422249676009_0007, Tracking URL =
Kill Command = /usr/local/hadoop-2.6.0/bin/hadoop job  -kill job_1422249676009_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-01-26 14:47:41,941 Stage-1 map = 0%,  reduce = 0%
2015-01-26 14:47:49,247 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.63 sec
MapReduce Total cumulative CPU time: 1 seconds 630 msec
Ended Job = job_1422249676009_0007
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://tong1:9000/tmp/hive/hadoop/c8ed6f95-d55d-4d1d-ba74-10170523f138/hive_2015-01-26_14-47-32_330_8683539045786824558-1/-ext-10001
Moving data to: hdfs://tong1:9000/user/hive/warehouse/tong.db/q1
Table tong.q1 stats: [numFiles=1, numRows=9, totalSize=36, rawDataSize=27]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 1.63 sec   HDFS Read: 313 HDFS Write: 99 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 630 msec
OK
Time taken: 18.243 seconds
hive> select * from q1;                 
OK
1 2
3 4
5 6
7 8
1 2
3 4
0 0
5 6
7 8
Time taken: 0.046 seconds, Fetched: 9 row(s)
hive>

4.用插入語句(insert)匯入資料
hive> select * from q1;            --插入表之前的資料
OK
1 2
3 4
5 6
7 8
1 2
3 4
0 0
5 6
7 8
Time taken: 0.07 seconds, Fetched: 9 row(s)
hive> insert into table q1 select * from q where a=1;          --在表中插入資料
Query ID = hadoop_20150126144949_2f36c732-219d-463c-847a-fe03534892d2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1422249676009_0008, Tracking URL =
Kill Command = /usr/local/hadoop-2.6.0/bin/hadoop job  -kill job_1422249676009_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-01-26 14:49:49,122 Stage-1 map = 0%,  reduce = 0%
2015-01-26 14:49:57,461 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.19 sec
MapReduce Total cumulative CPU time: 3 seconds 190 msec
Ended Job = job_1422249676009_0008
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://tong1:9000/tmp/hive/hadoop/c8ed6f95-d55d-4d1d-ba74-10170523f138/hive_2015-01-26_14-49-39_626_4862096867748585368-1/-ext-10000
Loading data to table tong.q1
Table tong.q1 stats: [numFiles=2, numRows=11, totalSize=44, rawDataSize=33]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 3.19 sec   HDFS Read: 313 HDFS Write: 71 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 190 msec
OK
Time taken: 19.199 seconds
hive> select * from q1;                --插入後的資料              
OK
1 2
3 4
5 6
7 8
1 2
3 4
0 0
5 6
7 8
1 2
1 2
Time taken: 0.033 seconds, Fetched: 11 row(s)
hive>

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25854343/viewspace-1415605/,如需轉載,請註明出處,否則將追究法律責任。

相關文章