HIVE之 Sqoop 1.4.6 安裝、hive與oracle表互導
1. sqoop資料遷移
1.1 概述
sqoop是apache旗下一款“Hadoop和關聯式資料庫伺服器之間傳送資料”的工具。
匯入資料:MySQL,Oracle匯入資料到Hadoop的HDFS、HIVE、HBASE等資料儲存系統;
匯出資料:從Hadoop的檔案系統中匯出資料到關聯式資料庫
1.2 工作機制
將匯入或匯出命令翻譯成mapreduce程式來實現
在翻譯出的mapreduce中主要是對inputformat和outputformat進行定製
1.3 sqoop實戰及原理
1.3.1 sqoop安裝
安裝sqoop的前提是已經具備java和hadoop的環境
1、下載並解壓
最新版下載地址
比如:sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
[root@hadoop1 sqoop]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
[root@hadoop1 sqoop]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop
[root@hadoop1 sqoop]# ls
apache-hive-1.2.1-bin hadoop-2.7.4 hdfs pig-0.17.0 pig_1517170893185.log sqoop tmp
2、修改配置檔案
在/etc/profile中配置sqoop_home,程式碼如下:
vim /etc/profile
export SQOOP_HOME=/usr/local/hadoop/sqoop
追加path
export PATH=$PATH:$SQOOP_HOME/bin
[root@hadoop1 sqoop]# source /etc/profile
$ cd $SQOOP_HOME/conf
$ mv sqoop-env-template.sh sqoop-env.sh
開啟sqoop-env.sh並編輯下面幾行: ## 去掉前面的##
export HADOOP_COMMON_HOME=/usr/local/hadoop/hadoop-2.7.4/
export HADOOP_MAPRED_HOME=/usr/local/hadoop/hadoop-2.7.4/
export HIVE_HOME=/usr/local/hadoop/apache-hive-1.2.1-bin/
配置後的介面效果如下:
這裡寫圖片描述
3.1 加入oracle的驅動包
將 ojdbc6.jar 放到 $SQOOP_HOME/lib/ 下。
3.2 加入mysql的jdbc驅動包
將mysql-connector-java-5.1.38.jar 放到 $SQOOP_HOME/lib/ 下。
4、驗證啟動
$ cd $SQOOP_HOME/bin
$ sqoop-version
預期的輸出:
[root@hadoop1 sqoop]# sqoop-version
Warning: /usr/local/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
18/01/29 19:09:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Sqoop 1.4.6
git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
Compiled by root on Mon Apr 27 14:38:36 CST 2015
[root@hadoop1 sqoop]#
到這裡,整個Sqoop安裝工作完成。
資料遷移> oracle to hive ## 注意 HIVE 表名需要大寫
sqoop# sqoop import --hive-import --connect jdbc:oracle:thin:@10.100.25.8:1521:devdb1 --username andy --password andy \
--table ANDY --hive-database oracletohive --hive-table ANDY -m 1
說明: 遷移的表時,如果 hive中已經存在,則預設會追加在原表中。 如果 hive 中不存在,則自動建立。
日誌輸出:
18/01/29 19:35:46 INFO hive.HiveImport: Loading uploaded data into Hive
18/01/29 19:35:51 INFO hive.HiveImport:
18/01/29 19:35:51 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/hadoop/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
18/01/29 19:36:02 INFO hive.HiveImport: OK
18/01/29 19:36:02 INFO hive.HiveImport: Time taken: 2.42 seconds
18/01/29 19:36:03 INFO hive.HiveImport: Loading data to table oracletohive.andy
18/01/29 19:36:04 INFO hive.HiveImport: Table oracletohive.andy stats: [numFiles=1, totalSize=1996]
18/01/29 19:36:04 INFO hive.HiveImport: OK
18/01/29 19:36:04 INFO hive.HiveImport: Time taken: 1.579 seconds
18/01/29 19:36:04 INFO hive.HiveImport: Hive import complete.
18/01/29 19:36:04 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
> show databases;
OK
default
oracletohive
Time taken: 0.027 seconds, Fetched: 2 row(s)
hive>
> use oracletohive;
OK
Time taken: 0.034 seconds
hive>
> show tables;
OK
andy
Time taken: 0.037 seconds, Fetched: 1 row(s)
hive> select count(*) from andy;
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-01-29 19:43:46,040 Stage-1 map = 0%, reduce = 0%
2018-01-29 19:43:54,738 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.49 sec
2018-01-29 19:44:03,323 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.39 sec
MapReduce Total cumulative CPU time: 3 seconds 390 msec
Ended Job = job_1517222580457_0002
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.39 sec HDFS Read: 16343 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 390 msec
OK
7
Time taken: 34.543 seconds, Fetched: 1 row(s)
資料遷移> hive to oracle
hive遷移oracle,需要提前在oracle中建立遷移的表,否則報 java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
sqoop# sqoop export --table ANDY --connect jdbc:oracle:thin:@10.100.25.8:1521:devdb1 --username andy --password andy
--export-dir /user/hive/warehouse/oracletohive.db/andy --input-fields-terminated-by '\001' \
--input-lines-terminated-by '\n'
日誌輸出:
18/01/29 20:21:34 INFO mapreduce.Job: Job job_1517222580457_0005 completed successfully
18/01/29 20:21:34 INFO mapreduce.Job: Counters: 30
。。。。。 省略輸出
18/01/29 20:21:34 INFO mapreduce.ExportJobBase: Transferred 5.502 KB in 116.7414 seconds (48.2605 bytes/sec)
18/01/29 20:21:34 INFO mapreduce.ExportJobBase: Exported 7 records.
-- oracle端檢視
SQL> select count(*) from andy;
COUNT(*)
----------
14 > 由 7條 變為了 14條 , 說明 hive 匯入 oracle 成功!
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31383567/viewspace-2150616/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- ubuntu 安裝使用 Sqoop1.4.6 + hive 2.3.0UbuntuOOPHive
- Sqoop之 Sqoop 1.4.6 安裝OOP
- Hive學習之Hive的安裝Hive
- sqoop導oracle資料庫的資料到hiveOOPOracle資料庫Hive
- Hive的安裝與配置Hive
- 【Hive一】Hive安裝及配置Hive
- Hive安裝Hive
- hive學習筆記之八:SqoopHive筆記OOP
- sqoop用法之mysql與hive資料匯入匯出OOPMySqlHive
- Hive的安裝Hive
- Hive安裝配置Hive
- Hive之 hive與hadoop的聯絡HiveHadoop
- Hive 3.1.2安裝部署Hive
- Hive的安裝部署Hive
- HIVE的安裝配置Hive
- Hive安裝手冊Hive
- hive-0.13安裝Hive
- Hive(八)安裝部署Hive
- Hive之 hive架構Hive架構
- Hive學習之二 《Hive的安裝之自定義mysql資料庫》HiveMySql資料庫
- Hive學習之常用互動命令Hive
- Sqoop匯出ClickHouse資料到HiveOOPHive
- Hive 與 ElasticSearch 的資料互動HiveElasticsearch
- Hive學習之一 《Hive的介紹和安裝》Hive
- 安裝和體驗hiveHive
- ubuntu 安裝hive2.3.0UbuntuHive
- hive的安裝(包括mysql)HiveMySql
- hive在centos上安裝HiveCentOS
- 1- hive和sqoop元件介紹HiveOOP元件
- mysql 資料sqoop到hive 步驟MySqlOOPHive
- hive建表Hive
- Hive1.2.1安裝筆記Hive筆記
- Hive遠端模式安裝配置Hive模式
- Hive學習之六 《Hive進階— —hive jdbc》 詳解HiveJDBC
- [hive]hive資料模型中四種表Hive模型
- hadoop上安裝hive2.3.2學習總結—hive安裝+mysql以及碰到坑點HadoopHiveMySql
- Sqoop1.4.6配置和使用OOP
- Centos7安裝配置Hive教程。CentOSHive