20202.9.29課堂筆記(Sqoop介紹及資料遷移)

超可愛慕之發表於2020-09-29

一.Sqoop概述

Sqoop是一個用於在Hadoop和關聯式資料庫之間傳輸資料的工具官網連結

  • 將資料從RDBMS匯入到HDFS、Hive、HBase- 從HDFS匯出資料到RDBMS- 使用MapReduce匯入和匯出資料,提供並行操作和容錯
    目標使用者
  • 系統管理員、資料庫管理員- 大資料分析師、大資料開發工程師等

二.Sqoop操作:相關文件

1.從RDB匯入資料到HDFS

sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--table customers \
--username root \
--password ok \
--target-dir /data/retail_db/customers \
-m 3

sqoop-import是sqoop import的別名

  • 通過Where語句過濾匯入表
sqoop import \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --driver com.mysql.jdbc.Driver \
 --table orders --where "order_id < 500" \
 --username root \
 --password ok \
 --delete-target-dir \
 --target-dir /data/retail_db/orders \
 -m 3

  • 通過COLUMNS過濾匯入表
sqoop import \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --driver com.mysql.jdbc.Driver \
 --table customers \
 --columns "customer_id,customer_fname,customer_lname" \
 --username root \
 --password ok \
 --delete-target-dir \
 --target-dir /data/retail_db/customers \
 -m 3

  • 使用query方式匯入資料
sqoop import \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --driver com.mysql.jdbc.Driver \
 --query "select * from orders where order_status!='CLOSED' and \$CONDITIONS"  \
 --username root \
 --password ok \
 --split-by order_id \
 --delete-target-dir \
 --target-dir /data/retail_db/orders \
 -m 3
  • 使用Sqoop增量匯入資料- Incremental指定增量匯入的模式- 1)append:追加資料記錄- 2)lastmodified:可追加更新的資料
 sqoop import \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --table orders \
 --username root \
 --password ok \
 --incremental append \
 --check-column order_date \
 --last-value '2013-07-24 00:00:00' \
 --target-dir /data/retail_db/orders \
 -m 3 

2.匯入資料到hive

在這裡插入圖片描述

ArgumentDescription
--hive-home <dir>Override $HIVE_HOME
--hive-importImport tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwriteOverwrite existing data in the Hive table.
--create-hive-tableIf set, then the job will fail if the target hive table exits. By default this property is false.
--hive-table <table-name>Sets the table name to use when importing to Hive.
--hive-drop-import-delimsDrops \n, \r, and \01 from string fields when importing to Hive.
--hive-delims-replacementReplace \n, \r, and \01 from string fields with user defined string when importing to Hive.
--hive-partition-keyName of a hive field to partition are sharded on
--hive-partition-value <v>String-value that serves as partition key for this imported into hive in this job.
--map-column-hive <map>Override default mapping from SQL type to Hive type for configured columns.
  • 1.複製jar包
 #複製hive的jar包
 cp /opt/install/hive/lib/hive-common-1.1.0-cdh5.14.2.jar /opt/install/sqoop/lib/
 cp /opt/install/hive/lib/hive-shims* /opt/install/sqoop/lib/

  • 2.匯入
 sqoop import \
 --connect jdbc:mysql://localhost:3306/retail_db  \
 --table orders \
 --username root  \
 --password ok \
 --hive-import  \
 --create-hive-table \
--hive-database retail_db \
--hive-table orders \
--m 3

  • 3.匯入到hive分割槽表,注意分割槽欄位不能當作普同欄位匯入表中
 sqoop import \
 --connect jdbc:mysql://localhost:3306/retail_db  \
 --driver com.mysql.jdbc.Driver \
 --query "select order_id,order_status from orders where order_date&gt;='2013-11-03' and order_date&lt;'2013-11-04' and \$CONDITIONS" \
 --username root  \
 --password ok \
 --delete-target-dir \
 --target-dir /data/retail_db/orders \
 --split-by order_status \
 --hive-import  \
 --create-hive-table \
 --hive-database retail_db \
 --hive-table orders \
 --hive-partition-key "order_date" \
 --hive-partition-value "2013-11-03" \
 --m 3

3.匯入資料到HBase

在這裡插入圖片描述

ArgumentDescription
--column-family <family>Sets the target column family for the import
--hbase-create-tableIf specified, create missing HBase tables
--hbase-row-key <col>Specifies which input column to use as the row key In case, if input table contains composite key, then<col>must be in the form of a comma-separated list of composite key attributes
--hbase-table <table-name>Specifies an HBase table to use as the target instead of HDFS
--hbase-bulkloadEnables bulk loading
#hbase建立表
create 'emp_hbase_import','details'
#匯入到hbase
 sqoop import  \
 --connect jdbc:mysql://localhost:3306/sqoop \
 --username root  \
 --password ok \
 --table emp \
 --columns "EMPNO,ENAME,JOB,SAL,COMM"  \
 --hbase-table emp_hbase_import \
 --column-family details  \
 --hbase-row-key "EMPNO" \
 --m 1

4.hdfs匯出到MySQL

#先在mysql建立一個空表
create table customers_demo as select * from customers where 1=2;
#建立目錄,上傳資料
hdfs dfs -mkdir /customerinput
hdfs dfs -put customers.csv /customerinput/
#匯出到mysql
sqoop export \
--connect jdbc:mysql://localhost:3306/retail_db \
--username root \
--password ok \
--table customers_demo \
--export-dir /customerinput/ \
--fields-terminated-by ',' \
-m 1 

5.sqoop指令碼

#sqoop指令碼
#1編寫指令碼,內容如下
#############################
import
--connect
jdbc:mysql://hadoop01:3306/retail_db
--driver
com.mysql.jdbc.Driver
--table
customers
--username
root
--password
root
--target-dir
/data/retail_db/customers
--delete-target-dir
--m
3
##############################
#2執行指令碼
sqoop --options-file job_RDBMS2HDFS.opt

6.sqoop的job任務

 #建立job 注意import前必須有空格
 sqoop job \
 --create mysqlToHdfs \
 -- import \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --table orders \
 --username root \
 --password ok \
 --incremental append \
 --check-column order_date \
 --last-value '0' \
 --target-dir /data/retail_db/orders \
 --m 3 
 #檢視job
 sqoop job --list
 #執行job,可設定crontab定時執行 用的比較多
 sqoop job --exec mysqlToHdfs

相關文章