【Hadoop】9、Sqoop元件

王智剛發表於2022-04-30

Sqoop元件安裝與配置

Sqoop是Apache旗下一款 “Hadoop和關聯式資料庫伺服器之間傳送資料” 的工具。主要用於在Hadoop(Hive)與傳統的資料庫(MySQL、Oracl、 Postgres等)之間進行資料的傳遞, 可以將一個關係型資料庫中的資料 導進到Hadoop的HDFS中,也可以將HDFS的資料導進到關係型資料庫中。

1、使用xftp將軟體包上傳到/opt/software

# 解壓
[root@master ~]# tar xf /opt/software/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/

# 重新命名
[root@master src]# mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop

2、部署sqoop(在master上執行)

# 複製 sqoop-env-template.sh 模板,並將模板重新命名為 sqoop-env.sh。
[root@master src]# cd /usr/local/src/sqoop/conf/
[root@master conf]# cp sqoop-env-template.sh sqoop-env.sh

# 修改 sqoop-env.sh 檔案,新增 Hdoop、Hbase、Hive 等元件的安裝路徑。
[root@master conf]# vi sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/src/hadoop
export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
export HBASE_HOME=/usr/local/src/hbase
export HIVE_HOME=/usr/local/src/hive

# 配置 Linux 系統環境變數,新增 Sqoop 元件的路徑。
[root@master conf]# vi /etc/profile.d/sqoop.sh
export SQOOP_HOME=/usr/local/src/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib

# 檢視環境變數
[root@master conf]# source /etc/profile.d/sqoop.sh
[root@master conf]# echo $PATH
/usr/local/src/sqoop/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/src/hive/bin:/root/bin

# 連線資料庫
[root@master conf]# cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/

3、啟動sqoop叢集(在master上執行)

[root@master ~]# su - hadoop

# 執行 Sqoop 前需要先啟動 Hadoop 叢集。
[hadoop@master ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
hadoop@master's password: 
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.example.com.out
192.168.100.30: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.example.com.out
192.168.100.20: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.example.com.out
Starting secondary namenodes [0.0.0.0]
hadoop@0.0.0.0's password: 
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.example.com.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.example.com.out
192.168.100.30: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.example.com.out
192.168.100.20: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.example.com.out

# 檢查 Hadoop 叢集的執行狀態。
[hadoop@master ~]$ jps
50448 NameNode
50836 ResourceManager
51096 Jps
47502 QuorumPeerMain
50670 SecondaryNameNode

# 測試 Sqoop 是否能夠正常連線 MySQL 資料庫。
[hadoop@master ~]$ sqoop list-databases --connect jdbc:mysql://master:3306 --username root -P
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/29 15:50:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password: 
22/04/29 15:50:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Fri Apr 29 15:50:45 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sys
# 能 夠 查 看 到 MySQL 數 據 庫 中 的 information_schema 、hive、mysql、performance_schema、sys等資料庫,說明 Sqoop 可以正常連線 MySQL。

# 回到root使用者
[hadoop@master ~]$ exit
logout

4、連線hive配置(在master上執行)

[root@master ~]# cp /usr/local/src/hive/lib/hive-common-2.0.0.jar  /usr/local/src/sqoop/lib/

# 登入 MySQL 資料庫
[root@master ~]# mysql -uroot -pWangzhigang.1
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 28
Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

# 建立 sample 庫
mysql> create database sample;
Query OK, 1 row affected (0.00 sec)

# 使用 sample 庫
mysql> use sample;
Database changed
mysql> create table student(number char(9) primary key, name varchar(10));
Query OK, 0 rows affected (0.01 sec)

# 向 student 表插入幾條資料
mysql> insert into student values('01','zhangsan'),('02','lisi'),('03','wangwu');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

# 查詢 student 表的資料
mysql> select * from student;
+--------+----------+
| number | name     |
+--------+----------+
| 01     | zhangsan |
| 02     | lisi     |
| 03     | wangwu   |
+--------+----------+
3 rows in set (0.00 sec)
# 如果能看到以上三條記錄則表示資料庫中表建立成功

# 退出
mysql> quit
Bye

5、在Hive中建立sample資料庫和student資料表

[root@master ~]# su - hadoop

# 啟動 hive 命令列
[hadoop@master ~]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

# 建立 sample 庫
hive> create database sample;
OK
Time taken: 0.659 seconds

# 使用 sample 庫
hive> use sample;
OK
Time taken: 0.012 seconds

# 建立 student 表
hive> create table student(number STRING,name STRING);
OK
Time taken: 0.206 seconds

# 退出 hive 命令列
hive> exit;

6、從MySQL匯出資料,匯入Hive

需要說明該命令的以下幾個引數:

1)--connect:MySQL 資料庫連線 URL。

2)--username 和--password:MySQL 資料庫的使用者名稱和密碼。

3)--table:匯出的資料表名。

4)--fields-terminated-by:Hive 中欄位分隔符。

5)--delete-target-dir:刪除匯出目的目錄。

6)--num-mappers:Hadoop 執行 Sqoop 匯入匯出啟動的 map 任務數。

7)--hive-import --hive-database:匯出到 Hive 的資料庫名。

8)--hive-table:匯出到 Hive 的表名。

[hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Wangzhigang.1 --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database sample --hive-table student

再開一個視窗用於檢視

[root@master ~]# su - hadoop

# 啟動hive
[hadoop@master ~]$ hive

# 檢視
hive> select * from sample.student;
OK
01|zhangsan     NULL
02|lisi NULL
03|wangwu       NULL
Time taken: 1.32 seconds, Fetched: 3 row(s)

# 退出
hive> exit;
# 能看到以上內容則表示將資料從mysql匯入到hive成功了。

7、將資料從hive中匯出到mysql資料庫中(在master上執行)

清空mysql資料庫中sample庫的student表內容

student 表中 number 為主鍵,新增資訊導致主鍵重複,報錯,所以刪除表資料

# 回到之前的視窗
[hadoop@master ~]$ mysql -uroot -pWangzhigang.1
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 66
Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

# 刪除
mysql> delete from sample.student;
Query OK, 3 rows affected (0.00 sec)

# 檢視
mysql> select * from sample.student;
Empty set (0.00 sec)
# 能看到以上Empty set則表示清空成功

# 退出
mysql> quit
Bye

8、從Hive匯出資料,匯入到MySQL

需要說明該命令的以下幾個引數:

1)--connect:MySQL 資料庫連線 URL。

2)--username 和--password:MySQL 資料庫的使用者名稱和密碼。

3)--table:匯出的資料表名。

4)--fields-terminated-by:Hive 中欄位分隔符。

5)--export-dir:Hive 資料表在 HDFS 中的儲存路徑。

[hadoop@master ~]$ sqoop export --connect "jdbc:mysql://master:3306/sample?useUnicode=true&characterEncoding=utf-8" --username root --password Wangzhigang.1 --table student --input-fields-terminated-by '|' --export-dir /user/hive/warehouse/sample.db/student/*

進入MySQL檢視

[hadoop@master ~]$ mysql -uroot -pWangzhigang.1
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 72
Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

# 檢視
mysql> select * from sample.student;
+--------+----------+
| number | name     |
+--------+----------+
| 01     | zhangsan |
| 02     | lisi     |
| 03     | wangwu   |
+--------+----------+
3 rows in set (0.00 sec)
# 能看到以上內容則表示從hadoop叢集的hive中匯出資料到mysql資料庫成功。

# 退出
mysql> quit
Bye

9、sqoop常用命令

# 列出所有資料庫
[hadoop@master ~]$ sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Wangzhigang.1
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/29 16:11:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/04/29 16:11:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/29 16:11:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Fri Apr 29 16:11:30 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sample
sys

# 連線 MySQL 並列出 sample 資料庫中的表
[hadoop@master ~]$ sqoop list-tables --connect "jdbc:mysql://master:3306/sample?useSSL=false" --username root --password Wangzhigang.1
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/29 16:11:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/04/29 16:11:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/29 16:11:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
student

# 將關係型資料的表結構複製到 hive 中,只是複製表的結構,表中的內容沒有複製過去
[hadoop@master ~]$ sqoop create-hive-table --connect jdbc:mysql://master:3306/sample --table student --username root --password Wangzhigang.1 --hive-table test
# 結果顯示hive.HiveImport: Hive import complete.則表示成功

# 從關聯式資料庫匯入檔案到 Hive 中。
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Wangzhigang.1 --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database default --hive-table test
# 結果顯示_SUCCESS則表示成功

# 啟動hive檢視
[hadoop@master ~]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

# 檢視錶
hive> show tables;
OK
test
Time taken: 0.641 seconds, Fetched: 1 row(s)
# 如果能看到以上test表則表示成功

# 退出
hive> exit;

# 從mysql中匯出表內容到HDFS檔案中
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Wangzhigang.1 --table student --num-mappers 1 --target-dir /user/test

10、瀏覽器檢視

在瀏覽器上訪問master:50070

然後點選Utilities下面的Browse the file system,要能看到user就表示成功

檢視匯入資料

[hadoop@master ~]$ hdfs dfs -ls /user/test
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2022-04-29 16:49 /user/test/_SUCCESS
-rw-r--r--   2 hadoop supergroup         30 2022-04-29 16:49 /user/test/part-m-00000
[hadoop@master ~]$ hdfs dfs -cat /user/test/part-m-00000
01,zhangsan
02,lisi
03,wangwu
#  執行以上命令後要能看到資料庫中的內容則表示成功

宣告:未經許可,不得轉載

相關文章