Sqoop1 From PostgreSQL to Hdfs
環境: Sqoop1 Install And Test MySQL/PostgreSQL
引數解析:
--connect: JDBC連線URL
--username:連線資料庫使用者名稱
--password:連線資料庫密碼
--table: 要讀取的表
-m:map並行讀取的數量
含義:讀取user_info表資料到HDFS叢集,並叧透過一個map任務
注意:此Sqoop命令沒有指定HDFS目錄,預設資料會放在/user/{user.name}/{--table引數指定表名}目錄下。
問題1: org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2_temp does not exist(表的relation不存在,也就是說該使用者沒有查到在預設schema下的這表)
[root@sht-sgmhadoopnn-01 bin]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali --table place_openhoursv2_temp -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 10:51:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 10:51:57 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 10:51:57 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 10:51:57 INFO tool.CodeGenTool: Beginning code generation
16/07/30 10:51:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM place_openhoursv2_temp AS t LIMIT 1
16/07/30 10:51:58 ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2_temp does not exist
Position: 17
org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2 does not exist
Position: 17
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2310)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2023)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:421)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/07/30 10:51:58 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 bin]#
解決方法: 修改使用者的預設的schema
[postgres@sht-sgmhadoopcm-01 bin]$ ./psql -U denaliadmin denali
psql (9.5.1)
Type "help" for help.
denali=# show search_path;
search_path
-----------------
"$user", public
(1 row)
denali=# alter role denaliadmin set search_path to factual_search_na_16q2_20160722_epl,public;
ALTER ROLE
denali=# show search_path;
search_path
-----------------
"$user", public
(1 row)
denali=# \q
[postgres@sht-sgmhadoopcm-01 bin]$ ./psql -U denaliadmin denali
psql (9.5.1)
Type "help" for help.
denali=# show search_path;
search_path
---------------------------------------------
factual_search_na_16q2_20160722_epl, public
(1 row)
denali=#
錯誤2: org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
### --table factual_search_na_16q2_20160722_epl.place_openhoursv2_temp 錯誤的,--table 只能填寫table名稱,不能帶schema的名稱(其實無需帶schema名稱,因為rearch_path已經指定該使用者的預設的schema的是factual_search_na_16q2_20160722_epl了)
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali --table factual_search_na_16q2_20160722_epl.place_openhoursv2_temp -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:39:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:39:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:39:50 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:39:50 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:39:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" AS t LIMIT 1
16/07/30 12:39:50 ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
Position: 17
org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
Position: 17
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2310)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2023)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:421)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/07/30 12:39:50 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
[root@sht-sgmhadoopnn-01 hadoop]#
解決方法: --table不帶 schema
錯誤3: orm.ClassWriter: No Java type for SQL type 1111 for column new_value(查詢語句或者表的欄位 new_value型別 無法轉換)
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali --table place_openhoursv2_temp -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:39:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:39:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:39:04 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:39:04 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:39:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "place_openhoursv2_temp" AS t LIMIT 1
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.sqoop.orm.ClassWriter.parseNullVal(ClassWriter.java:1377)
at org.apache.sqoop.orm.ClassWriter.parseColumn(ClassWriter.java:1402)
at org.apache.sqoop.orm.ClassWriter.myGenerateParser(ClassWriter.java:1528)
at org.apache.sqoop.orm.ClassWriter.generateParser(ClassWriter.java:1491)
at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1920)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1736)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]#
解決方法: 在sql語句中將該列 new_value的欄位型別由json改為 text
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali \
> --query 'select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where $CONDITIONS ' \
> --split-by place_id \
> --target-dir /sqoop1/test7 \
> -m 3 \
> --null-string '' --null-non-string ''
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:36:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:36:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:36:40 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:36:40 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:36:40 INFO manager.SqlManager: Executing SQL statement: select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where (1 = 0)
16/07/30 12:36:40 INFO manager.SqlManager: Executing SQL statement: select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where (1 = 0)
16/07/30 12:36:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop/hadoop/share/hadoop/mapreduce
Note: /tmp/sqoop-root/compile/b0a9ae11fdaa3e0b3754c0c70f8113f8/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/07/30 12:36:44 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b0a9ae11fdaa3e0b3754c0c70f8113f8/QueryResult.jar
16/07/30 12:36:44 INFO mapreduce.ImportJobBase: Beginning query import.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/07/30 12:36:45 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/07/30 12:36:46 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/07/30 12:36:56 INFO db.DBInputFormat: Using read commited transaction isolation
16/07/30 12:36:56 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(place_id), MAX(place_id) FROM (select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where (1 = 1) ) AS t1
16/07/30 12:36:56 INFO mapreduce.JobSubmitter: number of splits:3
16/07/30 12:36:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469795998430_0014
16/07/30 12:36:57 INFO impl.YarnClientImpl: Submitted application application_1469795998430_0014
16/07/30 12:36:58 INFO mapreduce.Job: The url to track the job:
16/07/30 12:36:58 INFO mapreduce.Job: Running job: job_1469795998430_0014
16/07/30 12:37:09 INFO mapreduce.Job: Job job_1469795998430_0014 running in uber mode : false
16/07/30 12:37:09 INFO mapreduce.Job: map 0% reduce 0%
16/07/30 12:37:17 INFO mapreduce.Job: map 33% reduce 0%
16/07/30 12:37:18 INFO mapreduce.Job: map 67% reduce 0%
16/07/30 12:37:19 INFO mapreduce.Job: map 100% reduce 0%
16/07/30 12:37:20 INFO mapreduce.Job: Job job_1469795998430_0014 completed successfully
16/07/30 12:37:20 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=426603
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=373
HDFS: Number of bytes written=184494
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Job Counters
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=20412
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=20412
Total vcore-seconds taken by all map tasks=20412
Total megabyte-seconds taken by all map tasks=20901888
Map-Reduce Framework
Map input records=1000
Map output records=1000
Input split bytes=373
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=153
CPU time spent (ms)=3990
Physical memory (bytes) snapshot=525692928
Virtual memory (bytes) snapshot=2674200576
Total committed heap usage (bytes)=316145664
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=184494
16/07/30 12:37:20 INFO mapreduce.ImportJobBase: Transferred 180.1699 KB in 34.3762 seconds (5.2411 KB/sec)
16/07/30 12:37:20 INFO mapreduce.ImportJobBase: Retrieved 1000 records.
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]# hadoop fs -ls /sqoop1/test7
Found 4 items
-rw-r--r-- 3 root root 0 2016-07-30 12:37 /sqoop1/test7/_SUCCESS
-rw-r--r-- 3 root root 184208 2016-07-30 12:37 /sqoop1/test7/part-m-00000
-rw-r--r-- 3 root root 143 2016-07-30 12:37 /sqoop1/test7/part-m-00001
-rw-r--r-- 3 root root 143 2016-07-30 12:37 /sqoop1/test7/part-m-00002
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]#
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30089851/viewspace-2122751/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Sqoop1 Install And Test MySQL/PostgreSQLOOPMySql
- Sqoop1.99.6 Install And From MySQL To HdfsOOPMySql
- HDFS 05 - HDFS 常用的 Java API 操作JavaAPI
- hdfs命令
- HDFS原理
- HDFS 09 - HDFS NameNode 的高可用機制
- Hadoop Shell命令 |HDFS Shell命令| HDFS 命令Hadoop
- HDFS Federation(HDFS 聯邦)(Hadoop2.3)Hadoop
- HDFS shell命令
- Hadoop–HDFSHadoop
- HDFS常用操作
- HDFS Shell操作
- HDFS Architecture
- HDFS原始碼解析系列一——HDFS通訊協議原始碼協議
- HDFS 命令:用於管理HDFS的Hadoop Shell命令大全Hadoop
- HDFS 07 - HDFS 效能調優之 合併小檔案
- hdfs學習(二)
- hdfs學習(三)
- HDFS入門概述
- 4、hdfs api使用API
- HDFS2.0原理
- Hdfs儲存策略
- HDFS資料平衡
- Hadoop HDFS(二)Hadoop
- Hadoop HDFS(一)Hadoop
- HDFS read and write
- hadoop命令——hdfsHadoop
- HDFS的架構架構
- HDFS 05 - HDFS 的後設資料管理(FSImage、EditLog、Checkpoint)
- Mysql增量寫入Hdfs(二) --Storm+hdfs的流式處理MySqlORM
- HDFS原始碼解析:教你用HDFS客戶端寫資料原始碼客戶端
- HDFS 常用命令
- HDFS寫過程分析
- HDFS架構及原理架構
- hdfs dfsadmin -fetchImage
- HDFS短路讀詳解
- HDFS balance策略詳解
- 詳解HDFS入門