Sqoop1 From PostgreSQL to Hdfs

hackeruncle發表於2016-07-30

環境:   Sqoop1 Install And Test MySQL/PostgreSQL
引數解析:
--connect: JDBC連線URL
--username:連線資料庫使用者名稱
--password:連線資料庫密碼
--table:   要讀取的表
-m:map並行讀取的數量
含義:讀取user_info表資料到HDFS叢集,並叧透過一個map任務
注意:此Sqoop命令沒有指定HDFS目錄,預設資料會放在/user/{user.name}/{--table引數指定表名}目錄下。


問題1:  org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2_temp does not exist(表的relation不存在,也就是說該使用者沒有查到在預設schema下的這表)
[root@sht-sgmhadoopnn-01 bin]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali  --table place_openhoursv2_temp  -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 10:51:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 10:51:57 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 10:51:57 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 10:51:57 INFO tool.CodeGenTool: Beginning code generation
16/07/30 10:51:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM place_openhoursv2_temp AS t LIMIT 1
16/07/30 10:51:58 ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2_temp does not exist
  Position: 17
org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2 does not exist
  Position: 17
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2310)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2023)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:421)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
        at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
        at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
        at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
        at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
        at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/07/30 10:51:58 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 bin]#


解決方法: 修改使用者的預設的schema
[postgres@sht-sgmhadoopcm-01 bin]$ ./psql -U denaliadmin denali
psql (9.5.1)
Type "help" for help.

denali=# show search_path;
   search_path  
-----------------
 "$user", public
(1 row)

denali=# alter role denaliadmin set search_path to factual_search_na_16q2_20160722_epl,public;
ALTER ROLE
denali=# show search_path;                                                                   
   search_path  
-----------------
 "$user", public
(1 row)

denali=# \q

[postgres@sht-sgmhadoopcm-01 bin]$ ./psql -U denaliadmin denali
psql (9.5.1)
Type "help" for help.

denali=# show search_path;
                 search_path                
---------------------------------------------
 factual_search_na_16q2_20160722_epl, public
(1 row)

denali=#

 

錯誤2: org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
###  --table factual_search_na_16q2_20160722_epl.place_openhoursv2_temp 錯誤的,--table 只能填寫table名稱,不能帶schema的名稱(其實無需帶schema名稱,因為rearch_path已經指定該使用者的預設的schema的是factual_search_na_16q2_20160722_epl了)
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali  --table factual_search_na_16q2_20160722_epl.place_openhoursv2_temp  -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:39:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:39:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:39:50 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:39:50 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:39:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" AS t LIMIT 1
16/07/30 12:39:50 ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
  Position: 17
org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
  Position: 17
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2310)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2023)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:421)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
        at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
        at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
        at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
        at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
        at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/07/30 12:39:50 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

[root@sht-sgmhadoopnn-01 hadoop]#

解決方法: --table不帶 schema

 


錯誤3:  orm.ClassWriter: No Java type for SQL type 1111 for column new_value(查詢語句或者表的欄位 new_value型別 無法轉換)
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali  --table place_openhoursv2_temp  -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:39:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:39:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:39:04 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:39:04 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:39:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "place_openhoursv2_temp" AS t LIMIT 1
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.sqoop.orm.ClassWriter.parseNullVal(ClassWriter.java:1377)
        at org.apache.sqoop.orm.ClassWriter.parseColumn(ClassWriter.java:1402)
        at org.apache.sqoop.orm.ClassWriter.myGenerateParser(ClassWriter.java:1528)
        at org.apache.sqoop.orm.ClassWriter.generateParser(ClassWriter.java:1491)
        at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1920)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1736)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]#


解決方法: 在sql語句中將該列 new_value的欄位型別由json改為 text
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import  --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali  \
> --query 'select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where  $CONDITIONS ' \
> --split-by place_id \
> --target-dir /sqoop1/test7  \
> -m 3 \
> --null-string '' --null-non-string ''

  Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:36:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:36:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:36:40 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:36:40 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:36:40 INFO manager.SqlManager: Executing SQL statement: select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where   (1 = 0) 
16/07/30 12:36:40 INFO manager.SqlManager: Executing SQL statement: select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where   (1 = 0) 
16/07/30 12:36:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop/hadoop/share/hadoop/mapreduce
Note: /tmp/sqoop-root/compile/b0a9ae11fdaa3e0b3754c0c70f8113f8/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/07/30 12:36:44 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b0a9ae11fdaa3e0b3754c0c70f8113f8/QueryResult.jar
16/07/30 12:36:44 INFO mapreduce.ImportJobBase: Beginning query import.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/07/30 12:36:45 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/07/30 12:36:46 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/07/30 12:36:56 INFO db.DBInputFormat: Using read commited transaction isolation
16/07/30 12:36:56 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(place_id), MAX(place_id) FROM (select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where   (1 = 1)  ) AS t1
16/07/30 12:36:56 INFO mapreduce.JobSubmitter: number of splits:3
16/07/30 12:36:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469795998430_0014
16/07/30 12:36:57 INFO impl.YarnClientImpl: Submitted application application_1469795998430_0014
16/07/30 12:36:58 INFO mapreduce.Job: The url to track the job:
16/07/30 12:36:58 INFO mapreduce.Job: Running job: job_1469795998430_0014
16/07/30 12:37:09 INFO mapreduce.Job: Job job_1469795998430_0014 running in uber mode : false
16/07/30 12:37:09 INFO mapreduce.Job:  map 0% reduce 0%
16/07/30 12:37:17 INFO mapreduce.Job:  map 33% reduce 0%
16/07/30 12:37:18 INFO mapreduce.Job:  map 67% reduce 0%
16/07/30 12:37:19 INFO mapreduce.Job:  map 100% reduce 0%
16/07/30 12:37:20 INFO mapreduce.Job: Job job_1469795998430_0014 completed successfully
16/07/30 12:37:20 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=426603
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=373
                HDFS: Number of bytes written=184494
                HDFS: Number of read operations=12
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=6
        Job Counters
                Launched map tasks=3
                Other local map tasks=3
                Total time spent by all maps in occupied slots (ms)=20412
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=20412
                Total vcore-seconds taken by all map tasks=20412
                Total megabyte-seconds taken by all map tasks=20901888
        Map-Reduce Framework
                Map input records=1000
                Map output records=1000
                Input split bytes=373
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=153
                CPU time spent (ms)=3990
                Physical memory (bytes) snapshot=525692928
                Virtual memory (bytes) snapshot=2674200576
                Total committed heap usage (bytes)=316145664
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=184494
16/07/30 12:37:20 INFO mapreduce.ImportJobBase: Transferred 180.1699 KB in 34.3762 seconds (5.2411 KB/sec)
16/07/30 12:37:20 INFO mapreduce.ImportJobBase: Retrieved 1000 records.
You have mail in /var/spool/mail/root


[root@sht-sgmhadoopnn-01 hadoop]# hadoop fs -ls  /sqoop1/test7
Found 4 items
-rw-r--r--   3 root root          0 2016-07-30 12:37 /sqoop1/test7/_SUCCESS
-rw-r--r--   3 root root     184208 2016-07-30 12:37 /sqoop1/test7/part-m-00000
-rw-r--r--   3 root root        143 2016-07-30 12:37 /sqoop1/test7/part-m-00001
-rw-r--r--   3 root root        143 2016-07-30 12:37 /sqoop1/test7/part-m-00002
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]#

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30089851/viewspace-2122751/,如需轉載,請註明出處,否則將追究法律責任。

相關文章