hive on spark配置
作者:劉權
時間:2016-08-18
背景介紹
1.軟體版本
- hive : hive-0.13.1
- spark : spark-1.2.0-bin-hadoop2.3
- hadoop : hadoop-2.2.0
- phoenix : phoenix-4.4.0-HBase-0.98
- hbase : hbase-0.98.0-hadoop2
2. 準備工作
- 保證hdfs可用
- 保證hive可用
3. 配置spark
3.1 設定SPARK_CLASSPATH
vi $SPARK_HOME/conf/spark-env.sh
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/yarn/spark/lib/mysql-connector-java-5.1.17.jar:/home/yarn/hadoop-2.2.0/lib/*:/home/yarn/hbase-0.98.0-hadoop2/lib/hbase-protocol-0.98.0-hadoop2.jar
3.2 拷貝hdfs-site.xml hive-site.xml
cp $HADOOP_HOME/hdfs-site.xml $SPARK_HOME/conf/
cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf/
4. 常見問題:
4.1. hive連線後設資料報錯
檢查:mysql中hive的後設資料字符集,hive要求後設資料庫字符集必須為:latin1
如果不是:使用以下命令修改字符集編碼
alter database hive character set latin1;
ALTER TABLE hive.* DEFAULT CHARACTER SET latin1;
4.2.注意:spark.driver.extraClassPath和SPARK_CLASSPATH這兩個設定同時存在一個
4.3. IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
問題原因:
這個問題的發生是由於優化了HBASE-9867 引起的,無意間引進了一個依賴類載入器。
它影響使用-libjars引數和使用 fat jar兩種模式的job.
fat jar模式Hadoop的一個特殊功能:
可以讀取操作目錄中/lib目錄下包含的所有庫的JAR檔案,
把執行job依賴的jar放在jar中的lib目錄下。
解決方案:
將hbase-protocol-0.98.0-hadoop2.jar加入SPARK_CLASSPATH中
4.4. MetaException(message:java.lang.ClassNotFoundException Class org.apache.phoenix.hive.PhoenixSerde
嘗試方案:
- 將phoenix-hive-4.2.2-jar-with-dependencies.jar加入SPARK_CLASSPATH ==嘗試結果:
失敗== - 將phoenix-hive-4.2.2-jar-with-dependencies.jar設定到$SPARK_CLASSPATH/conf/hive-site.xml ==嘗試結果:
失敗==
最終解決方案:
在執行spark-sql或spark-shell時候加入引數:
--jars /home/yarn/hive/lib/phoenix-hive-4.2.2-jar-with-dependencies.jar
如下:
spark-sql --master spark://big147:7077 --executor-memory 20G --total-executor-cores 2 --jars /home/yarn/hive/lib/phoenix-hive-4.2.2-jar-with-dependencies.jar
4.5 mysql驅動問題
問題詳情:
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:101)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
... 14 more
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310)
at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339)
at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248)
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:497)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:475)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171)
... 19 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at org.datanucleus.store.AbstractStoreManager.<init>(AbstractStoreManager.java:240)
at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
... 48 more
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "dbcp-builtin" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:259)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:85)
... 66 more
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)
at org.datanucleus.store.rdbms.connectionpool.DBCPBuiltinConnectionPoolFactory.createConnectionPool(DBCPBuiltinConnectionPoolFactory.java:49)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
... 68 more
解決方案:
方案1.
執行spark-sql或spark-shell加入引數 --driver-class-path執行驅動位置
spark-shell --master spark://big147:7077 --executor-memory 20G --total-executor-cores 2 --driver-class-path /home/yarn/hive/lib/mysql-connector-java-5.1.26.jar
方案2.
將驅動包加入SPARK_CLASSPATH:如下
vi $SPARK_HOME/conf/spark-env.sh
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/yarn/hadoop-2.2.0/lib/*:/home/yarn/hbase-0.98.0-hadoop2/lib/hbase-protocol-0.98.0:/home/yarn/spark/lib/mysql-c
onnector-java-5.1.17.jar
相關文章
- spark2.2.0 配置spark sql 操作hiveSparkSQLHive
- 【Spark篇】---SparkSQL on Hive的配置和使用SparkSQLHive
- spark with hiveSparkHive
- Spark整合hiveSparkHive
- hive on spark on yarnHiveSparkYarn
- Hive on Spark和Spark sql on Hive,你能分的清楚麼HiveSparkSQL
- Hive on Spark 和 Spark sql on Hive,你能分的清楚麼HiveSparkSQL
- hive、spark優化HiveSpark優化
- zookeeper:spark-project專案的hive和mysql配置SparkProjectHiveMySql
- spark寫入hive資料SparkHive
- Hive和Spark分割槽策略HiveSpark
- Spark操作Hive分割槽表SparkHive
- hive on spark記憶體模型HiveSpark記憶體模型
- Flume+Spark+Hive+Spark SQL離線分析系統SparkHiveSQL
- Apache Spark和Hive有用的功能ApacheSparkHive
- hive on spark執行速度慢HiveSpark
- 【Hive一】Hive安裝及配置Hive
- Hive安裝配置Hive
- spark相關介紹-提取hive表(一)SparkHive
- HIVE的安裝配置Hive
- hive on spark:return code 30041 Failed to create Spark client for Spark session原因分析及解決方案探尋HiveSparkAIclientSession
- Hadoop2.7.3+Hive2.1.1+Spark2.1.0環境搭建HadoopHiveSpark
- Hive的安裝與配置Hive
- Hive metastore三種配置方式HiveAST
- Hive配置與操作實踐Hive
- pycharm 怎麼配置sparkPyCharmSpark
- spark_home的配置Spark
- Spark Metrics配置詳解Spark
- Spark安裝與配置Spark
- Spark Streaming + Spark SQL 實現配置化ETSparkSQL
- Spark面試題(八)——Spark的Shuffle配置調優Spark面試題
- Hive遠端模式安裝配置Hive模式
- Spark SQL:Hive資料來源複雜綜合案例實戰SparkSQLHive
- Hive所有的配置總結 轉載Hive
- Centos7安裝配置Hive教程。CentOSHive
- spark叢集的配置檔案Spark
- Spark的相關引數配置Spark
- Hive學習之常見屬性配置Hive