Spark之HiveSupport連線(spark-shell和IDEA)
本文介紹了使用Spark連線Hive的兩種方式,spark-shell和IDEA遠端連線。
1.spark-shell
1.1.拷貝配置檔案
- 拷貝hive/conf/hdfs-site.xml 到 spark/conf/ 下
- 拷貝hive/lib/mysql 到 spark/jars/下
這裡可以通過如下引數來實現指定jar-path
--driver-class-path path/mysql-connector-java-5.1.13-bin.jar
1.2.啟動spark-shell
spark.sql("show databases").show()
spark.sql("use test")
spark.sql("select * from student").show()
執行結果:
[hadoop@hadoop1 spark-2.3.0-bin-hadoop2.7]$ ./bin/spark-shell
2018-09-04 11:43:10 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop1:4040
Spark context available as 'sc' (master = local[*], app id = local-1536032600945).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show databases").show()
2018-09-04 11:43:54 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
+------------+
|databaseName|
+------------+
| default|
| test|
+------------+
scala> spark.sql("use test")
res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("select * from student").show()
+----+-----+---+----+-----+
| sno|sname|sex|sage|sdept|
+----+-----+---+----+-----+
|1001| 張三| 男| 22| 高一|
|1002| 李四| 女| 25| 高二|
+----+-----+---+----+-----+
2.IDEA連線Hive
這裡是連線遠端的Hive,如果還沒有部署Hive,請參考Hive之環境安裝,前提是必須先啟動hdfs。
2.1.引入依賴
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
<!--<scope>provided</scope>-->
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.0</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency><!--資料庫驅動:Mysql-->
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.40</version>
</dependency>
2.2.拷貝配置檔案
拷貝hive-site.xml到專案的resources目錄下即可
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop1:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
</configuration>
2.3.編寫程式碼
object HiveSupport {
def main(args: Array[String]): Unit = {
//val warehouseLocation = "D:\\workspaces\\idea\\hadoop"
val spark =
SparkSession.builder()
.appName("HiveSupport")
.master("local[2]")
//拷貝hdfs-site.xml不用設定,如果使用本地hive,可通過該引數設定metastore_db的位置
//.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport() //開啟支援hive
.getOrCreate()
//spark.sparkContext.setLogLevel("WARN") //設定日誌輸出級別
import spark.implicits._
import spark.sql
sql("show databases")
sql("use test")
sql("select * from student").show()
Thread.sleep(150 * 1000)
spark.stop()
}
}
執行結果:
+----+-----+---+----+-----+
| sno|sname|sex|sage|sdept|
+----+-----+---+----+-----+
|1001| 張三| 男| 22| 高一|
|1002| 李四| 女| 25| 高二|
+----+-----+---+----+-----+
參考:
相關文章
- Spark連線MongoDB之ScalaSparkMongoDB
- 關於MASTER=spark://SparkMaster:7077 ./spark-shell問題ASTSpark
- IDEA連線MySQLIdeaMySql
- JAVA之長連線、短連線和心跳包Java
- IDEA官方下載連線Idea
- IDEA中資料庫連線Idea資料庫
- 菜鳥學網路之 —— 長連線和短連線
- 連線和半連線
- idea使用資料庫連線工具Idea資料庫
- Socket連線和Http連線HTTP
- 長連線和短連線
- 左連線和右連線
- spark Ml 機器學習之 線性迴歸Spark機器學習
- sql 內連線和外連線SQL
- MySQL之連線查詢和子查詢MySql
- Apache Spark SQL的高階Join連線技術ApacheSparkSQL
- 長連線和短連線的使用
- 連線池和連線數詳解
- http的長連線和短連線HTTP
- LINUX 硬連線和軟連線Linux
- Oracle的左連線和右連線Oracle
- 【T01】理解面向連線和無連線協議之間的區別協議
- 如何使用 IDEA 資料庫工具連線 TDengine?Idea資料庫
- 交叉線和直連線
- 學習連連看 連線線之謎+道具的使用
- 深入淺出SQL之左連線、右連線和全連線SQL
- Spark之spark shellSpark
- spark基礎之spark sql執行原理和架構SparkSQL架構
- 區分socket連線和tcp/ip連線TCP
- 等值連線和自然連線的區別
- SQL中的左連線和右連線SQL
- 【spark筆記】在idea用maven匯入spark原始碼Spark筆記IdeaMaven原始碼
- IDEA開發Spark應用並提交本地Spark 2.1.0 standIdeaSpark
- 用idea配置c3p0連線池Idea
- 配置 IDEA 遠端連線應用伺服器Idea伺服器
- 在IntelliJ IDEA中建立和執行java/scala/spark程式IntelliJIdeaJavaSpark
- HTTP非持續連線和持續連線HTTP
- scrapy軟連線失效和pip軟連線失效