Spark之HiveSupport連線(spark-shell和IDEA)
本文介紹了使用Spark連線Hive的兩種方式,spark-shell和IDEA遠端連線。
1.spark-shell
1.1.拷貝配置檔案
- 拷貝hive/conf/hdfs-site.xml 到 spark/conf/ 下
- 拷貝hive/lib/mysql 到 spark/jars/下
這裡可以通過如下引數來實現指定jar-path
--driver-class-path path/mysql-connector-java-5.1.13-bin.jar
1.2.啟動spark-shell
spark.sql("show databases").show()
spark.sql("use test")
spark.sql("select * from student").show()
執行結果:
[hadoop@hadoop1 spark-2.3.0-bin-hadoop2.7]$ ./bin/spark-shell
2018-09-04 11:43:10 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop1:4040
Spark context available as 'sc' (master = local[*], app id = local-1536032600945).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show databases").show()
2018-09-04 11:43:54 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
+------------+
|databaseName|
+------------+
| default|
| test|
+------------+
scala> spark.sql("use test")
res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("select * from student").show()
+----+-----+---+----+-----+
| sno|sname|sex|sage|sdept|
+----+-----+---+----+-----+
|1001| 張三| 男| 22| 高一|
|1002| 李四| 女| 25| 高二|
+----+-----+---+----+-----+
2.IDEA連線Hive
這裡是連線遠端的Hive,如果還沒有部署Hive,請參考Hive之環境安裝,前提是必須先啟動hdfs。
2.1.引入依賴
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
<!--<scope>provided</scope>-->
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.0</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency><!--資料庫驅動:Mysql-->
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.40</version>
</dependency>
2.2.拷貝配置檔案
拷貝hive-site.xml到專案的resources目錄下即可
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop1:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
</configuration>
2.3.編寫程式碼
object HiveSupport {
def main(args: Array[String]): Unit = {
//val warehouseLocation = "D:\\workspaces\\idea\\hadoop"
val spark =
SparkSession.builder()
.appName("HiveSupport")
.master("local[2]")
//拷貝hdfs-site.xml不用設定,如果使用本地hive,可通過該引數設定metastore_db的位置
//.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport() //開啟支援hive
.getOrCreate()
//spark.sparkContext.setLogLevel("WARN") //設定日誌輸出級別
import spark.implicits._
import spark.sql
sql("show databases")
sql("use test")
sql("select * from student").show()
Thread.sleep(150 * 1000)
spark.stop()
}
}
執行結果:
+----+-----+---+----+-----+
| sno|sname|sex|sage|sdept|
+----+-----+---+----+-----+
|1001| 張三| 男| 22| 高一|
|1002| 李四| 女| 25| 高二|
+----+-----+---+----+-----+
參考:
相關文章
- Spark Streaming監聽HDFS檔案(Spark-shell)Spark
- IDEA連線MySQLIdeaMySql
- JAVA之長連線、短連線和心跳包Java
- IDEA官方下載連線Idea
- IDEA中資料庫連線Idea資料庫
- 菜鳥學網路之 —— 長連線和短連線
- 長連線和短連線
- Socket連線和Http連線HTTP
- 在IntelliJ IDEA中建立和執行java/scala/spark程式IntelliJIdeaJavaSpark
- MySQL之連線查詢和子查詢MySql
- sql 內連線和外連線SQL
- Spark之spark shellSpark
- 如何使用 IDEA 資料庫工具連線 TDengine?Idea資料庫
- Apache Spark SQL的高階Join連線技術ApacheSparkSQL
- 【spark筆記】在idea用maven匯入spark原始碼Spark筆記IdeaMaven原始碼
- IDEA開發Spark應用並提交本地Spark 2.1.0 standIdeaSpark
- http的長連線和短連線HTTP
- 長連線和短連線的使用
- 連線池和連線數詳解
- cdh版spark on yarn與idea直連操作sql遇到的一些問題SparkYarnIdeaSQL
- 配置 IDEA 遠端連線應用伺服器Idea伺服器
- 用idea配置c3p0連線池Idea
- WebSocket系列之如何建立和維護可靠的連線Web
- idea 連線遠端 docker 並部署專案到 dockerIdeaDocker
- idea-Springboot新增依賴以及連線mysql--002IdeaSpring BootMySql
- 記錄一次spark連線mysql遇到的問題SparkMySql
- 【技術乾貨】程式碼示例:使用 Apache Spark 連線 TDengineApacheSpark
- scrapy軟連線失效和pip軟連線失效
- HTTP非持續連線和持續連線HTTP
- Spark on Yarn 和Spark on MesosSparkYarn
- 使用Intellij IDEA遠端除錯Spark程式IntelliJIdea除錯Spark
- IDEA開發Spark應用實戰(Scala)IdeaSpark
- Spark入門(二)--如何用Idea執行我們的Spark專案SparkIdea
- Python連線es筆記四之建立和刪除操作Python筆記
- 卷積層和全連線層之間的關係卷積
- 12、Swoole 中 TCP、UDP 和長連線、短連線TCPUDP
- 1.6.3.3. 本地連線和安全的遠端連線
- LAN連線和WAN連線有什麼區別?