Atlas 2.1.0 實踐（3）—— Atlas整合HIve

獨孤風發表於2021-01-25

原文網址 : https://www.cnblogs.com/tree1123/p/14326279.html

Hive

Atlas整合Hive

在安裝好Atlas以後，如果想要使用起來，還要讓Atlas與其他元件建立聯絡。

其中最常用的就是Hive。

通過Atlas的架構，只要配置好Hive Hook ，那麼每次Hive做任何操作就會寫入Kafka從而被atlas接收。

並在Atlas中已圖的形式展示出來。

Hive Model

都會記錄Hive哪些操作資訊呢？Altas對Hive Model進行了定義。

包含以下內容：

1、實體型別：

hive_db

型別： Asset

屬性：qualifiedName, name, description, owner, clusterName, location, parameters, ownerName

hive_table

型別：DataSet

屬性：qualifiedName, name, description, owner, db, createTime, lastAccessTime, comment, retention, sd, partitionKeys, columns, aliases, parameters, viewOriginalText, viewExpandedText, tableType, temporary

hive_column

型別：DataSet

屬性：qualifiedName, name, description, owner, type, comment, table

hive_storagedesc

型別：Referenceable

屬性： qualifiedName, table, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories

hive_process

型別：Process

屬性：qualifiedName, name, description, owner, inputs, outputs, startTime, endTime, userName, operationType, queryText, queryPlan, queryId, clusterName

hive_column_lineage

型別：Process

屬性：qualifiedName, name, description, owner, inputs, outputs, query, depenendencyType, expression

2、列舉型別：

hive_principal_type 值：USER, ROLE, GROUP

3、構造型別

hive_order 屬性： col, order

hive_serde 屬性： name, serializationLib, parameters

HIve實體的結構：

hive_db.qualifiedName:     <dbName>@<clusterName>
hive_table.qualifiedName:  <dbName>.<tableName>@<clusterName>
hive_column.qualifiedName: <dbName>.<tableName>.<columnName>@<clusterName>
hive_process.queryString:  trimmed query string in lower case

配置Hive hook

hive hook會監聽hive的 create/update/delete 操作，下面是配置步驟：

1、修改hive-env.sh（指定包地址）

export HIVE_AUX_JARS_PATH=/opt/apps/apache-atlas-2.1.0/hook/hive

2、修改hive-site.xml（配置完需要重啟hive）

<property>
    <name>hive.exec.post.hooks</name>
    <value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
1234

注意，這裡其實是執行後的監控，可以有執行前，執行中的監控。

3、同步配置
拷貝atlas配置檔案atlas-application.properties到hive配置目錄
新增配置：

atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
atlas.rest.address=http://doit33:21000

將Hive後設資料匯入Atlas

bin/import-hive.sh

Using Hive configuration directory [/opt/module/hive/conf]

Log file for import is /opt/module/atlas/logs/import-hive.log

log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout.

log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout.

輸入使用者名稱：admin；輸入密碼：admin

Enter username for atlas :- admin

Enter password for atlas :-

Hive Meta Data import was successful!!!

踩坑全記錄

一、找不到類 org.apache.atlas.hive.hook.hivehook

hive第三方jar包沒加進去

小技巧使用hive-shell 看一下jar包加進去沒有 set這將列印由使用者或配置單元覆蓋的配置變數列表。

以加入elsaticsearch-hadoop-2.1.2.jar為例，講述在Hive中加入第三方jar的幾種方式。

1，在hive shell中加入

hive> add jar /home/hadoop/elasticsearch-hadoop-hive-2.1.2.jar;

連線方式	是否有效
Hive Shell	不需要重啟Hive服務就有效
Hive Server	無效

2，Jar放入${HIVE_HOME}/auxlib目錄

在${HIVE_HOME}中建立資料夾auxlib，然後將自定義jar檔案放入該資料夾中。
此方法新增不需要重啟Hive。而且比較便捷。

連線方式	是否有效
Hive Shell	不需要重啟Hive服務就有效
Hive Server	重啟Hive服務才生效

3，HIVE.AUX.JARS.PATH和hive.aux.jars.path

hive-env.sh中的HIVE.AUX.JARS.PATH和hive-site.xml的hive.aux.jars.path配置對伺服器無效，僅對當前hive shell有效，不同的hive shell相互不影響，每個hive shell都需要配置，可以配置成資料夾形式。
HIVE.AUX.JARS.PATH和hive.aux.jars.path僅支援本地檔案。可配置成檔案，也可配置為資料夾。

連線方式	是否有效
Hive Shell	重啟Hive服務才生效
Hive Server	重啟Hive服務才生效

二、HIVE報錯 Failing because I am unlikely to write too

HIVE.AUX.JARS.PATH配置不對

hive-env.sh指令碼中有一段

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
if [ "${HIVE_AUX_JARS_PATH}" != "" ]; then
  export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}
elif [ -d "/usr/hdp/current/hive-webhcat/share/hcatalog" ]; then
  export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog
fi

如果給HIVE_AUX_JARS_PATH設值，則/usr/hdp/current/hive-webhcat/share/hcatalog就會被忽略掉。

hive只能讀取一個HIVE_AUX_JARS_PATH

在一個地方集中放置我們的共享jar包，然後在/usr/hdp/current/hive-webhcat/share/hcatalog下面建立一相應的軟連線就可以

sudo -u hive ln -s /usr/lib/share-lib/elasticsearch-hadoop-2.1.0.Beta4.jar /usr/hdp/current/hive-webhcat/share/hcatalog/elasticsearch-hadoop-2.1.0.Beta4.jar

瞭解大資料實時計算感受資料流動之美歡迎關注實時流式計算

Atlas 2.1.0 實踐（2）—— 安裝Atlas
2020-12-30
Atlas 2.1.0 實踐（1）—— 編譯Atlas
2020-12-21
編譯
Atlas 2.1.0 實踐（4）—— 許可權控制
2021-02-01
Atlas2.2.0編譯、安裝及使用(整合ElasticSearch，匯入Hive資料)
2022-05-11
編譯ElasticsearchHive
當Atlas遇見Flink——Apache Atlas 2.2.0釋出！
2021-08-24
Apache
阿里的Atlas元件化框架
2018-05-10
阿里元件化框架
docker+atlas+mysql實現讀寫分離
2020-11-08
DockerMySql
雲知聲 Atlas 超算平臺: 基於 Fluid + Alluxio 的計算加速實踐
2021-11-04
UIUX
MYSQL 主從 + ATLAS 讀寫分離搭建
2019-09-18
MySql
Mysql之讀寫分離架構-Atlas
2021-04-23
MySql架構
Mysql 高可用(MHA)-讀寫分離(Atlas)
2021-02-26
MySql
【上海】Atlas Protocol招聘軟體工程師（Golang，JS）
2018-06-21
Protocol軟體工程工程師GolangJS
AI開天記：Atlas 900的鵬城故事
2019-11-29
AI
擎天神Atlas，撐起智慧計算風暴
2019-03-24
使用Atlas進行後設資料管理之Glossary
2018-11-16
華為Atlas 200I DK A2開箱！
2023-05-18
再見！波士頓動力人形機器人Atlas
2024-04-17
機器人
開源Android容器化框架Atlas開發者指南
2019-08-16
Android框架
Atlas訪談：《女神異聞錄5》的聲音製作
2018-07-26
巡航在萬物智慧的海洋：Atlas 500的東京攻略
2019-06-17
CDH6.3.2整合ranger2.1.0
2024-11-23
Ranger
Atlas：2022年安全漏洞統計谷歌以 1372 個位居榜首
2023-03-16
谷歌
位元組跳動基於Apache Atlas的近實時訊息同步能力最佳化
2022-11-02
Apache
Spark整合hive
2020-11-05
SparkHive
Hive效能調優實踐 - Vidhya
2022-02-20
Hive
.NET Core(.NET 6)控制檯應用程式與MongoDB Atlas入門實戰示例教程詳解
2022-02-26
MongoDB
New Atlas：研究稱野火暴露可能會增加患癌症的風險
2022-05-11
Flume和Hive整合之hive sink
2020-12-19
Hive
波士頓動力新版人形機器人Atlas問世，純電驅動
2024-04-19
機器人
華為釋出Atlas系列新品，強大算力引領AI未來
2019-05-13
AI
資料治理之後設資料管理的利器——Atlas入門寶典
2021-10-29
hive編寫udf實踐記錄
2020-11-10
Hive
Atlas VPN：安全黑市中出售51%的漏洞是針對微軟產品的
2021-07-22
微軟
200km²超大海洋世界，《代號：ATLAS》比想象中還要浩瀚
2021-05-27
MongoDB 整合SpringBoot實踐
2019-03-23
MongoDBSpring Boot
SpringBoot 實踐系列-整合 RocketMQ
2020-04-06
Spring BootMQ
Hive 整合 Hudi 實踐（含程式碼）| 可能是全網最詳細的資料湖系列
2020-05-28
Hive
全自動打工「人」！波士頓動力Atlas進廠影片火了，不斷電不下班
2024-10-31