Atlas2.2.0編譯、安裝及使用(整合ElasticSearch，匯入Hive資料)

榆天紫夏發表於2022-05-11

原文網址 : https://www.cnblogs.com/yutianzixia/p/16257916.html

1、編譯階段

元件資訊：

元件名稱	版本
Atals	2.2.0
HBase	2.2.6
Hive	3.1.2
Hadoop	3.1.1
Kafka	2.11_2.4.1
Zookeeper	3.6.2
ElasticSearch	7.12.1

架構: x86 (已知arm編譯時會報node-sass缺少的問題，git上沒有對應arm架構的包)
作業系統：CentOS 7.6

說明：
1、編譯不包括其內嵌的HBase和Solr，只編譯Atlas本身
2、下面的步驟中有些並非編譯過程報錯，而是具體使用中或匯入hive後設資料時報的錯，提前修改

步驟：
step1：官網下載Atlas-2.2.0原始碼,解壓

step2：配個國內源。可以在maven的conf目錄下settings.xml裡配置，也可以在專案的pom.xml裡配置，這裡貼阿里源做參考

step3：提前從Here下一個nodejs的包放到maven倉庫下，目錄參考
$MAVEN_REPOSITORY/com/github/eirslett/node/12.16.0/node-12.16.0-linux-x64.tar.gz

注意下下來的包名字叫node-v12.16.0-linux-x64.tar.gz，放在maven倉庫裡的時候要把裡面的v去掉。如果不提前下，編譯時候自己下載的速度很慢

step4：主pom.xml裡新增下面兩個依賴

                        <dependency>
                <groupId>org.restlet.jee</groupId>
                <artifactId>org.restlet</artifactId>
                <version>2.4.0</version>
            </dependency>
            
            <dependency>
                <groupId>org.restlet.jee</groupId>
                <artifactId>org.restlet.ext.servlet</artifactId>
                <version>2.4.0</version>
            </dependency>

step5：修改./intg/src/main/java/org/apache/atlas/ApplicationProperties.java
註釋掉line 365 LOG.info("Setting " + SOLR_WAIT_SEARCHER_CONF + " = " + getBoolean(SOLR_WAIT_SEARCHER_CONF));

這步是因為我們採用es作為查詢引擎，solr的相關配置都會註釋掉，而這行呼叫會在匯入hive後設資料的時候報錯

step6：把專案裡的jsr311-api改成javax.ws.rs-api (6處，可以直接在專案目錄下grep -rn搜)，並修改主pom.xml中jsr.version為2.0.1

這步主要影響六個支援的元件的資料匯入及後續，包括hbase、hive、sqoop、impala、falcon、storm
主要原因：jsr311-api包中javax.ws.rs.core包中沒有Link類，而Atlas以HBase作為後設資料儲存，HBase本身使用的為javax.ws.rs-api包中的core包，其中有Link類，所以呼叫指令碼匯入資料時會報以下錯誤

step7：執行編譯命令mvn clean package -DskipTests -Drat.skip=true -Pdist

編譯後的包在./distro/target目下,server包即為Atlas部署包，bin包為整合了常用hook(如hbase-hook)的部署包

2、部署階段

前置條件：
叢集內有正常執行且可用的hadoop、hive、hbase、kafka、zk、es，且atlas所在節點必須有hbase及hive的配置檔案目錄

步驟：
step1:解壓atlas-bin包(這裡以/data/apps為例，順便改個名

tar -zxvf apache-atlas-2.2.0-bin.tar.gz -C /data/apps
mv apache-atlas-2.2.0 atlas-2.2.0

step2:解壓hook包(這裡以hive-hook為例)，並拷貝內容到atlas安裝目錄下

tar -zxvf apache-atlas-2.2.0-hive-hook.tar.gz -C /data/apps/
/usr/bin/cp /data/apps/apache-atlas-hive-hook-2.2.0/* /data/apps/atlas-2.2.0/

step3:修改atlas配置檔案(有的配置是已有的，修改即可；有的配置沒有，需要加)
atlas-application.properties:

     #atlas server config
             atlas.rest.address=http://atlas-ip:21000
        atlas.server.run.setup.on.start=false
      
     #hbase config
        atlas.audit.hbase.tablename=apache_atlas_entiry_audit
        atuls.audit.zookeeper.session.timeout.ms=1000
        atlas.audit.hbase.zookeeper.quorum=zk地址
             atlas.graph.storage.hostname=zk地址
        
     #solr config
      #註釋掉所有和solr相關的配置項
        
     #es config
              atlas.graph.index.search.backend=elasticsearch
        atlas.graph.index.search.hostname=es-ip:9200
        atlas.graph.index.search.elasticsearch.client-only=true
        atlas.graph.index.search.elasticsearch.http.auth.type=basic
        atlas.graph.index.search.elasticsearch.http.auth.basic.username=elastic
        atlas.graph.index.search.elasticsearch.http.auth.basic.password=Cestc!666
      
    #kafka config
             atlas.nofification.embedded=false
        atlas.kafka.data=/data/log/kafka
        atlas.kafka.zookeeper.connect=zk地址/kafkaCluster
        atlas.kafka.bootstrap.servers=kafka地址
      
    #hive config
             atlas.hook.hive.numRetries=3
        atlas.hook.hive.queueSize=10000
        atlas.cluster.name=primary

atlas-env.sh:

        export HBASE_CONF_DIR=/data/apps/hbase-2.2.6/conf

atlas-log4j.xml:

        #去掉org.apache.log4j.DailyRollingFileAppender一塊的註釋來暴露效能指標

step4:將atlas-application.properties分發到所有hive所在節點的hive/conf目錄下

step5:分發hive-hook目錄到hive節點下,並修改hive配置檔案

ssh hive-node "mkdir -p /data/apps/atlas-2.2.0/hook"
scp -r /data/apps/atlas-2.2.0/hook/hive hive-node:$PWD

hive-site.xml

           <property>
            <name>hive.exec.post.hooks</name>
            <value>org.apache.atlas.hive.hook.HiveHook</value>
        </property>

hive-env.sh

       export HIVE_AUX_JARS_PATH=/data/apps/atlas-2.2.0/hook/hive

step6:重啟Hive

step7:呼叫atlas啟動指令碼啟動服務

$ATLAS_HONE/bin/atlas_start.py

啟動過程如下圖所示

該過程會耗時較久，包含index建立、資料的初始化等操作
此時可以跟一下atlas的啟動日誌，直到日誌不再重新整理，再lsof或netstat查一下21000是否已經監聽了，如已存在，則開啟瀏覽器輸入ip:21000登入atlas頁面

千萬不要相信他提示的Apache Atlas Server started!!!和jps顯示的Atlas程式，因為啟動指令碼超過一定時間後一定會報成功，但此時21000埠還未被監聽，服務是不可用的，真正可用還是以21000被成功監聽，可以進到Atlas登入頁面為準

3、使用階段

說明：
此處我們以Hive的後設資料匯入及使用為例，其它資料來源使用類似

步驟：
step1:進入atlas安裝目錄下，執行hook-bin中的import-hive.sh指令碼

$ATLAS_HOME/hook-bin/import-hive.sh

執行後如下圖

過程中會提示輸入atlas使用者名稱和密碼，都輸入admin即可
成功後會提示

該過程時間視hive現有資料量大小而定

step2:登入Atlas Web頁面
開啟瀏覽器輸入ip:21000登入atlas頁面

登入後如下圖

此時可以點選右上角小圖示

檢視總體資料情況

檢視所有hive表

隨便點選一個表檢視詳情

可以清楚地看到這個表的各項資訊、欄位及血緣圖等

我們也可以通過左側搜尋欄檢索過濾想要查詢的項

以上就是我在生產環境中部署Atlas-2.2.0並整合es、hive的過程，使用時可以點選頁面操作，也可通過呼叫Rest API整合到自己系統裡用

本文首發於部落格園，作者榆天紫夏，希望對大家有所幫助。原文地址https://www.cnblogs.com/yutianzixia/p/16257916.html。如有遺漏或問題歡迎補充指正

elasticsearch匯入匯出工具elasticdump安裝和使用小記
2018-08-28
Elasticsearch
MySQL免編譯安裝及登入（5.6.36）
2018-11-25
MySql編譯
HIVE資料匯入基礎
2021-09-09
Hive
sqoop1.4.7環境搭建及mysql資料匯入匯出到hive
2019-01-30
OOPMySqlHive
使用VUE+SpringBoot+EasyExcel 整合匯入匯出資料
2022-05-14
VueSpring BootExcel
ElasticSearch安裝及java Api使用
2019-01-10
ElasticsearchJavaAPI
【Hive一】Hive安裝及配置
2018-05-06
Hive
Linux環境Hive安裝配置及使用
2019-02-27
LinuxHive
sqoop用法之mysql與hive資料匯入匯出
2020-12-22
OOPMySqlHive
將資料匯入kudu表（建立臨時hive表，從hive匯入kudu）步驟
2020-09-24
Hive
PG資料庫定時任務：PgAgent編譯安裝使用
2018-12-20
資料庫編譯
Sqoop將MySQL資料匯入到hive中
2019-01-30
OOPMySqlHive
Elasticsearch批量匯入資料指令碼（python）
2018-08-11
Elasticsearch指令碼Python
極速匯入elasticsearch測試資料
2022-09-11
Elasticsearch
protobuf 編譯工具安裝與使用
2020-01-19
編譯
nginx 編譯安裝與配置使用
2020-12-14
Nginx編譯
使用Logstash工具匯入sqlserver資料到elasticSearch及elk分散式日誌中心
2023-01-15
SQLServerElasticsearch分散式
Typescript安裝及編譯《CMD命令列方法》
2019-09-05
TypeScript編譯命令列
Windows下ElasticSearch的Head安裝及基本使用
2019-07-22
WindowsElasticsearch
Hive -------- 使用mysql儲存hive後設資料，Mysql的安裝以及配置步驟
2018-11-12
HiveMySql
Nebula Exchange 工具 Hive 資料匯入的踩坑之旅
2021-01-11
Hive
Hive資料匯入HBase引起資料膨脹引發的思考
2020-12-08
Hive
編譯安裝zabbix
2019-06-01
編譯
安裝編譯ffmpeg
2024-06-29
編譯
Griffin編譯安裝
2024-05-24
編譯
編譯安裝nmon
2022-12-20
編譯
swoole 編譯安裝
2022-06-23
編譯
apache編譯安裝
2022-07-18
Apache編譯
Elasticsearch 安裝和使用
2020-04-17
Elasticsearch
Hive 與 ElasticSearch 的資料互動
2019-01-27
HiveElasticsearch
ffmpeg安裝之linux編譯安裝
2021-06-01
Linux編譯
資料整合實現以及平臺安裝部署入門
2023-10-30
httpd編譯安裝php
2018-10-17
httpd編譯PHP
Shell編譯安裝nginx
2020-06-14
編譯Nginx
Linux 編譯安裝 Python
2024-05-29
Linux編譯Python
Linux編譯安裝Nginx
2021-09-09
Linux編譯Nginx
centos PHP 編譯安裝
2021-03-16
CentOSPHP編譯
Hive安裝
2020-09-24
Hive

Atlas2.2.0編譯、安裝及使用(整合ElasticSearch，匯入Hive資料)

1、編譯階段

2、部署階段

3、使用階段

相關文章