大資料協作框架Hue
大資料協作框架Hue
一,概述
1,參考文件
http://gethue.com/ 官網
http://github.com/cloudera/hue 原始碼
http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6/ hue安裝指南
3,特性:
- free&open source
- be productive
- 100%compatible
- 4dynamic search dashboar with solr(動態的solr整合)
- spark and hadoop notebooks
4,結構示意圖:
二,Hue的安裝和部署
1,下載原始碼包CDH5.3.6版本:
http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6.tar.gz
2,虛擬機器連線網際網路
3,安裝hue所以依賴的系統包,針對不同的unix系統,需要root許可權
yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel openldap-devel python-devel sqlite-devel openssl-devel mysql-devel gmp-devel
4,解壓hue原始碼包到指定的目錄下
[root@xingyunfei001 app]# tar zxf hue-3.7.0-cdh5.3.6.tar.gz
5,編譯原始碼包
[root@xingyunfei001 app]# cd hue-3.7.0-cdh5.3.6/
[root@xingyunfei001 hue-3.7.0-cdh5.3.6]# make apps
6,修改配置檔案hue.ini
vi /opt/app/hue-3.7.0-cdh5.3.6/desktop/conf/hue.ini
# Set this to a random string, the longer the better.
# This is used for secure hashing in the session store.
secret_key=qpbdxoewsqlkhztybvfidtvwekftusgdlofbcfghaswuicmqp
# Webserver listens on this address and port
http_host=xingyunfei001.comcn
http_port=8888
# Time zone name
time_zone=Asia/Shanghai
7,啟動hue
[hadoop001@xingyunfei001 hue-3.7.0-cdh5.3.6]$ build/env/bin/supervisor
8,瀏覽器檢視
http://xingyunfei001.com.cn:8888/accounts/login/?next=/
三,hue整合hadoop2.x
1,修改hadoop的hdfs-site.xml配置檔案:
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
2,修改hadoop的core-site.xml配置檔案
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
3,修改hue的hue.ini配置檔案
[hadoop]
# Configuration for HDFS NameNode
# ------------------------------------------------------------------------
[[hdfs_clusters]]
# HA support by using HttpFs
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://xingyunfei001.com.cn:8020
# NameNode logical name.
## logical_name=
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
webhdfs_url=http://xingyunfei001.com.cn:50070/webhdfs/v1
# Change this if your HDFS cluster is Kerberos-secured
## security_enabled=false
# Default umask for file and directory creation, specified in an octal value.
## umask=022
# Directory of the Hadoop configuration
hadoop_hdfs_home=/opt/app/hadoop_2.5.0_cdh
hadoop_bin=/opt/app/hadoop_2.5.0_cdh/bin
hadoop_conf_dir=/opt/app/hadoop_2.5.0_cdh/etc/hadoop
[[yarn_clusters]]
[[[default]]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=xingyunfei001.com.cn
# The port where the ResourceManager IPC listens on
resourcemanager_port=8032
# Whether to submit jobs to this cluster
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false
# URL of the ResourceManager API
resourcemanager_api_url=http://xingyunfei001.com.cn:8088
# URL of the ProxyServer API
proxy_api_url=http://xingyunfei001.com.cn:8088
# URL of the HistoryServer API
history_server_api_url=http://xingyunfei001.com.cn:19888
# In secure mode (HTTPS), if SSL certificates from Resource Manager's
# Rest Server have to be verified against certificate authority
## ssl_cert_ca_verify=False
# HA support by specifying multiple clusters
# e.g.
# [[[ha]]]
# Resource Manager logical name (required for HA)
## logical_name=my-rm-name
4,重新啟動hdfs
[hadoop001@xingyunfei001 hadoop_2.5.0_cdh]$ sbin/start-all.sh
[hadoop001@xingyunfei001 hadoop_2.5.0_cdh]$ sbin/mr-jobhistory-daemon.sh start historyserver
5,重新啟動hue伺服器
[hadoop001@xingyunfei001 hue-3.7.0-cdh5.3.6]$ build/env/bin/supervisor
6,檢視測試結果
四,hue整合hive
1,配置hue.ini配置檔案
[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=xingyunfei001.com.cn
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/opt/app/hive_0.13.1_cdh/conf
# Timeout in seconds for thrift calls to Hive service
server_conn_timeout=120
# Choose whether Hue uses the GetLog() thrift call to retrieve Hive logs.
# If false, Hue will use the FetchResults() thrift call instead.
## use_get_log_api=true
# Set a LIMIT clause when browsing a partitioned table.
# A positive value will be set as the LIMIT. If 0 or negative, do not set any limit.
## browse_partitioned_table_limit=250
# A limit to the number of rows that can be downloaded from a query.
# A value of -1 means there will be no limit.
# A maximum of 65,000 is applied to XLS downloads.
## download_row_limit=1000000
# Hue will try to close the Hive query when the user leaves the editor page.
# This will free all the query resources in HiveServer2, but also make its results inaccessible.
## close_queries=false
# Thrift version to use when communicating with HiveServer2
## thrift_version=5
2,修改hive的hive-site.xml檔案配置metastore server
<property>
<name>hive.metastore.uris</name>
<value>thrift://xingyunfei001.com.cn:9083</value>
</property>
3,啟動metastore server(先啟動)和hiveserver2
nohup bin/hive --service metastore &
[hadoop001@xingyunfei001 hive_0.13.1_cdh]$ bin/hiveserver2
4,修改hdfs檔案系統的/tmp許可權
[hadoop001@xingyunfei001 hadoop_2.5.0_cdh]$ bin/hdfs dfs -chmod -R o+x /tmp
5,檢視配置是否生效
select id,url,referer from track_log limit 10;
五,hive整合RDBMS
1,修改hue.ini配置檔案
[[databases]]
# sqlite configuration.
[[[sqlite]]]
# Name to show in the UI.
nice_name=SQLite
# For SQLite, name defines the path to the database.
name=/opt/app/hue-3.7.0-cdh5.3.6/desktop/desktop.db
# Database backend to use.
engine=sqlite
# Database options to send to the server when connecting.
# https://docs.djangoproject.com/en/1.4/ref/databases/
## options={}
# mysql, oracle, or postgresql configuration.
[[[mysql]]]
# Name to show in the UI.
nice_name="My SQL DB"
# For MySQL and PostgreSQL, name is the name of the database.
# For Oracle, Name is instance of the Oracle server. For express edition
# this is 'xe' by default.
## name=mysqldb
# Database backend to use. This can be:
# 1. mysql
# 2. postgresql
# 3. oracle
engine=mysql
# IP or hostname of the database to connect to.
host=xingyunfei001.com.cn
# Port the database server is listening to. Defaults are:
# 1. MySQL: 3306
# 2. PostgreSQL: 5432
# 3. Oracle Express Edition: 1521
port=3306
# Username to authenticate with when connecting to the database.
user=root
# Password matching the username to authenticate with when
# connecting to the database.
password=root
# Database options to send to the server when connecting.
# https://docs.djangoproject.com/en/1.4/ref/databases/
## options={}
2,重新啟動hue
build/env/bin/supervisor
3,檢視配置是否生效
相關文章
- 大資料專案實踐(五)——Hue安裝大資料
- 大資料學習筆記(十六)-Hue的安裝部署和使用大資料筆記
- 大資料作業大資料
- 大資料框架圖大資料框架
- Spark如何與深度學習框架協作,處理非結構化資料Spark深度學習框架
- 細說資料庫協作運維資料庫運維
- 大資料常用處理框架大資料框架
- 大資料框架原理簡介大資料框架
- huichen/mlf: 大資料機器學習框架UI大資料機器學習框架
- CDH 5.13 hue資料庫連線測試失敗資料庫
- 大資料學習筆記(十五)-大資料排程框架大資料筆記框架
- 淺析大資料框架 Hadoop大資料框架Hadoop
- 多變數資料協同可視探索框架變數框架
- Socket開發框架之資料傳輸協議框架協議
- 前端三大框架:資料繫結與資料流前端框架
- 大資料開源框架特點大總結大資料框架
- 人機協作是如何自動化分類資料
- 零基礎大資料學習框架大資料框架
- Hadoop大資料開發框架學習Hadoop大資料框架
- 大資料,大資料,大資料大資料
- 地區調研 京津冀協同框架下大資料產業差異互補發展研究框架大資料產業
- 基於AI的資料架構:業務在前,協作在後AI架構
- 資料視覺化大屏製作須知視覺化
- 大資料生態圈技術框架總攬大資料框架
- Hue安裝依賴
- 京東大資料:高溫天氣夏日清涼作戰大資料大資料
- 取人類與大模型之長,人機協作式智慧軟體開發框架AgileGen來了大模型框架
- 大快搜尋的大資料一體化開發框架下的大資料爬蟲安裝教程大資料框架爬蟲
- 大資料日知錄 03 叢集 分散式協調大資料分散式
- 大資料元件-Hive部署基於MySQL作為後設資料儲存大資料元件HiveMySql
- 大資料將作為保險業重點戰略大資料
- Spark大資料處理框架入門(單機版)Spark大資料框架
- DKHadoop大資料開發框架的構成模組Hadoop大資料框架
- DKhadoop大資料平臺基礎框架方案概述Hadoop大資料框架
- 盤點五種主流的大資料計算框架大資料框架
- 給 Java 開發者的 10 個大資料工具和框架Java大資料框架
- 團隊協作的五大障礙?用飛項輕鬆解決團隊協作難題
- http資料協商HTTP