greenplum6.14、GPCC6.4安裝詳解

一寸HUI發表於2021-03-31

最近在做gp的升級和整改,所以把做的內容整理下,這篇文章主要是基於gp6.14的安裝,主要分為gp,gpcc,pxf的一些安裝和初始化。本文為部落格園作者所寫: 一寸HUI,個人部落格地址:https://www.cnblogs.com/zsql/

一、安裝前準備

1.1、機器選型

參考:https://cn.greenplum.org/wp-content/uploads/2020/08/%E7%AC%AC%E4%B8%80%E8%8A%82%E8%AF%BE-%E8%85%BE%E8%AE%AF%E4%BA%91.pptx.pdf

1.2、網路引數調整

echo "10000 65535" > /proc/sys/net/ipv4/ip_local_port_range #net.ipv4.ip_local_port_range,定義了地tcp/udp的埠範圍。可以理解為系統中的程式會選擇這個範圍內的埠來連線到目的埠。
echo 1024 > /proc/sys/net/core/somaxconn #net.core.somaxconn,服務端所能accept即處理資料的最大客戶端數量,即完成連線上限。預設值是128,建議修改成1024。
echo 16777216 > /proc/sys/net/core/rmem_max #net.core.rmem_max,接收套接字緩衝區大小的最大值。預設值是229376,建議修改成16777216。
echo 16777216 > /proc/sys/net/core/wmem_max #net.core.wmem_max,傳送套接字緩衝區大小的最大值(以位元組為單位)。預設值是229376,建議修改成16777216。
echo "4096 87380 16777216" > /proc/sys/net/ipv4/tcp_rmem #net.ipv4.tcp_rmem,配置讀緩衝的大小,三個值,第一個是這個讀緩衝的最小值,第三個是最大值,中間的是預設值。預設值是"4096 87380 6291456",建議修改成"4096 87380 16777216"echo "4096 65536 16777216" > /proc/sys/net/ipv4/tcp_wmem #net.ipv4.tcp_wmem,配置寫緩衝的大小,三個值,第一個是這個寫緩衝的最小值,第三個是最大值,中間的是預設值。預設值是"4096 16384 4194304",建議修改成"4096 65536 16777216"echo 360000 > /proc/sys/net/ipv4/tcp_max_tw_buckets #net.ipv4.max_tw_buckets,表示系統同時保持TIME_WAIT套接字的最大數量。預設值是2048,建議修改成360000

參考:https://support.huaweicloud.com/tngg-kunpengdbs/kunpenggreenplum_05_0011.html

1.3、磁碟和I/O調整

mount -o rw,nodev,noatime,nobarrier,inode64 /dev/dfa /data  #掛載
/sbin/blockdev --setra 16384 /dev/dfa #配置readhead,減少磁碟的尋道次數和應用程式的I/O等待時間,提升磁碟讀I/O效能
echo deadline > /sys/block/dfa/queue/scheduler #配置IO排程,deadline更適用於Greenplum資料庫場景
grubby --update-kernel=ALL --args="elevator=deadline"

vim /etc/security/limits.conf #配置檔案描述符
#新增如下行

* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072

修改:/etc/security/limits.d/20-nproc.conf

#新增如下內容

*          soft    nproc     131072
root       soft    nproc     unlimited

 參考:http://docs-cn.greenplum.org/v6/best_practices/sysconfig.html

1.4、核心引數調整

vim  /etc/sysctl.conf  ,設定完成後 過載引數( sysctl -p),注意:如下的內容根據記憶體的實際情況進行修改

# kernel.shmall = _PHYS_PAGES / 2 # See Shared Memory Pages  # 共享記憶體
kernel.shmall = 4000000000
# kernel.shmmax = kernel.shmall * PAGE_SIZE                  # 共享記憶體
kernel.shmmax = 500000000
kernel.shmmni = 4096
vm.overcommit_memory = 2 # See Segment Host Memory           # 主機記憶體
vm.overcommit_ratio = 95 # See Segment Host Memory           # 主機記憶體

net.ipv4.ip_local_port_range = 10000 65535 # See Port Settings 埠設定
kernel.sem = 500 2048000 200 40960
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.swappiness = 10
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_background_ratio = 0 # See System Memory       # 系統記憶體
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736
vm.dirty_bytes = 4294967296

部分引數相關解釋和計算(注意:還是根據實際情況進行計算):

kernel.shmall = _PHYS_PAGES / 2   #echo $(expr $(getconf _PHYS_PAGES) / 2)
kernel.shmmax = kernel.shmall * PAGE_SIZE  #echo $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
vm.overcommit_memory 系統使用該引數來確定可以為程式分配多少記憶體。對於GP資料庫,此引數應設定為2。
vm.overcommit_ratio 以為程式分配內的百分比,其餘部分留給作業系統。在Red Hat上,預設值為50。建議設定95 #vm.overcommit_ratio = (RAM-0.026*gp_vmem) / RAM

為避免在Greenplum初始化期間與其他應用程式之間的埠衝突,指定的埠範圍 net.ipv4.ip_local_port_range。
使用gpinitsystem初始化Greenplum時,請不要在該範圍內指定Greenplum資料庫埠。
例如,如果net.ipv4.ip_local_port_range = 10000 65535,將Greenplum資料庫基本埠號設定為這些值。
PORT_BASE = 6000
MIRROR_PORT_BASE = 7000

系統記憶體大於64G ,建議以下配置
vm.dirty_background_ratio = 0
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736 # 1.5GB
vm.dirty_bytes = 4294967296 # 4GB

1.5、java安裝

mkdir /usr/local/java
tar –zxvf  jdk-8u181-linux-x64.tar.gz –C /usr/local/java
cd /usr/local/java
ln -s jdk1.8.0_181/ jdk

配置環境變數:

配置環境變數:vim /etc/profile #新增如下內容
export JAVA_HOME=/usr/local/java/jdk
export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH

然後source /etc/profile,然後使用java,javac命令檢查是否安裝好

1.6、防火牆和SElinux(Security-Enhanced Linux)檢查

關閉防火牆:

systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl status firewalld

禁用SElinux

修改 /etc/selinux/config 設定 #修改如下內容
SELINUX=disabled

1.7、其他配置

禁用THP:grubby --update-kernel=ALL --args="transparent_hugepage=never" #新增引數後,重啟系統
設定RemoveIPC
vim /etc/systemd/logind.conf
RemoveIPC=no

1.8、配置hosts檔案

vim /etc/hosts  #把所有的ip  hostname配置進去

1.9、新建gpadmin使用者和使用者組並配置免密

新建使用者和使用者組(每個節點):

groupdel gpadmin
userdel gpadmin
groupadd gpadmin
useradd  -g gpadmin  -d /apps/gpadmin gpadmin

passwd gpadmin
或者echo 'ow@R99d7' | passwd gpadmin --stdin

配置免密:

ssh-keygen -t rsa

ssh-copy-id lgh1
ssh-copy-id lgh2
ssh-copy-id lgh3

 

二、Greenplum安裝

2.1、安裝

下載地址:https://network.pivotal.io/products/pivotal-gpdb#/releases/837931  #6.14.1版本

安裝:yum install -y greenplum-db-6.14.1-rhel7-x86_64.rpm  #預設安裝在/usr/local目錄下,安裝greenpum需要很多依賴,但是通過這樣安裝預設自動安裝需要的依賴。

 安裝完畢後會出現目錄:/usr/local/greenplum-db

chown -R gpadmin:gpadmin /usr/local/greenplum*

 檢視安裝的目錄:

[root@lgh1 local]# cd /usr/local/greenplum-db
[root@lgh1 greenplum-db]# ll
total 5548
drwxr-xr-x 7 root root    4096 Mar 30 16:44 bin
drwxr-xr-x 3 root root      21 Mar 30 16:44 docs
drwxr-xr-x 2 root root      58 Mar 30 16:44 etc
drwxr-xr-x 3 root root      19 Mar 30 16:44 ext
-rwxr-xr-x 1 root root     724 Mar 30 16:44 greenplum_path.sh
drwxr-xr-x 5 root root    4096 Mar 30 16:44 include
drwxr-xr-x 6 root root    4096 Mar 30 16:44 lib
drwxr-xr-x 2 root root      20 Mar 30 16:44 libexec
-rw-r--r-- 1 root root 5133247 Feb 23 03:17 open_source_licenses.txt
-rw-r--r-- 1 root root  519852 Feb 23 03:17 open_source_license_VMware_Tanzu_Greenplum_Streaming_Server_1.5.0_GA.txt
drwxr-xr-x 7 root root     135 Mar 30 16:44 pxf
drwxr-xr-x 2 root root    4096 Mar 30 16:44 sbin
drwxr-xr-x 5 root root      49 Mar 30 16:44 share

2.2、建立叢集相關主機檔案

在$GPHOME目錄建立兩個host檔案(all_host,seg_host),用於後續使用gpssh,gpscp 等指令碼host引數檔案
all_host : 內容是叢集所有主機名或ip,包含master,segment,standby等。
seg_host: 內容是所有 segment主機名或ip

2.3、配置環境變數

vim  ~/.bashrc
source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/apps/data1/master/gpseg-1
export PGPORT=5432
#然後使其生效
source ~/.bashrc

2.4、gpssh-exkeys 工具,打通n-n的免密登陸

 #gpssh-exkeys -f all_hosts

  2.5、驗證gpssh

#gpssh -f all_hosts

  2.6、同步master配置到各個主機

這裡主要是同步:/usr/local/greenplum-db檔案到其他的主機上,不管使用什麼方式,只要最終在每個機器上的狀態如下就好:ll /usr/local

 

 可以使用scp,也可以使用gp的方式(需要root賬號)注意:這裡會涉及到許可權的問題,所以使用root就好,最後通過chown去修改為gpadmin即可,下面的方式應該是不使用,除非每個主機的/usr/lcoal目錄的許可權為gpadmin,

打包:cd /usr/local &&  tar -cf /usr/local/gp6.tar greenplum-db-6.14.1
分發:gpscp -f /apps/gpadmin/conf/all_hosts /usr/local/gp6.tar =:/usr/local
然後解壓,並建立軟連結
gpssh -f /apps/gpadmin/conf/all_hosts
cd /usr/local
tar -xf gp6.tar
ln -s greenplum-db-6.14.1 greenplum-db
chown -R gpadmin:gpadmin /usr/local/greenplum*

2.7、建立master和segment相關檔案

#建立mster檔案
mkdir /data/data1/master
chown -R gpadmin:gpadmin /data/data1/master

#建立segment檔案
gpssh -f /apps/gpadmin/conf/seg_hosts -e 'mkdir -p /data/data1/primary'
gpssh -f /apps/gpadmin/conf/seg_hosts -e 'mkdir -p /data/data1/mirror'
gpssh -f /apps/gpadmin/conf/seg_hosts -e 'chown -R gpadmin:gpadmin /data/data1/'

不管使用什麼方式,也可以使用CRT批量建立,保證每個主機的目錄相同,所有的目錄的許可權為gpadmin:gpadmin即可,

2.8、初始化叢集

建立資料夾,並複製配置檔案:

mkdir -p ~/gpconfigs
cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config ~/gpconfigs/gpinitsystem_config

然後修改~/gpconfigs/gpinitsystem_config檔案:

[gpadmin@lgh1 gpconfigs]$ cat gpinitsystem_config  | egrep -v '^$|#'
ARRAY_NAME="Greenplum Data Platform"
SEG_PREFIX=gpseg
PORT_BASE=6000
declare -a DATA_DIRECTORY=(/apps/data1/primary)
MASTER_HOSTNAME=master_hostname
MASTER_DIRECTORY=/apps/data1/master
MASTER_PORT=5432
TRUSTED_SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE
MIRROR_PORT_BASE=7000
declare -a MIRROR_DATA_DIRECTORY=(/apps/data1/mirror)
DATABASE_NAME=gp_test

使用如下命令進行初始化:

gpinitsystem -c ~/gpconfigs/gpinitsystem_config -h  ~/conf/seg_hosts

執行結果如下:

greenplum6.14、GPCC6.4安裝詳解
[gpadmin@lgh1 ~]$ gpinitsystem -c ~/gpconfigs/gpinitsystem_config -h  ~/conf/seg_hosts
20210330:17:37:56:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Checking configuration parameters, please wait...
20210330:17:37:56:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Reading Greenplum configuration file /apps/gpadmin/gpconfigs/gpinitsystem_config
20210330:17:37:56:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Locale has not been set in /apps/gpadmin/gpconfigs/gpinitsystem_config, will set to default value
20210330:17:37:56:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Locale set to en_US.utf8
20210330:17:37:56:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-MASTER_MAX_CONNECT not set, will set to default value 250
20210330:17:37:56:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Checking configuration parameters, Completed
20210330:17:37:56:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Commencing multi-home checks, please wait...
..
20210330:17:37:56:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Configuring build for standard array
20210330:17:37:56:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Commencing multi-home checks, Completed
20210330:17:37:56:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Building primary segment instance array, please wait...
..
20210330:17:37:57:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Building group mirror array type , please wait...
..
20210330:17:37:58:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Checking Master host
20210330:17:37:58:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Checking new segment hosts, please wait...
....
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Checking new segment hosts, Completed
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Greenplum Database Creation Parameters
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:---------------------------------------
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Master Configuration
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:---------------------------------------
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Master instance name       = Greenplum Data Platform
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Master hostname            = lgh1
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Master port                = 5432
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Master instance dir        = /apps/data1/master/gpseg-1
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Master LOCALE              = en_US.utf8
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Greenplum segment prefix   = gpseg
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Master Database            = gpebusiness
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Master connections         = 250
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Master buffers             = 128000kB
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Segment connections        = 750
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Segment buffers            = 128000kB
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Checkpoint segments        = 8
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Encoding                   = UNICODE
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Postgres param file        = Off
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Initdb to be used          = /usr/local/greenplum-db-6.14.1/bin/initdb
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-GP_LIBRARY_PATH is         = /usr/local/greenplum-db-6.14.1/lib
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-HEAP_CHECKSUM is           = on
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-HBA_HOSTNAMES is           = 0
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Ulimit check               = Passed
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Array host connect type    = Single hostname per node
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Master IP address [1]      = 10.18.43.86
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Standby Master             = Not Configured
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Number of primary segments = 1
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Total Database segments    = 2
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Trusted shell              = ssh
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Number segment hosts       = 2
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Mirror port base           = 7000
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Number of mirror segments  = 1
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Mirroring config           = ON
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Mirroring type             = Group
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:----------------------------------------
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Greenplum Primary Segment Configuration
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:----------------------------------------
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-lgh2       6000    lgh2       /apps/data1/primary/gpseg0      2
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-lgh3       6000    lgh3       /apps/data1/primary/gpseg1      3
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:---------------------------------------
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Greenplum Mirror Segment Configuration
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:---------------------------------------
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-lgh3       7000    lgh3       /apps/data1/mirror/gpseg0       4
20210330:17:38:02:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-lgh2       7000    lgh2       /apps/data1/mirror/gpseg1       5

Continue with Greenplum creation Yy|Nn (default=N):
> y
20210330:17:38:05:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Building the Master instance database, please wait...
20210330:17:38:09:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Starting the Master in admin mode
20210330:17:38:10:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Commencing parallel build of primary segment instances
20210330:17:38:10:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Spawning parallel processes    batch [1], please wait...
..
20210330:17:38:10:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Waiting for parallel processes batch [1], please wait...
.........
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:------------------------------------------------
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Parallel process exit status
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:------------------------------------------------
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Total processes marked as completed           = 2
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Total processes marked as killed              = 0
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Total processes marked as failed              = 0
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:------------------------------------------------
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Deleting distributed backout files
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Removing back out file
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-No errors generated from parallel processes
20210330:17:38:19:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Restarting the Greenplum instance in production mode
20210330:17:38:20:013281 gpstop:lgh1:gpadmin-[INFO]:-Starting gpstop with args: -a -l /apps/gpadmin/gpAdminLogs -m -d /apps/data1/master/gpseg-1
20210330:17:38:20:013281 gpstop:lgh1:gpadmin-[INFO]:-Gathering information and validating the environment...
20210330:17:38:20:013281 gpstop:lgh1:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20210330:17:38:20:013281 gpstop:lgh1:gpadmin-[INFO]:-Obtaining Segment details from master...
20210330:17:38:20:013281 gpstop:lgh1:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 6.14.1 build commit:5ef30dd4c9878abadc0124e0761e4b988455a4bd'
20210330:17:38:20:013281 gpstop:lgh1:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
20210330:17:38:20:013281 gpstop:lgh1:gpadmin-[INFO]:-Master segment instance directory=/apps/data1/master/gpseg-1
20210330:17:38:20:013281 gpstop:lgh1:gpadmin-[INFO]:-Stopping master segment and waiting for user connections to finish ...
server shutting down
20210330:17:38:21:013281 gpstop:lgh1:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process
20210330:17:38:21:013281 gpstop:lgh1:gpadmin-[INFO]:-Terminating processes for segment /apps/data1/master/gpseg-1
20210330:17:38:21:013307 gpstart:lgh1:gpadmin-[INFO]:-Starting gpstart with args: -a -l /apps/gpadmin/gpAdminLogs -d /apps/data1/master/gpseg-1
20210330:17:38:21:013307 gpstart:lgh1:gpadmin-[INFO]:-Gathering information and validating the environment...
20210330:17:38:21:013307 gpstart:lgh1:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.14.1 build commit:5ef30dd4c9878abadc0124e0761e4b988455a4bd'
20210330:17:38:21:013307 gpstart:lgh1:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232'
20210330:17:38:21:013307 gpstart:lgh1:gpadmin-[INFO]:-Starting Master instance in admin mode
20210330:17:38:21:013307 gpstart:lgh1:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20210330:17:38:21:013307 gpstart:lgh1:gpadmin-[INFO]:-Obtaining Segment details from master...
20210330:17:38:21:013307 gpstart:lgh1:gpadmin-[INFO]:-Setting new master era
20210330:17:38:21:013307 gpstart:lgh1:gpadmin-[INFO]:-Master Started...
20210330:17:38:21:013307 gpstart:lgh1:gpadmin-[INFO]:-Shutting down master
20210330:17:38:22:013307 gpstart:lgh1:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
20210330:17:38:22:013307 gpstart:lgh1:gpadmin-[INFO]:-Process results...
20210330:17:38:22:013307 gpstart:lgh1:gpadmin-[INFO]:-----------------------------------------------------
20210330:17:38:22:013307 gpstart:lgh1:gpadmin-[INFO]:-   Successful segment starts                                            = 2
20210330:17:38:22:013307 gpstart:lgh1:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20210330:17:38:22:013307 gpstart:lgh1:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20210330:17:38:22:013307 gpstart:lgh1:gpadmin-[INFO]:-----------------------------------------------------
20210330:17:38:22:013307 gpstart:lgh1:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances
20210330:17:38:22:013307 gpstart:lgh1:gpadmin-[INFO]:-----------------------------------------------------
20210330:17:38:22:013307 gpstart:lgh1:gpadmin-[INFO]:-Starting Master instance lgh1 directory /apps/data1/master/gpseg-1
20210330:17:38:23:013307 gpstart:lgh1:gpadmin-[INFO]:-Command pg_ctl reports Master lgh1 instance active
20210330:17:38:23:013307 gpstart:lgh1:gpadmin-[INFO]:-Connecting to dbname='template1' connect_timeout=15
20210330:17:38:23:013307 gpstart:lgh1:gpadmin-[INFO]:-No standby master configured.  skipping...
20210330:17:38:23:013307 gpstart:lgh1:gpadmin-[INFO]:-Database successfully started
20210330:17:38:23:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Completed restart of Greenplum instance in production mode
20210330:17:38:23:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Commencing parallel build of mirror segment instances
20210330:17:38:23:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Spawning parallel processes    batch [1], please wait...
..
20210330:17:38:23:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Waiting for parallel processes batch [1], please wait...
..
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:------------------------------------------------
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Parallel process exit status
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:------------------------------------------------
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Total processes marked as completed           = 2
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Total processes marked as killed              = 0
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Total processes marked as failed              = 0
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:------------------------------------------------
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Scanning utility log file for any warning messages
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[WARN]:-*******************************************************
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[WARN]:-Scan of log file indicates that some warnings or errors
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[WARN]:-were generated during the array creation
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Please review contents of log file
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-/apps/gpadmin/gpAdminLogs/gpinitsystem_20210330.log
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-To determine level of criticality
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-These messages could be from a previous run of the utility
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-that was called today!
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[WARN]:-*******************************************************
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Greenplum Database instance successfully created
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-------------------------------------------------------
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-To complete the environment configuration, please
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-update gpadmin .bashrc file with the following
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-1. Ensure that the greenplum_path.sh file is sourced
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-2. Add "export MASTER_DATA_DIRECTORY=/apps/data1/master/gpseg-1"
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-   to access the Greenplum scripts for this instance:
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-   or, use -d /apps/data1/master/gpseg-1 option for the Greenplum scripts
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-   Example gpstate -d /apps/data1/master/gpseg-1
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Script log file = /apps/gpadmin/gpAdminLogs/gpinitsystem_20210330.log
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-To remove instance, run gpdeletesystem utility
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-To initialize a Standby Master Segment for this Greenplum instance
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Review options for gpinitstandby
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-------------------------------------------------------
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-The Master /apps/data1/master/gpseg-1/pg_hba.conf post gpinitsystem
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-has been configured to allow all hosts within this new
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-new array must be explicitly added to this file
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-located in the /usr/local/greenplum-db-6.14.1/docs directory
20210330:17:38:26:011255 gpinitsystem:lgh1:gpadmin-[INFO]:-------------------------------------------------------
View Code

到這裡就算安裝完了,但是這裡需要說一下就是mirror的分佈方式,這裡使用了預設的方式:Group Mirroring,在初始化的時候可以指定其他的方式,其他的分佈方式要求稍微高一些,也複雜點,可以參考:http://docs-cn.greenplum.org/v6/best_practices/ha.html

具體在初始化的時候如何指定mirror的分佈方式,以及在初始化的安裝master standby,可參考:gpinitsystem命令=》網址:http://docs-cn.greenplum.org/v6/utility_guide/admin_utilities/gpinitsystem.html#topic1

2.9、檢視叢集狀態

#gpstate  #命令見:http://docs-cn.greenplum.org/v6/utility_guide/admin_utilities/gpstate.html

[gpadmin@lgh1 ~]$ gpstate
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-Starting gpstate with args:
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.14.1 build commit:5ef30dd4c9878abadc0124e0761e4b988455a4bd'
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.14.1 build commit:5ef30dd4c9878abadc0124e0761e4b988455a4bd) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Feb 22 2021 18:27:08'
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-Obtaining Segment details from master...
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-Gathering data from segments...
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-Greenplum instance status summary
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-----------------------------------------------------
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Master instance                                           = Active
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Master standby                                            = No master standby configured
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total segment instance count from metadata                = 4
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-----------------------------------------------------
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Primary Segment Status
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-----------------------------------------------------
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total primary segments                                    = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total primary segment valid (at master)                   = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total primary segment failures (at master)                = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number postmaster processes missing                 = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number postmaster processes found                   = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-----------------------------------------------------
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Mirror Segment Status
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-----------------------------------------------------
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total mirror segments                                     = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total mirror segment valid (at master)                    = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total mirror segment failures (at master)                 = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number postmaster processes missing                 = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number postmaster processes found                   = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number mirror segments acting as primary segments   = 0
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-   Total number mirror segments acting as mirror segments    = 2
20210330:17:41:32:013986 gpstate:lgh1:gpadmin-[INFO]:-----------------------------------------------------

2.10、安裝master  standby

segment的mirror,以及master的standby都可以在初始化的時候安裝,可以在安裝好了手動新增mirror和master standby,mirror的手動新增參考:http://docs-cn.greenplum.org/v6/utility_guide/admin_utilities/gpaddmirrors.html

這裡說下master standby的新增:參考:http://docs-cn.greenplum.org/v6/utility_guide/admin_utilities/gpinitstandby.html

找一臺機器,建立目錄:

mkdir /apps/data1/master
chown -R gpadmin:gpadmin /apps/data1/master

安裝master standby:gpinitstandby -s lgh2

[gpadmin@lgh1 ~]$ gpinitstandby -s lgh2
20210330:17:58:39:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20210330:17:58:39:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Checking for data directory /apps/data1/master/gpseg-1 on lgh2
20210330:17:58:40:014332 gpinitstandby:lgh1:gpadmin-[INFO]:------------------------------------------------------
20210330:17:58:40:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Greenplum standby master initialization parameters
20210330:17:58:40:014332 gpinitstandby:lgh1:gpadmin-[INFO]:------------------------------------------------------
20210330:17:58:40:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Greenplum master hostname               = lgh1
20210330:17:58:40:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Greenplum master data directory         = /apps/data1/master/gpseg-1
20210330:17:58:40:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Greenplum master port                   = 5432
20210330:17:58:40:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Greenplum standby master hostname       = lgh2
20210330:17:58:40:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Greenplum standby master port           = 5432
20210330:17:58:40:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Greenplum standby master data directory = /apps/data1/master/gpseg-1
20210330:17:58:40:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Greenplum update system catalog         = On
Do you want to continue with standby master initialization? Yy|Nn (default=N):
> y
20210330:17:58:41:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby
20210330:17:58:42:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-The packages on lgh2 are consistent.
20210330:17:58:42:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Adding standby master to catalog...
20210330:17:58:42:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Database catalog updated successfully.
20210330:17:58:42:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Updating pg_hba.conf file...
20210330:17:58:42:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-pg_hba.conf files updated successfully.
20210330:17:58:43:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Starting standby master
20210330:17:58:43:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Checking if standby master is running on host: lgh2  in directory: /apps/data1/master/gpseg-1
20210330:17:58:44:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files...
20210330:17:58:44:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully.
20210330:17:58:44:014332 gpinitstandby:lgh1:gpadmin-[INFO]:-Successfully created standby master on lgh2

 

三、GPCC安裝

下載地址:https://network.pivotal.io/products/gpdb-command-center/#/releases/829440

參考文件:https://gpcc.docs.pivotal.io/640/welcome.html

 

 3.1、確認安裝環境

資料庫狀態:確認greenplum資料庫已被安裝並正在執行:gpstate
環境變數:環境變數MASTER_DATA_DIRECTORY已設定並生效:echo $ MASTER_DATA_DIRECTORY
/usr/local/許可權:確認gpadmin有/usr/local/的寫許可權:ls –ltr /usr
28080埠:28080為預設的web客戶埠,確認未被佔用:netstat -aptn| grep 28080
8899埠:8899為RPC埠,確認未被佔用:netstat -aptn| grep 8899
依賴包apr-util:確認依賴包apr-util已被安裝:rpm –q apr-util

GPCC預設安裝在/usr/local目錄下面,然而gpadmin預設是沒有許可權的,所以可以建立一個目錄授權給gpadmin,然後在安裝的時候指定該目錄,我這裡比較偷懶,選擇如下方式:注意:每一臺機器都要一樣的操作,這種方式如果還有其他使用者的目錄均要還原

chown -R gpadmin:gpadmin /usr/local
安裝完成後,執行如下操作:
chown -R root:root /usr/local
chown -R gpadmin:gpadmin /usr/local/greenplum*

3.2、安裝

安裝官網:https://gpcc.docs.pivotal.io/640/topics/install.html

安裝方式的選擇,總共有四種方式的安裝,這裡選擇互動式的安裝方式:

 

 解壓:

unzip greenplum-cc-web-6.4.0-gp6-rhel7-x86_64.zip
ln -s greenplum-cc-web-6.4.0-gp6-rhel7-x86_64 greenplum-cc
chown -R gpadmin:gpadmin greenplum-cc*

切換到gpadmin使用者,然後使用-W安裝:

su - gpadmin
cd greenplum-cc
./gpccinstall-6.4.0  -W  #這裡設定密碼

 

 輸入密碼後,一直按空格鍵,指導同意安裝協議:

 

 3.3、配置環境變數

vim ~/.bashrc  #新增如下內容,然後source ~/.bashrc

source /usr/local/greenplum-cc/gpcc_path.sh

3.4、驗證安裝是否成功

然後使用測試:gpcc status

 

 然後看下在檢視配置的密碼檔案:ll -a ~   #沒有找到隱藏檔案 .pgpass檔案,所以就手動建立一個這樣的檔案

vim  ~/.pgpass
#在裡面新增如下內容:最後一個欄位為安裝gpcc時輸入的密碼
*:5432:gpperfmon:gpmon:1qaz@WSX

然後授權:

chown gpadmin:gpadmin ~/.pgpass
chmod 600 ~/.pgpass

也可以重新修改密碼:參考:https://gpcc.docs.pivotal.io/640/topics/gpmon.html

注意:這裡需要把.pgpass檔案拷貝master standby 主機的gpadmin的home目錄下

然後檢視狀態,並啟動:

 

 然後登陸最後的url,使用gpmon使用者和密碼登入:

 

 

 

 到此配置GPCC就安裝完畢了

四、pxf初始化和基本配置

4.1、安裝環境確認

使用gpstate確認GP是否啟動
確認java是否安裝
確認/usr/local目錄gpadmin使用者是否有許可權建立檔案,當然也可以使用其他目錄

這裡同樣採用騷操作是的gpadmin使用者具有/usr/local目錄的建立許可權:

chown -R gpadmin:gpadmin /usr/local
安裝完成後,執行如下操作:
chown -R root:root /usr/local
chown -R gpadmin:gpadmin /usr/local/greenplum*

4.2、初始化

參考網址:http://docs-cn.greenplum.org/v6/pxf/init_pxf.html

PXF_CONF=/usr/local/greenplum-pxf $GPHOME/pxf/bin/pxf cluster init
[gpadmin@lgh1~]$ PXF_CONF=/usr/local/greenplum-pxf $GPHOME/pxf/bin/pxf cluster init
Initializing PXF on master host and 2 segment hosts...
PXF initialized successfully on 3 out of 3 hosts

4.3、配置環境變數

vim ~/.bashrc 新增如下內容,然後source一下

export PXF_CONF=/usr/local/greenplum-pxf
export PXF_HOME=/usr/local/greenplum-db/pxf/
export PATH=$PXF_HOME/bin:$PATH

4.4、檢視安裝是否成功

pxf只會在segment的主機上安裝,所以下面只有兩個主機有安裝

[gpadmin@lgh1 ~]$ pxf cluster status
Checking status of PXF servers on 2 segment hosts...
ERROR: PXF is not running on 2 out of 2 hosts
lgh2 ==> Checking if tomcat is up and running...
ERROR: PXF is down - tomcat is not running...
lgh3 ==> Checking if tomcat is up and running...
ERROR: PXF is down - tomcat is not running...
[gpadmin@lgh1 ~]$ pxf cluster start
Starting PXF on 2 segment hosts...
PXF started successfully on 2 out of 2 hosts
[gpadmin@lgh1 ~]$ pxf cluster status
Checking status of PXF servers on 2 segment hosts...
PXF is running on 2 out of 2 hosts

pxf命令詳見:http://docs-cn.greenplum.org/v6/pxf/ref/pxf-ref.html

4.5、配置hadoop和hive聯結器

參考官網:http://docs-cn.greenplum.org/v6/pxf/client_instcfg.html

說的有點繁瑣,一句話就是把所有的幾個配置copy過來,看看具體的結果就好:我是直接把hadoop和hive作為預設的聯結器:所以配置檔案全部放在default下面即可:

[gpadmin@lgh1~]$ cd /usr/local/greenplum-pxf/servers/default/
[gpadmin@lgh1 default]$ ll
total 40
-rw-r--r-- 1 gpadmin gpadmin 4421 Mar 19 17:03 core-site.xml
-rw-r--r-- 1 gpadmin gpadmin 2848 Mar 19 17:03 hdfs-site.xml
-rw-r--r-- 1 gpadmin gpadmin 5524 Mar 23 14:58 hive-site.xml
-rw-r--r-- 1 gpadmin gpadmin 5404 Mar 19 17:03 mapred-site.xml
-rw-r--r-- 1 gpadmin gpadmin 1460 Mar 19 17:03 pxf-site.xml
-rw-r--r-- 1 gpadmin gpadmin 5412 Mar 19 17:03 yarn-site.xml

上面的core-site.xml,hdfs-site.xml,hive-site.xml,mapred-site.xml,yarn-site.xml檔案均為hadoop和hive的一些配置檔案,均不要修改。因為是CDH的叢集,所以如上的配置均在目錄中:/etc/hadoop/conf,/etc/hive/conf可以找到

pxf-site.xml這個檔案主要是用來配置許可權的,可以值得gpadmin可以訪問hadoop的目錄許可權,這種方式比較簡單,當然還有其他的方式,內容如下:

[gpadmin@lgh1 default]$ cat pxf-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
        <name>pxf.service.kerberos.principal</name>
        <value>gpadmin/_HOST@EXAMPLE.COM</value>
        <description>Kerberos principal pxf service should use. _HOST is replaced automatically with hostnames FQDN</description>
    </property>
    <property>
        <name>pxf.service.kerberos.keytab</name>
        <value>${pxf.conf}/keytabs/pxf.service.keytab</value>
        <description>Kerberos path to keytab file owned by pxf service with permissions 0400</description>
    </property>
    <property>
        <name>pxf.service.user.impersonation</name>
        <value>false</value>
        <description>End-user identity impersonation, set to true to enable, false to disable</description>
    </property>
    <property>
        <name>pxf.service.user.name</name>
        <value>gpadmin</value>
        <description>
            The pxf.service.user.name property is used to specify the login
            identity when connecting to a remote system, typically an unsecured
            Hadoop cluster. By default, it is set to the user that started the
            PXF process. If PXF impersonation feature is used, this is the
            identity that needs to be granted Hadoop proxy user privileges.
            Change it ONLY if you would like to use another identity to login to
            an unsecured Hadoop cluster
        </description>
    </property>
</configuration>

如上內容的操作步驟:

1、 複製模板/usr/local/greenplum-pxf/templates/pxf-site.xml到目標server路徑/usr/local/greenplum-pxf/servers/default下
2、 修改pxf.service.user.name和pxf.service.user.impersonation配置

注意:這裡可能需要把hadoop整個叢集的hosts檔案的內容複製到greenplum主機的hosts檔案中

還有一種模擬使用者和代理的方式,需要重啟hadoop叢集,詳見:http://docs-cn.greenplum.org/v6/pxf/pxfuserimpers.html

4.6、同步和重啟pxf

[gpadmin@lgh1 ~]$ pxf cluster stop
Stopping PXF on 2 segment hosts...
PXF stopped successfully on 2 out of 2 hosts
[gpadmin@lgh1 ~]$ pxf cluster sync
Syncing PXF configuration files from master host to 2 segment hosts...
PXF configs synced successfully on 2 out of 2 hosts
[gpadmin@lgh1 ~]$ pxf cluster start
Starting PXF on 2 segment hosts...
PXF started successfully on 2 out of 2 hosts

這裡不做測試了,太繁瑣,因為都實現過了,測試可以參考:http://docs-cn.greenplum.org/v6/pxf/access_hdfs.html

 

相關文章