0039-如何使用PythonImpyla客戶端連線Hive和Impala

hadoop實操發表於2018-11-22

溫馨提示:要看高清無碼套圖,請使用手機開啟並單擊圖片放大檢視。

1.文件編寫目的


繼上一章講述如何在CDH叢集安裝Anaconda&搭建Python私有源後,本章節主要講述如何使用Pyton Impyla客戶端連線CDH叢集的HiveServer2和Impala Daemon,並進行SQL操作。

  • 內容概述

1.依賴包安裝

2.程式碼編寫

3.程式碼測試

  • 測試環境

1.CM和CDH版本為5.11.2

2.RedHat7.2

  • 前置條件

1.CDH叢集環境正常執行

2.Anaconda已安裝並配置環境變數

3.pip工具能夠正常安裝Python包

4.Python版本2.6+ or 3.3+

5.非安全叢集環境

2.Impyla依賴包安裝


Impyla所依賴的Python包

  • six
  • bit_array
  • thrift (on Python 2.x) orthriftpy (on Python 3.x)
  • thrift_sasl
  • sasl

1.首先安裝Impyla依賴的Python包

[root@ip-172-31-22-86 ~]# pip install bit_array
[root@ip-172-31-22-86 ~]# pip install thrift==0.9.3
[root@ip-172-31-22-86 ~]# pip install six
[root@ip-172-31-22-86 ~]# pip install thrift_sasl
[root@ip-172-31-22-86 ~]# pip install sasl

注意:thrift的版本必須使用0.9.3,預設安裝的為0.10.0版本,需要解除安裝後重新安裝0.9.3版本,解除安裝命令pip uninstall thrift

2.安裝Impyla包

impyla版本,預設安裝的是0.14.0,需要將解除安裝後安裝0.13.8版本

 [root@ip-172-31-22-86 ec2-user]# pip install impyla==0.13.8
Collecting impyla
  Downloading impyla-0.14.0.tar.gz (151kB)
    100% |████████████████████████████████| 153kB 1.0MB/s 
Requirement already satisfied: six in /opt/cloudera/parcels/Anaconda-4.2.0/lib/python2.7/site-packages (from impyla)
Requirement already satisfied: bitarray in /opt/cloudera/parcels/Anaconda-4.2.0/lib/python2.7/site-packages (from impyla)
Requirement already satisfied: thrift in /opt/cloudera/parcels/Anaconda-4.2.0/lib/python2.7/site-packages (from impyla)
Building wheels for collected packages: impyla
  Running setup.py bdist_wheel for impyla ... done
  Stored in directory: /root/.cache/pip/wheels/96/fa/d8/40e676f3cead7ec45f20ac43eb373edc471348ac5cb485d6f5
Successfully built impyla
Installing collected packages: impyla
Successfully installed impyla-0.14.0

3.編寫Python程式碼


Python連線Hive(HiveTest.py)

from impala.dbapi importconnect

conn = connect(host=`ip-172-31-21-45.ap-southeast-1.compute.internal`,port=10000,database=`default`,auth_mechan

ism=`PLAIN`)

print(conn)

cursor = conn.cursor()

cursor.execute(`show databases`)

print cursor.description # prints the result set`s schema

results = cursor.fetchall()

print(results)

cursor.execute(`SELECT * FROM test limit 10`)

print cursor.description # prints the result set`s schema

results = cursor.fetchall()

print(results)

Python連線Impala(ImpalaTest.py)

from impala.dbapi importconnect

conn = connect(host=`ip-172-31-26-80.ap-southeast-1.compute.internal`,port=21050)

print(conn)

cursor = conn.cursor()

cursor.execute(`show databases`)

print cursor.description # prints the result set`s schema

results = cursor.fetchall()

print(results)

cursor.execute(`SELECT * FROM test limit 10`)

print cursor.description # prints the result set`s schema

results = cursor.fetchall()

print(results)

4.測試程式碼


在shell命令列執行Python程式碼測試

1.測試連線Hive

_root@ip-172-31-22-86_ec2-user# python HiveTest.py

__

(`database_name`, `STRING`, None, None, None, None, None)

(`default`,)

(`test.s1`, `STRING`,None, None, None, None, None), (`test.s2`, `STRING`, None, None, None, None, None)

(`name1`, `age1`), (`name2`, `age2`), (`name3`, `age3`), (`name4`, `age4`), (`name5`, `age5`), (`name6`, `age6`), (`name7`, `age7`), (`name8`, `age8`), (`name9`, `age9`), (`name10`, `age10`)

[root@ip-172-31-22-86 ec2-user]#

2.測試連線Impala

_root@ip-172-31-22-86_ec2-user# python ImpalaTest.py

__

(`name`, `STRING`, None, None, None, None, None), (`comment`, `STRING`, None, None, None, None, None)

(`_impala_builtins`, `Systemdatabase for Impala builtin functions`), (`default`, `Default Hive database`)

(`s1`, `STRING`, None, None, None,None, None), (`s2`, `STRING`, None, None, None,None, None)

(`name1`, `age1`), (`name2`, `age2`), (`name3`, `age3`), (`name4`, `age4`), (`name5`, `age5`), (`name6`, `age6`), (`name7`, `age7`), (`name8`, `age8`), (`name9`, `age9`), (`name10`, `age10`)

[root@ip-172-31-22-86 ec2-user]#

5.常見問題


1.錯誤一

building `sasl.saslwrapper` extension
    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/sasl
    gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Isasl -I/opt/cloudera/parcels/Anaconda/include/python2.7 -c sasl/saslwrapper.cpp -o build/temp.linux-x86_64-2.7/sasl/saslwrapper.o
    unable to execute `gcc`: No such file or directory
    error: command `gcc` failed with exit status 1
    
    ----------------------------------------
Command "/opt/cloudera/parcels/Anaconda/bin/python -u -c "import setuptools, tokenize;__file__=`/tmp/pip-build-kD6tvP/sasl/setup.py`;f=getattr(tokenize, `open`, open)(__file__);code=f.read().replace(`
`, `
`);f.close();exec(compile(code, __file__, `exec`))" install --record /tmp/pip-WJFNeG-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-kD6tvP/sasl/

解決方法:

[root@ip-172-31-22-86 ec2-user]# yum -y install gcc 
[root@ip-172-31-22-86 ec2-user]# yum install gcc-c++ 

2.錯誤二

gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Isasl -I/opt/cloudera/parcels/Anaconda/include/python2.7 -c sasl/saslwrapper.cpp -o build/temp.linux-x86_64-2.7/sasl/saslwrapper.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
In file included from sasl/saslwrapper.cpp:254:0:
sasl/saslwrapper.h:22:23: fatal error: sasl/sasl.h: No such file or directory
#include <sasl/sasl.h>
                   ^
compilation terminated.
error: command `gcc` failed with exit status 1

解決方法:

[root@ip-172-31-22-86 ec2-user]# yum -y install python-devel.x86_64 cyrus-sasl-devel.x86_64

醉酒鞭名馬,少年多浮誇! 嶺南浣溪沙,嘔吐酒肆下!摯友不肯放,資料玩的花!
溫馨提示:要看高清無碼套圖,請使用手機開啟並單擊圖片放大檢視。


推薦關注Hadoop實操,第一時間,分享更多Hadoop乾貨,歡迎轉發和分享。


原創文章,歡迎轉載,轉載請註明:轉載自微信公眾號Hadoop實操


相關文章