python 安裝 impala

CoderSunYu發表於2018-06-07

Dependencies

Required:

Python 2.6+ or 3.3+

six, bit_array

thrift (on Python 2.x) or thriftpy (on Python 3.x)

For Hive and/or Kerberos support:

pip install thrift_sasl==0.2.1
pip install sasl
複製程式碼

Optional:

pandas for conversion to DataFrame objects; but see the Ibis project instead

sqlalchemy for the SQLAlchemy engine

pytest for running tests; unittest2 for testing on Python 2.6

Installation

Install the latest release (0.13.1) with pip:

pip install impyla
複製程式碼

For the latest (dev) version, install directly from the repo:

pip install git+https://github.com/cloudera/impyla.git

複製程式碼

or clone the repo:

git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install
複製程式碼

Usage

Impyla implements the Python DB API v2.0 (PEP 249) database interface (refer to it for API details):

from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description  # prints the result set's schema
results = cursor.fetchall()
複製程式碼

The Cursor object also exposes the iterator interface, which is buffered (controlled by cursor.arraysize):

cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
    process(row)
複製程式碼

You can also get back a pandas DataFrame object

from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
複製程式碼

相關文章