Dependencies
Required:
Python 2.6+ or 3.3+
six, bit_array
thrift
(on Python 2.x) or thriftpy
(on Python 3.x)
For Hive and/or Kerberos support:
pip install thrift_sasl==0.2.1
pip install sasl
複製程式碼
Optional:
pandas
for conversion to DataFrame
objects; but see the Ibis project instead
sqlalchemy
for the SQLAlchemy engine
pytest
for running tests; unittest2
for testing on Python 2.6
Installation
Install the latest release (0.13.1
) with pip
:
pip install impyla
複製程式碼
For the latest (dev) version, install directly from the repo:
pip install git+https://github.com/cloudera/impyla.git
複製程式碼
or clone the repo:
git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install
複製程式碼
Usage
Impyla implements the Python DB API v2.0 (PEP 249) database interface (refer to it for API details):
from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
複製程式碼
The Cursor
object also exposes the iterator interface, which is buffered (controlled by cursor.arraysize
):
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
複製程式碼
You can also get back a pandas DataFrame object
from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
複製程式碼