Skip to content

Apache Impala Integration


Apache Impala is an open source, native analytic database for Apache Hadoop, it's designed for massively-distributed, massively-parallel SQL queries. Apache Impala integrates easily with Row64 by wiring to Row64 RamDb through Python.

Integration Overview

The basic connection process is to use row64tools to push updates from Apache Impala to Row64. An overview is available here:
https://pypi.org/project/row64tools/

If sub-second speeds are required, you might want to look at connecting using real-time streaming, which is covered under the streaming help:
Row64 Stream Overview V3.5

Continuous Update

Cron jobs are the simple and production-proven Linux tool for continous update.
Here's a simple example on how to set them up:
https://www.geeksforgeeks.org/linux-unix/how-to-setup-cron-jobs-in-ubuntu/

All you need to do is take the integration .py file and set up a cron job to run at your data refresh rate, from every day to every 20 seconds.



If your update rate is faster than 60 seconds, be sure to update your Row64 config in:
/opt/row64server/conf/config.json

so that "RAMDB_UPDATE" is set to match the update speed.

Install Pip Libraries

To set up Apache Impala integration, install the following pip libraries:

pip install row64tools
pip install pandas
pip install python-dotenv
pip install impyla

Setup For Security

Any security process that works in Python and Linux can be used to secure your data credentials. Our integration code is written to be easily modified to fit your exact needs. The default example is to use .env files to set Linux environment variables. An overview of that approach is here:
https://upsun.com/blog/what-is-env-file/

The most popular Python library for .env is here:
https://pypi.org/project/python-dotenv/

The Apache Impala integration code assumes you create a .env file at the same location as your .py file. It sets the variables: DBHost. .

Download Apache Impala Integration

Row64 Integrations can be downloaded from the github project:
https://github.com/Row64/Row64_Integrations

The Apache Impala integration is in the sub-folder:
https://github.com/Row64/Row64_Integrations/tree/master/ApacheImpala

Note

The integration .py files are intended to be modified or used as a starting point to fit your specific needs.


More help and background information on this database connector can be found at:
https://impala.apache.org/docs/build/html/topics/impala_langref.html

If you have any problems or requests, please log them at:
https://github.com/Row64/row64tools/issues