Skip to content

Databricks Integration


Databricks is a unified cloud-based data processing and analysis platform. It's designed for enterprise grade data intelligence. Databricks integrates easily with Row64 by wiring to Row64 RamDb through Python.

Integration Overview

The basic connection process is to use row64tools to push updates from Databricks to Row64. An overview is available here:
https://pypi.org/project/row64tools/

If sub-second speeds are required, you might want to look at connecting using real-time streaming, which is covered under the streaming help:
Row64 Stream Overview V3.5

Continuous Update

Cron jobs are the simple and production-proven Linux tool for continous update.
Here's a simple example on how to set them up:
https://www.geeksforgeeks.org/linux-unix/how-to-setup-cron-jobs-in-ubuntu/

All you need to do is take the integration .py file and set up a cron job to run at your data refresh rate, from every day to every 20 seconds.



If your update rate is faster than 60 seconds, be sure to update your Row64 config in:
/opt/row64server/conf/config.json

so that "RAMDB_UPDATE" is set to match the update speed.

Install Pip Libraries

To set up Databricks integration, install the following pip libraries:

pip install row64tools
pip install pandas
pip install python-dotenv
pip install databricks-sql-connector

Setup For Security

Any security process that works in Python and Linux can be used to secure your data credentials. Our integration code is written to be easily modified to fit your exact needs. The default example is to use .env files to set Linux environment variables. An overview of that approach is here:
https://upsun.com/blog/what-is-env-file/

The most popular Python library for .env is here:
https://pypi.org/project/python-dotenv/

The Databricks integration code assumes you create a .env file at the same location as your .py file. It sets the variables: DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, DATABRICKS_TOKEN.

Download Databricks Integration

Row64 Integrations can be downloaded from the github project:
https://github.com/Row64/Row64_Integrations

The Databricks integration is in the sub-folder:
https://github.com/Row64/Row64_Integrations/tree/master/Databricks

Note

The integration .py files are intended to be modified or used as a starting point to fit your specific needs.


More help and background information on this database connector can be found at:
https://docs.databricks.com/aws/en/dev-tools/python-sql-connector

If you have any problems or requests, please log them at:
https://github.com/Row64/row64tools/issues