Categories
data analytics novice

Solr + Superset

Apache Superset is a business intelligence SQL inclined platform equipped with a wide array of BI features and visualizations that satisfies data exploration and visualization requirements. It is battle tested in large environments with hundreds of concurrent users in production environments.

Out of the box, Superset supports the SQL databases, including the main SQL database players: Oracle, MySQL, Microsoft SQL Server, IBM DB2 and other RDBMS vendors. It also supports data lake in the cloud solutions such as Teradata, Vertical and Snowflake. Moreover, some SQL-friendly systems, such as Amazon Redshift, Apache Impala and Spark, are supported.

Under the hood, Superset leverages Python SQLAlchemy to support the variations of SQL dialects that are defined by different vendors and systems. This post explores Superset and Solr integration through SQLAlchemy and Parallel SQL Interface to run SQL queries and create visualizations for data stored in Solr.

Solr Dialect

To get Superset working with Solr, a dialect must be installed. A dialect that uses an HTTP driver can be found here. Just clone the repository run the following command on the Superset server:

$ python3 setup.py install

Note: Superset requires Python 3.6 or higher.

After installing the dialect, a database corresponding to Solr default worker collection for handling queries can be added to Superset. The following connection string can be used:

solr://<username>:<password>@<host>:<port>/<server_path>/<collection>

where username and password are basic authentication credentials. server_path is the server path that handles Solr queries, usually solr, and collection is the default worker collection. If everything is setup correctly, clicking “Test Connection” should pup-up a fancy dialog that the connection “Seems OK!”. Otherwise, Solr server could be inaccessible, the connection string might be incorrect or the dialect not installed properly.

Database Connection Seems OK!

Note: in the above connection string, solr:// can be replaced with solr+http:// since the default driver is http.

Note: username and password are optional

Note: to use https instead of http, pass use_ssl=true parameter at the end of the connection string.

SQL Lab

Once the connection is working, it’s time to Superset. SQL Lab is one of the best ways to start. Open SQL Editor and you’ll see that the worker collection, as specified in the connection string, appears under Databases dropdown list. Selecting the worker collection populates the Schema dropdown. Since Solr does not have a schema concept, this list is always displaying only “default”. Similarly, selecting the default schema populates the Tables dropdown list. Solr server collections are mapped to tables. Finally, selecting a table displays all the corresponding fields and their data types.

Superset SQL Lab Editor

Charts and Dashboards

There are no special considerations when creating charts and dashboards in Superset when a Solr dialect selected, though some features might not work as expected. To create a chart for a specific table in Superset, that table has to be defined under Sources menu since tables are charts datasources. A dashboard in Superset consists of a set of charts that are optionally organized in tabs. Solr SQL uses intervals syntax for time filtration which is taken care of by the dialect. Below is an example of a dashboard that displays charts of data stored in three different Solr collections.

Superset Dashboard

Leave a Reply

Your email address will not be published. Required fields are marked *