data analytics devops novice

Realtime Log Analytics with Solr, Logstash, Banana and Beats

Logs are everywhere and usually generated in large sizes and high velocities. These logs can be used to obtain useful information and insights about the domain or the process related to these logs, such as platforms, transactions, system users, etc. In this post, a realtime web (Apache2) log analytics pipeline will be built using Apache Solr, Banana, Logstash and Beats containers.

However, in order to get the pipeline running, several integration aspects related to streaming data need to be addressed through settings and patches supplied through mounted volumes. The structure of these volumes can be as below:

data analytics novice

A Sales Dashboard

The purpose of this post is to present a typical sales analysis that might serve as a starting point for the task of analyzing a firm’s sales data. A sample dataset from Kaggle and the latest versions of Solr and Banana1 will be used for that purpose.

As often required, the dataset needs a bit of pre-processing, such as feature transformation or column name changes, before it can be indexed.