The purpose of this post is to present a typical sales analysis that might serve as a starting point for the task of analyzing a firm’s sales data. A sample dataset from Kaggle and the latest versions of Solr and Banana1 will be used for that purpose.
As often required, the dataset needs a bit of pre-processing, such as feature transformation or column name changes, before it can be indexed.
First, an ISO-8601 date field needs to be added. Since the dataset size is small, a spreadsheet processor can be used to insert that column using a simple formula like:
LEFT(F2;10) & "T00:00:00Z".
In order to take advantage of the country field when analyzing the dataset on a map, it needs to be converted to standard 2-letter code.2 Grab a country code mapping and insert it as a new spreadsheet in the same workbook. Then, using a formula like
VLOOKUP(V2;'Sheet2:A:B';2), corresponding country codes can be obtained.3
The third and last step is to update column names to work with dynamic fields feature of Solr as follows:4
Now, the dataset can be indexed into Solr using the
post command in a collection named
sales. If you have Docker installed,
docker may be used to quickly run Solr and Banana, check this post for more details. Open Banana and load the pre-configured dashboard by clicking Load on the top right and pasting the dashboard Gist URL.
Most of the panels are self-explanatory and interactive: clicking an element reflects on the whole dashboard. On the fifth row however, the quantity distribution panel displays quantity frequencies. It’s noted that quantity is almost equally distributed between 20 and 50 per order. Many panels can be added to get more insights though.
1. The latest versions of Solr and Banana are used in this post, 8.1.1 and 1.6.26-dev, respectively.
2. ISO 3166-1
3. In the original dataset, replace “USA” and “UK” with “United States of America” and “United Kingdom” respectively.
4. Transformed fields are marked in italics font