Asking Solr Questions in Natural Language

July 31, 2024

With the recent advancements of AI/ML, many tasks that were once unapproachable have become not. One of these tasks is asking questions to computers in a natural language and getting accurate and reasonable answers. Indeed, doing this task today is enabled by large language models that are notable for their ability to achieve general-purpose language generation and other natural language processing tasks.

Live Twitter reach

March 14, 2020

What is the trend of a specific topic, such as a new brand or a current issue happening somewhere in the world? This post shows how to answer this question, and several similar, using a streaming pipeline and an analytic dashboard powered by Twitter Streaming API, Solr, Logstash and Banana.

Coronavirus (COVID-19) Live Demo (1456)

Lightweight Text Clustering with Solr

December 26, 2019

Clustering is one of the most common unsupervised Machine Learning tasks. Solr is shipped with a clustering module based on Carrot2 built-in algorithms. Carrot(2) comes with 4 algorithms: Lingo, STC, kMeans and Lingo3D each one mapped to a clustering engine. The first three are open-source whereas the last one is commercial. When this approach is used, clustering takes place in memory. Other frameworks, such as Mahout, can be used to do the clustering “off-line.”

solr + superset

November 28, 2019

Apache Superset is a business intelligence SQL inclined platform equipped with a wide array of BI features and visualizations that satisfies data exploration and visualization requirements. It is battle tested in large environments with hundreds of concurrent users in production environments.

Document Classification With Solr Streaming Expressions

November 6, 2019

Classification is one of the most popular tasks in Natural Language Processing and Machine Learning. Solr ships with features, a subset of Streaming Expressions features, that allows building and deploying statistical classification models out-of-the-box. With adequate preprocessing and indexing tweaks, these features can be used to classify documents quickly and with high accuracy. This post illustrates how Solr streaming expressions and Zeppelin notebooks can be used to build a document classifier.

Zeppelin Notebooks and Solr

October 22, 2019

The concept of data science notebooks has been around for a while. Notebooks are web interfaces that allow creating and sharing live code, equations, visualizations and narrative text. They exist somewhere in data science workflows to serve data cleaning, transformation, numerical simulation, statistical modeling, data visualization and even machine learning. In a Python environment, Jupyter is prominent. In Java or Scala environment, Apache Zeppelin fits seamlessly. Though Jupyter can be used with a Java kernel and Zeppelin can be used with a Python interpreter, each one natively belongs to its own stack.

Apache Zeppelin

Realtime Log Analytics with Solr, Logstash, Banana and Beats

September 24, 2019

Logs are everywhere and usually generated in large sizes and high velocities. These logs can be used to obtain useful information and insights about the domain or the process related to these logs, such as platforms, transactions, system users, etc. In this post, a realtime web (Apache2) log analytics pipeline will be built using Apache Solr, Banana, Logstash and Beats containers.

However, in order to get the pipeline running, several integration aspects related to streaming data need to be addressed through settings and patches supplied through mounted volumes. The structure of these volumes can be as below:

A Sales Dashboard

September 14, 2019

The purpose of this post is to present a typical sales analysis that might serve as a starting point for the task of analyzing a firm’s sales data. A sample dataset from Kaggle and the latest versions of Solr and Banana(1) will be used for that purpose.

As often required, the dataset needs a bit of pre-processing, such as feature transformation or column name changes, before it can be indexed.

Solr and Banana on Docker

August 30, 2019

A container is an abstraction layer to run a software application in a lightweight environment. Containerization provides a standard and a secure way to build, ship and run applications anywhere. Docker images of Solr and Banana are available for quick installation and run.

Introducing Graph Visualization in Banana v1.7

August 14, 2019

Graph traversal features have been introduced in Solr 6 releases. These powerful features enables Solr users to run expressions that traverses graph structures in order to introduce or extract useful information. These graph traversal features are particularly useful when data is already indexed into Solr and light graph operations are required especially on top of text search. Before proceeding, a basic knowledge of Solr and graph structures is required.

Solr traversal implementation uses Breadth First Search (BFS) to perform graph traversal which is more suitable for solving search problems than its counterpart Depth First Search (DFS). It is also possible to combine graph traversal with other search or streaming operations.

Building a Dynamic Analytics Dashboard with Apache Solr and Banana in 10 Minutes

July 7, 2019

TL;DR if you have a raw dataset or a data indexed into Apache Solr, a meaningful analytics dashboard that gives insights and useful graphical and tabular information can be built in minutes.

Hello, world!

June 22, 2019