Categories
machine learning natural language processing novice search

Asking Solr Questions in Natural Language

With the recent advancements of AI/ML, many tasks that were once unapproachable have become not. One of these tasks is asking questions to computers in a natural language and getting accurate and reasonable answers. Indeed, doing this task today is enabled by large language models that are notable for their ability to achieve general-purpose language generation and other natural language processing tasks.1

One approach to asking Solr questions in natural language is leveraging the SQL query interface and integrating a text-to-SQL model. Two models are experimented with here: t5-small-awesome-text-to-sql2 based on Google t5-small and Natural SQL based on Llama. Though it’s easier to run the t5-small models, the accuracy of Llama in this experiment exceeds its counterpart by far, given the size and release date of each.

Introducing Quest

Quest is a containerized application built to capture user questions and display respective answers based on the connected database. It is composed of:

  1. A front-end, generated by Gemini, that captures user’s questions and presents answers, progress, or errors.
  2. an API that connects the front-end and the model, and
  3. a web proxy to handle requests and serve responses.

By installing Quest, the models can be run against data indexed in Solr. For more details on Quest, please check here.

Setup

Books Sales and Ratings dataset was used after renaming columns to match the Solr dynamic field naming convention.

The model used is Natural SQL 7B and the schema is set as follows:

  CREATE TABLE books (
publishing_year_i INTEGER,
book_name_t: TEXT,
author_s: VARCHAR(100),
language_code_s:VARCHAR(10),
author_rating_s:VARCHAR(100),
book_average_rating_f: FLOAT,
book_ratings_count_i":INTEGER,
genre_s: TEXT,
gross_sales_f: FLOAT,
publisher_revenue_f:FLOAT,
sale_price_f:FLOAT,
sales_rank_i:INTEGER,
publisher_s:VARCHAR(100),
units_sold_i:INTEGER
);

Examples

Below is a sample of 2 questions and their respective answers given by Natural SQL model:

Q1: Show me the book with the highest sales price

A1:

book_name_tsales_price_f
The Wind in the Willows33.86
SELECT book_name_t, sale_price_f
FROM books
ORDER BY sale_price_f DESC
LIMIT 1;

Q2: Which publisher have total revenue higher than $100k published fiction book?

A2:

publisher_s
Amazon Digital Services, Inc.
Hachette Book Group
HarperCollins Publishers
Penguin Group (USA) LLC
Random House LLC
SELECT publisher_s
FROM books
WHERE genre_s LIKE '%fiction%'
GROUP BY publisher_s
HAVING SUM(publisher_revenue_f) > 100000
LIMIT 1000

Demo

  1. https://en.wikipedia.org/wiki/Large_language_model ↩︎

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.