With the recent advancements of AI/ML, many tasks that were once unapproachable have become not. One of these tasks is asking questions to computers in a natural language and getting accurate and reasonable answers. Indeed, doing this task today is enabled by large language models that are notable for their ability to achieve general-purpose language generation and other natural language processing tasks.1
One approach to asking Solr questions in natural language is leveraging the SQL query interface and integrating a text-to-SQL model. Two models are experimented with here: t5-small-awesome-text-to-sql2 based on Google t5-small and Natural SQL based on Llama. Though it’s easier to run the t5-small models, the accuracy of Llama in this experiment exceeds its counterpart by far, given the size and release date of each.
Introducing Quest
Quest is a containerized application built to capture user questions and display respective answers based on the connected database. It is composed of:
- A front-end, generated by Gemini, that captures user’s questions and presents answers, progress, or errors.
- an API that connects the front-end and the model, and
- a web proxy to handle requests and serve responses.
By installing Quest, the models can be run against data indexed in Solr. For more details on Quest, please check here.
Setup
Books Sales and Ratings dataset was used after renaming columns to match the Solr dynamic field naming convention.
The model used is Natural SQL 7B and the schema is set as follows:
CREATE TABLE books (
publishing_year_i INTEGER,
book_name_t: TEXT,
author_s: VARCHAR(100),
language_code_s:VARCHAR(10),
author_rating_s:VARCHAR(100),
book_average_rating_f: FLOAT,
book_ratings_count_i":INTEGER,
genre_s: TEXT,
gross_sales_f: FLOAT,
publisher_revenue_f:FLOAT,
sale_price_f:FLOAT,
sales_rank_i:INTEGER,
publisher_s:VARCHAR(100),
units_sold_i:INTEGER
);
Examples
Below is a sample of 2 questions and their respective answers given by Natural SQL model:
Q1: Show me the book with the highest sales price
A1:
book_name_t | sales_price_f |
The Wind in the Willows | 33.86 |
SELECT book_name_t, sale_price_f
FROM books
ORDER BY sale_price_f DESC
LIMIT 1;
Q2: Which publisher have total revenue higher than $100k published fiction book?
A2:
publisher_s |
Amazon Digital Services, Inc. |
Hachette Book Group |
HarperCollins Publishers |
Penguin Group (USA) LLC |
Random House LLC |
SELECT publisher_s
FROM books
WHERE genre_s LIKE '%fiction%'
GROUP BY publisher_s
HAVING SUM(publisher_revenue_f) > 100000
LIMIT 1000
Demo
- https://en.wikipedia.org/wiki/Large_language_model ↩︎