Text-to-SQL: Bridging the Gap Between Natural Language and Database Insights

How can we query any database by simply asking a question, as if we were talking to a friend? Text-to-SQL provides an intuitive and accessible way to interact with databases using natural language.

Democratizing data access and empowering users to effortlessly extract insights from databases is a pivotal step in today's data-driven landscape. Introducing the "Text-to-SQL" project, a powerful application that bridges the gap between natural language and complex SQL queries. This project builds upon the foundation of the NA2SQL app, leveraging the capabilities of OpenAI's GPT-3.5 and LlamaIndex to provide an intuitive and accessible way to interact with databases using natural language.

Features

  • Natural Language Understanding: Translates user-provided natural language queries into accurate SQL queries, eliminating the need for SQL expertise.
  • Data Retrieval and Analysis: Retrieves results from the database and interprets them using an LLM (GPT 3.5) to offer meaningful insights, empowering users to make data-driven decisions.
  • User-Friendly Interface: With Streamlit, the application offers a clean and intuitive user interface that includes "Streamlit pills" for executing sample queries. This feature significantly enhances user experience by providing a straightforward mechanism for data interaction.
Overview of the Text-to-SQL demonstration app

Tools and Technologies

  • LLM: OpenAI's GPT-3.5 is known for its advanced language understanding and generation capabilities.
  • LLM Orchestration: LlamaIndex (version 0.10.26), a powerful framework for managing and interacting with LLMs.
  • Data Management: SQL Database with SQLite, providing a reliable and efficient data storage solution.
  • UI Framework: Streamlit, enabling the creation of interactive and user-friendly web applications.

Why LlamaIndex over LangChain?

While both LlamaIndex and LangChain offer valuable tools for LLM-based applications, we opted for LlamaIndex for its unique strengths in this Text-to-SQL project. LlamaIndex provides a more streamlined and integrated approach to building and managing LLM-powered applications. Its modular design allows for a flexible composition of components like LLMs, vector stores, and data connectors. Additionally, LlamaIndex's focus on data indexing and retrieval aligns perfectly with the core functionality of our Text-to-SQL application.

Deep Dive into the Code

The app.py script is a crucial component of the Text-to-SQL application, serving as the primary gateway through which users interact with the tool. Let's explore some of the critical functions and how they contribute to the application's functionality.

1. Database Connection and Table Display

The application utilizes SQLAlchemy to manage connections to an SQLite database, allowing it to perform operations more securely and efficiently. It uses this connection to fetch and display data based on user queries.

def get_table_data(table_name, conn):
    query = f"SELECT * FROM {table_name}"
    df = pd.read_sql_query(query, conn)
    return df

This function retrieves all data from a specified table, which is then displayed in the Streamlit sidebar. It demonstrates how the application interacts with the database at a fundamental level.

Overview of the Database Schema Viewer

2. Loading the Database and Language Model

The function load_db_llm() initializes the database connection and sets up the OpenAI language model for generating SQL queries. It exemplifies how various components like the database and language model are integrated.

def load_db_llm():
    engine = create_engine("sqlite:///ecommerce_platform1.db?mode=ro", connect_args={"uri": True})
    sql_database = SQLDatabase(engine)  # include all tables
    Settings.llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo-1106")
    return sql_database, Settings, engine

This setup ensures that the application can handle SQL queries securely and performantly while leveraging an advanced AI model to translate natural language into SQL.

3. Handling User Queries and Responses

The core interaction mechanism of the app is handled by the chat functionality, which captures user input and processes it through OpenAI’s GPT 3.5 via Llamaindex’s NLSQLTableQueryEngine. The query engine first converts the natural language user query to SQL. It displays the SQL query and then also executes it against the database. The SQL results are then passed along to the LLM along with the user query, generate the final response.

if prompt := st.chat_input("Enter your natural language query about the database"):
    with st.chat_message("user"):
        st.write(prompt)
    add_to_message_history("user", prompt)

    # If last message is not from assistant, generate a new response
    if st.session_state["messages"][-1]["role"] != "assistant":
        with st.spinner():
            with st.chat_message("assistant"):
                response = st.session_state["query_engine"].query("User Question:"+prompt+". ")
                sql_query = f"```sql\n{response.metadata['sql_query']}\n```\n**Response:**\n{response.response}\n"
                response_container = st.empty()
                response_container.write(sql_query)
                add_to_message_history("assistant", sql_query)

Enhancing the User Experience with Streamlit Pills

A notable UI enhancement in the application is the introduction of Streamlit pills, which simplify the interaction by allowing users to select example queries quickly.

query_options = ["None", "Show data for '4K LED Smart TV'", "Show all books and their reviews", "Analyse reviews for Electronics"]
selected_query = pills("Select example queries or enter your own query in the chat input below", query_options, key="query_pills")

This functionality not only makes the application more user-friendly but also demonstrates how complex functionalities can be encapsulated in simple, intuitive UI elements.

Streamlit Pills offer users suggested example queries, enhancing user experience

Check out our Github page for more code snippets and a complete app implementation. Share comments and thoughts with us at hello@cohorte.co.

Conclusion

The app.py script effectively integrates multiple advanced technologies to create a seamless and powerful user experience. By leveraging SQL databases, large language models, and an intuitive Streamlit interface, the Text-to-SQL application stands as a robust example of modern software engineering and AI capabilities. This deep dive into the codebase highlights how each component plays a vital role in the application's functionality, showcasing the intricate balance between backend logic and frontend design.