Text-to-SQL: Bridging the Gap Between Natural Language and Database Insights
Democratizing data access and empowering users to effortlessly extract insights from databases is a pivotal step in today's data-driven landscape. Introducing the "Text-to-SQL" project, a powerful application that bridges the gap between natural language and complex SQL queries. This project builds upon the foundation of the NA2SQL app, leveraging the capabilities of OpenAI's GPT-3.5 and LlamaIndex to provide an intuitive and accessible way to interact with databases using natural language.
Features
- Natural Language Understanding: Translates user-provided natural language queries into accurate SQL queries, eliminating the need for SQL expertise.
- Data Retrieval and Analysis: Retrieves results from the database and interprets them using an LLM (GPT 3.5) to offer meaningful insights, empowering users to make data-driven decisions.
- User-Friendly Interface: With Streamlit, the application offers a clean and intuitive user interface that includes "Streamlit pills" for executing sample queries. This feature significantly enhances user experience by providing a straightforward mechanism for data interaction.
Tools and Technologies
- LLM: OpenAI's GPT-3.5 is known for its advanced language understanding and generation capabilities.
- LLM Orchestration: LlamaIndex (version 0.10.26), a powerful framework for managing and interacting with LLMs.
- Data Management: SQL Database with SQLite, providing a reliable and efficient data storage solution.
- UI Framework: Streamlit, enabling the creation of interactive and user-friendly web applications.
Why LlamaIndex over LangChain?
While both LlamaIndex and LangChain offer valuable tools for LLM-based applications, we opted for LlamaIndex for its unique strengths in this Text-to-SQL project. LlamaIndex provides a more streamlined and integrated approach to building and managing LLM-powered applications. Its modular design allows for a flexible composition of components like LLMs, vector stores, and data connectors. Additionally, LlamaIndex's focus on data indexing and retrieval aligns perfectly with the core functionality of our Text-to-SQL application.
Deep Dive into the Code
The app.py
script is a crucial component of the Text-to-SQL application, serving as the primary gateway through which users interact with the tool. Let's explore some of the critical functions and how they contribute to the application's functionality.
1. Database Connection and Table Display
The application utilizes SQLAlchemy to manage connections to an SQLite database, allowing it to perform operations more securely and efficiently. It uses this connection to fetch and display data based on user queries.
def get_table_data(table_name, conn):
query = f"SELECT * FROM {table_name}"
df = pd.read_sql_query(query, conn)
return df
This function retrieves all data from a specified table, which is then displayed in the Streamlit sidebar. It demonstrates how the application interacts with the database at a fundamental level.
2. Loading the Database and Language Model
The function load_db_llm()
initializes the database connection and sets up the OpenAI language model for generating SQL queries. It exemplifies how various components like the database and language model are integrated.
def load_db_llm():
engine = create_engine("sqlite:///ecommerce_platform1.db?mode=ro", connect_args={"uri": True})
sql_database = SQLDatabase(engine) # include all tables
Settings.llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo-1106")
return sql_database, Settings, engine
This setup ensures that the application can handle SQL queries securely and performantly while leveraging an advanced AI model to translate natural language into SQL.
3. Handling User Queries and Responses
The core interaction mechanism of the app is handled by the chat functionality, which captures user input and processes it through OpenAI’s GPT 3.5 via Llamaindex’s NLSQLTableQueryEngine
. The query engine first converts the natural language user query to SQL. It displays the SQL query and then also executes it against the database. The SQL results are then passed along to the LLM along with the user query, generate the final response.
if prompt := st.chat_input("Enter your natural language query about the database"):
with st.chat_message("user"):
st.write(prompt)
add_to_message_history("user", prompt)
# If last message is not from assistant, generate a new response
if st.session_state["messages"][-1]["role"] != "assistant":
with st.spinner():
with st.chat_message("assistant"):
response = st.session_state["query_engine"].query("User Question:"+prompt+". ")
sql_query = f"```sql\n{response.metadata['sql_query']}\n```\n**Response:**\n{response.response}\n"
response_container = st.empty()
response_container.write(sql_query)
add_to_message_history("assistant", sql_query)
Enhancing the User Experience with Streamlit Pills
A notable UI enhancement in the application is the introduction of Streamlit pills, which simplify the interaction by allowing users to select example queries quickly.
query_options = ["None", "Show data for '4K LED Smart TV'", "Show all books and their reviews", "Analyse reviews for Electronics"]
selected_query = pills("Select example queries or enter your own query in the chat input below", query_options, key="query_pills")
This functionality not only makes the application more user-friendly but also demonstrates how complex functionalities can be encapsulated in simple, intuitive UI elements.
Check out our Github page for more code snippets and a complete app implementation. Share comments and thoughts with us at hello@cohorte.co.
Conclusion
The app.py
script effectively integrates multiple advanced technologies to create a seamless and powerful user experience. By leveraging SQL databases, large language models, and an intuitive Streamlit interface, the Text-to-SQL application stands as a robust example of modern software engineering and AI capabilities. This deep dive into the codebase highlights how each component plays a vital role in the application's functionality, showcasing the intricate balance between backend logic and frontend design.