ved-kashyap-samsung opened a new issue, #32408:
URL: https://github.com/apache/superset/issues/32408
## Motivation
The goal of this proposal is to introduce a new feature into Apache Superset
that leverages Large Language Models (LLMs) to provide advanced dashboard and
chart summarization capabilities. This feature aims to enhance user experience
by enabling natural language query support, automated summarization, and
intelligent chart selection based on user queries. The proposed feature will
also reduce dependency on less accurate NL-to-SQL conversion models by directly
utilizing LLMs for query processing.
## Proposed Change
### Overview
We propose the integration of an LLM-based agentic architecture into
Superset to enable the following capabilities:
1. Natural language query support for dashboards and charts.
2. Intelligent selection of appropriate charts based on natural language
queries.
3. Automated summarization of chart data and dashboards.
4. Automated text-based reporting based on predefined KPIs and schedules.
5. Graceful handling of scenarios where relevant charts are not found for a
given query.
### Implementation Details
1. **LLM Integration**:
- Integrate an LLM capable of understanding and processing natural
language queries.
- Develop an agent-based system where LLMs can perform actions such as
selecting relevant charts based on user query, fetching relevant SQL queries
from existing APIs, applying filters, running final SQL query and getting
results, and finally summarizing results and generating insights.
2. **Natural Language Query Support**:
- Add input fields at both dashboard and chart levels to support natural
language queries.
- Implement backend services to process these queries using the LLM.
3. **Chart and Dashboard Summarization**:
- Provide summarization options in the chart menu based on the loaded
data.
- Implement automated text-based reporting for dashboards using cron jobs
for predefined KPIs.
4. **Intelligent Chart Selection**:
- Develop mechanisms for LLMs to pick the correct chart based on chart
names or associated metadata.
- Ensure graceful handling when no relevant charts are found for a query.
5. **Feature Flags**:
- Enable the feature using feature flags to allow users to opt-in on a
per-user basis.
### Mockups and Screenshots
*Mockups and screenshots will be added here once the design phase is
complete.*
## New or Changed Public Interfaces
1. **REST Endpoints**:
- New endpoints for processing natural language queries and returning
summarized results.
2. **React Components**:
- New input fields for natural language queries at the dashboard and
chart levels.
- Updated chart menu with summarization options.
3. **Configuration**:
- Configuration options for enabling/disabling the feature using feature
flags.
4. **CLI Changes**:
- New CLI commands for managing LLM-related configurations and feature
flags.
## New dependencies
1. **LLM Libraries**:
- We will integrate with existing LLM libraries such as Hugging Face's
Meta-Llama-3-8B
- Ensure compatibility with Apache License v2.0.
2. **Other Dependencies**:
- Additional Python packages for natural language processing and machine
learning (e.g., NLTK, spaCy).
## Migration Plan and Compatibility
1. **Database Migrations**:
- No database migrations are required for this feature.
2. **Compatibility**:
- Ensure that existing dashboards and charts continue to function without
any changes.
- Provide a seamless upgrade path with clear documentation on enabling
and using the new feature.
3. **Deprecation Strategy**:
- Allow the new feature to coexist with existing NL-to-SQL conversion
models during a deprecation period.
- Provide clear documentation and migration guides for users
transitioning to the new system.
## Rejected Alternatives
1. **Enhancing Existing NL-to-SQL Models**:
- While enhancing existing NL-to-SQL models could improve accuracy, it
would require significant effort in model training and fine-tuning. The
LLM-based approach offers a more flexible and scalable solution.
2. **Rule-Based Systems**:
- Rule-based systems lack the flexibility to handle the wide variety of
natural language queries effectively. LLMs provide a more robust solution by
understanding context and intent.
By integrating LLM-based agentic architecture into Superset, we can
significantly enhance the user experience with advanced natural language
processing capabilities, making it easier for users to interact with their data
and generate insights.
---
This SIP is now open for discussion. Please subscribe and provide your
feedback here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]