Hello everyone,

Since I first opened the thread discussing the MCP server, I've been thinking 
for a long while about how we can practically bring AI-assisted debugging and 
operational insights directly into Airflow. As orchestration environments grow 
more complex, the cost of troubleshooting, such as navigating across Dag code, 
task instances, scheduler logs, and configurations, translates directly to lost 
on-call time and delayed pipelines.

Today, organizations that want AI-assisted debugging are forced to build 
custom, ad-hoc integrations or rely on external paid solutions. This leads to 
fragmented user experiences, duplicated effort, and most critically, 
inconsistent security controls that risk exposing sensitive metadata or 
bypassing Airflow's native Role-Based Access Control (RBAC).

I think we can do better, by proposing AIP-101: Airflow AI Assistant - Phase 1 
(Read-only assistance). This AIP introduces an official, opt-in plugin that 
provides a conversational UI directly within Airflow to answer user questions 
about their instances, explain errors, and help troubleshoot failures.

To ensure this is done safely and securely, Phase 1 is strictly read-only. The 
assistant does not modify Airflow state, nor does it operate autonomously. 
Instead, it relies on the newly proposed Airflow MCP Server (AIP-91) as its 
data-retrieval engine. By leveraging the MCP standard, the assistant guarantees 
that its answers are grounded in live system state while strictly enforcing the 
authenticated user's RBAC permissions so the AI never accesses data the user 
cannot see.

tl;dr of the proposed implementation:

Packaging: Delivered as an opt-in, standalone plugin package within the 
apache/airflow monorepo (with an independent release cycle).
Frontend: A conversational UI embedded directly in the Airflow web interface.
Backend: A FastAPI-based plugin backend utilizing pydantic-ai to safely 
orchestrate external LLM calls.
Data Access: Relies entirely on the Airflow MCP Server (AIP-91) to fetch 
read-only state.

---

Because the assistant is heavily coupled with the secure tool-calling execution 
provided by the MCP server, which is covered in a separate AIP (AIP-91) - 
please note the ongoing discussion here, as well as AIP-91 itself: 
https://lists.apache.org/thread/xgd66v6s7zf0xkvy3c7ysqvn4csgmw06
https://cwiki.apache.org/confluence/x/G4q3FQ

---

AIP-101 is available here:
https://cwiki.apache.org/confluence/x/8Ic8G

A quick warning before you read: the AIP is quite long! (sorry Jarek)
Because integrating AI into an orchestrator opens up a lot of potential 
pitfalls, [ChatGPT and] I tried to be extremely thorough in covering all the 
possible stuff that could go wrong :)
If you find a specific section to be overly detailed or repetitive, please 
comment in the AIP and I'll try to handle it.

I've managed to build a very inital POC, screenshots are available in this 
section:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620144#AIP101AirflowAIAssistantPhase1(Readonlyassistance)-BehavioralModel

I would love to hear your thoughts. Please comment on the AIP and/or reply to 
this thread.

Thank you,

Shahar

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to