[D] [Idea] LLM-powered smart retry operator for Airflow 3.x [airflow]

via GitHub Fri, 27 Mar 2026 13:29:42 -0700


GitHub user ertancelik created a discussion: [Idea] LLM-powered smart retry 
operator for Airflow 3.x


## Problem

Airflow's current retry mechanism is static — it waits the same fixed interval 
and retries blindly regardless of the error type. This causes:

- Rate limit errors being retried too fast (same error again)
- Auth errors being retried pointlessly (will never succeed)
- Network errors not being retried fast enough

## Proposed Solution

An operator that uses a local LLM (via Ollama) to analyze the error log and 
make an intelligent retry decision:

- **Should we retry at all?** (auth errors → no, fail fast)
- **How long should we wait?** (rate limits → 60s, network → 0s)
- **What type of error is this?** (rate_limit / network / auth / data_schema)

## Implementation

I built a working prototype as a standalone provider package:
👉 https://github.com/ertancelik/airflow-provider-smart-retry

Works with Airflow 3.x and Ollama (local LLM, no API key needed).

## Example
```python
from smart_retry.operator import LLMSmartRetryOperator

smart_task = LLMSmartRetryOperator(
    task_id="my_task",
    task_callable=my_function,
    ollama_base_url="http://localhost:11434";,
    model="llama3.1:8b",
    max_retries=3,
)
```

Would love to hear community feedback — is this something worth contributing 
to the core or as an official provider?

GitHub link: https://github.com/apache/airflow/discussions/64334

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

[D] [Idea] LLM-powered smart retry operator for Airflow 3.x [airflow]

Reply via email to