GitHub user ertancelik created a discussion: [Idea] LLM-powered smart retry operator for Airflow 3.x
## Problem Airflow's current retry mechanism is static — it waits the same fixed interval and retries blindly regardless of the error type. This causes: - Rate limit errors being retried too fast (same error again) - Auth errors being retried pointlessly (will never succeed) - Network errors not being retried fast enough ## Proposed Solution An operator that uses a local LLM (via Ollama) to analyze the error log and make an intelligent retry decision: - **Should we retry at all?** (auth errors → no, fail fast) - **How long should we wait?** (rate limits → 60s, network → 0s) - **What type of error is this?** (rate_limit / network / auth / data_schema) ## Implementation I built a working prototype as a standalone provider package: 👉 https://github.com/ertancelik/airflow-provider-smart-retry Works with Airflow 3.x and Ollama (local LLM, no API key needed). ## Example ```python from smart_retry.operator import LLMSmartRetryOperator smart_task = LLMSmartRetryOperator( task_id="my_task", task_callable=my_function, ollama_base_url="http://localhost:11434", model="llama3.1:8b", max_retries=3, ) ``` Would love to hear community feedback — is this something worth contributing to the core or as an official provider? GitHub link: https://github.com/apache/airflow/discussions/64334 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
