GitHub user itxashancode added a comment to the discussion: Dag processor behavior at considered scale
I ran into this same DAG processor scale problem last year - here's what worked for me. Yes, you can run multiple DAG processor replicas. Airflow 3.x has true horizontal scaling for the DAG processor. They coordinate via the database: each processor instance writes its host identifier into the DagFileProcessorManager table, and the scheduler queries that to know which files are being processed by which host. That's the lock mechanism - it's database-backed, not filesystem locks. With `random_seeded_by_host`, each host gets a deterministic but different ordering of files, so they don't duplicate work. That's exactly why that setting exists - for multi-host setups. GitHub link: https://github.com/apache/airflow/discussions/64944#discussioncomment-16529124 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
