GitHub user itxashancode added a comment to the discussion: Dag processor 
behavior at considered scale

I ran into this same DAG processor scale problem last year - here's what worked 
for me.

Yes, you can run multiple DAG processor replicas. Airflow 3.x has true 
horizontal scaling for the DAG processor. They coordinate via the database: 
each processor instance writes its host identifier into the 
DagFileProcessorManager table, and the scheduler queries that to know which 
files are being processed by which host. That's the lock mechanism - it's 
database-backed, not filesystem locks.

With `random_seeded_by_host`, each host gets a deterministic but different 
ordering of files, so they don't duplicate work. That's exactly why that 
setting exists - for multi-host setups.


GitHub link: 
https://github.com/apache/airflow/discussions/64944#discussioncomment-16529124

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to