zhuqi-lucas opened a new issue, #21915:
URL: https://github.com/apache/datafusion/issues/21915

   ## Background
   
   #21828 implements OFFSET pushdown for single-file parquet queries. 
Multi-file queries still use `GlobalLimitExec` for offset handling.
   
   ## Problem
   
   For multi-file queries like `SELECT * FROM directory/ LIMIT 5 OFFSET 
1000000`, the offset is handled by `GlobalLimitExec` which reads all rows then 
discards the first 1M. With multiple files, we could skip entire files whose 
cumulative row count falls within the offset.
   
   ## Challenge
   
   File read order is non-deterministic with `target_partitions > 1` and 
dynamic scheduling (#21351). A shared counter (`Arc<AtomicUsize>`) across file 
openers could work for single-partition sequential reads, but multi-partition 
ordering is undefined.
   
   ## Proposed approach
   
   1. Single partition (`preserve_order=true`): files read in deterministic 
order → shared counter tracks consumed offset across files → skip entire files 
+ RGs
   2. Multi-partition: keep `GlobalLimitExec` (order undefined without ORDER BY)
   3. Use file-level statistics (`PartitionedFile.statistics.num_rows`) to skip 
entire files before opening
   
   ## Related
   
   - #21828 — Single-file OFFSET pushdown (parent PR)
   - #19654 — Original issue for OFFSET performance


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to