zhuqi-lucas opened a new issue, #21915: URL: https://github.com/apache/datafusion/issues/21915
## Background #21828 implements OFFSET pushdown for single-file parquet queries. Multi-file queries still use `GlobalLimitExec` for offset handling. ## Problem For multi-file queries like `SELECT * FROM directory/ LIMIT 5 OFFSET 1000000`, the offset is handled by `GlobalLimitExec` which reads all rows then discards the first 1M. With multiple files, we could skip entire files whose cumulative row count falls within the offset. ## Challenge File read order is non-deterministic with `target_partitions > 1` and dynamic scheduling (#21351). A shared counter (`Arc<AtomicUsize>`) across file openers could work for single-partition sequential reads, but multi-partition ordering is undefined. ## Proposed approach 1. Single partition (`preserve_order=true`): files read in deterministic order → shared counter tracks consumed offset across files → skip entire files + RGs 2. Multi-partition: keep `GlobalLimitExec` (order undefined without ORDER BY) 3. Use file-level statistics (`PartitionedFile.statistics.num_rows`) to skip entire files before opening ## Related - #21828 — Single-file OFFSET pushdown (parent PR) - #19654 — Original issue for OFFSET performance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
