zhuqi-lucas commented on PR #21828: URL: https://github.com/apache/datafusion/pull/21828#issuecomment-4349287217
Thanks @alamb for testing! You found the key issue — the optimization doesn't trigger because the single `hits.parquet` file is split into 16 file groups (byte-range partitioning by `target_partitions`). My `total_files > 1` guard treats this as "multi-file" and falls back to `GlobalLimitExec`. The fix: distinguish between "multiple physical files" (non-deterministic order) and "single file split into byte-range partitions" (deterministic order within file). For the latter, offset pushdown is safe since RG order within a single file is deterministic regardless of how it's partitioned. Will fix and push shortly — need to check file group count by unique file paths rather than total partition count. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
