zhuqi-lucas commented on PR #21828:
URL: https://github.com/apache/datafusion/pull/21828#issuecomment-4349287217

   Thanks @alamb for testing! You found the key issue — the optimization 
doesn't trigger because the single `hits.parquet` file is split into 16 file 
groups (byte-range partitioning by `target_partitions`). My `total_files > 1` 
guard treats this as "multi-file" and falls back to `GlobalLimitExec`.
   
   The fix: distinguish between "multiple physical files" (non-deterministic 
order) and "single file split into byte-range partitions" (deterministic order 
within file). For the latter, offset pushdown is safe since RG order within a 
single file is deterministic regardless of how it's partitioned.
   
   Will fix and push shortly — need to check file group count by unique file 
paths rather than total partition count.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to