zhuqi-lucas commented on PR #21182:
URL: https://github.com/apache/datafusion/pull/21182#issuecomment-4181485812

   Strange — I tested locally (release build, --partitions 12 and --partitions 
16) and found:
   
   1. **Plans are identical** between main and PR for all 4 queries (SPM → 
DataSourceExec, no SortExec in either case)
   2. **Performance is identical** after multiple warm iterations:
   
   ```
   --partitions 12, release, 5 iterations:
   Q1: main 112ms vs PR 108ms (~same)
   Q2: main 2.6ms vs PR 2.4ms (~same)
   Q3: main 300ms vs PR 299ms (~same)
   Q4: main 6.3ms vs PR 6.0ms (~same)
   ```
   
   The GKE benchmark runs main and PR on different instances (different machine 
names in bot output), which could explain the consistent per-run variance. Our 
code doesn't trigger in this scenario because `EnforceSorting` already 
eliminates `SortExec` after byte-range splitting creates single-file groups.
   
   The optimization triggers when a partition has **multiple files in wrong 
order** (e.g., `--partitions 1` or `split_file_groups_by_statistics=true`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to