Shekharrajak opened a new issue, #16430:
URL: https://github.com/apache/iceberg/issues/16430

   ### Feature Request / Improvement
   
   SparkBatchQueryScan stores sort_order_id per file in manifests but never 
implements SupportsReportOrdering, so BatchScanExec.outputOrdering always 
returns Nil
   
   We can implement SupportsReportOrdering in SparkBatchQueryScan. Return the 
table's current SortOrder (converted via SortOrderToSpark) when all planned 
FileScanTasks share the same non-zero sort_order_id
   
   This will benefit by Eliminating pre-sort in sort-merge joins, ordered 
aggregations, and MOR compaction reads when the table has a defined sort order 
and all files are sorted consistently.
   
   ```
   CREATE TABLE db.events (user_id BIGINT, event_time TIMESTAMP)
   USING iceberg WRITE ORDERED BY event_time;
   
   INSERT INTO db.events SELECT * FROM source;
   
   EXPLAIN SELECT * FROM db.events ORDER BY event_time;
   -- Today:  Sort[event_time] → BatchScanExec (outputOrdering=Nil)
   -- After:  BatchScanExec (outputOrdering=[event_time ASC])  — Sort eliminated
   ```
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [x] I can contribute this improvement/feature independently
   - [x] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to