Shekharrajak opened a new pull request, #16454:
URL: https://github.com/apache/iceberg/pull/16454
Ref #16430
Iceberg stores `sort_order_id` per data file in the manifest, but the Spark
scan never advertises this to the query planner. Spark inserts a redundant
`Sort` node above every Iceberg scan, even when every file in the snapshot was
written sorted.
The **write side** has long produced sorted files via
`RequiresDistributionAndOrdering` (#2165, #3720, #7637) and tags each file with
its `sort_order_id` (#15150, #15832, #16308). The **read side** never closes
the loop — that's what this PR fixes.
`SparkPartitioningAwareScan` now implements `SupportsReportOrdering` (Spark
3.3+ API). `outputOrdering()` returns the table's current `SortOrder`
(converted via `SortOrderToSpark`) .
Example :
```
CREATE TABLE events (user_id BIGINT, event_time TIMESTAMP) USING iceberg;
ALTER TABLE events WRITE ORDERED BY event_time;
INSERT INTO events SELECT * FROM source;
EXPLAIN SELECT user_id, event_time,
ROW_NUMBER() OVER (ORDER BY event_time) AS rn
FROM events;
```
Before:
```
Window [row_number() OVER (ORDER BY event_time ASC)]
+- Sort [event_time ASC NULLS FIRST], false, 0 ← redundant
+- Exchange SinglePartition
+- BatchScan events ← sort_order_id=1,
ignored
```
After this change:
```
Window [row_number() OVER (ORDER BY event_time ASC)]
+- Exchange SinglePartition
+- BatchScan events ←
outputOrdering=[event_time ASC]
(Sort eliminated)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]