Jeadie commented on PR #23199:
URL: https://github.com/apache/datafusion/pull/23199#issuecomment-4805319367

   ### Worked example
   #### Before `EnsureRequirements` (still valid)
   ```
     SortExec: TopK(fetch=4), expr=[score@1 DESC, a@0 ASC]        out: [a, 
score, value]
       CoalescePartitionsExec                                     out: [a, 
score, value]
         ProjectionExec: [a@0 as a, c@2 as score, b@1 as value]   out: [a, 
score, value]
           SortExec: TopK(fetch=1000), expr=[c@2 DESC]            out: [a, b, 
c, d, e]
             RepartitionExec: RoundRobinBatch                     out: [a, b, 
c, d, e]
               DataSourceExec                                     out: [a, b, 
c, d, e]
   ```
   
     Resolving the sort expressions:
     - Outer SortExec: its input is CoalescePartitionsExec → [a, score, value]. 
So score@1 = index 1 = score ✓.
     - ProjectionExec: its input is the inner sort → [a, b, c, d, e]. So c@2 = 
c ✓; output renames it to score at position 1.
     - Inner SortExec: input [a, b, c, d, e], c@2 = c ✓.
   
   #### After EnsureRequirements, without the fix (invalid)
   
     `parallelize_sorts` replaces `CoalescePartitions` + the global sort with a 
`SortPreservingMergeExec`, and pushes a per-partition `SortExec` below the 
projection; copying `expr=[score@1, a@0]` down unchanged:
   
   ```
     SortPreservingMergeExec: [score@1 DESC, a@0 ASC], fetch=4    out: [a, 
score, value]
       ProjectionExec: [a@0 as a, c@2 as score, b@1 as value]     out: [a, 
score, value]
         SortExec: TopK(fetch=4), expr=[score@1 DESC, a@0 ASC]    out: [a, b, 
c, d, e]   ◀── now BELOW the projection
           SortExec: TopK(fetch=1000), expr=[c@2 DESC]            out: [a, b, 
c, d, e]
             RepartitionExec: RoundRobinBatch                     out: [a, b, 
c, d, e]
               DataSourceExec                                     out: [a, b, 
c, d, e]
   ```
     The relocated `SortExec`'s input is now [a, b, c, d, e] (not [a, score, 
value]). So its `score@1` resolves against the wrong schema:
     - @0 = a ✓
     - @1 = b ✗  there is no score at this level; index 1 is b. The sort 
silently orders by b, not score.
   
   `SanityCheckPlan` rejects:
     - The `SortPreservingMergeExec` (top) requires its input ordered by 
`[score@1, a@0]` over `[a, score, value]`.
     - Its child projection derives its output ordering from the relocated 
sort, which orders by `[b@1, a@0]`. The projection maps `b@1 → value@2`, so it 
can only offer `[value@2, a@0]` upward.
     - Nothing satisfies `score@1`, and therefore raises `does not satisfy 
order requirements: [score@1 DESC NULLS LAST, a@0 ASC]. Child-0 order: []`.
   
   ####  After EnsureRequirements, with the fix (valid)
   The requirement is remapped as it crosses the projection (score@1 → c@2), so 
the relocated sort indexes correctly into [a, b, c, d, e]:
   ```
     SortPreservingMergeExec: [score@1 DESC, a@0 ASC], fetch=4    out: [a, 
score, value]
       ProjectionExec: [a@0 as a, c@2 as score, b@1 as value]     out: [a, 
score, value]
         SortExec: TopK(fetch=4), expr=[c@2 DESC, a@0 ASC]        out: [a, b, 
c, d, e]   ◀── remapped @1 -> @2
           SortExec: TopK(fetch=1000), expr=[c@2 DESC]            out: [a, b, 
c, d, e]
             RepartitionExec: RoundRobinBatch                     out: [a, b, 
c, d, e]
               DataSourceExec                                     out: [a, b, 
c, d, e]
   ```
     Now the relocated sort orders by `[c@2, a@0] = [c, a]`; the projection 
maps `c@2 → score@1`, so its output is ordered by `[score@1, a@0]` (what the 
`SortPreservingMergeExec` needs). Valid.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to