Jeadie opened a new pull request, #23199:
URL: https://github.com/apache/datafusion/pull/23199

   ## Which issue does this PR close?
   
   None — filing the fix directly. Happy to open a tracking issue if 
maintainers prefer.
   
   ## Rationale for this change
   
   `EnforceSorting`/`EnsureRequirements` can produce a physically invalid plan 
that `SanityCheckPlan` rejects:
   
   ```
   ... does not satisfy order requirements: [score@1 DESC NULLS LAST, id@0 
ASC]. Child-0 order: []
   ```
   
   **Root cause.** In `sort_pushdown.rs::pushdown_requirement_to_children`, the 
fetch-forwarding branch
   
   ```rust
   } else if plan.fetch().is_some()
       && plan.supports_limit_pushdown()
       && plan.maintains_input_order().into_iter().all(|m| m)
   {
       ... Ok(Some(vec![Some(parent_required)]))   // forwards the requirement 
UNCHANGED
   }
   ```
   
   forwards the parent ordering requirement to the child **unchanged**. A 
`ProjectionExec` reports a `fetch()` (forwarded from its input) and can 
renumber/reorder columns, so the requirement — expressed in the projection's 
**output** schema — is pushed into the **child** schema verbatim. For a 
projection that reorders the sort key (e.g. output `[id, score, value]` over 
input `[id, value, score]`), `score@1` (valid above) is pushed down as 
`score@1` (a different column below). The relocated `SortExec` then advertises 
an ordering its child cannot provide, and `SanityCheckPlan` fails.
   
   This reproduces whenever a fetch-bearing global sort sits above a 
`CoalescePartitionsExec` over a column-reordering, order-preserving 
`ProjectionExec` whose input is already ordered — `parallelize_sorts` sinks the 
per-partition `SortExec` below the projection without remapping the key index.
   
   ## What changes are included in this PR?
   
   Remap the ordering requirement through the projection's column mapping 
before pushing it down (new `remap_requirement_through_projection` helper):
   
   - Each required column at output index `i` is rewritten to the column the 
projection produces at that index (`projection.expr()[i]`).
   - If any required column maps to a **computed** (non-`Column`) expression, 
the ordering cannot be expressed below the projection, so pushdown is declined 
and the sort is kept **above** the projection.
   - Hard/soft-ness and all alternatives of the `OrderingRequirements` are 
preserved; unsatisfiable alternatives are dropped.
   
   For non-reordering projections the remap is the identity, so existing plans 
are unchanged.
   
   ## Are these changes tested?
   
   Yes.
   
   - Unit tests for the remap helper: reordering remap, softness preservation, 
multiple hard alternatives, dropping an unsatisfiable alternative while keeping 
others, and declining when a required column is computed.
   - An end-to-end regression test 
(`test_parallelize_sorts_remaps_index_through_reordering_projection`) builds 
the multi-partition reordering-projection plan, runs `EnsureRequirements` with 
`repartition_sorts` enabled, and asserts the result passes `SanityCheckPlan`. 
It fails without this fix and passes with it.
   
   ## Are there any user-facing changes?
   
   No API changes. Plans that previously failed `SanityCheckPlan` (or produced 
incorrect orderings) for this shape are now valid; all other plans are 
unchanged.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to