kosiew opened a new pull request, #21289:
URL: https://github.com/apache/datafusion/pull/21289

   
   ## Which issue does this PR close?
   
   * Part of #20002
   
   ---
   
   ## Rationale for this change
   
   This PR extracts and formalizes the null-restriction evaluation logic into a 
dedicated utility module to improve clarity, performance, and reviewability.
   
   Previously, null-restriction determination relied solely on evaluating 
physical expressions, which is more expensive and harder to reason about in 
isolation. Additionally, the logic was embedded in a broader set of changes, 
making it difficult to review independently.
   
   This change introduces a conservative syntactic fast path that can determine 
null-restriction behavior for common predicate shapes without executing 
expressions. This improves optimizer efficiency while maintaining correctness 
by falling back to the authoritative evaluation path when needed.
   
   The PR also explicitly guards against mixed-reference predicates (those 
referencing columns outside the provided join-column set), ensuring they are 
treated conservatively as non-restricting to avoid incorrect pruning.
   
   ---
   
   ## What changes are included in this PR?
   
   * Introduced new module `utils/null_restriction.rs` containing:
   
     * A syntactic null-restriction evaluator
     * Conservative pattern matching for common predicate shapes (column refs, 
IS NULL, IS NOT NULL, comparisons, AND/OR/NOT)
     * Clear semantics via `SyntacticNullRestriction` enum
   
   * Updated `is_restrict_null_predicate` in `utils.rs` to:
   
     * Add early return for mixed-reference predicates (columns outside join 
set)
     * Use the syntactic evaluator as a fast path when possible
     * Fall back to the authoritative physical-expression evaluation when needed
   
   * Improved internal documentation explaining:
   
     * Two-phase evaluation strategy (syntactic + authoritative)
     * Safety guarantees and conservative behavior
   
   * Added focused unit tests:
   
     * Mixed-reference predicate handling (ensuring non-restricting behavior)
     * Parity between syntactic and authoritative evaluators for supported cases
     * Coverage of boolean logic (AND/OR/NOT) and comparison operators
   
   ---
   
   ## Are these changes tested?
   
   Yes.
   
   This PR includes comprehensive unit tests that:
   
   1. Validate that the syntactic fast path agrees with the authoritative 
evaluator for supported predicate shapes
   2. Ensure mixed-reference predicates are conservatively treated as 
non-restricting
   3. Cover key SQL boolean semantics (AND, OR, NOT) and comparison operators
   4. Verify fallback behavior when the syntactic evaluator cannot determine 
the result
   
   These tests help prevent regressions and document expected behavior.
   
   ---
   
   ## Are there any user-facing changes?
   
   No.
   
   This change is internal to the optimizer and does not modify public APIs or 
user-visible behavior. It improves performance and maintainability without 
altering query results.
   
   ---
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to