kosiew opened a new pull request, #21289:
URL: https://github.com/apache/datafusion/pull/21289
## Which issue does this PR close?
* Part of #20002
---
## Rationale for this change
This PR extracts and formalizes the null-restriction evaluation logic into a
dedicated utility module to improve clarity, performance, and reviewability.
Previously, null-restriction determination relied solely on evaluating
physical expressions, which is more expensive and harder to reason about in
isolation. Additionally, the logic was embedded in a broader set of changes,
making it difficult to review independently.
This change introduces a conservative syntactic fast path that can determine
null-restriction behavior for common predicate shapes without executing
expressions. This improves optimizer efficiency while maintaining correctness
by falling back to the authoritative evaluation path when needed.
The PR also explicitly guards against mixed-reference predicates (those
referencing columns outside the provided join-column set), ensuring they are
treated conservatively as non-restricting to avoid incorrect pruning.
---
## What changes are included in this PR?
* Introduced new module `utils/null_restriction.rs` containing:
* A syntactic null-restriction evaluator
* Conservative pattern matching for common predicate shapes (column refs,
IS NULL, IS NOT NULL, comparisons, AND/OR/NOT)
* Clear semantics via `SyntacticNullRestriction` enum
* Updated `is_restrict_null_predicate` in `utils.rs` to:
* Add early return for mixed-reference predicates (columns outside join
set)
* Use the syntactic evaluator as a fast path when possible
* Fall back to the authoritative physical-expression evaluation when needed
* Improved internal documentation explaining:
* Two-phase evaluation strategy (syntactic + authoritative)
* Safety guarantees and conservative behavior
* Added focused unit tests:
* Mixed-reference predicate handling (ensuring non-restricting behavior)
* Parity between syntactic and authoritative evaluators for supported cases
* Coverage of boolean logic (AND/OR/NOT) and comparison operators
---
## Are these changes tested?
Yes.
This PR includes comprehensive unit tests that:
1. Validate that the syntactic fast path agrees with the authoritative
evaluator for supported predicate shapes
2. Ensure mixed-reference predicates are conservatively treated as
non-restricting
3. Cover key SQL boolean semantics (AND, OR, NOT) and comparison operators
4. Verify fallback behavior when the syntactic evaluator cannot determine
the result
These tests help prevent regressions and document expected behavior.
---
## Are there any user-facing changes?
No.
This change is internal to the optimizer and does not modify public APIs or
user-visible behavior. It improves performance and maintainability without
altering query results.
---
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]