SubhamSinghal opened a new pull request, #21851:
URL: https://github.com/apache/datafusion/pull/21851
## Which issue does this PR close?
- Closes [#16973](https://github.com/apache/datafusion/issues/16973)
(partial — adds dynamic filter support for NestedLoopJoinExec)
## Rationale for this change
NestedLoopJoinExec handles non-equi joins (range, temporal, inequality)
but currently reads ALL probe-side data even
when the build side has a narrow range. For example:
```sql
SELECT * FROM events e JOIN windows w ON e.ts BETWEEN w.start AND w.end
```
If the build side (windows) has start values in [100, 300] and end values
in [150, 400], the probe scan reads all
events even though only events with ts in [100, 400] can possibly match.
With dynamic filters, the probe scan can skip
row groups outside this range.
HashJoinExec already supports dynamic filter pushdown for equi-joins. This
PR extends the same mechanism to NLJ for
non-equi joins by analyzing the JoinFilter expression to derive bounds
from build-side data.
## What changes are included in this PR?
New: nlj_filter_analysis.rs — Expression analysis module that walks the
JoinFilter expression tree to extract
(probe_col, operator, build_col) pairs and derive probe-side bounds:
- extract_bound_pairs() — finds BinaryExpr comparisons between build and
probe columns
- compute_build_bounds() — computes min/max from the merged build batch
- build_probe_predicate() — converts bounds into probe-side filter (e.g.,
ts >= 100 AND ts <= 400)
Modified: nested_loop_join.rs:
- NLJDynamicFilter struct holding DynamicFilterPhysicalExpr and extracted
bound pairs
- gather_filters_for_pushdown() — creates dynamic filter in Post phase,
routes parent filters
- handle_child_pushdown_result() — captures the filter Arc reference
- handle_buffering_left() — after build data is ready, computes bounds and
pushes filter before probe starts
## Are these changes tested?
Yes, with UT
## Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]