2010YOUY01 commented on PR #21859:
URL: https://github.com/apache/datafusion/pull/21859#issuecomment-4323659303
> * **`IntervalJoinExec`** (`physical-plan/src/joins/interval_join.rs`):
sort-merge range join operator for inner joins
> with point-in-interval pattern. Collects + sorts build side by low
bound, sorts each probe batch internally, uses a
> monotonic boundary pointer for amortized O(1) per-probe-row matching.
> * **Pattern detection** (`core/src/physical_planner.rs`): detects
`probe_col {>= | >} build_low AND probe_col {< | <=} build_high` in the
non-equi join path. Handles flipped operands and reordered conditions. Falls
through to NLJ if
> pattern doesn't match.
This is a cool idea. I got a question: for this predicate `e.time >= w.start
AND e.time < w.end`, let's say `e.time >= w.start` can be handled efficiently
with monotonic pointer (amortized O(1)), do we have some mechanism to handle
the residual filter `e.time < w.end` also efficiently? Or we can only do the
first stage filter efficiently, the residual filter still has to be evaluated
row-by-row.
The existing Piecewise Merge join has already implemented the above idea,
though the residual filter has't been implemented yet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]