2010YOUY01 commented on PR #21859:
URL: https://github.com/apache/datafusion/pull/21859#issuecomment-4323659303

   > * **`IntervalJoinExec`** (`physical-plan/src/joins/interval_join.rs`): 
sort-merge range join operator for inner joins
   >   with point-in-interval pattern. Collects + sorts build side by low 
bound, sorts each probe batch internally, uses a
   >   monotonic boundary pointer for amortized O(1) per-probe-row matching.
   > * **Pattern detection** (`core/src/physical_planner.rs`): detects 
`probe_col {>= | >} build_low AND probe_col {< | <=}   build_high` in the 
non-equi join path. Handles flipped operands and reordered conditions. Falls 
through to NLJ if
   >   pattern doesn't match.
   
   This is a cool idea. I got a question: for this predicate `e.time >= w.start 
AND e.time < w.end`, let's say `e.time >= w.start` can be handled efficiently 
with monotonic pointer (amortized O(1)), do we have some mechanism to handle 
the residual filter `e.time < w.end` also efficiently? Or we can only do the 
first stage filter efficiently, the residual filter still has to be evaluated 
row-by-row.
   
   The existing Piecewise Merge join has already implemented the above idea, 
though the residual filter has't been implemented yet. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to