2010YOUY01 commented on PR #21859:
URL: https://github.com/apache/datafusion/pull/21859#issuecomment-4324917344

   > **`e.time >= w.start` (lower bound):** The build side is sorted by 
`start_time` ascending, and a monotonic boundary pointer advances forward as we 
process sorted probe rows. For probe value `v`, all build rows `[0..=boundary]` 
satisfy `start_time <= v`. This is amortized O(1) per probe row since the 
pointer only moves forward.
   > 
   > **`e.time < w.end` (upper bound):** The boundary pointer already 
eliminates all intervals that haven't started yet, so the `end_time` check only 
runs on the remaining smaller window. Expired intervals are still visited but 
not emitted.
   
   
   This idea has been already implemented in 
https://github.com/apache/datafusion/blob/7fa6e2118b5567ccadf75623f11a65aa5ecfa57e/datafusion/physical-plan/src/joins/piecewise_merge_join/exec.rs#L68,
 though it's not working right now, it will work if the predicate only includes 
`e.time >= w.start`, but it requires some follow-up work for remaining filters 
handling (like `e.time < w.end`)
    
   > If needed, we can also sort by `end_time` as a secondary key to enable 
skipping expired intervals entirely — happy to explore this as a follow-up. I 
didn't include sorting by `end_time` thinking this wouldn't add much value 
since window is already getting smaller by start_time pointer.
   
   This is a good idea to explore in the future! I think `PiecewiseMergeJoin` 
is a more general solution for only one in-equality join condition, if there 
are more in-equality conditions, there might be other indexing scheme to 
accelerate those specific workloads.
   
   For coordination, I’m hoping to refactor PWMJ to build on top of the 
`JoinAccelerator` API: 
https://github.com/apache/datafusion/pull/21851#issuecomment-4321635395. I’ll 
share a more detailed write-up soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to