2010YOUY01 commented on PR #21859: URL: https://github.com/apache/datafusion/pull/21859#issuecomment-4324917344
> **`e.time >= w.start` (lower bound):** The build side is sorted by `start_time` ascending, and a monotonic boundary pointer advances forward as we process sorted probe rows. For probe value `v`, all build rows `[0..=boundary]` satisfy `start_time <= v`. This is amortized O(1) per probe row since the pointer only moves forward. > > **`e.time < w.end` (upper bound):** The boundary pointer already eliminates all intervals that haven't started yet, so the `end_time` check only runs on the remaining smaller window. Expired intervals are still visited but not emitted. This idea has been already implemented in https://github.com/apache/datafusion/blob/7fa6e2118b5567ccadf75623f11a65aa5ecfa57e/datafusion/physical-plan/src/joins/piecewise_merge_join/exec.rs#L68, though it's not working right now, it will work if the predicate only includes `e.time >= w.start`, but it requires some follow-up work for remaining filters handling (like `e.time < w.end`) > If needed, we can also sort by `end_time` as a secondary key to enable skipping expired intervals entirely — happy to explore this as a follow-up. I didn't include sorting by `end_time` thinking this wouldn't add much value since window is already getting smaller by start_time pointer. This is a good idea to explore in the future! I think `PiecewiseMergeJoin` is a more general solution for only one in-equality join condition, if there are more in-equality conditions, there might be other indexing scheme to accelerate those specific workloads. For coordination, I’m hoping to refactor PWMJ to build on top of the `JoinAccelerator` API: https://github.com/apache/datafusion/pull/21851#issuecomment-4321635395. I’ll share a more detailed write-up soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
