2010YOUY01 commented on PR #21851: URL: https://github.com/apache/datafusion/pull/21851#issuecomment-4321635395
Thank you—this is an exciting optimization! I am working on a general infrastructure for NLJ dynamic filters and custom build index that could help simplify this implementation. Would you (and other reviewers) be open to waiting until I submit that PR next 1-2 weeks, so we can coordinate and collaborate on this? I’d appreciate any thoughts on this direction! Here is the preview and WIP draft: The core idea is that, most specialized joins (e.g., Piecewise Merge Join, IEJoin, Spatial Join, Array Set Joins) follow a standard pattern: 1. Buffer: Collect all build-side data. 2. Probe: Iterate row-by-row. Specialization typically only requires: - Custom Dynamic Filters: To reduce probe-side size (as seen in this PR). - Custom Indices: To accelerate the probing process. Taking this PR as example, beyond the dynamic filter implemented, if we know a window range has a fixed maximum span, we could sort the build side and use a custom index to accelerate the probe further. So I'm hoping to add a common trait to support both custom dynamic filter and custom runtime index. Introducing a common extension point can make adding similar optimizations easier -- only a small trait need to be implemented to specify how to build/probe index, how to build dynamic filters, for each specialization, and we won't need to touch the join core state machine each time. I have a WIP draft of this infrastructure here (only refactor and API rough shape is done, still working on adding a example implementation for both custom index and dynamic filter): https://github.com/2010YOUY01/arrow-datafusion/tree/join-accelerator -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
