westonpace opened a new issue, #34135: URL: https://github.com/apache/arrow/issues/34135
### Describe the enhancement requested Now that we are starting to introduce formal ordering we can create an AsofJoinNode variant that works even if use_threads is true. A rough overview of the algorithm: * Require all inputs to be sequenced on the `on` field in ascending order * As each batch arrives put it into a sequenced queue. In the `process` step we should accumulate the batch in an accumulation vector for each input. Then, still in the process step, we record the latest on time and store it with the batch. Then determine if we have enough data in all inputs to process. If so, we create a task to asof join all those batches. This may require a binary search into each input to make sure we cut at a clean point but that should be pretty quick. For example ``` // Accumulate queues just after insert into R0 LEFT | R0 | R1 B t=300 | | B t=150 | | B t=500 B t=100 | B t=200 | B t=20 // Task to process LEFT | R0 | R1 B t=200 | | B t=150 | | B t=200 B t=100 | B t=200 | B t=20 // Remaining accumulation queue state LEFT | R0 | R1 B t=300 (sliced) | | B t=500 (sliced) ``` ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org