westonpace opened a new issue, #34135:
URL: https://github.com/apache/arrow/issues/34135

   ### Describe the enhancement requested
   
   Now that we are starting to introduce formal ordering we can create an 
AsofJoinNode variant that works even if use_threads is true.  A rough overview 
of the algorithm:
   
    * Require all inputs to be sequenced on the `on` field in ascending order
    * As each batch arrives put it into a sequenced queue.  In the `process` 
step we should accumulate the batch in an accumulation vector for each input.  
Then, still in the process step, we record the latest on time and store it with 
the batch. Then determine if we have enough data in all inputs to process.  If 
so, we create a task to asof join all those batches.  This may require a binary 
search into each input to make sure we cut at a clean point but that should be 
pretty quick.
   
   For example
   
   ```
   // Accumulate queues just after insert into R0
   LEFT    | R0      | R1
   B t=300 |         |
   B t=150 |         | B t=500
   B t=100 | B t=200 | B t=20
   
   // Task to process
   LEFT    | R0      | R1
   B t=200 |         |
   B t=150 |         | B t=200
   B t=100 | B t=200 | B t=20
   
   // Remaining accumulation queue state
   LEFT             | R0      | R1
   B t=300 (sliced) |         | B t=500 (sliced)
   
   ```
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to