Dandandan commented on issue #21543:
URL: https://github.com/apache/datafusion/issues/21543#issuecomment-4229100418

   Using https://docs.rs/arrow/latest/arrow/compute/struct.BatchCoalescer.html 
   might also help a bit (instead of using `concat_batches`).
   
   (It still uses `take` but avoids the double memory usage caused by 
`concat_batches`.
   Additionally it allows doing the copying earlier, which also reduces the 
"final" allocation/CPU spike and might also be a bit more efficient as batch is 
likely be in CPU cache when processed (and not when doing `concat_batches` on a 
large number of batches), spreading out deallocations, etc.
   
   (There is potential to make it even faster by fusing `take` + insert, but 
that is yet to be implemented).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to