Dandandan commented on issue #21543: URL: https://github.com/apache/datafusion/issues/21543#issuecomment-4229100418
Using https://docs.rs/arrow/latest/arrow/compute/struct.BatchCoalescer.html might also help a bit (instead of using `concat_batches`). (It still uses `take` but avoids the double memory usage caused by `concat_batches`. Additionally it allows doing the copying earlier, which also reduces the "final" allocation/CPU spike and might also be a bit more efficient as batch is likely be in CPU cache when processed (and not when doing `concat_batches` on a large number of batches), spreading out deallocations, etc. (There is potential to make it even faster by fusing `take` + insert, but that is yet to be implemented). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
