maxburke opened a new pull request, #23032: URL: https://github.com/apache/datafusion/pull/23032
## Which issue does this PR close? Closes issue #23031 ## Rationale for this change We run into two problems when operating on datasets with approximately 60 million rows: 1. First, we get OOM killed on machines with 64gb or less of memory 2. Second, on machines with more than 64gb, we overflow string array offsets during the record batch concatenation in the core of the join. ## What changes are included in this PR? This removes record batch concatenation from several joins (hash join, nested loop join, piecewise merge join) ## Are these changes tested? Yes ## Are there any user-facing changes? I sure hope not! (no) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
