ag1805x opened a new issue, #45645: URL: https://github.com/apache/arrow/issues/45645
### Describe the usage question you have. Please include as many useful details as possible. I'm working with 50 Parquet files (~800MB each) and need to perform a grouped summarization in R (group_by(colA, colB, colC)). When using arrow, I encounter memory issues (core dumped, bad_alloc). What is the best way to handle this large data without running into memory errors? The experimental batch processing seemed like an option but I will not be able to make batches by random sub-setting. Rather, it would be ideal to sub-set via the group_by columns. Is this possible? ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org