nbc opened a new issue, #41057: URL: https://github.com/apache/arrow/issues/41057
### Describe the enhancement requested I'm not sure it's a request or a bug but when using write_dataset, the resulting dataset can have very small rows_per_group resulting in very bad performance for almost many queries : at least 20 time slower and 20 time more memory for a big dataset of 10GB. Setting the min_rows_per_group to something around 100000L fixes the problem. Users are not all aware of `min_rows_per_group` parameter so setting a "good" default (if it exists) could help them very much. I'm not qualified enough to know if there's drawbacks. ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org