pitrou opened a new issue, #47030: URL: https://github.com/apache/arrow/issues/47030
### Describe the enhancement requested Both the Rust and Java implementations limit the number of rows written per page: * Rust: https://github.com/apache/arrow-rs/blob/3126dad0348035bc5fadc8ec61b7150b9ce6aad5/parquet/src/file/properties.rs#L42 * Java: https://github.com/apache/parquet-java/blob/4aa2ea91863274aebb1eded243ce275912c16010/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L61 They do this in addition to trying to keep the page size under 1 MB. This allows keeping the actual page size to a much smaller value. However, in Parquet C++ we only have the 1 MB page size limit, but do not limit the number of rows written. This can result in much larger pages than with other implementations. Large pages can have several problems: 1) less CPU cache efficiency when reading, decompressing, etc. 2) less fine-grained page pruning using predicate pushdown 3) larger intermediate buffers, leading to a significant [increase in memory consumption](https://github.com/apache/arrow/issues/46971) if there are many columns to read ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org