adamreeve opened a new pull request, #47032: URL: https://github.com/apache/arrow/pull/47032
### Rationale for this change Ensures Parquet pages are written when the buffered data reaches the configured page size when page indexes are enabled and a column is repeated, while also ensuring pages are only split on record boundaries. Without this fix, page sizes can grow unbounded until the row group is closed. ### What changes are included in this PR? Fixes off-by-one error in logic to control when pages can be written. ### Are these changes tested? Yes, added a new unit test. ### Are there any user-facing changes? **This PR contains a "Critical Fix".** This bug could cause a crash when writing a large number of rows of a repeated column and reaching a page size > max int32. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
