adamreeve opened a new pull request, #47032:
URL: https://github.com/apache/arrow/pull/47032

   ### Rationale for this change
   
   Ensures Parquet pages are written when the buffered data reaches the 
configured page size when page indexes are enabled and a column is repeated, 
while also ensuring pages are only split on record boundaries.
   
   Without this fix, page sizes can grow unbounded until the row group is 
closed.
   
   ### What changes are included in this PR?
   
   Fixes off-by-one error in logic to control when pages can be written.
   
   ### Are these changes tested?
   
   Yes, added a new unit test.
   
   ### Are there any user-facing changes?
   
   **This PR contains a "Critical Fix".**
   
   This bug could cause a crash when writing a large number of rows of a 
repeated column and reaching a page size > max int32.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to