sqd opened a new pull request, #16317:
URL: https://github.com/apache/iceberg/pull/16317

   parquet-java/parquet-mr supports this limit (config key 
parquet.block.row.count.limit) but Iceberg doesn't support setting it yet. This 
commit simply wires it up.
   
   Parquet.WriteBuilder has two terminal paths: when callers supply a 
createWriterFunc (Iceberg's ParquetValueWriter — the path every production 
engine integration uses, such as Spark rewrite_data_files), the build returns 
Iceberg's own ParquetWriter, which manages the row-group lifecycle itself and 
ignores parquet-mr's auto-roll. When callers supply a WriteSupport instead, we 
delegate to parquet-mr's ParquetWriter, which enforces row-group limits 
internally.
   
   The new property has to be wired into both: on the Iceberg path it is 
carried via ParquetProperties and consumed by an explicit recordCount check in 
ParquetWriter.checkSize(); on the parquet-mr path it is passed through 
ParquetWriteBuilder.withRowGroupRowCountLimit() and enforced by parquet-mr.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to