edgarRd commented on code in PR #11258: URL: https://github.com/apache/iceberg/pull/11258#discussion_r1792071351
########## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ########## @@ -185,9 +190,17 @@ public List<Long> splitOffsets() { return null; } + private long estimateBufferSize() { + if (totalRowGroupSize == 0 || totalBufferSize == 0) { Review Comment: Should this also fallback to `writeStore.getBufferedSize()` when `currentBufferSize == 0`? Likely it'd be `0` since I don't see other case, but I feel that'd make it consistent on the fallbacks. ########## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ########## @@ -185,9 +190,17 @@ public List<Long> splitOffsets() { return null; } + private long estimateBufferSize() { Review Comment: Rename `estimateBufferSize` to `estimateBufferedSize` for naming consistency. ########## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ########## @@ -185,9 +190,17 @@ public List<Long> splitOffsets() { return null; } + private long estimateBufferSize() { Review Comment: Also, can we add a few comments about the estimation logic? i.e. what the ratio `totalRowGroupSize / totalBufferSize` means as estimating the ratio of size reduction to match closely what will be written out encoded/compressed. ########## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ########## @@ -66,6 +66,9 @@ class ParquetWriter<T> implements FileAppender<T>, Closeable { private boolean closed; private ParquetFileWriter writer; private int rowGroupOrdinal; + private long currentBufferSize = 0; Review Comment: Since this tracks mostly what is being added in memory directly on each `add`, I'd suggest renaming to `currentRawBufferedSize` or something alike. ########## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ########## @@ -66,6 +66,9 @@ class ParquetWriter<T> implements FileAppender<T>, Closeable { private boolean closed; private ParquetFileWriter writer; private int rowGroupOrdinal; + private long currentBufferSize = 0; + private long totalBufferSize = 0; Review Comment: Same here, given this is the sum of the raw count of bytes on each add, perhaps something like `totalRawBufferedSize`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org