Re: [PR] Add ParquetFileMerger for efficient row-group level file merging [iceberg]

via GitHub Mon, 03 Nov 2025 07:33:35 -0800


pvary commented on PR #14435:
URL: https://github.com/apache/iceberg/pull/14435#issuecomment-3481147661


   I have a few concerns here:
   - I would prefer if the decision to do the row-group level merging is done 
on the action level, and not leaked to the table properties
   - I would prefer to check the requirements as soon as possible and fail, or 
fall back with logging to the normal rewrite if the requirements are not met
   - In the planning we can create groups with the expected sizes, and in this 
case the runner could rewrite the whole groups, and don't need to split the 
planned groups to the expected file sizes
   - Always using HadoopFileIO could be problematic. The catalog might define a 
different FileIO implementation. We should handle the case correctly, and use 
the Catalog/Table provided FileIo
   - We don't reuse the `ParquetFileMerger` object. In this case, I usually 
prefer to use static methods.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add ParquetFileMerger for efficient row-group level file merging [iceberg]

Reply via email to