Re: [PR] Add ParquetFileMerger for efficient row-group level file merging [iceberg]

via GitHub Sun, 02 Nov 2025 22:10:38 -0800


huaxingao commented on PR #14435:
URL: https://github.com/apache/iceberg/pull/14435#issuecomment-3479034901


   Thanks @shangxinli for the PR! At a high level, leveraging Parquet’s 
appendFile for row‑group merging is the right approach and a performance win. 
Making it opt‑in via an action option and a table property is appropriate. 
   
   A couple of areas I’d like to discuss:
   
   - IO integration: Would it make sense to route IO through 
table.io()/OutputFileFactory rather than Hadoop IO?
   - Executor/driver split: Should executors only write files and return 
locations/sizes, with DataFiles (and metrics) constructed on the driver?
   
   I’d also like to get others’ opinions. @pvary @amogh-jahagirdar @nastra 
@singhpk234 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add ParquetFileMerger for efficient row-group level file merging [iceberg]

Reply via email to