huaxingao commented on PR #14435: URL: https://github.com/apache/iceberg/pull/14435#issuecomment-3479034901
Thanks @shangxinli for the PR! At a high level, leveraging Parquet’s appendFile for row‑group merging is the right approach and a performance win. Making it opt‑in via an action option and a table property is appropriate. A couple of areas I’d like to discuss: - IO integration: Would it make sense to route IO through table.io()/OutputFileFactory rather than Hadoop IO? - Executor/driver split: Should executors only write files and return locations/sizes, with DataFiles (and metrics) constructed on the driver? I’d also like to get others’ opinions. @pvary @amogh-jahagirdar @nastra @singhpk234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
