badalprasadsingh opened a new pull request, #524: URL: https://github.com/apache/iceberg-go/pull/524
# Partitioned Fanout Writer with Rolling Data File Support (Append Mode) This PR completes the implementation of partitioned writing with support for rolling data files in append mode. It enables efficient, parallelized ingestion into partitioned tables while maintaining manifest and snapshot correctness. **Slack Thread Discussion**: [Link](https://apache-iceberg.slack.com/archives/C05J3MJ42BD/p1751002533414969) **Proposal Document**: [Google Drive](https://drive.google.com/file/d/18CwR9nhwkThs-Q-JZZvisBEaDICvp5Z7/view?usp=drive_link) ### Details * Introduced parallel processing of `arrow.Record` using a user-defined number of goroutines. * Each goroutine maintains its own hash table to map partition keys to row indices. * After partitioning, `compute.Take()` is used to extract per-partition data slices. * Integrated dedicated rolling writers per partition to manage data file size thresholds and output constraints. ### Tests Performed * [x] Compatible with all partition transforms * [x] Handled null values in partition columns * [x] Validated compatibility with partition spec evolution * [x] Verified correctness for non-linear transformation cases * [x] Confirmed schema evolution compatibility * [x] Partition pruning verified --- @zeroshade — would appreciate your review when you get a chance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
