fqaiser94 opened a new pull request, #9860: URL: https://github.com/apache/iceberg/pull/9860
# What is the problem? Currently the `table.newAppend()` API expects users to provide Datafiles with the same PartitionSpec via `.appendFile()`. Failure to do so [raises](https://github.com/apache/iceberg/blob/1a4f23bc0e6cda520ca815f2a245f5f21bfbc24f/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L258) a `ValidationException("Invalid data file, expected spec id: %d", dataSpec.specId())`. CMIIW but the Iceberg spec doesn't seem to impose any such restriction. The only related restriction I could find was in the [manifests section](https://iceberg.apache.org/spec/#manifests) which says: > A manifest stores files for a single partition spec. We can easily work around this by writing multiple manifests, one for each spec for which files are being appended. # Why is this change needed/valuable? In the iceberg-kafka-connect project, we've seen that when users evolve the PartitionSpec of the table, often they'll end up in a situation where Datafiles with different PartitionSpecs might be inflight and committing these DataFiles together as part of the same snapshot becomes impossible due to the aforementioned `ValidationException`. While we could work around this by committing DataFiles with different PartitionSpecs as separate snapshots, this makes it complex for us to correctly associate valuable (watermarking) metadata with each snapshot in the snapshot properties. In addition, it makes the table snapshot history unnecessarily longer. It would be more ideal if we could avoid these issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org