fqaiser94 opened a new pull request, #9860:
URL: https://github.com/apache/iceberg/pull/9860

   # What is the problem?
   
   Currently the `table.newAppend()` API expects users to provide Datafiles 
with the same PartitionSpec via `.appendFile()`. 
   Failure to do so 
[raises](https://github.com/apache/iceberg/blob/1a4f23bc0e6cda520ca815f2a245f5f21bfbc24f/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L258)
 a `ValidationException("Invalid data file, expected spec id: %d", 
dataSpec.specId())`. 
   
   CMIIW but the Iceberg spec doesn't seem to impose any such restriction. 
   The only related restriction I could find was in the [manifests 
section](https://iceberg.apache.org/spec/#manifests) which says: 
   
   > A manifest stores files for a single partition spec. 
   
   We can easily work around this by writing multiple manifests, one for each 
spec for which files are being appended. 
   
   # Why is this change needed/valuable? 
   In the iceberg-kafka-connect project, we've seen that when users evolve the 
PartitionSpec of the table, often they'll end up in a situation where Datafiles 
with different PartitionSpecs might be inflight and committing these DataFiles 
together as part of the same snapshot becomes impossible due to the 
aforementioned `ValidationException`. 
   
   While we could work around this by committing DataFiles with different 
PartitionSpecs as separate snapshots, this makes it complex for us to correctly 
associate valuable (watermarking) metadata with each snapshot in the snapshot 
properties. In addition, it makes the table snapshot history unnecessarily 
longer. It would be more ideal if we could avoid these issues. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to