Fokko opened a new pull request, #12161:
URL: https://github.com/apache/iceberg/pull/12161

   I was looking at adding support for `source-ids` in PyIceberg, but noticed 
that it was also still lacking for Java.
   
   I've noticed that `source-ids` are also backported to V1 and V2 tables, 
which surprised me since this might break existing V2 implementations that are 
unaware of the `source-ids`.
   
   This PR reconsiders https://github.com/apache/iceberg/pull/9661
   And more specifically: 
https://lists.apache.org/thread/9opgkrpqhzp3nl8hdohgnk1m1zxnxmq0
   
   It would be good to only allow multi-arg transforms from V3 onwards, and 
avoid having some implementations support this by setting a flag. Other 
implementations might not be aware of this implementation and drop the 2nd 
argument onward:
   
   ```json
   {
     "source-id": 19,
     "source-ids": [19, 25],
     "field-id": 1000,
     "name": "ts_bucket",
     "transform": "bucket"
   }
   ```
   
   The V2 implementation that is unaware of the `source-ids` (PyIceberg, 
Iceberg-Rust and others), would produce:
   
   ```json
   {
     "source-id": 19,
     "field-id": 1000,
     "name": "ts_bucket",
     "transform": "bucket"
   }
   ```
   
   Breaking the partitioning silently 😱 
   
   cc @rdblue @szehon-ho @advancedxy @jbonofre 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to