rdblue commented on code in PR #6323:
URL: https://github.com/apache/iceberg/pull/6323#discussion_r1217003884


##########
python/pyiceberg/table/__init__.py:
##########
@@ -69,21 +72,288 @@
     import ray
     from duckdb import DuckDBPyConnection
 
+    from pyiceberg.catalog import Catalog
 
 ALWAYS_TRUE = AlwaysTrue()
 
 
+class TableUpdates:

Review Comment:
   > Would it be possible to add more tests later once we get the schema, 
partitioning, and sorting figured out?
   
   Yes! I think we should limit this to setting and removing properties. Then 
we can handle the schema, spec, and write order changes later.
   
   > However, this gets awkward for lists, maps, and structs.
   
   We already handle these cases in the Java API, using the same approach that 
you outlined above with methods to make individual changes to the schema that 
correspond to SQL DDL operations (e.g. ADD COLUMN). There are [thorough 
tests](https://github.com/apache/iceberg/blob/master/core/src/test/java/org/apache/iceberg/TestSchemaUpdate.java#L669-L737)
 as well.
   
   > we should do checks on the server side that forbid this kind of operation
   
   I agree that we should do as much as we can, but the client is responsible 
for this logic for non-REST catalogs. Plus, even in the REST catalog we can 
only validate using `last-column-id`. There are plenty of changes that are 
valid, but not what the user intended to do because they don't understand IDs.
   
   For example, if I modify my example above slightly:
   
   ```python
   catalog = load_catalog(...)
   
   schema1 = Schema(
       NestedField(field_id=0, name='id', field_type=IntegerType()),
       NestedField(field_id=1, name='data', field_type=LongType()),
       schema_id=1
   )
   schema2 = Schema(
       NestedField(field_id=1, name='data', field_type=LongType()),
       schema_id=1
   )
   
   table = catalog.create_table(identifier='test', schema=schema1)
   table.set_schema(schema2).commit()
   ```
   
   It's clear that the user's intent was to drop the `id` field. But what they 
actually requested was to drop `data` (id 2 assigned during table create), 
rename `id` to `data` (id 1), and promote `id` from int to long. All of those 
changes are perfectly valid, but they're not what the user was trying to do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to