rdblue commented on code in PR #6323:
URL: https://github.com/apache/iceberg/pull/6323#discussion_r1209504044


##########
python/pyiceberg/table/__init__.py:
##########
@@ -69,21 +72,288 @@
     import ray
     from duckdb import DuckDBPyConnection
 
+    from pyiceberg.catalog import Catalog
 
 ALWAYS_TRUE = AlwaysTrue()
 
 
+class TableUpdates:

Review Comment:
   I think the rest of this PR is about ready, my remaining concern is this 
API. I don't think that we want to expose methods that directly set schema, 
partitioning, or other table metadata. Schema is a great example because people 
are going to really mess up the field IDs. We don't want to encourage that 
through the API.
   
   For example, this is going to break:
   
   ```python
   catalog = load_catalog(...)
   
   schema1 = Schema(
       NestedField(field_id=0, name='id', field_type=LongType()),
       schema_id=1
   )
   schema2 = Schema(
       NestedField(field_id=0, name='id', field_type=LongType()),
       NestedField(field_id=1, name='data', field_type=StringType()),
       schema_id=1
   )
   
   table = catalog.create_table(identifier='test', schema=schema1)
   table.set_schema(schema2).commit()
   ```
   
   The table schema is incompatible with the second schema because the field 
IDs are reassigned.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to