rdblue commented on code in PR #6323:
URL: https://github.com/apache/iceberg/pull/6323#discussion_r1217003884
##########
python/pyiceberg/table/__init__.py:
##########
@@ -69,21 +72,288 @@
import ray
from duckdb import DuckDBPyConnection
+ from pyiceberg.catalog import Catalog
ALWAYS_TRUE = AlwaysTrue()
+class TableUpdates:
Review Comment:
> Would it be possible to add more tests later once we get the schema,
partitioning, and sorting figured out?
Yes! I think we should limit this to setting and removing properties. Then
we can handle the schema, spec, and write order changes later.
> However, this gets awkward for lists, maps, and structs.
We already handle these cases in the Java API, using the same approach that
you outlined above with methods to make individual changes to the schema that
correspond to SQL DDL operations (e.g. ADD COLUMN). There are [thorough
tests](https://github.com/apache/iceberg/blob/master/core/src/test/java/org/apache/iceberg/TestSchemaUpdate.java#L669-L737)
as well.
> we should do checks on the server side that forbid this kind of operation
I agree that we should do as much as we can, but the client is responsible
for this logic for non-REST catalogs. Plus, even in the REST catalog we can
only validate using `last-column-id`. There are plenty of changes that are
valid, but not what the user intended to do because they don't understand IDs.
For example, if I modify my example above slightly:
```python
catalog = load_catalog(...)
schema1 = Schema(
NestedField(field_id=0, name='id', field_type=IntegerType()),
NestedField(field_id=1, name='data', field_type=LongType()),
schema_id=1
)
schema2 = Schema(
NestedField(field_id=1, name='data', field_type=LongType()),
schema_id=1
)
table = catalog.create_table(identifier='test', schema=schema1)
table.set_schema(schema2).commit()
```
It's clear that the user's intent was to drop the `id` field. But what they
actually requested was to drop `data` (id 2 assigned during table create),
rename `id` to `data` (id 1), and promote `id` from int to long. All of those
changes are perfectly valid, but they're not what the user was trying to do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]