Re: [PR] Support `replace_table` and `replace_table_transaction` [iceberg-python]

via GitHub Mon, 18 May 2026 12:28:21 -0700


smaheshwar-pltr commented on code in PR #3220:
URL: https://github.com/apache/iceberg-python/pull/3220#discussion_r3261177472



##########
pyiceberg/catalog/__init__.py:
##########
@@ -323,6 +328,20 @@ def delete_data_files(io: FileIO, manifests_to_delete: 
list[ManifestFile]) -> No
                 deleted_files[path] = True
 
 
+def _raise_if_view_exists(catalog: Catalog, identifier: str | Identifier) -> 
None:

Review Comment:
   Review: Do we need to be introducing this now? The Java implementation does 
this in other methods too right so maybe it's worth not doing this here but 
separating into a follow-up where we port this over everywhere?



##########
pyiceberg/catalog/__init__.py:
##########
@@ -444,6 +463,136 @@ def create_table_if_not_exists(
         except TableAlreadyExistsError:
             return self.load_table(identifier)
 
+    def replace_table(
+        self,
+        identifier: str | Identifier,
+        schema: Schema | pa.Schema,
+        location: str | None = None,
+        partition_spec: PartitionSpec = UNPARTITIONED_PARTITION_SPEC,
+        sort_order: SortOrder = UNSORTED_SORT_ORDER,
+        properties: Properties = EMPTY_DICT,
+    ) -> Table:
+        """Atomically replace a table's schema, spec, sort order, location, 
and properties.
+
+        The table UUID and history (snapshots, schemas, specs, sort orders) 
are preserved.
+        The current snapshot is cleared (main branch ref is removed).
+
+        Args:
+            identifier (str | Identifier): Table identifier.
+            schema (Schema): New table schema.
+            location (str | None): New table location. Defaults to the 
existing location.
+            partition_spec (PartitionSpec): New partition spec.
+            sort_order (SortOrder): New sort order.
+            properties (Properties): Properties to apply. Merged on top of the 
existing
+                table properties: keys present here override existing values; 
existing keys
+                not present here are preserved. To remove a property, follow 
up with a
+                transaction that removes it explicitly.
+
+        Returns:
+            Table: the replaced table instance.
+
+        Raises:
+            NoSuchTableError: If the table does not exist.
+            TableAlreadyExistsError: If a view exists at the same identifier.
+        """
+        return self.replace_table_transaction(
+            identifier, schema, location, partition_spec, sort_order, 
properties
+        ).commit_transaction()
+
+    def replace_table_transaction(

Review Comment:
   Review: I don't understand this, can you elaborate or ditch the comment if 
not helpful?



##########
tests/test_schema.py:
##########
@@ -1815,3 +1816,77 @@ def 
test_check_schema_compatible_optional_map_field_present() -> None:
     )
     # Should not raise - schemas match
     _check_schema_compatible(requested_schema, provided_schema)
+
+
[email protected](
+    "new_fields, expected_ids, expected_last_col_id",
+    [
+        # All columns reused by name: IDs come from base, last_column_id 
unchanged.
+        ([("id", IntegerType()), ("data", StringType())], [1, 2], 2),
+        # Mix of reused and new: new column gets ID above last_column_id.
+        ([("id", IntegerType()), ("data", StringType()), ("new_col", 
BooleanType())], [1, 2, 3], 3),
+        # No column names match: all fresh IDs starting from last_column_id + 
1.
+        ([("x", IntegerType()), ("y", IntegerType())], [3, 4], 4),
+    ],
+    ids=[
+        "all-reused-keeps-last-col-id",
+        "new-field-bumps-last-col-id",
+        "no-name-overlap-bumps-from-base",
+    ],
+)
+def test_assign_fresh_schema_ids_for_replace_primitive_fields(
+    new_fields: list[tuple[str, IcebergType]], expected_ids: list[int], 
expected_last_col_id: int
+) -> None:
+    """Replace schema field IDs are reused from the base schema by name; new 
fields get IDs above last_column_id."""
+    base_schema = Schema(
+        NestedField(field_id=1, name="id", field_type=IntegerType(), 
required=False),
+        NestedField(field_id=2, name="data", field_type=StringType(), 
required=False),
+    )
+    new_schema = Schema(
+        *(
+            NestedField(field_id=10 * (i + 1), name=name, 
field_type=field_type, required=False)
+            for i, (name, field_type) in enumerate(new_fields)
+        )
+    )
+    fresh, last_col_id = assign_fresh_schema_ids_for_replace(new_schema, 
base_schema, 2)
+    assert [f.field_id for f in fresh.fields] == expected_ids
+    assert last_col_id == expected_last_col_id
+
+
+def test_assign_fresh_schema_ids_for_replace_with_nested_struct() -> None:
+    """Test that nested struct field IDs are reused by full path name."""
+    base_schema = Schema(
+        NestedField(field_id=1, name="id", field_type=IntegerType(), 
required=False),
+        NestedField(
+            field_id=2,
+            name="location",
+            field_type=StructType(
+                NestedField(field_id=3, name="lat", field_type=FloatType(), 
required=False),
+                NestedField(field_id=4, name="lon", field_type=FloatType(), 
required=False),
+            ),
+            required=False,
+        ),
+    )
+    new_schema = Schema(
+        NestedField(field_id=10, name="id", field_type=IntegerType(), 
required=False),
+        NestedField(
+            field_id=20,
+            name="location",
+            field_type=StructType(
+                NestedField(field_id=30, name="lat", field_type=FloatType(), 
required=False),
+                NestedField(field_id=40, name="lon", field_type=FloatType(), 
required=False),
+                NestedField(field_id=50, name="alt", field_type=FloatType(), 
required=False),
+            ),
+            required=False,
+        ),
+    )
+    fresh, last_col_id = assign_fresh_schema_ids_for_replace(new_schema, 
base_schema, 4)
+    assert fresh.fields[0].field_id == 1  # id reused
+    assert fresh.fields[1].field_id == 2  # location reused
+    loc_fields = fresh.fields[1].field_type.fields
+    assert loc_fields[0].field_id == 3  # location.lat reused
+    assert loc_fields[1].field_id == 4  # location.lon reused
+    assert loc_fields[2].field_id == 5  # location.alt is new
+    assert last_col_id == 5
+
+

Review Comment:
   Review: this looks like it'll break formatting. Please format the code and 
check format / lint passes locally



##########
mkdocs/docs/api.md:
##########
@@ -185,6 +185,45 @@ with 
catalog.create_table_transaction(identifier="docs_example.bids", schema=sch
     txn.set_properties(test_a="test_aa", test_b="test_b", test_c="test_c")
 ```
 
+## Replace a table
+
+Atomically replace an existing table's schema, partition spec, sort order, 
location, and properties. The table UUID and history (snapshots, schemas, 
specs, sort orders, metadata log) are preserved; the current snapshot is 
cleared (the `main` branch ref is removed). Use this when you want to redefine 
the table's metadata; pair it with `replace_table_transaction` to atomically 
write new data alongside the metadata change (RTAS-style).

Review Comment:
   Review:
   
   ```suggestion
   Atomically replace an existing table's schema, partition spec, sort order, 
location, and properties. The table UUID and history (snapshots, schemas, 
specs, sort orders, metadata log) are preserved; the current snapshot is 
cleared (the `main` branch ref is removed). `replace_table` redefines the table 
in this way; `replace_table_transaction` allows for new data to be written 
alongside this change to permit RTAS (replace-table-as-select) workflows.
   ```
   
   or something, you'll need to imrpove my wording 



##########
mkdocs/docs/api.md:
##########
@@ -185,6 +185,45 @@ with 
catalog.create_table_transaction(identifier="docs_example.bids", schema=sch
     txn.set_properties(test_a="test_aa", test_b="test_b", test_c="test_c")
 ```
 
+## Replace a table
+
+Atomically replace an existing table's schema, partition spec, sort order, 
location, and properties. The table UUID and history (snapshots, schemas, 
specs, sort orders, metadata log) are preserved; the current snapshot is 
cleared (the `main` branch ref is removed). Use this when you want to redefine 
the table's metadata; pair it with `replace_table_transaction` to atomically 
write new data alongside the metadata change (RTAS-style).
+
+```python
+from pyiceberg.schema import Schema
+from pyiceberg.types import NestedField, LongType, StringType, BooleanType
+
+new_schema = Schema(
+    NestedField(field_id=1, name="datetime", field_type=LongType(), 
required=False),
+    NestedField(field_id=2, name="symbol", field_type=StringType(), 
required=False),
+    NestedField(field_id=3, name="active", field_type=BooleanType(), 
required=False),
+)
+catalog.replace_table(
+    identifier="docs_example.bids",
+    schema=new_schema,
+)
+```
+
+Field IDs from columns whose names appear in the previous schema are reused, 
so existing data files remain readable when the new schema is a compatible 
superset. New columns get fresh IDs above `last-column-id`.
+
+Properties passed to `replace_table` are **merged** with the existing table 
properties (your values override; existing keys you don't pass are preserved). 
To remove a property as part of the replace, use `replace_table_transaction` 
and remove it explicitly within the transaction.
+
+Use `replace_table_transaction` to stage additional changes (writes, property 
updates, schema evolution) before committing — for example, swap the schema and 
write new data atomically:
+
+```python
+with catalog.replace_table_transaction(identifier="docs_example.bids", 
schema=new_schema) as txn:
+    with txn.update_snapshot().fast_append() as snap:
+        for data_file in 
_dataframe_to_data_files(table_metadata=txn.table_metadata, df=df, 
io=txn._table.io):
+            snap.append_data_file(data_file)
+    txn.set_properties(write_replaced_at="2026-04-19T00:00:00Z")

Review Comment:
   Review: this + tests feels too verbose for RTAS, no? Surely `txn.append` is 
just what we want here? Can we not make those changes, why are we being this 
verbose?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Support `replace_table` and `replace_table_transaction` [iceberg-python]

Reply via email to