Re: [PR] InMemory Catalog Implementation [iceberg-python]

via GitHub Sun, 21 Jan 2024 22:09:59 -0800


HonahX commented on code in PR #289:
URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1461380205



##########
pyiceberg/table/__init__.py:
##########
@@ -504,6 +504,12 @@ def _(update: AddSchemaUpdate, base_metadata: 
TableMetadata, context: _TableMeta
     if update.last_column_id < base_metadata.last_column_id:
         raise ValueError(f"Invalid last column id {update.last_column_id}, 
must be >= {base_metadata.last_column_id}")
 
+    # `update.schema_.schema_id` should be the last_schema_id + 1
+    last_schema_id = max(schema.schema_id for schema in base_metadata.schemas)
+    next_schema_id = last_schema_id + 1
+    new_schema = update.schema_.model_copy(update={"schema_id": 
next_schema_id})
+    update = update.model_copy(update={"schema_": new_schema})

Review Comment:
   The `AddSchemaUpdate` should contain a schema with changes applied and 
schema-id incremented. In practice, we trust `update_schema` API to give us the 
correct one, as I mentioned in this 
[comment](https://github.com/apache/iceberg-python/issues/290#issuecomment-1903304501).
 
   
   Since this PR already updated the `_commit_table` for InMemory Catalog, I 
think we do not need to increment the schema-id again here



##########
tests/catalog/test_base.py:
##########
@@ -572,6 +379,11 @@ def test_commit_table(catalog: InMemoryCatalog) -> None:
         NestedField(4, "add", LongType()),
     )

Review Comment:
   ```suggestion
       schema-id=1
       )
   ```
   



##########
pyiceberg/catalog/in_memory.py:
##########
@@ -0,0 +1,222 @@
+import uuid
+from typing import (
+    Dict,
+    List,
+    Optional,
+    Set,
+    Union,
+)
+
+from pyiceberg.catalog import (
+    Catalog,
+    Identifier,
+    Properties,
+    PropertiesUpdateSummary,
+)
+from pyiceberg.exceptions import (
+    NamespaceAlreadyExistsError,
+    NamespaceNotEmptyError,
+    NoSuchNamespaceError,
+    NoSuchTableError,
+    TableAlreadyExistsError,
+)
+from pyiceberg.io import WAREHOUSE
+from pyiceberg.partitioning import UNPARTITIONED_PARTITION_SPEC, PartitionSpec
+from pyiceberg.schema import Schema
+from pyiceberg.table import (
+    CommitTableRequest,
+    CommitTableResponse,
+    Table,
+    update_table_metadata,
+)
+from pyiceberg.table.metadata import new_table_metadata
+from pyiceberg.table.sorting import UNSORTED_SORT_ORDER, SortOrder
+from pyiceberg.typedef import EMPTY_DICT
+
+DEFAULT_WAREHOUSE_LOCATION = "file:///tmp/warehouse"
+
+
+class InMemoryCatalog(Catalog):
+    """An in-memory catalog implementation."""

Review Comment:
   Shall we also indicate that this should be used in test, demo, and 
playground but not in production? like this 
[comment](https://github.com/apache/iceberg/blob/43fce1b56bc8364908941eee8a4b5a9ccca6c7fe/core/src/main/java/org/apache/iceberg/inmemory/InMemoryCatalog.java#L55-L59)
 in java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] InMemory Catalog Implementation [iceberg-python]

Reply via email to