HonahX commented on code in PR #471:
URL: https://github.com/apache/iceberg-python/pull/471#discussion_r1503532245


##########
pyiceberg/table/__init__.py:
##########
@@ -219,68 +220,41 @@ def property_as_int(properties: Dict[str, str], 
property_name: str, default: Opt
 
 class Transaction:
     _table: Table
+    _table_metadata: TableMetadata
+    _autocommit: bool
     _updates: Tuple[TableUpdate, ...]
     _requirements: Tuple[TableRequirement, ...]
 
-    def __init__(
-        self,
-        table: Table,
-        actions: Optional[Tuple[TableUpdate, ...]] = None,
-        requirements: Optional[Tuple[TableRequirement, ...]] = None,
-    ):
+    def __init__(self, table: Table, autocommit: bool = False):
         self._table = table
-        self._updates = actions or ()
-        self._requirements = requirements or ()
+        self._table_metadata = table.metadata
+        self._autocommit = autocommit

Review Comment:
   [Minor] May be `auto_commit` is more consistent with other names in 
pyiceberg?
   
   Also, shall we add doc to explain the usage of this new option. We could 
probably update the doc when we finishing all modifications of Transaction 
related code.



##########
pyiceberg/table/__init__.py:
##########
@@ -1512,8 +1432,33 @@ class Move:
     other_field_id: Optional[int] = None
 
 
-class UpdateSchema:
-    _table: Optional[Table]
+U = TypeVar('U')
+
+
+class TableMetadataUpdate(ABC, Generic[U]):
+    _transaction: Transaction
+    _table_metadata: TableMetadata
+
+    def __init__(self, transaction: Transaction, table_metadata: 
TableMetadata) -> None:
+        self._transaction = transaction
+        self._table_metadata = table_metadata
+
+    @abstractmethod
+    def _commit(self) -> Tuple[Tuple[TableUpdate, ...], 
Tuple[TableRequirement, ...]]: ...

Review Comment:
   Shall we create a new type for the return value? Like
   ```python
   UpdatesAndRequirements = Tuple[Tuple[TableUpdate, ...], 
Tuple[TableRequirement, ...]]
   ```



##########
pyiceberg/io/pyarrow.py:
##########
@@ -1720,7 +1721,7 @@ def fill_parquet_file_metadata(
     data_file.split_offsets = split_offsets
 
 
-def write_file(table: Table, tasks: Iterator[WriteTask], file_schema: 
Optional[Schema] = None) -> Iterator[DataFile]:
+def write_file(io: FileIO, table_metadata: TableMetadata, tasks: 
Iterator[WriteTask]) -> Iterator[DataFile]:

Review Comment:
   Just to confirm my understanding: the `file_schema` parameter was previously 
added in https://github.com/apache/iceberg-python/pull/446 for test purpose and 
it is no longer required?



##########
pyiceberg/table/__init__.py:
##########
@@ -219,68 +220,41 @@ def property_as_int(properties: Dict[str, str], 
property_name: str, default: Opt
 
 class Transaction:
     _table: Table
+    _table_metadata: TableMetadata
+    _autocommit: bool
     _updates: Tuple[TableUpdate, ...]
     _requirements: Tuple[TableRequirement, ...]
 
-    def __init__(
-        self,
-        table: Table,
-        actions: Optional[Tuple[TableUpdate, ...]] = None,
-        requirements: Optional[Tuple[TableRequirement, ...]] = None,
-    ):
+    def __init__(self, table: Table, autocommit: bool = False):
         self._table = table
-        self._updates = actions or ()
-        self._requirements = requirements or ()
+        self._table_metadata = table.metadata
+        self._autocommit = autocommit
+        self._updates = ()
+        self._requirements = ()
 
     def __enter__(self) -> Transaction:
         """Start a transaction to update the table."""
         return self
 
     def __exit__(self, _: Any, value: Any, traceback: Any) -> None:
         """Close and commit the transaction."""
-        fresh_table = self.commit_transaction()
-        # Update the new data in place
-        self._table.metadata = fresh_table.metadata
-        self._table.metadata_location = fresh_table.metadata_location
-
-    def _append_updates(self, *new_updates: TableUpdate) -> Transaction:
-        """Append updates to the set of staged updates.
-
-        Args:
-            *new_updates: Any new updates.
+        self.commit_transaction()
 
-        Raises:
-            ValueError: When the type of update is not unique.
+    def _apply(self, updates: Tuple[TableUpdate, ...], requirements: 
Tuple[TableRequirement, ...] = ()) -> Transaction:
+        """Check if the requirements are met, and applies the updates to the 
metadata."""
+        for requirement in requirements:
+            requirement.validate(self._table_metadata)
 
-        Returns:
-            Transaction object with the new updates appended.
-        """
-        for new_update in new_updates:
-            # explicitly get type of new_update as new_update is an 
instantiated class
-            type_new_update = type(new_update)
-            if any(isinstance(update, type_new_update) for update in 
self._updates):
-                raise ValueError(f"Updates in a single commit need to be 
unique, duplicate: {type_new_update}")
-        self._updates = self._updates + new_updates
-        return self
+        self._updates += updates
+        self._requirements += requirements
 
-    def _append_requirements(self, *new_requirements: TableRequirement) -> 
Transaction:
-        """Append requirements to the set of staged requirements.
+        self._table_metadata = update_table_metadata(self._table_metadata, 
updates)

Review Comment:
   Now there will be 2 updated metadata after the transaction commits: 
`_table_metadata` in the transaction and the new metadata of the table. I think 
it may be good to add a test to ensure that the 2 metadata are the same after 
the transaction. This could help verify the correctness of the internal 
`_table_metadata` WDYT?



##########
pyiceberg/table/__init__.py:
##########
@@ -1512,8 +1432,33 @@ class Move:
     other_field_id: Optional[int] = None
 
 
-class UpdateSchema:
-    _table: Optional[Table]
+U = TypeVar('U')
+
+
+class TableMetadataUpdate(ABC, Generic[U]):

Review Comment:
   [Minor] I feel like `UpdateTableMetadata` may be more aligned with its child 
classes like `UpdateSchema`, `UpdateSnapshot`, `UpdateSpec`, etc. But this is 
totally my personal preference.



##########
pyiceberg/table/__init__.py:
##########
@@ -1512,8 +1432,33 @@ class Move:
     other_field_id: Optional[int] = None
 
 
-class UpdateSchema:
-    _table: Optional[Table]
+U = TypeVar('U')
+
+
+class TableMetadataUpdate(ABC, Generic[U]):
+    _transaction: Transaction
+    _table_metadata: TableMetadata
+
+    def __init__(self, transaction: Transaction, table_metadata: 
TableMetadata) -> None:
+        self._transaction = transaction
+        self._table_metadata = table_metadata

Review Comment:
   [Question] I am a little bit confused here. Why do we have both 
`transaction` and `table_metadata` instead of `transaction` only since table 
metadata is also stored in the transaction?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to