syun64 commented on code in PR #569:
URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1598626280
##########
pyiceberg/table/__init__.py:
##########
@@ -443,6 +471,74 @@ def overwrite(
for data_file in data_files:
update_snapshot.append_data_file(data_file)
+ def delete(self, delete_filter: Union[str, BooleanExpression],
snapshot_properties: Dict[str, str] = EMPTY_DICT) -> None:
+ if (
+ self.table_metadata.properties.get(TableProperties.DELETE_MODE,
TableProperties.DELETE_MODE_COPY_ON_WRITE)
+ == TableProperties.DELETE_MODE_MERGE_ON_READ
+ ):
+ warnings.warn("Merge on read is not yet supported, falling back to
copy-on-write")
+
+ if isinstance(delete_filter, str):
+ delete_filter = _parse_row_filter(delete_filter)
+
+ with
self.update_snapshot(snapshot_properties=snapshot_properties).delete() as
delete_snapshot:
+ delete_snapshot.delete_by_predicate(delete_filter)
+
+ # Check if there are any files that require an actual rewrite of a
data file
+ if delete_snapshot.rewrites_needed is True:
+ bound_delete_filter = bind(self._table.schema(), delete_filter,
case_sensitive=True)
+ preserve_row_filter =
expression_to_pyarrow(Not(bound_delete_filter))
+ commit_uuid = uuid.uuid4()
Review Comment:
Is the `commit_uuid` intended to be unique per snapshot, or per transaction?
Here, it looks like we are introducing a commit uuid to the delete operation
that is executed as two separate snapshots, which isn't being shared with the
append snapshot that follows after it. Could we just generate the commit_uuid
on instantiation of the Transaction class?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]