syun64 commented on code in PR #810: URL: https://github.com/apache/iceberg-python/pull/810#discussion_r1634878236
########## pyiceberg/table/__init__.py: ########## @@ -474,6 +474,26 @@ def add_files(self, file_paths: List[str], snapshot_properties: Dict[str, str] = for data_file in data_files: update_snapshot.append_data_file(data_file) + def add_files_overwrite(self, file_paths: List[str], snapshot_properties: Dict[str, str] = EMPTY_DICT) -> None: + """ + Shorthand API for adding files as data files and overwriting the table. + + Args: + file_paths: The list of full file paths to be added as data files to the table + snapshot_properties: Custom properties to be added to the snapshot summary + + Raises: + FileNotFoundError: If the file does not exist. + """ + if self._table.name_mapping() is None: + self.set_properties(**{TableProperties.DEFAULT_NAME_MAPPING: self._table.schema().name_mapping.model_dump_json()}) + with self.update_snapshot(snapshot_properties=snapshot_properties).overwrite() as update_snapshot: Review Comment: There's an open [PR to implement partial deletes](https://github.com/apache/iceberg-python/pull/569/files#diff-23e8153e0fd497a9212215bd2067068f3b56fa071770c7ef326db3d3d03cee9bR474-R476) that I think we should leverage for this API. Similar to the proposed implementation of overwrite, instead of calling overwrite() I think we'd want to invoke `self.delete()` with the `overwrite_filter` to delete all or partial data and rewrite them, and then add the data files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org