rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1451832048
########## pyiceberg/table/__init__.py: ########## @@ -831,6 +887,46 @@ def history(self) -> List[SnapshotLogEntry]: def update_schema(self, allow_incompatible_changes: bool = False, case_sensitive: bool = True) -> UpdateSchema: return UpdateSchema(self, allow_incompatible_changes=allow_incompatible_changes, case_sensitive=case_sensitive) + def append(self, df: pa.Table) -> None: + if len(self.spec().fields) > 0: + raise ValueError("Cannot write to partitioned tables") + + snapshot_id = self.new_snapshot_id() + + data_files = _dataframe_to_data_files(self, df=df) + merge = _MergeAppend(operation=Operation.APPEND, table=self, snapshot_id=snapshot_id) + for data_file in data_files: + merge.append_datafile(data_file) + + if current_snapshot := self.current_snapshot(): + for manifest in current_snapshot.manifests(io=self.io): + for entry in manifest.fetch_manifest_entry(io=self.io): + merge.append_datafile(entry.data_file, added=False) Review Comment: I think that the `_MergeAppend` should be responsible for handling the existing data. It doesn't make sense to me that an append operation would require the caller to re-add the data files that were in the table already. That puts too much on the caller, which should just add files and not worry about existing data or state. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org