Fokko commented on code in PR #363: URL: https://github.com/apache/iceberg-python/pull/363#discussion_r1538708807
########## pyiceberg/table/__init__.py: ########## @@ -1091,7 +1111,7 @@ def append(self, df: pa.Table) -> None: _check_schema(self.schema(), other_schema=df.schema) with self.transaction() as txn: - with txn.update_snapshot().fast_append() as update_snapshot: + with txn.update_snapshot().merge_append() as update_snapshot: Review Comment: @syun64 Could you elaborate on the motivation to pick merge-append over a fast-append? For Java, it is for historical reasons since the fast-append was added later. The fast-append creates more metadata but also has: - Takes less time to commit, since it doesn't rewrite any existing manifests. This reduces the chances of having a conflict. - The time it takes to commit is more predictable and fairly constant to the number of data files that are written. - When you static-overwrite partitions as you do in your typical ETL, it will speed up the deletes since it can just drop a whole manifest that the previous fast-append has produced. The main downside is when you do full-table scans that you need to evaluate more metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org