Re: [PR] Support merge manifests on writes (MergeAppend) [iceberg-python]

via GitHub Tue, 26 Mar 2024 00:35:47 -0700


Fokko commented on code in PR #363:
URL: https://github.com/apache/iceberg-python/pull/363#discussion_r1538708807



##########
pyiceberg/table/__init__.py:
##########
@@ -1091,7 +1111,7 @@ def append(self, df: pa.Table) -> None:
         _check_schema(self.schema(), other_schema=df.schema)
 
         with self.transaction() as txn:
-            with txn.update_snapshot().fast_append() as update_snapshot:
+            with txn.update_snapshot().merge_append() as update_snapshot:

Review Comment:
   @syun64 Could you elaborate on the motivation to pick merge-append over a 
fast-append? For Java, it is for historical reasons since the fast-append was 
added later. The fast-append creates more metadata but also has:
   
   - Takes less time to commit, since it doesn't rewrite any existing 
manifests. This reduces the chances of having a conflict.
   - The time it takes to commit is more predictable and fairly constant to the 
number of data files that are written.
   - When you static-overwrite partitions as you do in your typical ETL, it 
will speed up the deletes since it can just drop a whole manifest that the 
previous fast-append has produced.
   
   The main downside is when you do full-table scans that you need to evaluate 
more metadata.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Support merge manifests on writes (MergeAppend) [iceberg-python]

Reply via email to