Fokko commented on code in PR #363:
URL: https://github.com/apache/iceberg-python/pull/363#discussion_r1478849196


##########
pyiceberg/table/__init__.py:
##########
@@ -2411,11 +2428,29 @@ def _fetch_existing_manifests() -> List[ManifestFile]:
 
         executor = ExecutorFactory.get_or_create()
 
-        added_manifests = executor.submit(_write_added_manifest)
-        delete_manifests = executor.submit(_write_delete_manifest)
-        existing_manifests = executor.submit(_fetch_existing_manifests)
+        added_manifests_future = executor.submit(_write_added_manifest)
+        delete_manifests_future = executor.submit(_write_delete_manifest)
+        existing_manifests_future = executor.submit(_fetch_existing_manifests)
+
+        added_manifests = added_manifests_future.result()
+        delete_manifests = delete_manifests_future.result()
+        existing_manifests = existing_manifests_future.result()
+        unmerged_data_manifests = (
+            added_manifests
+            + delete_manifests
+            + [manifest for manifest in existing_manifests if manifest.content 
== ManifestContent.DATA]
+        )
+        # TODO: need to re-consider the name here: manifest containing 
positional deletes and manifest containing deleted entries
+        unmerged_deletes_manifests = [manifest for manifest in 
existing_manifests if manifest.content == ManifestContent.DELETES]
+
+        data_manifest_merge_manager = ManifestMergeManager(

Review Comment:
   We're changing the append operation from a fast-append to a regular append 
when it hits a threshold. I would be more comfortable with keeping the 
compaction separate. This way we know that an append/overwrite is always fast 
and in constant time. For example, if you have a process that appends data, you 
know how fast it will run (actually it is a function of the number of 
manifests).



##########
pyiceberg/table/__init__.py:
##########
@@ -2304,6 +2313,12 @@ def __init__(self, operation: Operation, table: Table) 
-> None:
         self._parent_snapshot_id = snapshot.snapshot_id if (snapshot := 
self._table.current_snapshot()) else None
         self._added_data_files = []
         self._commit_uuid = uuid.uuid4()
+        self._manifest_num_counter = itertools.count(0)
+
+        # TODO: consider if we can add a TableProperties class to handle these 
like Java

Review Comment:
   Yes, we really need that :3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to