Fokko commented on code in PR #363:
URL: https://github.com/apache/iceberg-python/pull/363#discussion_r1478849196
##########
pyiceberg/table/__init__.py:
##########
@@ -2411,11 +2428,29 @@ def _fetch_existing_manifests() -> List[ManifestFile]:
executor = ExecutorFactory.get_or_create()
- added_manifests = executor.submit(_write_added_manifest)
- delete_manifests = executor.submit(_write_delete_manifest)
- existing_manifests = executor.submit(_fetch_existing_manifests)
+ added_manifests_future = executor.submit(_write_added_manifest)
+ delete_manifests_future = executor.submit(_write_delete_manifest)
+ existing_manifests_future = executor.submit(_fetch_existing_manifests)
+
+ added_manifests = added_manifests_future.result()
+ delete_manifests = delete_manifests_future.result()
+ existing_manifests = existing_manifests_future.result()
+ unmerged_data_manifests = (
+ added_manifests
+ + delete_manifests
+ + [manifest for manifest in existing_manifests if manifest.content
== ManifestContent.DATA]
+ )
+ # TODO: need to re-consider the name here: manifest containing
positional deletes and manifest containing deleted entries
+ unmerged_deletes_manifests = [manifest for manifest in
existing_manifests if manifest.content == ManifestContent.DELETES]
+
+ data_manifest_merge_manager = ManifestMergeManager(
Review Comment:
We're changing the append operation from a fast-append to a regular append
when it hits a threshold. I would be more comfortable with keeping the
compaction separate. This way we know that an append/overwrite is always fast
and in constant time. For example, if you have a process that appends data, you
know how fast it will run (actually it is a function of the number of
manifests).
##########
pyiceberg/table/__init__.py:
##########
@@ -2304,6 +2313,12 @@ def __init__(self, operation: Operation, table: Table)
-> None:
self._parent_snapshot_id = snapshot.snapshot_id if (snapshot :=
self._table.current_snapshot()) else None
self._added_data_files = []
self._commit_uuid = uuid.uuid4()
+ self._manifest_num_counter = itertools.count(0)
+
+ # TODO: consider if we can add a TableProperties class to handle these
like Java
Review Comment:
Yes, we really need that :3
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]