Fokko commented on code in PR #363: URL: https://github.com/apache/iceberg-python/pull/363#discussion_r1478849196
########## pyiceberg/table/__init__.py: ########## @@ -2411,11 +2428,29 @@ def _fetch_existing_manifests() -> List[ManifestFile]: executor = ExecutorFactory.get_or_create() - added_manifests = executor.submit(_write_added_manifest) - delete_manifests = executor.submit(_write_delete_manifest) - existing_manifests = executor.submit(_fetch_existing_manifests) + added_manifests_future = executor.submit(_write_added_manifest) + delete_manifests_future = executor.submit(_write_delete_manifest) + existing_manifests_future = executor.submit(_fetch_existing_manifests) + + added_manifests = added_manifests_future.result() + delete_manifests = delete_manifests_future.result() + existing_manifests = existing_manifests_future.result() + unmerged_data_manifests = ( + added_manifests + + delete_manifests + + [manifest for manifest in existing_manifests if manifest.content == ManifestContent.DATA] + ) + # TODO: need to re-consider the name here: manifest containing positional deletes and manifest containing deleted entries + unmerged_deletes_manifests = [manifest for manifest in existing_manifests if manifest.content == ManifestContent.DELETES] + + data_manifest_merge_manager = ManifestMergeManager( Review Comment: We're changing the append operation from a fast-append to a regular append when it hits a threshold. I would be more comfortable with keeping the compaction separate. This way we know that an append/overwrite is always fast and in constant time. For example, if you have a process that appends data, you know how fast it will run (actually it is a function of the number of manifests). ########## pyiceberg/table/__init__.py: ########## @@ -2304,6 +2313,12 @@ def __init__(self, operation: Operation, table: Table) -> None: self._parent_snapshot_id = snapshot.snapshot_id if (snapshot := self._table.current_snapshot()) else None self._added_data_files = [] self._commit_uuid = uuid.uuid4() + self._manifest_num_counter = itertools.count(0) + + # TODO: consider if we can add a TableProperties class to handle these like Java Review Comment: Yes, we really need that :3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org