Fokko commented on code in PR #363: URL: https://github.com/apache/iceberg-python/pull/363#discussion_r1522768023
########## tests/integration/test_writes.py: ########## @@ -355,6 +355,44 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w assert [row.deleted_data_files_count for row in rows] == [0, 0, 1, 0, 0] +@pytest.mark.integration Review Comment: Can you parameterize the test for both V1 and V2 tables? ########## pyiceberg/table/__init__.py: ########## @@ -216,6 +221,15 @@ class TableProperties: FORMAT_VERSION = "format-version" DEFAULT_FORMAT_VERSION = 2 + MANIFEST_TARGET_SIZE_BYTES = "commit.manifest.target-size-bytes" + MANIFEST_TARGET_SIZE_BYTES_DEFAULT = 8 * 1024 * 1024 # 8 MB + + MANIFEST_MIN_MERGE_COUNT = "commit.manifest.min-count-to-merge" + MANIFEST_MIN_MERGE_COUNT_DEFAULT = 100 + + MANIFEST_MERGE_ENABLED = "commit.manifest-merge.enabled" + MANIFEST_MERGE_ENABLED_DEFAULT = True Review Comment: Can you add these to the docs as well? :) ########## tests/integration/test_writes.py: ########## @@ -355,6 +355,44 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w assert [row.deleted_data_files_count for row in rows] == [0, 0, 1, 0, 0] +@pytest.mark.integration Review Comment: We want to assert the manifest-entries as well (only for the merge-appended one). ########## pyiceberg/table/__init__.py: ########## @@ -2697,6 +2810,9 @@ def __init__(self, transaction: Transaction, io: FileIO) -> None: def fast_append(self) -> FastAppendFiles: return FastAppendFiles(operation=Operation.APPEND, transaction=self._transaction, io=self._io) + def merge_append(self) -> MergeAppendFiles: + return MergeAppendFiles(operation=Operation.APPEND, transaction=self._transaction, io=self._io) Review Comment: I see that the signature is the same as the fast-append. My intention was to enable folks to use either merge or fast-append through the API, rather than having to set table properties. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org