Re: [PR] Core: Parsing and Writing Tests for V3 Metadata [iceberg]

via GitHub Tue, 14 Jan 2025 11:07:10 -0800


HonahX commented on code in PR #11947:
URL: https://github.com/apache/iceberg/pull/11947#discussion_r1915429803



##########
core/src/test/java/org/apache/iceberg/TestMetadataUpdateParser.java:
##########
@@ -364,15 +364,15 @@ public void testAddSnapshotToJson() throws IOException {
     String manifestList = createManifestListWithManifestFiles(snapshotId, 
parentId);
 
     Snapshot snapshot =
-        new BaseSnapshot(
-            0,
-            snapshotId,
-            parentId,
-            System.currentTimeMillis(),
-            DataOperations.REPLACE,
-            ImmutableMap.of("files-added", "4", "files-deleted", "100"),
-            schemaId,
-            manifestList);
+        MetadataTestUtils.buildTestSnapshot()

Review Comment:
   Good point, I will update to reduce the number of setters



##########
core/src/main/java/org/apache/iceberg/TableMetadataParser.java:
##########
@@ -352,6 +352,7 @@ public static TableMetadata fromJson(String 
metadataLocation, JsonNode node) {
       ImmutableList.Builder<Schema> builder = ImmutableList.builder();
       for (JsonNode schemaNode : schemaArray) {
         Schema current = SchemaParser.fromJson(schemaNode);
+        Schema.checkCompatibility(current, formatVersion);

Review Comment:
   I’ve given this some thought.
   
   When parsing metadata from a JSON file, we need to perform compatibility 
checks for every schema in the metadata. However, when building new metadata 
from an existing one, we only need to check the compatibility of newly added 
schemas, as the existing schemas in the `TableMetadata` object can be trusted.
   
   TableMetadata constructor is also directly called here: 
https://github.com/apache/iceberg/blob/2551587bf9d340507c2a4ca8ee355ee43c02383c/core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java#L107-L109
   For this and similar use case in the future, I think we do not need 
additional compatibility checks.
   
   In general, compatibility checks are expensive because they require 
iterating through fields, and the cost will increase as we add more fields and 
features to schemas in v4, v5, and beyond. Therefore, I think it’s better to 
minimize the number of checks whenever possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Parsing and Writing Tests for V3 Metadata [iceberg]

Reply via email to