advancedxy commented on code in PR #8502:
URL: https://github.com/apache/iceberg/pull/8502#discussion_r1650705615


##########
core/src/main/java/org/apache/iceberg/TableMetadataParser.java:
##########
@@ -481,6 +488,13 @@ public static TableMetadata fromJson(String 
metadataLocation, JsonNode node) {
       statisticsFiles = ImmutableList.of();
     }
 
+    List<PartitionStatisticsFile> partitionStatisticsFiles;
+    if (node.has(PARTITION_STATISTICS)) {
+      partitionStatisticsFiles = 
partitionStatsFilesFromJson(node.get(PARTITION_STATISTICS));
+    } else {
+      partitionStatisticsFiles = ImmutableList.of();
+    }
+

Review Comment:
   Hi @ajantha-bhat and @aokolnychyi, I have a question about this 
implementation as I'm exploring to add new fields into TableMetadata.  Suppose 
the table `db.table`'s partition stats is updated by the new version of Iceberg 
via UpdatePartitionStatistics. After that, some old version of Iceberg library 
or the PyIceberg client produces a new commit to this table. Per my 
understanding, that writer will produce TableMetadata without 
`PARTITION_STATISTICS` since it knows nothing about `PARTITION_STATISTICS`, 
which effectively loses that info for the table. 
   
   Do you have any solutions or ideas on how to prevent such cases? I can think 
of some potential ideas, such as:
   1. upgrade the format_version to a new one whenever we need to add new 
fields to table metadata, all the old clients will be rejected by the version 
check then.
   2. define a writer_version field, old client can read metadata produced by 
new client, but it will reject writers with old versions.
   3. move the check to the REST catalog service? 
   
   I feel it's too heavy  to do a format upgrade when only adding new fields in 
TableMetadata. 
   
   Do you have any other ideas? Really appreciate your inputs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to