amogh-jahagirdar commented on code in PR #12886: URL: https://github.com/apache/iceberg/pull/12886#discussion_r2072266707
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java: ########## @@ -946,7 +946,7 @@ public static List<SparkPartition> getPartitions( Object catalystValue = partition.values().get(fieldIndex, field.dataType()); Object value = CatalystTypeConverters.convertToScala(catalystValue, field.dataType()); - values.put(field.name(), String.valueOf(value)); + values.put(field.name(), (value == null) ? null : value.toString()); Review Comment: I definitely agree that this is a problem since these should be considered as null identity partition values in Iceberg but I feel like this solution is kind of brittle since we're essentially relying on the map having null values. The awkwardness of that is some downstream consumer taking the map<String, string> and checking the partition value in the map cannot differentiate if the partition value is literally null or it's not even a partition field to begin with (not an entry in the map) without an explicit containsKey. Ideally, we can actually start to deprecate the use of Map<String, String> in the SparkPartition and actually just include a StructLike of the identity partition values. I know that's a larger change just to fix this particular problem but it feels like the right thing to do in general. Let me think about this a bit more -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org