[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

GitBox Tue, 15 Nov 2022 02:20:57 -0800


szehon-ho commented on code in PR #5376:
URL: https://github.com/apache/iceberg/pull/5376#discussion_r1022604659



##########
core/src/main/java/org/apache/iceberg/MetricsUtil.java:
##########
@@ -56,4 +63,125 @@ public static MetricsModes.MetricsMode metricsMode(
     String columnName = inputSchema.findColumnName(fieldId);
     return metricsConfig.columnMode(columnName);
   }
+
+  // Utilities for Displaying Metrics
+
+  static final Types.NestedField COLUMN_SIZES_METRICS =
+      optional(
+          300,
+          "column_sizes_metrics",
+          Types.MapType.ofRequired(301, 302, Types.StringType.get(), 
Types.LongType.get()),
+          "Map of column name to total size on disk");
+  static final Types.NestedField VALUE_COUNT_METRICS =
+      optional(
+          303,
+          "value_counts_metrics",
+          Types.MapType.ofRequired(304, 305, Types.StringType.get(), 
Types.LongType.get()),
+          "Map of column name to total count, including null and NaN");
+  static final Types.NestedField NULL_VALUE_COUNTS_METRICS =
+      optional(
+          306,
+          "null_value_counts_metrics",
+          Types.MapType.ofRequired(307, 308, Types.StringType.get(), 
Types.LongType.get()),
+          "Map of column name to null value count");
+  static final Types.NestedField NAN_VALUE_COUNTS_METRICS =
+      optional(
+          309,
+          "nan_value_counts_metrics",
+          Types.MapType.ofRequired(310, 311, Types.StringType.get(), 
Types.LongType.get()),
+          "Map of column name to number of NaN values in the column");
+  static final Types.NestedField LOWER_BOUNDS_METRICS =
+      optional(
+          312,
+          "lower_bounds_metrics",
+          Types.MapType.ofRequired(313, 314, Types.StringType.get(), 
Types.StringType.get()),
+          "Map of column name to lower bound in string format");
+  static final Types.NestedField UPPER_BOUNDS_METRICS =
+      optional(
+          315,
+          "upper_bounds_metrics",
+          Types.MapType.ofRequired(316, 317, Types.StringType.get(), 
Types.StringType.get()),
+          "Map of column name to upper bound in string format");
+  public static final Schema METRICS_DISPLAY_SCHEMA =
+      new Schema(
+          COLUMN_SIZES_METRICS,
+          VALUE_COUNT_METRICS,
+          NULL_VALUE_COUNTS_METRICS,
+          NAN_VALUE_COUNTS_METRICS,
+          LOWER_BOUNDS_METRICS,
+          UPPER_BOUNDS_METRICS);
+
+  public static class Metric {
+    private final String quotedName;
+    private final Types.NestedField field;
+    private final ByteBuffer value;
+
+    Metric(String quotedName, Types.NestedField field, ByteBuffer value) {
+      this.quotedName = quotedName;
+      this.field = field;
+      this.value = value;
+    }
+
+    String quotedName() {
+      return quotedName;
+    }
+
+    boolean valid() {
+      return quotedName != null && field != null && value != null;
+    }
+
+    Optional<String> convertToReadable() {
+      try {
+        return Optional.of(
+            Transforms.identity(field.type())
+                .toHumanString(Conversions.fromByteBuffer(field.type(), 
value)));
+      } catch (Exception e) { // Ignore

Review Comment:
   Following up on this, this is a non-issue as the spark procedures set the 
flag: schema.name-mapping.default , just this test does not.  Fixed the test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

Reply via email to