amogh-jahagirdar commented on code in PR #15251:
URL: https://github.com/apache/iceberg/pull/15251#discussion_r2783735581


##########
core/src/main/java/org/apache/iceberg/MetricsUtil.java:
##########
@@ -476,4 +481,73 @@ public <T> void set(int pos, T value) {
       throw new UnsupportedOperationException("StructWithReadableMetrics is 
read only");
     }
   }
+
+  public static ContentStats fromMetrics(Schema schema, Metrics metrics) {
+    if (null == metrics) {
+      return null;
+    }
+
+    BaseContentStats.Builder builder = 
BaseContentStats.builder().withTableSchema(schema);
+    Map<Integer, BaseFieldStats<?>> map = Maps.newHashMap();

Review Comment:
   Minor: Should this be a map from field ID to a builder instead, and then at 
the end of the method we build? The logic is correct as far as I can tell but 
   
   ```
                 fieldStatsById.merge(
                     id,
                     
setter.apply(BaseFieldStats.builder().fieldId(id).type(type), 
boundValue).build(),
                     (oldVal, newVal) ->
                         setter
                             .apply(
                                 
BaseFieldStats.buildFrom((BaseFieldStats<Object>) oldVal).type(type),
                                 boundValue)
                             .build());
   ```
   
   below means that every time a metric is merged a whole BaseFieldStats object 
is built. So it's essentially an object allocation per field per metric we're 
collecting stats on rather than just mutating an existing builder reference, 
which would just be per field. Given the number of fields we expect to collect 
stats on is generally quite bounded, probably not a real problem but for wider 
schemas and in case a lot of fields do have stats it may start to creep into 
the hot path. Also feels like the code is just as simple that way, but up to you



##########
core/src/main/java/org/apache/iceberg/stats/BaseContentStats.java:
##########
@@ -120,8 +120,7 @@ public <T> T get(int pos, Class<T> javaClass) {
   @SuppressWarnings({"unchecked", "rawtypes", "CyclomaticComplexity"})
   @Override
   public <T> void set(int pos, T value) {
-    if (value instanceof GenericRecord) {
-      GenericRecord record = (GenericRecord) value;
+    if (value instanceof GenericRecord record) {

Review Comment:
   I think this is one of the newer JDK style ways of avoiding the "instanceof" 
and another line for casting? I'm good with it since it's in a relevant code 
path for this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to