amogh-jahagirdar commented on code in PR #15251:
URL: https://github.com/apache/iceberg/pull/15251#discussion_r2783735581
##########
core/src/main/java/org/apache/iceberg/MetricsUtil.java:
##########
@@ -476,4 +481,73 @@ public <T> void set(int pos, T value) {
throw new UnsupportedOperationException("StructWithReadableMetrics is
read only");
}
}
+
+ public static ContentStats fromMetrics(Schema schema, Metrics metrics) {
+ if (null == metrics) {
+ return null;
+ }
+
+ BaseContentStats.Builder builder =
BaseContentStats.builder().withTableSchema(schema);
+ Map<Integer, BaseFieldStats<?>> map = Maps.newHashMap();
Review Comment:
Minor: Should this be a map from field ID to a builder instead, and then at
the end of the method we build? The logic is correct as far as I can tell but
```
fieldStatsById.merge(
id,
setter.apply(BaseFieldStats.builder().fieldId(id).type(type),
boundValue).build(),
(oldVal, newVal) ->
setter
.apply(
BaseFieldStats.buildFrom((BaseFieldStats<Object>) oldVal).type(type),
boundValue)
.build());
```
below means that every time a metric is merged a whole BaseFieldStats object
is built. So it's essentially an object allocation per field per metric we're
collecting stats on rather than just mutating an existing builder reference,
which would just be per field. Given the number of fields we expect to collect
stats on is generally quite bounded, probably not a real problem but for wider
schemas and in case a lot of fields do have stats it may start to creep into
the hot path. Also feels like the code is just as simple that way, but up to you
##########
core/src/main/java/org/apache/iceberg/stats/BaseContentStats.java:
##########
@@ -120,8 +120,7 @@ public <T> T get(int pos, Class<T> javaClass) {
@SuppressWarnings({"unchecked", "rawtypes", "CyclomaticComplexity"})
@Override
public <T> void set(int pos, T value) {
- if (value instanceof GenericRecord) {
- GenericRecord record = (GenericRecord) value;
+ if (value instanceof GenericRecord record) {
Review Comment:
I think this is one of the newer JDK style ways of avoiding the "instanceof"
and another line for casting? I'm good with it since it's in a relevant code
path for this change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]