rdblue commented on code in PR #13039:
URL: https://github.com/apache/iceberg/pull/13039#discussion_r2254733603
##########
core/src/main/java/org/apache/iceberg/MetricsConfig.java:
##########
@@ -107,6 +112,104 @@ public static MetricsConfig forPositionDelete(Table
table) {
return new MetricsConfig(columnModes.build(), defaultMode);
}
+ static Set<Integer> limitFieldIds(Schema schema, int limit) {
+ return TypeUtil.visit(
+ schema,
+ new TypeUtil.CustomOrderSchemaVisitor<>() {
+ private final Set<Integer> idSet = Sets.newHashSet();
+
+ private boolean shouldContinue() {
+ return idSet.size() < limit;
+ }
+
+ private boolean metricsEligible(Type type) {
+ return type.isPrimitiveType() || type.isVariantType();
+ }
+
+ @Override
+ @SuppressWarnings("ReturnValueIgnored")
+ public Set<Integer> schema(Schema schema, Supplier<Set<Integer>>
structResult) {
+ // We need to call structResult.get() to visit the schema
+ structResult.get();
+ return idSet;
+ }
+
+ @Override
+ public Set<Integer> struct(Types.StructType struct,
Iterable<Set<Integer>> fieldResults) {
+ Iterator<Types.NestedField> fields = struct.fields().iterator();
+ while (shouldContinue() && fields.hasNext()) {
+ Types.NestedField field = fields.next();
+ if (metricsEligible(field.type())) {
+ idSet.add(field.fieldId());
+ }
+ }
+
+ // visit children to add more ids
+ Iterator<Set<Integer>> iter = fieldResults.iterator();
+ while (shouldContinue() && iter.hasNext()) {
+ iter.next();
+ }
+
+ return null;
+ }
+
+ @Override
+ @SuppressWarnings("ReturnValueIgnored")
+ public Set<Integer> field(Types.NestedField field,
Supplier<Set<Integer>> fieldResult) {
+ if (shouldContinue()) {
Review Comment:
I don't think this _needs_ to call `shouldContinue` everywhere, as long as
it is called before adding an ID to the set. It's not a problem to, but it's
okay to traverse a field and return quickly without adding an ID rather than
not traverse a field. And for fields specifically, `shouldContinue` is called
before the field is visited.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]