emkornfield commented on issue #13855: URL: https://github.com/apache/iceberg/issues/13855#issuecomment-3275784592
> I assume you are referring "metrics" when you mention "stats". Yes, the spec doesn't have a name for them in aggregate but was was referring to `value_counts` and `null_value_counts` from the spec. > If you look at the discussions https://github.com/apache/iceberg/pull/13398#discussion_r2170905401, which was one of the primary reasons not to depend on metrics as it DOES NOT cover for ALL columns and driven by write.metadata.metrics.max-inferred-column-defaults configuration. If a schema has 20 columns and max inferred column says 10 columns, metrics would be generated only for 10 columns. Metrics like valueCounts, nullValueCounts, nanValueCounts and so on would be generated only for certain columns in the schema, not for all. I don't think this should be ruled out, or at least there seems to be a path here without changing the specification. This configuration as far as I can tell is not part of the spec. I think the disconnect we have is one could add a new configuration named something like `write.metadata.metrics.always_record_all_null_metrics` which could always write the necessary metrics for implementations to infer if the column is entirely null once it exists in the schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
