mzzz-zzm opened a new pull request, #1044: URL: https://github.com/apache/iceberg-go/pull/1044
v1 and v2 manifest writers silently dropped data_file.distinct_counts (field id 111) because it was missing from the data_file_v1 and data_file_v2 record declarations in internal/avro_schemas.go. The hamba/avro encoder writes only declared fields, so the Go-side *[]colMap pointer was discarded on encode for every version. The Iceberg v1 and v2 specs list distinct_counts as a writable optional field (map<123: int, 124: long>); the v3 spec deprecates it (see apache/iceberg#12182 and #1001/#1039). This commit: - Adds the distinct_counts field to data_file_v1 and data_file_v2 with the canonical map element ids (key=123, value=124), inserted after nan_value_counts to match the spec's field ordering. - Leaves data_file_v3 unchanged. The defensive guard added in #1039 (v3writerImpl.prepareEntry) becomes load-bearing once this lands: it ensures v3 manifests still omit the field even though the encoder is now capable of serializing it via the v1/v2 schema pathway. Tests: - TestWriteManifestV2KeepsDistinctCounts - v2 round-trip preserves the supplied distinct counts. - TestWriteManifestV1KeepsDistinctCounts - v1 round-trip preserves them too. Both tests fail on origin/main (encoder drops the field, returns nil) and pass after the schema additions, exercising the change directly. Fixes #1038 Related: #1001, #1039 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
