mzzz-zzm opened a new pull request, #1044:
URL: https://github.com/apache/iceberg-go/pull/1044

   v1 and v2 manifest writers silently dropped data_file.distinct_counts (field 
id 111) because it was missing from the data_file_v1 and data_file_v2 record 
declarations in internal/avro_schemas.go. The hamba/avro encoder writes only 
declared fields, so the Go-side *[]colMap pointer was discarded on encode for 
every version.
   
   The Iceberg v1 and v2 specs list distinct_counts as a writable optional 
field (map<123: int, 124: long>); the v3 spec deprecates it (see 
apache/iceberg#12182 and #1001/#1039). This commit:
   
   - Adds the distinct_counts field to data_file_v1 and data_file_v2 with the 
canonical map element ids (key=123, value=124), inserted after nan_value_counts 
to match the spec's field ordering.
   - Leaves data_file_v3 unchanged. The defensive guard added in #1039 
(v3writerImpl.prepareEntry) becomes load-bearing once this lands: it ensures v3 
manifests still omit the field even though the encoder is now capable of 
serializing it via the v1/v2 schema pathway.
   
   Tests:
   - TestWriteManifestV2KeepsDistinctCounts - v2 round-trip preserves the 
supplied distinct counts.
   - TestWriteManifestV1KeepsDistinctCounts - v1 round-trip preserves them too.
   
   Both tests fail on origin/main (encoder drops the field, returns nil) and 
pass after the schema additions, exercising the change directly.
   
   Fixes #1038
   Related: #1001, #1039


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to