laskoviymishka opened a new issue, #987:
URL: https://github.com/apache/iceberg-go/issues/987

   Parent: #589
   
   Once the shredded reader (#986) lands, the writer needs to decide what to 
shred per row and emit the shredded `STRUCT` alongside `metadata` / residual 
`value`. Java's posture: no shredding unless explicitly configured. Mirror that 
with a new table property `write.variant.shredding-paths` carrying a list of 
`$.path.expressions` — same property name as Java, parsed by the same property 
layer that already handles `write.metadata.compression-codec` etc.
   
   Add `table/internal/variant_shredder.go` with `ShredVariant(value 
variant.Value, schema ShreddingSchema) (typed any, residual []byte, err 
error)`, then hook it into the parquet writer in 
`table/internal/parquet_files.go` so shredded columns appear in the parquet 
schema and are populated per row. Statistics on shredded typed columns 
piggyback the existing column-stats path — typed columns get min/max for free, 
which is the substrate the follow-up stats issue builds on.
   
   Cross-client coverage: write a shredded variant via iceberg-go, read it via 
Java/pyiceberg, assert equal to source. Round-trip via the iceberg-go reader is 
also required. Spec: [Parquet Variant 
shredding](https://github.com/apache/parquet-format/blob/master/VariantShredding.md).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to