rdblue commented on code in PR #12512: URL: https://github.com/apache/iceberg/pull/12512#discussion_r1996431272
########## parquet/src/main/java/org/apache/iceberg/parquet/ParquetVariantReaders.java: ########## @@ -332,6 +347,91 @@ public void setPageSource(PageReadStore pageStore) { } } + private static class ShreddedArrayReader implements VariantValueReader { + private final int valueDL; + private final VariantValueReader valueReader; + private final int repeatedDL; + private final int repeatedRL; + private final VariantValueReader elementReader; + private final TripleIterator<?> valueColumn; + private final TripleIterator<?> elementColumn; + private final List<TripleIterator<?>> children; + + private ShreddedArrayReader( Review Comment: I don't think that the complexity here is needed. The visitor is going to visit the `element` group using `visitValue` after visiting both `value` and `typed_value` separately. The visitor then calls `ParquetVariantReaders.shredded(valueDL, valueReader, typedDL, typedReader)`. That returns a shredded reader that combines the two fields. That reader then would need to be wrapped in a `RepeatedReader` that produces a `ValueArray` to handle `typed_value`. And finally, that reader needs to be combined with the `value` reader using another `shredded` reader. I think that structure would avoid the need to reimplement the Parquet array reconstruction logic. There's still one issue with this, which is that the `RepeatedReader` needs to implement `VariantValueReader` as well, and the implementation needs to call `read(VariantMetadata)`. To do that, I'd probably copy the `RepeatedReader` and modify it as needed. But the critical difference is that it shouldn't do as much as is done here (combining `value` and `typed_value` isn't needed) and should be nearly identical to the `RepeatedReader` base. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org