rdblue commented on code in PR #12512:
URL: https://github.com/apache/iceberg/pull/12512#discussion_r1996431272


##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetVariantReaders.java:
##########
@@ -332,6 +347,91 @@ public void setPageSource(PageReadStore pageStore) {
     }
   }
 
+  private static class ShreddedArrayReader implements VariantValueReader {
+    private final int valueDL;
+    private final VariantValueReader valueReader;
+    private final int repeatedDL;
+    private final int repeatedRL;
+    private final VariantValueReader elementReader;
+    private final TripleIterator<?> valueColumn;
+    private final TripleIterator<?> elementColumn;
+    private final List<TripleIterator<?>> children;
+
+    private ShreddedArrayReader(

Review Comment:
   I don't think that the complexity here is needed. The visitor is going to 
visit the `element` group using `visitValue` after visiting both `value` and 
`typed_value` separately. The visitor then calls 
`ParquetVariantReaders.shredded(valueDL, valueReader, typedDL, typedReader)`. 
That returns a shredded reader that combines the two fields.
   
   That reader then would need to be wrapped in a `RepeatedReader` that 
produces a `ValueArray` to handle `typed_value`. And finally, that reader needs 
to be combined with the `value` reader using another `shredded` reader.
   
   I think that structure would avoid the need to reimplement the Parquet array 
reconstruction logic.
   
   There's still one issue with this, which is that the `RepeatedReader` needs 
to implement `VariantValueReader` as well, and the implementation needs to call 
`read(VariantMetadata)`. To do that, I'd probably copy the `RepeatedReader` and 
modify it as needed. But the critical difference is that it shouldn't do as 
much as is done here (combining `value` and `typed_value` isn't needed) and 
should be nearly identical to the `RepeatedReader` base.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to