[GitHub] [iceberg] asheeshgarg commented on issue #6415: Vectorized Read Issue

GitBox Wed, 04 Jan 2023 13:57:59 -0800


asheeshgarg commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1371476608


   @nastra @nazq  so in the mean time we have merged the pull request and 
bundle a local jar with the #3024 
   It work fine for most of the columns but we are getting 
   java.lang.IndexOutOfBoundsException: index: 32749, length: 32 (expected: 
range(0, 32768))
           at org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:701)
           at org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:765)
           at 
org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1244)
           at 
org.apache.arrow.vector.BaseVariableWidthVector.set(BaseVariableWidthVector.java:1025)
           at 
org.apache.iceberg.arrow.DictEncodedArrowConverter.lambda$toVarCharVector$5(DictEncodedArrowConverter.java:153)
           at 
org.apache.iceberg.arrow.DictEncodedArrowConverter.initVector(DictEncodedArrowConverter.java:201)
           at 
org.apache.iceberg.arrow.DictEncodedArrowConverter.toVarCharVector(DictEncodedArrowConverter.java:150)
           at 
org.apache.iceberg.arrow.DictEncodedArrowConverter.toArrowVector(DictEncodedArrowConverter.java:47)
           at 
org.apache.iceberg.arrow.vectorized.ColumnVector.getArrowVector(ColumnVector.java:66)
           at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
           at 
java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992)
           at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
           at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
           at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575)
           at 
java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
           at 
java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616)
           at 
org.apache.iceberg.arrow.vectorized.ColumnarBatch.createVectorSchemaRootFromVectors(ColumnarBatch.java:58)
           at com.ReadIcebergTableTestV3.main(ReadIcebergTableTestV3.java:54)
   when reading columns where distinct count is large for the dictionary. Will 
try to create a test case to replicate it.
   
   @rdblue Arrow also added dataset where we can read the tabular data Arrow 
Vectors I was able to read the parquet file directly not sure if we like to add 
read support using the Arrow Dataset.
    
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] asheeshgarg commented on issue #6415: Vectorized Read Issue

Reply via email to