asheeshgarg commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1371476608
@nastra @nazq so in the mean time we have merged the pull request and
bundle a local jar with the #3024
It work fine for most of the columns but we are getting
java.lang.IndexOutOfBoundsException: index: 32749, length: 32 (expected:
range(0, 32768))
at org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:701)
at org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:765)
at
org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1244)
at
org.apache.arrow.vector.BaseVariableWidthVector.set(BaseVariableWidthVector.java:1025)
at
org.apache.iceberg.arrow.DictEncodedArrowConverter.lambda$toVarCharVector$5(DictEncodedArrowConverter.java:153)
at
org.apache.iceberg.arrow.DictEncodedArrowConverter.initVector(DictEncodedArrowConverter.java:201)
at
org.apache.iceberg.arrow.DictEncodedArrowConverter.toVarCharVector(DictEncodedArrowConverter.java:150)
at
org.apache.iceberg.arrow.DictEncodedArrowConverter.toArrowVector(DictEncodedArrowConverter.java:47)
at
org.apache.iceberg.arrow.vectorized.ColumnVector.getArrowVector(ColumnVector.java:66)
at
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at
java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575)
at
java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at
java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616)
at
org.apache.iceberg.arrow.vectorized.ColumnarBatch.createVectorSchemaRootFromVectors(ColumnarBatch.java:58)
at com.ReadIcebergTableTestV3.main(ReadIcebergTableTestV3.java:54)
when reading columns where distinct count is large for the dictionary. Will
try to create a test case to replicate it.
@rdblue Arrow also added dataset where we can read the tabular data Arrow
Vectors I was able to read the parquet file directly not sure if we like to add
read support using the Arrow Dataset.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]