wypoon commented on issue #11221:
URL: https://github.com/apache/iceberg/issues/11221#issuecomment-2379964488

   In Iceberg 1.1, a different bug occurs when reading the Iceberg table; the 
read fails altogether due to:
   ```
   ERROR org.apache.iceberg.spark.source.BaseReader - Error reading file(s): 
file:/Users/wypoon/tmp/downloads/ENGESC-26958/lgim_test_data.parq
   java.lang.IndexOutOfBoundsException: index: 1016, length: 1 (expected: 
range(0, 1016))
        at org.apache.arrow.memory.ArrowBuf.checkIndexD(ArrowBuf.java:318)
        at org.apache.arrow.memory.ArrowBuf.chk(ArrowBuf.java:305)
        at org.apache.arrow.memory.ArrowBuf.getByte(ArrowBuf.java:507)
        at 
org.apache.arrow.vector.BitVectorHelper.setBit(BitVectorHelper.java:85)
        at 
org.apache.arrow.vector.DecimalVector.setBigEndian(DecimalVector.java:216)
        at 
org.apache.iceberg.arrow.vectorized.parquet.DecimalVectorUtil.setBigEndian(DecimalVectorUtil.java:31)
        at 
org.apache.iceberg.arrow.vectorized.parquet.VectorizedDictionaryEncodedParquetValuesReader$FixedLengthDecimalDictEncodedReader.nextVal(VectorizedDictionaryEncodedParquetValuesReader.java:146)
        at 
org.apache.iceberg.arrow.vectorized.parquet.VectorizedDictionaryEncodedParquetValuesReader$BaseDictEncodedReader.nextBatch(VectorizedDictionaryEncodedParquetValuesReader.java:67)
        at 
org.apache.iceberg.arrow.vectorized.parquet.VectorizedParquetDefinitionLevelReader$FixedLengthDecimalReader.nextDictEncodedVal(VectorizedParquetDefinitionLevelReader.java:513)
        at 
org.apache.iceberg.arrow.vectorized.parquet.VectorizedParquetDefinitionLevelReader$BaseReader.nextDictEncodedBatch(VectorizedParquetDefinitionLevelReader.java:356)
        at 
org.apache.iceberg.arrow.vectorized.parquet.VectorizedPageIterator$FixedLengthDecimalPageReader.nextDictEncodedVal(VectorizedPageIterator.java:421)
        at 
org.apache.iceberg.arrow.vectorized.parquet.VectorizedPageIterator$BagePageReader.nextBatch(VectorizedPageIterator.java:186)
        at 
org.apache.iceberg.arrow.vectorized.parquet.VectorizedColumnIterator$FixedLengthDecimalBatchReader.nextBatchOf(VectorizedColumnIterator.java:213)
        at 
org.apache.iceberg.arrow.vectorized.parquet.VectorizedColumnIterator$BatchReader.nextBatch(VectorizedColumnIterator.java:77)
        at 
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader.read(VectorizedArrowReader.java:146)
        at 
org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader$ColumnBatchLoader.readDataToColumnVectors(ColumnarBatchReader.java:123)
        at 
org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader$ColumnBatchLoader.loadDataToColumnBatch(ColumnarBatchReader.java:98)
        at 
org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader.read(ColumnarBatchReader.java:72)
        at 
org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader.read(ColumnarBatchReader.java:44)
        at 
org.apache.iceberg.parquet.VectorizedParquetReader$FileIterator.next(VectorizedParquetReader.java:147)
        at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:130)
        at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:119)
        at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:156)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
        at scala.Option.exists(Option.scala:376)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:97)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
           ...
   ```
   As far as I can tell, this has never worked or worked correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to