[I] Vectorized reader throws ClassCastException on int-to-long promotion when Parquet file has INT(32) logical type annotation [iceberg]

via GitHub Thu, 14 May 2026 14:58:39 -0700


xndai opened a new issue, #16341:
URL: https://github.com/apache/iceberg/issues/16341


   ### Apache Iceberg version
   
   1.10.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   After promoting an integer column to long via schema evolution, reading 
Parquet files that have an INT(32, true) logical type annotation with the 
vectorized reader throws:
   
   ```
     java.lang.ClassCastException: class 
org.apache.iceberg.shaded.org.apache.arrow.vector.BigIntVector cannot be cast 
to class
     org.apache.iceberg.shaded.org.apache.arrow.vector.IntVector
         at 
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader$LogicalTypeVisitor.visit(VectorizedArrowReader.java:592)
         at 
org.apache.parquet.schema.LogicalTypeAnnotation$IntLogicalTypeAnnotation.accept(LogicalTypeAnnotation.java:812)
         at 
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader.allocateVectorBasedOnLogicalType(VectorizedArrowReader.java:287)
         at 
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader.allocateFieldVector(VectorizedArrowReader.java:239)
         at 
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader.read(VectorizedArrowReader.java:153)
   ```
   Detailed repro steps:
   
   ```
       tables = new HadoopTables();
       Schema schema = new Schema(Types.NestedField.required(1, "col", 
Types.IntegerType.get()));
       Table table = tables.create(schema, tempDir.toURI() + 
"/int-promotion-logical");
   
       // Write a Parquet file with INT(32, signed) logical type annotation.
       // This is what non-Iceberg writers (PyArrow, Spark native, etc.) 
typically produce.
       MessageType parquetSchema =
           new MessageType(
               "test",
               primitive(PrimitiveType.PrimitiveTypeName.INT32, 
Type.Repetition.REQUIRED)
                   .as(LogicalTypeAnnotation.intType(32, true))
                   .id(1)
                   .named("col"));
       ...
       // Promote the column type from int to long (simulates ALTER TABLE)
       table.updateSchema().updateColumn("col", Types.LongType.get()).commit();
       ...
   
       // Read with the vectorized reader 
       int totalRows = 0;
       int rowIndex = 0;
       try (VectorizedTableScanIterable vectorizedReader =
           new VectorizedTableScanIterable(table.newScan(), 1024, false)) {
         for (ColumnarBatch batch : vectorizedReader) {           // exception 
thrown here
             ...
         }
       }
       ...
   
   ```
   Root cause:
   
   In `VectorizedArrowReader.allocateFieldVector()`, the vector is created from 
the Iceberg schema type which is `BigIntVector` after schema evolution. But 
then the `LogicalTypeVisitor` casts it based on the Parquet file's logical 
type, which is INT(32). This mismatch causes the `ClassCastException`.
   
   To fix this, we would need to create the `FieldVector` based on the actual 
parquet data size. The accessor then handles widening to long when the engine 
calls getLong(). 
   
   
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Vectorized reader throws ClassCastException on int-to-long promotion when Parquet file has INT(32) logical type annotation [iceberg]

Reply via email to