Re: [PR] Fix: Arrow vectorized reads for TIMESTAMP_MILLIS and shaded JAR initialization [iceberg]

via GitHub Wed, 05 Nov 2025 09:49:43 -0800


shubham19may commented on code in PR #14499:
URL: https://github.com/apache/iceberg/pull/14499#discussion_r2495555914



##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java:
##########
@@ -541,31 +541,41 @@ public Optional<LogicalTypeVisitorResult> visit(
     @Override
     public Optional<LogicalTypeVisitorResult> visit(
         LogicalTypeAnnotation.TimestampLogicalTypeAnnotation 
timestampLogicalType) {
-      FieldVector vector = arrowField.createVector(rootAlloc);
       switch (timestampLogicalType.getUnit()) {
         case MILLIS:
-          ((BigIntVector) vector).allocateNew(batchSize);
+          Field bigIntField =
+              new Field(
+                  icebergField.name(),
+                  new FieldType(
+                      icebergField.isOptional(), new ArrowType.Int(Long.SIZE, 
true), null, null),
+                  null);
+          FieldVector millisVector = bigIntField.createVector(rootAlloc);
+          ((BigIntVector) millisVector).allocateNew(batchSize);

Review Comment:
   well, I had earlier tried with `TimeStampMilliTZVector` first, but it fails 
because `arrowField.createVector()` creates `TimeStampMicroTZVector` (from 
Iceberg schema), causing `TimeStampMicroTZVector` -> `TimeStampMilliTZVector` 
cast exception. Even if we explicitly created `TimeStampMilliTZVector`, 
TimeStampMilliTZVector writes raw longs via `getDataBuffer().setLong()`, not 
Arrow timestamp values.
    
   moreover, parquet stores `TIMESTAMP_MILLIS` as physical `INT64`, so 
`BigIntVector` (raw long container) is correct. Caller knows it's timestamp 
data via `ReadType.TIMESTAMP_MILLIS`, not vector type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix: Arrow vectorized reads for TIMESTAMP_MILLIS and shaded JAR initialization [iceberg]

Reply via email to