Re: [PR] Arrow: Fix vectorized reads of decimal columns with default values [iceberg]

via GitHub Thu, 21 May 2026 16:50:50 -0700


harperjiang commented on code in PR #16501:
URL: https://github.com/apache/iceberg/pull/16501#discussion_r3284998335



##########
arrow/src/test/java/org/apache/iceberg/arrow/vectorized/TestArrowReader.java:
##########
@@ -388,6 +388,99 @@ public void testTimestampMillisAreReadCorrectly() throws 
Exception {
     assertThat(totalRowsRead).as("Should read all 
rows").isEqualTo(millisValues.size());
   }
 
+  /**
+   * Regression test: a decimal column whose Iceberg field carries an 
initialDefault/writeDefault

Review Comment:
   Thanks @amogh-jahagirdar. Moved the test to 
spark/v4.1/.../TestParquetVectorizedReads.java and added a 
`getParquetWriterWithoutDictionary` helper next to the existing writer helpers. 
The new testDecimalWithDefaultValueNotDictionaryEncoded covers all three 
decimal physical encodings ( INT32, INT64, FIXED), and goes through the regular 
assertRecordsMatch path.
   
   Looked at goldenFilesAndEncodings but that doesn't cover decimal currently. 
Adding decimal to the list will intro extra changes. Happy to do it as a follow 
up.
   
   On UUID: traced the path and the bug doesn't apply. `getPhysicalType` only 
rewrites the field when `primitive.getLogicalTypeAnnotation() instanceof 
DecimalLogicalTypeAnnotation`; ArrowSchemaUtil on UUIDType produces 
FixedSizeBinary(16) directly, which is already the vector class the reader 
populates, so no surrogate Iceberg type is needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Arrow: Fix vectorized reads of decimal columns with default values [iceberg]

Reply via email to