harperjiang commented on code in PR #16501:
URL: https://github.com/apache/iceberg/pull/16501#discussion_r3284998335
##########
arrow/src/test/java/org/apache/iceberg/arrow/vectorized/TestArrowReader.java:
##########
@@ -388,6 +388,99 @@ public void testTimestampMillisAreReadCorrectly() throws
Exception {
assertThat(totalRowsRead).as("Should read all
rows").isEqualTo(millisValues.size());
}
+ /**
+ * Regression test: a decimal column whose Iceberg field carries an
initialDefault/writeDefault
Review Comment:
Thanks @amogh-jahagirdar. Moved the test to
spark/v4.1/.../TestParquetVectorizedReads.java and added a
`getParquetWriterWithoutDictionary` helper next to the existing writer helpers.
The new testDecimalWithDefaultValueNotDictionaryEncoded covers all three
decimal physical encodings ( INT32, INT64, FIXED), and goes through the regular
assertRecordsMatch path.
Looked at goldenFilesAndEncodings but that doesn't cover decimal currently.
Adding decimal to the list will intro extra changes. Happy to do it as a follow
up.
On UUID: traced the path and the bug doesn't apply. `getPhysicalType` only
rewrites the field when `primitive.getLogicalTypeAnnotation() instanceof
DecimalLogicalTypeAnnotation`; ArrowSchemaUtil on UUIDType produces
FixedSizeBinary(16) directly, which is already the vector class the reader
populates, so no surrogate Iceberg type is needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]