aihuaxu commented on code in PR #14279:
URL: https://github.com/apache/iceberg/pull/14279#discussion_r2422249274
##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java:
##########
@@ -379,7 +380,8 @@ public <T> Boolean in(BoundReference<T> ref, Set<T>
literalSet) {
// When filtering nested types notNull() is implicit filter passed even
though complex
Review Comment:
nit: correct the comment notNull() => in()
##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java:
##########
@@ -329,7 +329,8 @@ public <T> Boolean eq(BoundReference<T> ref, Literal<T>
lit) {
// When filtering nested types notNull() is implicit filter passed even
though complex
// filters aren't pushed down in Parquet. Leave all nested column type
filters to be
// evaluated post scan.
- if (schema.findType(id) instanceof Type.NestedType) {
+ Type type = schema.findType(id);
+ if (type instanceof Type.NestedType || type.isVariantType()) {
Review Comment:
Thanks for details. That helps.
##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java:
##########
@@ -329,7 +329,8 @@ public <T> Boolean eq(BoundReference<T> ref, Literal<T>
lit) {
// When filtering nested types notNull() is implicit filter passed even
though complex
Review Comment:
nit: can you help correct the comment here? notNull() => eq()
##########
data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java:
##########
@@ -1048,19 +1041,68 @@ public void testVariantFieldAllNullsNotNull() throws
IOException {
records.add(record);
}
- File parquetFile = writeParquetFile("test-variant-nulls", VARIANT_SCHEMA,
records);
- InputFile inFile = Files.localInput(parquetFile);
+ boolean shouldRead = shouldReadVariant(notNull("variant_field"), records);
- try (ParquetFileReader reader =
ParquetFileReader.open(parquetInputFile(inFile))) {
- BlockMetaData blockMetaData = reader.getRowGroups().get(0);
- MessageType fileSchema = reader.getFileMetaData().getSchema();
- ParquetMetricsRowGroupFilter rowGroupFilter =
- new ParquetMetricsRowGroupFilter(VARIANT_SCHEMA,
notNull("variant_field"), true);
+ assertThat(shouldRead)
+ .as("Should read: variant notNull filters must be evaluated post scan
even for all nulls")
+ .isTrue();
+ }
- assertThat(rowGroupFilter.shouldRead(fileSchema, blockMetaData))
- .as("Should read: variant notNull filters must be evaluated post
scan even for all nulls")
- .isTrue();
- }
+ @TestTemplate
+ public void testVariantFieldEq() throws IOException {
Review Comment:
These tests are very similar. Probably consider parameterized tests or
extract the common part into helper function - seems the only difference is the
test function and error message?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]