amogh-jahagirdar commented on code in PR #9902: URL: https://github.com/apache/iceberg/pull/9902#discussion_r1518451041
########## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderWithBloomFilter.java: ########## @@ -367,11 +374,28 @@ public void testReadWithFilter() { .filter( "id = 250 AND id_long = 1250 AND id_double = 10250.0 AND id_float = 100250.0" + " AND id_string = 'BINARY测试_250' AND id_boolean = true AND id_date = '2021-09-05'" - + " AND id_int_decimal = 77.77 AND id_long_decimal = 88.88 AND id_fixed_decimal = 99.99"); + + " AND id_int_decimal = 77.77 AND id_long_decimal = 88.88 AND id_fixed_decimal = 99.99" + + " AND id_nested.nested_id = 250"); record = SparkValueConverter.convert(table.schema(), df.collectAsList().get(0)); assertThat(df.collectAsList()).as("Table should contain 1 row").hasSize(1); assertThat(record.get(0)).as("Table should contain expected rows").isEqualTo(250); } + + @TestTemplate + public void testBloomCreation() throws IOException { + org.apache.hadoop.fs.Path path = new org.apache.hadoop.fs.Path(temp.toString()); + ParquetMetadata parquetMetadata = ParquetFileReader.readFooter(new Configuration(), path); + for(int i = 0; i < 11; i++) { + if (useBloomFilter) + { + assertThat(parquetMetadata.getBlocks().get(0).getColumns().get(0).getBloomFilterOffset()).isNotEqualTo(-1L); + } + else + { + assertThat(parquetMetadata.getBlocks().get(0).getColumns().get(0).getBloomFilterOffset()).isEqualTo(-1L); + } + } Review Comment: I think this is great validation we should add, but I think in the Spark tests we should use the Spark APIs or Spark SQL to perform the write and then we run the validation to make sure the bloom filters exist. That should help catch the issue why we're not seeing bloom filters being written for nested types when writing via Spark (but going through `FileAppender` writes masks that) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org