hussein-awala opened a new issue, #9898: URL: https://github.com/apache/iceberg/issues/9898
### Apache Iceberg version 1.4.3 (latest release) ### Query engine Spark ### Please describe the bug 🐞 I have an Iceberg table, and I want to create two bloom filters on a root string column and nested string column in a struct, I've set the properties `write.parquet.bloom-filter-enabled.column.a` and `write.parquet.bloom-filter-enabled.column.b.c` to `true`, and I checked with `parquet-cli`: ```bash $ parquet bloom-filter /path/to/file.parquet -c a -v <not existing value> Row group 0: -------------------------------------------------------------------------------- value <not existing value> NOT exists. $ parquet bloom-filter /path/to/file.parquet -c a -v <existing value> Row group 0: -------------------------------------------------------------------------------- value <existing value> maybe exists. $ parquet bloom-filter /path/to/file.parquet -c b.c -v <some value> Row group 0: -------------------------------------------------------------------------------- column b.c has no bloom filter # check if it's an issue with column name parsing: $ parquet bloom-filter /path/to/file.parquet -c b.d -v <some value> Argument error: Schema doesn't have column: b.d ``` However, I tried with Spark and parquet, and it worker without any issue: ```scala import org.apache.spark.sql.types._ import org.apache.spark.sql.Row import spark.implicits._ val schema = StructType(Array( StructField("a", StringType, true), StructField("b", StringType, true), StructField("nested", StructType(Array( StructField("c", StringType, true), StructField("d", StringType, true) )), true) )) val data = Seq( Row("1", "25", Row("100", "a")), Row("2", "30", Row("200", "b")), Row("3", "35", Row("300", "c")), Row("4", "40", Row("400", "d")), Row("5", "45", Row("500", "e")) ) val df = spark.createDataFrame( spark.sparkContext.parallelize(data), schema ) df.write.format("parquet") .option("parquet.bloom.filter.enabled#a", "true") .option("parquet.bloom.filter.enabled#nested.c", "true") .save("bloom_parquet") ``` Check with `parquet-cli` ```bash $ github parquet bloom-filter bloom_parquet/part-00002-9fac4c38-7113-45df-8db9-d96c3f6b6a8e-c000.snappy.parquet -c a -v "1" Row group 0: -------------------------------------------------------------------------------- value 1 maybe exists. $ github parquet bloom-filter bloom_parquet/part-00002-9fac4c38-7113-45df-8db9-d96c3f6b6a8e-c000.snappy.parquet -c nested.c -v "1" Row group 0: -------------------------------------------------------------------------------- value 1 NOT exists. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org