huaxingao commented on code in PR #10149:
URL: https://github.com/apache/iceberg/pull/10149#discussion_r1597722756


##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetWriter.java:
##########
@@ -116,4 +128,27 @@ public void testCorrectness() throws IOException {
       assertThat(rows).as("Should not have extra rows").isExhausted();
     }
   }
+
+  @Test
+  public void testFpp() throws IOException, NoSuchFieldException, 
IllegalAccessException {
+    File testFile = File.createTempFile("junit", null, temp.toFile());
+    try (FileAppender<InternalRow> writer =
+        Parquet.write(Files.localOutput(testFile))
+            .schema(SCHEMA)
+            .set(PARQUET_BLOOM_FILTER_COLUMN_ENABLED_PREFIX + "id", "true")
+            .set(PARQUET_BLOOM_FILTER_COLUMN_FPP_PREFIX + "id", "0.05")
+            .createWriterFunc(
+                msgType ->
+                    
SparkParquetWriters.buildWriter(SparkSchemaUtil.convert(SCHEMA), msgType))
+            .build()) {
+      // Using reflection to access the private 'props' field in ParquetWriter
+      Field propsField = writer.getClass().getDeclaredField("props");
+      propsField.setAccessible(true);
+      ParquetProperties props = (ParquetProperties) propsField.get(writer);
+      MessageType parquetSchema = ParquetSchemaUtil.convert(SCHEMA, "test");
+      ColumnDescriptor descriptor = parquetSchema.getColumnDescription(new 
String[] {"id"});
+      double fpp = props.getBloomFilterFPP(descriptor).getAsDouble();
+      assertThat(fpp).isEqualTo(0.05);

Review Comment:
   parquet-mr takes the `bloomFilterFPPs` in `ParquetProperties` and uses it to 
build the bloom filter. Checking `bloomFilterFPPs` in `ParquetProperties` is 
sufficient for verifying bloom filter fpp is set correctly in Iceberg.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to