pvary commented on code in PR #12771:
URL: https://github.com/apache/iceberg/pull/12771#discussion_r2039250002
##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetWriter.java:
##########
@@ -151,4 +152,27 @@ public void testFpp() throws IOException,
NoSuchFieldException, IllegalAccessExc
assertThat(fpp).isEqualTo(0.05);
}
}
+
+ @Test
+ public void testColumnStatsEnabled()
+ throws IOException, NoSuchFieldException, IllegalAccessException {
+ File testFile = File.createTempFile("junit", null, temp.toFile());
+ try (FileAppender<InternalRow> writer =
+ Parquet.write(Files.localOutput(testFile))
+ .schema(SCHEMA)
+ .set(PARQUET_COLUMN_STATS_ENABLED_PREFIX + "id_long", "false")
+ .createWriterFunc(
+ msgType ->
+
SparkParquetWriters.buildWriter(SparkSchemaUtil.convert(SCHEMA), msgType))
+ .build()) {
+ // Using reflection to access the private 'props' field in ParquetWriter
+ Field propsField = writer.getClass().getDeclaredField("props");
+ propsField.setAccessible(true);
+ ParquetProperties props = (ParquetProperties) propsField.get(writer);
+ MessageType parquetSchema = ParquetSchemaUtil.convert(SCHEMA, "test");
+ ColumnDescriptor idlDescriptor = parquetSchema.getColumnDescription(new
String[] {"id_long"});
+ // Default statisticsEnabled should be true and for column id_long, it
is disabled.
+ assertThat(props.getStatisticsEnabled(idlDescriptor)).isEqualTo(false);
+ }
+ }
Review Comment:
Why is this test in Spark?
The test is testing a Parquet writer feature. I think it should be in
TestParquet, or somewhere near that. In this case we don't need reflection to
test the setting. Package-private method, and VisibleForTesting annotation
would be enough.
Also it would be good to test the actual effect on the Parquet files if that
is possible.
I see that the previous test for bloom filter is done here, but that means
that should have done there too. Maybe move it in a different PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]