pvary commented on code in PR #12771: URL: https://github.com/apache/iceberg/pull/12771#discussion_r2039250002
########## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetWriter.java: ########## @@ -151,4 +152,27 @@ public void testFpp() throws IOException, NoSuchFieldException, IllegalAccessExc assertThat(fpp).isEqualTo(0.05); } } + + @Test + public void testColumnStatsEnabled() + throws IOException, NoSuchFieldException, IllegalAccessException { + File testFile = File.createTempFile("junit", null, temp.toFile()); + try (FileAppender<InternalRow> writer = + Parquet.write(Files.localOutput(testFile)) + .schema(SCHEMA) + .set(PARQUET_COLUMN_STATS_ENABLED_PREFIX + "id_long", "false") + .createWriterFunc( + msgType -> + SparkParquetWriters.buildWriter(SparkSchemaUtil.convert(SCHEMA), msgType)) + .build()) { + // Using reflection to access the private 'props' field in ParquetWriter + Field propsField = writer.getClass().getDeclaredField("props"); + propsField.setAccessible(true); + ParquetProperties props = (ParquetProperties) propsField.get(writer); + MessageType parquetSchema = ParquetSchemaUtil.convert(SCHEMA, "test"); + ColumnDescriptor idlDescriptor = parquetSchema.getColumnDescription(new String[] {"id_long"}); + // Default statisticsEnabled should be true and for column id_long, it is disabled. + assertThat(props.getStatisticsEnabled(idlDescriptor)).isEqualTo(false); + } + } Review Comment: Why is this test in Spark? The test is testing a Parquet writer feature. I think it should be in TestParquet, or somewhere near that. In this case we don't need reflection to test the setting. Package-private method, and VisibleForTesting annotation would be enough. Also it would be good to test the actual effect on the Parquet files if that is possible. I see that the previous test for bloom filter is done here, but that means that should have done there too. Maybe move it in a different PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org