wypoon commented on code in PR #12482: URL: https://github.com/apache/iceberg/pull/12482#discussion_r1986150146
########## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkScan.java: ########## @@ -279,12 +268,50 @@ public void testTableWithOneColStats() throws NoSuchTableException { table.updateStatistics().setStatistics(statisticsFile).commit(); - checkColStatisticsNotReported(scan, 4L); - withSQLConf(reportColStatsDisabled, () -> checkColStatisticsNotReported(scan, 4L)); + List<SimpleRecord> newRecords = + Lists.newArrayList(new SimpleRecord(5, "a"), new SimpleRecord(6, "b")); + spark + .createDataset(newRecords, Encoders.bean(SimpleRecord.class)) + .coalesce(1) + .writeTo(tableName) + .append(); - Map<String, Long> expectedOneNDV = Maps.newHashMap(); - expectedOneNDV.put("id", 4L); - withSQLConf(reportColStatsEnabled, () -> checkColStatisticsReported(scan, 4L, expectedOneNDV)); + table.refresh(); + long snapshotId2 = table.currentSnapshot().snapshotId(); + + GenericStatisticsFile statisticsFile2 = + new GenericStatisticsFile( + snapshotId2, + "/test/statistics/file2.puffin", + 100, + 42, + ImmutableList.of( + new GenericBlobMetadata( + APACHE_DATASKETCHES_THETA_V1, + snapshotId2, + 2, + ImmutableList.of(1), + ImmutableMap.of("ndv", "6")))); + + table.updateStatistics().setStatistics(statisticsFile2).commit(); + + SparkScanBuilder scanBuilder = + new SparkScanBuilder(spark, table, CaseInsensitiveStringMap.empty()); + SparkScan scan = (SparkScan) scanBuilder.build(); + + Map<String, String> reportColStatsDisabled = + ImmutableMap.of( + SQLConf.CBO_ENABLED().key(), "true", SparkSQLProperties.REPORT_COLUMN_STATS, "false"); + + checkColStatisticsNotReported(scan, 6L); + withSQLConf(reportColStatsDisabled, () -> checkColStatisticsNotReported(scan, 6L)); + + Map<String, String> reportColStatsEnabled = + ImmutableMap.of(SQLConf.CBO_ENABLED().key(), "true"); + + Map<String, Long> expectedNDV = Maps.newHashMap(); + expectedNDV.put("id", 6L); + withSQLConf(reportColStatsEnabled, () -> checkColStatisticsReported(scan, 6L, expectedNDV)); Review Comment: The test is parameterized with three parameters. When run on its own without the fix, from one to three cases will fail. The reason some of the time the correct `StatisticsFile` appears as the first in the `List` is that when `TableMetadata` is built (https://github.com/apache/iceberg/blob/9a8466c95c323eba7c427cafb44a6c91a81d4bf4/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1602), the `List` is built from a `Map` and the order of the entries depend on the hashing of the snapshotId (which is random). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org