wypoon commented on code in PR #12482:
URL: https://github.com/apache/iceberg/pull/12482#discussion_r1986150146
##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkScan.java:
##########
@@ -279,12 +268,50 @@ public void testTableWithOneColStats() throws
NoSuchTableException {
table.updateStatistics().setStatistics(statisticsFile).commit();
- checkColStatisticsNotReported(scan, 4L);
- withSQLConf(reportColStatsDisabled, () ->
checkColStatisticsNotReported(scan, 4L));
+ List<SimpleRecord> newRecords =
+ Lists.newArrayList(new SimpleRecord(5, "a"), new SimpleRecord(6, "b"));
+ spark
+ .createDataset(newRecords, Encoders.bean(SimpleRecord.class))
+ .coalesce(1)
+ .writeTo(tableName)
+ .append();
- Map<String, Long> expectedOneNDV = Maps.newHashMap();
- expectedOneNDV.put("id", 4L);
- withSQLConf(reportColStatsEnabled, () -> checkColStatisticsReported(scan,
4L, expectedOneNDV));
+ table.refresh();
+ long snapshotId2 = table.currentSnapshot().snapshotId();
+
+ GenericStatisticsFile statisticsFile2 =
+ new GenericStatisticsFile(
+ snapshotId2,
+ "/test/statistics/file2.puffin",
+ 100,
+ 42,
+ ImmutableList.of(
+ new GenericBlobMetadata(
+ APACHE_DATASKETCHES_THETA_V1,
+ snapshotId2,
+ 2,
+ ImmutableList.of(1),
+ ImmutableMap.of("ndv", "6"))));
+
+ table.updateStatistics().setStatistics(statisticsFile2).commit();
+
+ SparkScanBuilder scanBuilder =
+ new SparkScanBuilder(spark, table, CaseInsensitiveStringMap.empty());
+ SparkScan scan = (SparkScan) scanBuilder.build();
+
+ Map<String, String> reportColStatsDisabled =
+ ImmutableMap.of(
+ SQLConf.CBO_ENABLED().key(), "true",
SparkSQLProperties.REPORT_COLUMN_STATS, "false");
+
+ checkColStatisticsNotReported(scan, 6L);
+ withSQLConf(reportColStatsDisabled, () ->
checkColStatisticsNotReported(scan, 6L));
+
+ Map<String, String> reportColStatsEnabled =
+ ImmutableMap.of(SQLConf.CBO_ENABLED().key(), "true");
+
+ Map<String, Long> expectedNDV = Maps.newHashMap();
+ expectedNDV.put("id", 6L);
+ withSQLConf(reportColStatsEnabled, () -> checkColStatisticsReported(scan,
6L, expectedNDV));
Review Comment:
The test is parameterized with three parameters. When run on its own without
the fix, from one to three cases will fail. The reason some of the time the
correct `StatisticsFile` appears as the first in the `List` is that when
`TableMetadata` is built
(https://github.com/apache/iceberg/blob/9a8466c95c323eba7c427cafb44a6c91a81d4bf4/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1602),
the `List` is built from a `Map` and the order of the entries depend on the
hashing of the snapshotId (which is random).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]