Baunsgaard commented on code in PR #16740:
URL: https://github.com/apache/iceberg/pull/16740#discussion_r3389952253
##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##########
@@ -143,6 +150,15 @@ public class TestRewriteDataFilesAction extends TestBase {
@TempDir private File tableDir;
private static final int SCALE = 400000;
+ // Cache of pre-written input data files keyed by table shape
(schema/spec/props are
+ // fixed per key), so identical large inputs are materialized via Spark only
once per JVM
+ // fork and reused by every test that asks for the same shape. The Spark
write of SCALE
+ // rows dominates these tests; the rewrite under test still runs per test on
a fresh table.
+ @TempDir private static Path inputCacheDir;
+ private static final Map<String, List<DataFile>> INPUT_FILE_CACHE =
Maps.newConcurrentMap();
Review Comment:
Okay, accordingly added an `@AfterAll` that clears the cache + lock map and
resets the seq. `@AfterAll` runs once after all tests, so within-run cross-test
caching is unchanged; it only stops a second in-JVM run (IDE re-run) from
returning DataFiles pointing into the recreated `@TempDir`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]