cowwoc commented on code in PR #394:
URL:
https://github.com/apache/maven-build-cache-extension/pull/394#discussion_r2423321701
##########
src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java:
##########
@@ -498,25 +498,23 @@ public void save(
final MavenProject project = context.getProject();
final MavenSession session = context.getSession();
try {
+ attachedResourcesPathsById.clear();
+ attachedResourceCounter = 0;
+
final HashFactory hashFactory = cacheConfig.getHashFactory();
+ final HashAlgorithm algorithm = hashFactory.createAlgorithm();
final org.apache.maven.artifact.Artifact projectArtifact =
project.getArtifact();
- final List<org.apache.maven.artifact.Artifact> attachedArtifacts;
- final List<Artifact> attachedArtifactDtos;
- final Artifact projectArtifactDto;
- if (project.hasLifecyclePhase("package")) {
- final HashAlgorithm algorithm = hashFactory.createAlgorithm();
- attachGeneratedSources(project);
- attachOutputs(project);
- attachedArtifacts = project.getAttachedArtifacts() != null
- ? project.getAttachedArtifacts()
- : Collections.emptyList();
- attachedArtifactDtos = artifactDtos(attachedArtifacts,
algorithm, project);
- projectArtifactDto = artifactDto(project.getArtifact(),
algorithm, project);
- } else {
- attachedArtifacts = Collections.emptyList();
- attachedArtifactDtos = new ArrayList<>();
- projectArtifactDto = null;
- }
+ final boolean hasPackagePhase =
project.hasLifecyclePhase("package");
+
+ attachGeneratedSources(project);
+ attachOutputs(project);
Review Comment:
Ah! That makes much more sense. Thank you for clarifying why this logic was
conditioned no the `package` phase.
To tackle this problem, I've combined timestamp-based filtering with
per-project thread-safe isolation to prevent both the branch-switching scenario
and race conditions in multi-threaded builds.
## How It Works
### 1. Timestamp Filtering
The cache now only captures outputs that were either:
- **Modified during the current build** (timestamp >= buildStartTime), OR
- **Restored from cache during this build** (tracked explicitly)
This elegantly handles your branch-switching scenario:
1. mvn compile on Branch A at 10:00:00 → target/classes modified at 10:00:01
2. git checkout branch-b
3. mvn process-resources on Branch B starts at 10:01:00
4. Check target/classes: lastModified (10:00:01) < buildStartTime (10:01:00)
AND not restored this build → SKIPPED ✓
### 2. Handling Cache Hits
Your question about `mvn package` with a compile cache hit is crucial. The
solution tracks restored outputs:
1. First build: mvn compile at 10:00:00
- Compiles classes, caches them ✓
2. Second build: mvn package at 10:05:00
- Compile phase: Cache hit, restores to target/classes
- Restoration tracked: classifier added to state.restoredOutputClassifiers
- Package phase save() checks:
- Fresh package outputs: timestamp >= 10:05:00 → include ✓
- Restored compile outputs: in restoredOutputClassifiers → include ✓
- Stale old outputs: timestamp < 10:05:00 AND not restored → exclude ✓
### 3. Thread Safety for Multi-Threaded Builds
Per your comment about multi-threaded builds, I've also fixed the existing
thread safety issues:
**Problem:** `CacheControllerImpl` is `@SessionScoped` (one instance shared
across all modules). The original code used:
```java
private final Map<String, Path> attachedResourcesPathsById = new HashMap<>();
private int attachedResourceCounter = 0;
```
With `mvn -T 4`, calling `clear()` in one thread would affect other threads'
modules.
Solution: Per-project isolation using ConcurrentHashMap:
```java
private static class ProjectCacheState {
final Map<String, Path> attachedResourcesPathsById = new HashMap<>();
int attachedResourceCounter = 0;
final Set<String> restoredOutputClassifiers = new HashSet<>();
}
private final ConcurrentMap<String, ProjectCacheState> projectStates = new
ConcurrentHashMap<>();
```
Each module gets isolated state. Cleanup happens per-project in `save()`'s
finally block.
Implementation Details
In `CacheControllerImpl.java`:
1. Capture build start time:
`final long buildStartTime = session.getRequest().getStartTime().getTime();`
2. Track restored outputs in restoreProjectArtifacts():
`state.restoredOutputClassifiers.add(attachedArtifactInfo.getClassifier());`
3. Check timestamps in `attachDirIfNotEmpty()`:
```java
long lastModified = Files.getLastModifiedTime(candidateSubDir).toMillis();
boolean isRestoredThisBuild =
state.restoredOutputClassifiers.contains(classifier);
if (lastModified < buildStartTime && !isRestoredThisBuild) {
// Skip stale outputs
return;
}
```
Testing
All existing tests pass, and the approach has been validated through:
- Compilation verification
- Full test suite execution
- Thread safety analysis of concurrent access patterns
Let me know if you have any concerns about this approach or if you'd like me
to add additional safeguards or test cases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]