cowwoc commented on code in PR #394:
URL: 
https://github.com/apache/maven-build-cache-extension/pull/394#discussion_r2423321701


##########
src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java:
##########
@@ -498,25 +498,23 @@ public void save(
         final MavenProject project = context.getProject();
         final MavenSession session = context.getSession();
         try {
+            attachedResourcesPathsById.clear();
+            attachedResourceCounter = 0;
+
             final HashFactory hashFactory = cacheConfig.getHashFactory();
+            final HashAlgorithm algorithm = hashFactory.createAlgorithm();
             final org.apache.maven.artifact.Artifact projectArtifact = 
project.getArtifact();
-            final List<org.apache.maven.artifact.Artifact> attachedArtifacts;
-            final List<Artifact> attachedArtifactDtos;
-            final Artifact projectArtifactDto;
-            if (project.hasLifecyclePhase("package")) {
-                final HashAlgorithm algorithm = hashFactory.createAlgorithm();
-                attachGeneratedSources(project);
-                attachOutputs(project);
-                attachedArtifacts = project.getAttachedArtifacts() != null
-                        ? project.getAttachedArtifacts()
-                        : Collections.emptyList();
-                attachedArtifactDtos = artifactDtos(attachedArtifacts, 
algorithm, project);
-                projectArtifactDto = artifactDto(project.getArtifact(), 
algorithm, project);
-            } else {
-                attachedArtifacts = Collections.emptyList();
-                attachedArtifactDtos = new ArrayList<>();
-                projectArtifactDto = null;
-            }
+            final boolean hasPackagePhase = 
project.hasLifecyclePhase("package");
+
+            attachGeneratedSources(project);
+            attachOutputs(project);

Review Comment:
   Ah! That makes much more sense. Thank you for clarifying why this logic was 
conditioned no the `package` phase.
   
   To tackle this problem, I've combined timestamp-based filtering with 
per-project thread-safe isolation to prevent both the branch-switching scenario 
and race conditions in multi-threaded builds.
   
   ## How It Works
   
   ### 1. Timestamp Filtering
   
   The cache now only captures outputs that were either:
   - **Modified during the current build** (timestamp >= buildStartTime), OR
   - **Restored from cache during this build** (tracked explicitly)
   
   This elegantly handles your branch-switching scenario:
   
   1. mvn compile on Branch A at 10:00:00 → target/classes modified at 10:00:01
   2. git checkout branch-b
   3. mvn process-resources on Branch B starts at 10:01:00
   4. Check target/classes: lastModified (10:00:01) < buildStartTime (10:01:00)
   AND not restored this build → SKIPPED ✓
   
   ### 2. Handling Cache Hits
   
   Your question about `mvn package` with a compile cache hit is crucial. The 
solution tracks restored outputs:
   
   1. First build: mvn compile at 10:00:00
     - Compiles classes, caches them ✓
   2. Second build: mvn package at 10:05:00
     - Compile phase: Cache hit, restores to target/classes
     - Restoration tracked: classifier added to state.restoredOutputClassifiers
     - Package phase save() checks:
         - Fresh package outputs: timestamp >= 10:05:00 → include ✓
       - Restored compile outputs: in restoredOutputClassifiers → include ✓
       - Stale old outputs: timestamp < 10:05:00 AND not restored → exclude ✓
   
   ### 3. Thread Safety for Multi-Threaded Builds
   
   Per your comment about multi-threaded builds, I've also fixed the existing 
thread safety issues:
   
   **Problem:** `CacheControllerImpl` is `@SessionScoped` (one instance shared 
across all modules). The original code used:
   ```java
   private final Map<String, Path> attachedResourcesPathsById = new HashMap<>();
   private int attachedResourceCounter = 0;
   ```
   
   With `mvn -T 4`, calling `clear()` in one thread would affect other threads' 
modules.
   
   Solution: Per-project isolation using ConcurrentHashMap:
   ```java
   private static class ProjectCacheState {
       final Map<String, Path> attachedResourcesPathsById = new HashMap<>();
       int attachedResourceCounter = 0;
       final Set<String> restoredOutputClassifiers = new HashSet<>();
   }
   
   private final ConcurrentMap<String, ProjectCacheState> projectStates = new 
ConcurrentHashMap<>();
   ```
   Each module gets isolated state. Cleanup happens per-project in `save()`'s 
finally block.
   
   Implementation Details
   
   In `CacheControllerImpl.java`:
   
   1. Capture build start time:
   `final long buildStartTime = session.getRequest().getStartTime().getTime();`
   2. Track restored outputs in restoreProjectArtifacts():
   `state.restoredOutputClassifiers.add(attachedArtifactInfo.getClassifier());`
   3. Check timestamps in `attachDirIfNotEmpty()`:
   ```java
   long lastModified = Files.getLastModifiedTime(candidateSubDir).toMillis();
   boolean isRestoredThisBuild = 
state.restoredOutputClassifiers.contains(classifier);
   
   if (lastModified < buildStartTime && !isRestoredThisBuild) {
       // Skip stale outputs
       return;
   }
   ```
   
   Testing
   
   All existing tests pass, and the approach has been validated through:
   - Compilation verification
   - Full test suite execution
   - Thread safety analysis of concurrent access patterns
   
   Let me know if you have any concerns about this approach or if you'd like me 
to add additional safeguards or test cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to