[ https://issues.apache.org/jira/browse/MRESOLVER-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332767#comment-17332767 ]
Michael Osipov edited comment on MRESOLVER-153 at 4/26/21, 10:00 PM: --------------------------------------------------------------------- So, first of all, I made a mistake. My analysis for removing the synchronization was not good enough. I need to apologize for that. The purpose of the change was to remove synchronization because it is done on the outside ({{SyncContext}}) already and analyzing the hierarchy from above it truly is except one small case: {{o.e.a.internal.impl.DefaultMetadataResolver.ResolveTask.run()}}. So what is causing this race condition? Resolver wants to resolve (download) {{maven-metadata.xml}} from all known repositories for the *same* {{G(A(V)?)?}}. Each file instance has its distinct name on the file system (no race conditions possible), *but* some management data is written to {{resolver-status.properties}} next (same level) to these metadata. Since {{executor.execute( errorForwarder.wrap( task ) )}} is running in parallel multiple threads write to the same file concurrently. The {{SyncContext}} does not apply here and it causes the race condition you see. The read fails because the previous writes have corrupted the file. There are several entry points for {{DefaultMetadataResolver.resolveMetadata()}}. I can reproduce the issue with the IT case. [~cstamas], it'd be nice if you could confirm my observations. While we need to discuss a proper solution, I consider this behavior ugly because it completely breaks our sync idea. I think there are two options: * Add named sync into {{ResolverTask}} * Inline code of {{ResolverTask}}, use {{RepositoryConnector}} directly, and record stuff in serial to avoid race conditions. Looking at {{ResolverTask}} is runs off a parallel executor which hands off to a parallel executor again. I don't think that there is huge gain in that. Please comment. was (Author: michael-o): So, first of all, I made a mistake. My analysis for removing the synchronization was not good enough. I need to apologize for that. The purpose of the change was to remove synchronization because it is done on the outside ({{SyncContext}}) already and analyzing the hierarchy from above it truly is except one small case: {{o.e.a.internal.impl.DefaultMetadataResolver.ResolveTask.run()}}. So what is causing this race condition? Resolver wants to resolve (download) {{maven-metadata.xml}} from all known repositories for the *same* {{G(A(V)?)?}}. Each file instance has its distinct name on the file system (no race conditions possible), *but* some management data is written to {{resolver-status.properties}} next (same level) to these metadata. Since {{executor.execute( errorForwarder.wrap( task ) )}} is running in parallel multiple threads write to the same file concurrently. The {{SyncContext}} does not apply here and it causes the race condition you see. The read fails because the previous writes have corrupted the file. There are several entry points for {{DefaultMetadataResolver.resolveMetadata()}}. I can reproduce the issue with the IT case. [~cstamas], it'd be nice if you could confirm my observations. While we need to discuss a proper solution, I consider this behavior ugly because it completely breaks our sync idea. I think there are two options: * Add named sync into {{ResolverTask}} * Inline code of {{ResolverTask}}, use {{RepositoryConnector}} directly, and record stuff in serial to avoid race conditions. Looking at {{ResolverTask}} is runs off an parallel executor which hands off to a parallel executor again. I don't think that there is huge gain in that. Please comment. > resolver-status.properties file is corrupted due to concurrent writes > --------------------------------------------------------------------- > > Key: MRESOLVER-153 > URL: https://issues.apache.org/jira/browse/MRESOLVER-153 > Project: Maven Resolver > Issue Type: Bug > Components: Resolver > Affects Versions: 1.6.1 > Environment: OSX 11.1 on adoptopenjdk-11.0.8+10 > Reporter: Guy Brand > Assignee: Michael Osipov > Priority: Major > Fix For: waiting-for-feedback, 1.7.0 > > Attachments: screenshot-1.png > > > In our integration tests which run with Maven {{4.0.0-alpha-1-SNAPSHOT}} > after [this > commit|https://github.com/apache/maven/commit/7c7de41562a8d83635e8fa21c3a3340298b508a1], > we face the following error: > {code:java} > [main] [INFO] > ------------------------------------------------------------------------ > [main] [INFO] BUILD FAILURE > [main] [INFO] > ------------------------------------------------------------------------ > [main] [INFO] Total time: 0.243 s > [main] [INFO] Finished at: 2020-12-23T13:48:59+01:00 > [main] [INFO] > ------------------------------------------------------------------------ > [main] [ERROR] Malformed \uxxxx encoding. > java.lang.IllegalArgumentException: Malformed \uxxxx encoding. > at java.util.Properties.loadConvert (Properties.java:675) > at java.util.Properties.load0 (Properties.java:451) > at java.util.Properties.load (Properties.java:404) > at org.eclipse.aether.internal.impl.TrackingFileManager.read > (TrackingFileManager.java:56) > at org.eclipse.aether.internal.impl.DefaultUpdateCheckManager.read > (DefaultUpdateCheckManager.java:511) > at org.eclipse.aether.internal.impl.DefaultUpdateCheckManager.checkMetadata > (DefaultUpdateCheckManager.java:250) > at org.eclipse.aether.internal.impl.DefaultMetadataResolver.resolve > (DefaultMetadataResolver.java:302) > at org.eclipse.aether.internal.impl.DefaultMetadataResolver.resolveMetadata > (DefaultMetadataResolver.java:181) > at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveMetadata > (DefaultRepositorySystem.java:277) > at > org.apache.maven.plugin.version.internal.DefaultPluginVersionResolver.resolveFromRepository > (DefaultPluginVersionResolver.java:134) > at > org.apache.maven.plugin.version.internal.DefaultPluginVersionResolver.resolve > (DefaultPluginVersionResolver.java:97) > at > org.apache.maven.lifecycle.internal.LifecyclePluginResolver.resolveMissingPluginVersions > (LifecyclePluginResolver.java:67) > at > org.apache.maven.lifecycle.internal.DefaultLifecycleTaskSegmentCalculator.calculateTaskSegments > (DefaultLifecycleTaskSegmentCalculator.java:104) > at > org.apache.maven.lifecycle.internal.DefaultLifecycleTaskSegmentCalculator.calculateTaskSegments > (DefaultLifecycleTaskSegmentCalculator.java:86) > at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:92) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:321) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:206) > at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:119) > at org.apache.maven.cli.MavenCli.execute (MavenCli.java:982) > at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:296) > at org.apache.maven.cli.MavenCli.main (MavenCli.java:200) > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:566) > at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:282) > at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:225) > at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:406) > at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347) > {code} > It's not consistently failing and OSX based CI agents fail more often than > Linux or Windows ones. After some investigations we saw that as part of > https://issues.apache.org/jira/browse/MRESOLVER-132 the synchronization has > been removed on the {{TrackingFileManager}} > ([https://github.com/apache/maven-resolver/pull/67]). This now leads to > concurrent writes on the {{resolver-status.properties}} file in our tests and > causes the error during the {{Properties#load()}} method wich then throws the > above error. See this screenshot on how these files look like (cannot add the > text here as {{null}} characters aren't shown): > !screenshot-1.png! > When enabling debug log we also see that the {{DefaultMetadataResolver}} > threads try to write concurrently to this file: > {code:java} > [DefaultMetadataResolver-0-0] [INFO] Downloading from custom-repo2: > https://repo.maven.apache.org/maven2/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/maven-metadata.xml > [DefaultMetadataResolver-0-2] [INFO] Downloading from central-proxy: > https://repo.maven.apache.org/maven2/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/maven-metadata.xml > [DefaultMetadataResolver-0-1] [INFO] Downloading from custom-repo: > https://repository.apache.org/snapshots/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/maven-metadata.xml > [DefaultMetadataResolver-0-1] [DEBUG] Writing tracking file > /.../core-it-suite/target/test-classes/mng-xxxx-concurrent-writes-to-resolver-status-file/repo/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/resolver-status.properties > [DefaultMetadataResolver-0-2] [DEBUG] Writing tracking file > /.../core-it-suite/target/test-classes/mng-xxxx-concurrent-writes-to-resolver-status-file/repo/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/resolver-status.properties > [DefaultMetadataResolver-0-0] [DEBUG] Writing tracking file > /.../core-it-suite/target/test-classes/mng-xxxx-concurrent-writes-to-resolver-status-file/repo/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/resolver-status.properties > {code} > We [created a Maven integration > test|https://github.com/guylabs/maven-integration-testing/commit/56edeb28895a6704ad28e76655ee792d19797a8c] > which reproduces the issue. The test is not always failing as in some runs > the {{null}} characters are normalized to {{\u0000}} Strings and then written > back to the {{resolver-status.properties}} file which then can be read > properly by the {{Properties#load()}} method. You can monitor the > {{core-it-suite/target/test-classes/mng-xxxx-concurrent-writes-to-resolver-status-file/repo/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/resolver-status.properties}} > file while running the test as there you'll see the corrupted updates. > Readding the thread synchronization to the {{TrackingFileManager}} would > solve this race condition, as file locking is indeed not required as stated > in MRESOLVER-132. > Can we get this change into a new Maven {{4.0.0-alpha-1-SNAPSHOT}} version to > be able to test the fix? -- This message was sent by Atlassian Jira (v8.3.4#803005)