[ 
https://issues.apache.org/jira/browse/MNG-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576860#comment-17576860
 ] 

ASF GitHub Bot commented on MNG-7509:
-------------------------------------

Yougoss commented on PR #768:
URL: https://github.com/apache/maven/pull/768#issuecomment-1208283097

   
   
   
   > The idea of maintaining pool of immutable objects is good, however the 
implementation proposed isn't: the JVM is free to cleanup dependencyCache as 
soon as execution leaves #toDependency method - there are no more references to 
depCacheKeykey. The author is observing some performance improvements just 
because his JVM has enough free memory to not trigger GC whilst maven builds 
dependency graph (give JVM more memory to save memory).
   
   Hi @andreybpanfilov, glad to see your comments, I think there may be some 
misunderstanding for this change.
   
   So first, I need to declare that the main change I want to make should be in 
ArtifactDescriptorReaderDelegate(maven-resolver-provider), but I noticed that 
there is same code in  RepositoryUtils(maven-core), maybe for some historic 
reason, so I just made a same change here to keep them consistent. 
   
   For the toDependency method in RepositoryUtils only collect 
dependencyManagement in project pom, as you side, the change here doesn't bring 
too much increasement.(Entrance: getDependencies:243, 
LifecycleDependencyResolver (org.apache.maven.lifecycle.internal))
   
   But for the convert method in ArtifactDescriptorReaderDelegate is totally 
different. It's used for calculating all the dependency relationship. It will 
travels all the dependencies in the project pom, and collect dependency 
manageMent for each dependency. 
   Image one case, there are two dependencies in the project. And in each 
dependency pom, there are two dependencyManagement, so there should be 4 
dependency instance will be created to save the information of the 
dependencyManagement.
   Image a case which is a little complicated, if the dependenyManagement in 
above two dependencise are one same parent pom, and there are 1000 dependencies 
managed in this parent pom. There would be 1000 * 2 = 2000 instance will be 
created for dependencyManagement, although there is only 1000 after remove 
duplicate ones.
   So for our real case in ebay project, there is about 3000 dependencies with 
a parent pom which managed all these 3000 dependencies version. So when 
building the project, there will be 3000 * 3000 = 9,000,000 instance will be 
created for dependencyManagement. But after removing duplicate, there are only 
3000 are necessary. That's the change here why I use a hashmap the cache all 
the dependencyManagement instance.
   
   And why I use a weakHashMap here is I want to make sure when the dependency 
instance are not refered by graph anymore it can be cleanup by GC, won't stay 
in memory since the reference from hashMap key.
   
   And for the Memory cost you mentioned, I have also shared a build comparison 
for our project with command 'export MAVEN_OPTS="-Xms1g -Xmx1g”' to limit the 
memory  in jira ticket. With the PR, it can build successfully, but with 
current maven code, it will throw OOM exception.
   
   Please let me know if I should share some fake project to simulate this case 
to make you more clear
   
    
   
   




> Huge memory cost when parent pom widely used in a big project for 
> dependencyManagement
> --------------------------------------------------------------------------------------
>
>                 Key: MNG-7509
>                 URL: https://issues.apache.org/jira/browse/MNG-7509
>             Project: Maven
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: Xiong Luyao
>            Priority: Critical
>             Fix For: 3.9.0-candidate, 4.0.x-candidate
>
>         Attachments: image-2022-07-09-09-37-53-823.png, 
> image-2022-07-09-09-38-26-354.png, image-2022-07-09-10-27-12-668.png, 
> image-2022-07-09-10-27-56-437.png, image-2022-07-09-10-28-05-706.png, 
> image-2022-07-09-10-28-22-864.png, image-2022-07-09-10-28-35-341.png, 
> image-2022-07-09-10-28-40-612.png, image-2022-07-09-10-29-04-045.png, 
> image-2022-07-09-10-29-15-822.png, image-2022-07-09-10-29-21-991.png, 
> image-2022-07-09-10-29-46-216.png, image-2022-07-09-10-29-51-456.png
>
>
> When maven try to resolve dependency relationship, it will create many 
> instances of dependency / artifact, even the dependency/artifact content is 
> totally same, but just in different pom models. It cost huge memory if there 
> is a parent pom with dependencyManagement which managed a lot of 
> dependencies, and this parent pom are implemented by many project libraries.
> (libraries_count * managedDependency_count) dependency instances will be 
> created. For example, if there are 3000 libraries, and all the library 
> introduce same parent pom which managed 3000 dependencies version. There will 
> be 3000*3000 =9,000,000 dependency instances will be created. But most of 
> them are same, in fact, we only need one instance for each dependency in 
> parent pom (3000 dependency instances).
>  
> I'm from eBay, and here is a real case in enterprise level project. We have 
> about 3000 business domain libraries, there are dependency relationship 
> between them. We need to build all libraries in one release to keep all the 
> libraries in same release are based on same code. So we used a parent pom as 
> a central management to manage all the version for a release, and introduced 
> by those libraries.  As below picture, when the release start, it will 
> calculate and start with the library which doesn't depend on others, then 
> start the library which dependency libraries are already built. Keep this 
> process until all libraries are built.
> With current maven resolve logic, it costs huge memory in above ways to built 
> libraries. And even the libraries have been released, if the project which 
> contains a lot of above libraries, it also cost huge memory when building 
> project.
> So current now, we have to specify version in each library pom files instead 
> of using parent pom. We think we can make some enhancement for this case.
>  
> !image-2022-07-09-09-37-53-823.png|width=493,height=226!
>  
> Here is a thread dump when building a real project which depends on about 
> 1000 above libraries. The top 5 objects are all related to 
> org.eclipse.aether.graph.Dependency.
> !image-2022-07-09-09-38-26-354.png|width=510,height=199!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to