[ 
https://issues.apache.org/jira/browse/MNG-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564491#comment-17564491
 ] 

Xiong Luyao edited comment on MNG-7509 at 7/17/22 1:08 PM:
-----------------------------------------------------------

The reason is, when maven resolving the relationship, for each library it will 
create instances for the dependencies in dependencyManagement. So for those 
dependencies in parent pom, they will be created many times even they are 
totally same. In the worst cases, If a project introduce 3000 libraries and 
each library introduce the same parent pom which managed 3000 libraries 
version. There will be 9 million Dependency instance created. But in fact, only 
3000 Dependency instances are necessary for this case.
 
I think we can resolve it by adding a cache map for the dependencies. If the 
dependency instance is already in the cache map, it will get instance from the 
map rather than creating a new one.  

Fortunately, we needn't worry about the case that different model refer to the 
same dependency instance, and it will impact others if one model change the 
dependency.

Since Dependency (including DefaultArtifact in it) is already designed as 
immutable, which means, if someone changes the value in somewhere, it won’t 
change the existing instance, but return a totally new one. 

 

Here is the PR:
 * [https://github.com/apache/maven/pull/764]
 * [https://github.com/apache/maven/pull/768] 

Currently, I just make PR for 3.8.x and 3.9.x, but I think it can be applied to 
other version as well if the solution is OK


was (Author: JIRAUSER292130):
The reason is, when maven resolving the relationship, for each library it will 
create instances for the dependencies in dependencyManagement. So for those 
dependencies in parent pom, they will be created many times even they are 
totally same. In the worst cases, If a project introduce 3000 libraries and 
each library introduce the same parent pom which managed 3000 libraries 
version. There will be 9 million Dependency instance created. But in fact, only 
3000 Dependency instances are necessary for this case.
 
I think we can resolve it by adding a cache map for the dependencies. If the 
dependency instance is already in the cache map, it will get instance from the 
map rather than creating a new one.  

Fortunately, we needn't worry about the case that different model refer to the 
same dependency instance, and it will impact others if one model change the 
dependency.

Since Dependency (including DefaultArtifact in it) is already designed as 
immutable, which means, if someone changes the value in somewhere, it won’t 
change the existing instance, but return a totally new one. 

 

Here is the PR: [https://github.com/apache/maven/pull/764]

Currently, I just make PR for 3.8.x, but I think it can be applied to other 
version as well if the solution is OK

> Huge memory cost when parent pom widely used in a big project for 
> dependencyManagement
> --------------------------------------------------------------------------------------
>
>                 Key: MNG-7509
>                 URL: https://issues.apache.org/jira/browse/MNG-7509
>             Project: Maven
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: Xiong Luyao
>            Priority: Major
>         Attachments: image-2022-07-09-09-37-53-823.png, 
> image-2022-07-09-09-38-26-354.png, image-2022-07-09-10-27-12-668.png, 
> image-2022-07-09-10-27-56-437.png, image-2022-07-09-10-28-05-706.png, 
> image-2022-07-09-10-28-22-864.png, image-2022-07-09-10-28-35-341.png, 
> image-2022-07-09-10-28-40-612.png, image-2022-07-09-10-29-04-045.png, 
> image-2022-07-09-10-29-15-822.png, image-2022-07-09-10-29-21-991.png, 
> image-2022-07-09-10-29-46-216.png, image-2022-07-09-10-29-51-456.png
>
>
> When maven try to resolve dependency relationship, it will create many 
> instances of dependency / artifact, even the dependency/artifact content is 
> totally same, but just in different pom models. It cost huge memory if there 
> is a parent pom with dependencyManagement which managed a lot of 
> dependencies, and this parent pom are implemented by many project libraries.
> (libraries_count * managedDependency_count) dependency instances will be 
> created. For example, if there are 3000 libraries, and all the library 
> introduce same parent pom which managed 3000 dependencies version. There will 
> be 3000*3000 =9,000,000 dependency instances will be created. But most of 
> them are same, in fact, we only need one instance for each dependency in 
> parent pom (3000 dependency instances).
>  
> I'm from eBay, and here is a real case in enterprise level project. We have 
> about 3000 business domain libraries, there are dependency relationship 
> between them. We need to build all libraries in one release to keep all the 
> libraries in same release are based on same code. So we used a parent pom as 
> a central management to manage all the version for a release, and introduced 
> by those libraries.  As below picture, when the release start, it will 
> calculate and start with the library which doesn't depend on others, then 
> start the library which dependency libraries are already built. Keep this 
> process until all libraries are built.
> With current maven resolve logic, it costs huge memory in above ways to built 
> libraries. And even the libraries have been released, if the project which 
> contains a lot of above libraries, it also cost huge memory when building 
> project.
> So current now, we have to specify version in each library pom files instead 
> of using parent pom. We think we can make some enhancement for this case.
>  
> !image-2022-07-09-09-37-53-823.png|width=493,height=226!
>  
> Here is a thread dump when building a real project which depends on about 
> 1000 above libraries. The top 5 objects are all related to 
> org.eclipse.aether.graph.Dependency.
> !image-2022-07-09-09-38-26-354.png|width=510,height=199!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to