[ 
https://issues.apache.org/jira/browse/MRESOLVER-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450266#comment-17450266
 ] 

wei cai edited comment on MRESOLVER-228 at 11/29/21, 9:46 AM:
--------------------------------------------------------------

[~michael-o] 

Regarding MNG-6357, I think this is a different issue.  MNG-6357 is about the 
dependency order when generating classpath. This Jira is about skipping 
resolving a node with higher depth if a node with same GAV has been resolved at 
a lower depth. MNG-6357  could be probably be resolved by leveraging a BFS 
solution.

 

Question: You have added a boolean property, does this mean that this solution 
is an opt-out? If yes, why? Why would one want to disable this if the tree does 
not change and it is much faster?

Just for compatibility consideration. As I'm confident with the "skip & 
reconcile" approach as we've dryrun 2000+ applications in our company, so I 
would raise both hands in favour of that we don't provide a property to disable 
this behavior.

 

Question:  If you visit a node at a lower level and if it has been previously 
resolved at least on the same or higher level and you know the tree is going to 
be identical because version and exclusions are the same you will skip it?

If version differs, it won't be skipped. Different versions are considered as 
version conflicts and will be always resolved.

If version are the same and exclusions differs, still it will be skipped as the 
node with higher depth is mostly won't be picked. 

*Most likely* a node at a higher level won't be picked up by maven as maven 
employs a "nearest transitive dependency in the tree depth and the first in 
resolution" strategy.
I mean most likely here, however this is not always true in one case: version 
conflicts in parent nodes and one of the parent node is the conflict loser, 
this is why I need "reconcile/fix" later. 

The strategy is like we should skip as much as possible, and then reconcile the 
least nodes that should be reconciled.

Below is the message printed from one of our app.
{code:java}
Skipped resolving 31459 nodes, and reconciled 8 nodes to solve 71 dependency 
conflicts. {code}
*Skip:*

A > {color:#4c9aff}*B* {color}> C
   > D(excl E) -> {color:#de350b}*B*{color} -> C

The red {color:#de350b}*B*{color} would be skipped as above blue 
*{color:#4c9aff}B{color}* is with lower depth, even the exclusion is different, 
we skip resolving {color:#de350b}*B*{color} as {color:#de350b}*B*{color} is 
most likely won't be picked up by maven.  As a skip, we simply set 
{color:#de350b}*B*{color}'s children with empty, then record 
{color:#de350b}*B*{color} is skipped by the {color:#4c9aff}*B*{color} with path 
(A>{*}{color:#4c9aff}B{color}{*}). Originally if exclusions up the tree 
(exclusions can be inherited) are different, maven would resolve 
*{color:#de350b}B{color}* again, this means both *{color:#de350b}B{color}* and 
{color:#4c9aff}*B*{color} will be resolved by maven. And now we only resolve 
the blue {color:#4c9aff}*B*{color}.

The skip of *{color:#de350b}B{color}* is safe as maven won't pick up 
{color:#de350b}*B*{color} at all, this explains why the resolution could be 
much faster in this way because we skipped calculating many nodes of such 
cases. 

*Reconcile:*

A -> B -> D:2.0 -> *{color:#4c9aff}E{color}* ->F
   -> C -> G -> H -> {color:#de350b}*E*{color} -> F ===> this 
*{color:#de350b}E{color}* would be skipped as above {color:#4c9aff}*E*{color} 
is at lower depth.
   -> D:1.0 -> G    ===> D1.0 is with lower depth, D:2.0 is the conflict loser, 
this means *{color:#4c9aff}E in the 1st tree path{color}* is no longer invalid 
as D2.0 is not picked up by maven, however *{color:#de350b}E{color} 
{color:#de350b}in 2nd tree path{color}* is skipped by the *{color:#4c9aff}E in 
the 1st tree path.{color}* {color:#172b4d}Thus we need to reconcile/fix 
the{color} *{color:#de350b}E{color} {color:#de350b}in 2nd tree path{color}* as 
this {color:#de350b}*E*{color} would be the winner.

 


was (Author: wecai):
[~michael-o] 

Regarding MNG-6357, I think this is a different issue.  MNG-6357 is about the 
dependency order when generating classpath. This Jira is about skipping 
resolving a node with higher depth if a node with same GAV has been resolved at 
a lower depth. MNG-6357  could be probably be resolved by leveraging a BFS 
solution.

 

Question: You have added a boolean property, does this mean that this solution 
is an opt-out? If yes, why? Why would one want to disable this if the tree does 
not change and it is much faster?

Just for compatibility consideration. As I'm confident with the "skip & 
reconcile" approach as we've dryrun 2000+ applications in our company, so I 
would raise both hands in favour of that we don't provide a property to disable 
this behavior.

 

Question:  If you visit a node at a lower level and if it has been previously 
resolved at least on the same or higher level and you know the tree is going to 
be identical because version and exclusions are the same you will skip it?

If version differs, it won't be skipped. Different versions are considered as 
version conflicts.
If version are the same and exclusions differs, still it will be skipped.

*Most likely* a node at a higher level won't be picked up by maven as maven 
employs a "{*}nearest{*} transitive dependency in the tree depth and the 
*first* in resolution" strategy.
I mean most likely here, however this is not always true in one case: version 
conflicts in parent nodes, this is why I need "reconcile/fix" later. 

*Skip:*

A > {color:#4c9aff}*B* {color}> C
   > D(excl E) -> {color:#de350b}*B*{color} -> C

The red {color:#de350b}*B*{color} would be skipped as above blue 
*{color:#4c9aff}B{color}* is with lower depth, even the exclusion is different, 
we skip resolving {color:#de350b}*B*{color} as {color:#de350b}*B*{color} is 
most likely won't be picked up by maven.  As a skip, we simply set 
{color:#de350b}*B*{color}'s children with empty, then record 
{color:#de350b}*B*{color} is skipped by the {color:#4c9aff}*B*{color} with path 
(A>{*}{color:#4c9aff}B{color}{*}). Originally if exclusions up the tree 
(exclusions can be inherited) are different, maven would resolve 
*{color:#de350b}B{color}* again, this means both *{color:#de350b}B{color}* and 
{color:#4c9aff}*B*{color} will be resolved by maven. And now we only resolve 
the blue {color:#4c9aff}*B*{color}.

The skip of *{color:#de350b}B{color}* is safe as maven won't pick up 
{color:#de350b}*B*{color} at all, this explains why the resolution could be 
much faster in this way because we skipped calculating many nodes of such 
cases. 

*Reconcile:*

A -> B -> D:2.0 -> *{color:#4c9aff}E{color}* ->F
   -> C -> G -> H -> {color:#de350b}*E*{color} -> F ===> this 
*{color:#de350b}E{color}* would be skipped as above {color:#4c9aff}*E*{color} 
is at lower depth.
   -> D:1.0 -> G    ===> D1.0 is with lower depth, D:2.0 is the conflict loser, 
this means *{color:#4c9aff}E in the 1st tree path{color}* is no longer invalid 
as D2.0 is not picked up by maven, however *{color:#de350b}E{color} 
{color:#de350b}in 2nd tree path{color}* is skipped by the *{color:#4c9aff}E in 
the 1st tree path.{color}* {color:#172b4d}Thus we need to reconcile/fix 
the{color} *{color:#de350b}E{color} {color:#de350b}in 2nd tree path{color}* as 
this {color:#de350b}*E*{color} would be the winner.

 

> Improve the maven dependency resolution speed by a skip & reconcile approach
> ----------------------------------------------------------------------------
>
>                 Key: MRESOLVER-228
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-228
>             Project: Maven Resolver
>          Issue Type: Improvement
>          Components: Resolver
>    Affects Versions: 1.7.2
>            Reporter: wei cai
>            Priority: Major
>         Attachments: Screen Shot 2021-11-27 at 12.58.26 PM.png, Screen Shot 
> 2021-11-27 at 12.58.59 PM.png, Screen Shot 2021-11-27 at 12.59.32 PM.png
>
>
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}.
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 
> !Screen Shot 2021-11-27 at 12.58.26 PM.png!
> From above figure, we know:
>  * In 1st case, D will be resolved only once as there are no exclusions/same 
> exclusions up the tree.
>  * In 2nd case, the B and C have different exclusions and D needs to be 
> recalculated, if D is a heavy dependency which introduce many transitive 
> dependencies, all D and its children needs to be recalculated.  Recalculating 
> all of these nodes introduces 2 issues:
>  ** Slow in resolving dependencies.
>  ** Lots of DependencyNodes cached (all calculated/recalculated nodes would 
> be cached) and will consume huge memory.
> To improve the speed of maven resolver's dependency resolution,  I 
> implemented a skip & reconcile approach. Here is the *skip* part.
> !Screen Shot 2021-11-27 at 12.58.59 PM.png!
> From above figure, the 1st R is resolved at depth 3, and the 2nd R is 
> resolved again because the depth is at 2 which is lower, the 3rd R at depth 3 
> and the 4th R at depth 4 are simply skipped as R is already resolved at depth 
> 2. This is because the same node with deeper depth is most likely won't be 
> picked up by maven as maven employs a "{*}nearest{*} transitive dependency in 
> the tree depth and the *first* in resolution" strategy.
> The 3rd R and 4th R will have children set as zero and marked as skipped by 
> the R at depth 2 in 2nd tree path.
>  
> Here is the *reconcile* part:
> !Screen Shot 2021-11-27 at 12.59.32 PM.png!
> When there are dependency conflicts, some of the skipped nodes need to be 
> reconciled.
> In above figure, there are 4 tree paths.
>  * The D1 (D with version 1) in the 1st tree path is get resolved, children 
> of E and R at depth 3 are resolved and cached.
>  * In the 2nd tree path, when resolving E & R of H, we simply skip these 2 
> nodes as they are in deeper depth (depth: 4) than the E & R in 1st tree path.
>  * In the 3rd tree path, a R node with lower path is resolved, and a E node 
> at depth 5 is skipped.
>  * In the 4th path, a D2 (D with version 2) node is resolved, as the depth is 
> lower than D1, so maven will pick D2, this means the E & R's children cached 
> in tree depth 1 should be {*}discarded{*}. 
> Thus we might need to reconcile the E & R nodes in 2nd, 3rd and 4th tree 
> paths. Here only E in 2nd tree path needs to be reconciled. This is because:
>  * R in 3rd tree path won't be picked up as there is already a R in 2nd tree 
> path with a lower depth.
>  * E in 3rd tree path won't be picked up as it is enough to reconcile the E 
> in 2nd tree path as the E in 2nd tree path is deeper than E in 3rd tree path.
> Here is what we've updated in the maven-resolver logic:
>  * Resolve dependencies by leveraging a skip approach. The node in deeper 
> depth will be skipped if a node with same GAV has been resolved with a lower 
> depth.
>  * Use maven's ConflictResolver (Transformer) to find out the conflict 
> winners. Figure out the node that conflict with the winner. Ex, g:a:D1 
> conflicts with g:a:D2 in above case.
>  * Find out all skipped nodes that is getting affected with D1 as D1 is the 
> loser and D2 is the winner.
>  * Reconcile all skipped nodes in above step, for nodes with same GAVs, only 
> the node with the lowest path will be reconciled.
>  
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build 
> time reduced for different projects depend on how complex the dependencies 
> are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain 
> the same.
> We've verified the resolver performance patch leveraging an automation 
> solution to certify 2000+ apps of our company by comparing the  *mvn 
> dependency:tree* and *mvn dependency:list* result with/without the 
> performance patch.
> Please help review the PR.
> [https://github.com/apache/maven-resolver/pull/136]
>  
> Another approach comes to my mind is we could modify the ConflictResolver to 
> make it reconcile the nodes, the logic would be:
>  * Resolve dependencies by leveraging a skip approach. The node in deeper 
> depth will be skipped if a node with same GAV has been resolved in a lower 
> depth.
>  * Modify the [ConflictResolver|#L183],] when the ConflictResolves determines 
> the winner and find it has been skipped, it should do the reconcile 
> immediately.
> The question here is maven-resolver-impl relies on the maven-resolver-util 
> where the ConflictResolver resides, and in the ConflictResolver, it again 
> need do the reconcile which agains depends on maven-resolver-impl. This is 
> some sort of cyclic dependency. Probably we should go this way:
> The DefaultDependencyCollector should pass a IDependencyReconciler to 
> ConflictResolver, the ConflictResolver can then do the reconcile when it 
> considers it is mandatory. Please share your expertise on how to make the 
> code clean, tidy and maintainable. It would be my great honor to refine the 
> PR until it meets your accept criteria.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to