[ https://issues.apache.org/jira/browse/MRESOLVER-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466825#comment-17466825 ]
wei cai commented on MRESOLVER-228: ----------------------------------- [~michael-o] [~ibabiankou] Cool! Have read the comments in [https://github.com/apache/maven-resolver/pull/142.] Looks like we now have a road map: * DFS -> BFS to make sure dependency tree no changes, might solve MRESOLVER-133 * BFS + Skip & Reconcile to make sure dependency tree no changes which is for MRESOLVER-228 * Download artifacts in parallel which is for MRESOLVER-7 I already get a BFS solution implementation, let me prepare the very first PR. > Improve the maven dependency resolution speed by a skip & reconcile approach > ---------------------------------------------------------------------------- > > Key: MRESOLVER-228 > URL: https://issues.apache.org/jira/browse/MRESOLVER-228 > Project: Maven Resolver > Issue Type: Improvement > Components: Resolver > Affects Versions: 1.7.2 > Reporter: wei cai > Priority: Major > Fix For: 1.8.0 > > Attachments: Screen Shot 2021-11-27 at 12.58.26 PM.png, Screen Shot > 2021-11-27 at 12.58.59 PM.png, Screen Shot 2021-11-27 at 12.59.32 PM.png > > > When comes to resolve the huge amount of dependencies of an enterprise level > project, the maven resolver is very slow to resolve the dependency > graph/tree. Take one of our app as example, it could take *10minutes+ and 16G > memory* to print out the result of {*}mvn dependency:tree{*}. > This is because there are many dependencies declared in the project, and some > of the dependencies would introduce *600+* transitive dependencies, and > exclusions are widely used to solve dependency conflicts. > By checking the > [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500], > we know the exclusion is also part of the cache key. This means when the > exclusions up the tree differs, the cached resolution result for the same GAV > won't be picked up and need s to be recalculated. > !Screen Shot 2021-11-27 at 12.58.26 PM.png! > From above figure, we know: > * In 1st case, D will be resolved only once as there are no exclusions/same > exclusions up the tree. > * In 2nd case, the B and C have different exclusions and D needs to be > recalculated, if D is a heavy dependency which introduce many transitive > dependencies, all D and its children needs to be recalculated. Recalculating > all of these nodes introduces 2 issues: > ** Slow in resolving dependencies. > ** Lots of DependencyNodes cached (all calculated/recalculated nodes would > be cached) and will consume huge memory. > To improve the speed of maven resolver's dependency resolution, I > implemented a skip & reconcile approach. Here is the *skip* part. > !Screen Shot 2021-11-27 at 12.58.59 PM.png! > From above figure, the 1st R is resolved at depth 3, and the 2nd R is > resolved again because the depth is at 2 which is lower, the 3rd R at depth 3 > and the 4th R at depth 4 are simply skipped as R is already resolved at depth > 2. This is because the same node with deeper depth is most likely won't be > picked up by maven as maven employs a "{*}nearest{*} transitive dependency in > the tree depth and the *first* in resolution" strategy. > The 3rd R and 4th R will have children set as zero and marked as skipped by > the R at depth 2 in 2nd tree path. > > Here is the *reconcile* part: > !Screen Shot 2021-11-27 at 12.59.32 PM.png! > When there are dependency conflicts, some of the skipped nodes need to be > reconciled. > In above figure, there are 4 tree paths. > * The D1 (D with version 1) in the 1st tree path is get resolved, children > of E and R at depth 3 are resolved and cached. > * In the 2nd tree path, when resolving E & R of H, we simply skip these 2 > nodes as they are in deeper depth (depth: 4) than the E & R in 1st tree path. > * In the 3rd tree path, a R node with lower path is resolved, and a E node > at depth 5 is skipped. > * In the 4th path, a D2 (D with version 2) node is resolved, as the depth is > lower than D1, so maven will pick D2, this means the E & R's children cached > in tree depth 1 should be {*}discarded{*}. > Thus we might need to reconcile the E & R nodes in 2nd, 3rd and 4th tree > paths. Here only E in 2nd tree path needs to be reconciled. This is because: > * R in 3rd tree path won't be picked up as there is already a R in 2nd tree > path with a lower depth. > * E in 3rd tree path won't be picked up as it is enough to reconcile the E > in 2nd tree path as the E in 2nd tree path is deeper than E in 3rd tree path. > Here is what we've updated in the maven-resolver logic: > # Resolve dependencies by leveraging a skip approach. The node in deeper > depth will be skipped if a node with same GAV has been resolved with a lower > depth. > # Cloned the nodes in step 1. > Use maven's ConflictResolver (Transformer) to transform the cloned nodes and > find out the conflict winners & the nodes to reconcile. > Ex, D1 conflicts with D2 in above digram. > E in 2nd tree path will be reconciled. > # Reconcile all skipped nodes identified in above step. > > After we enabled the resolver patch in maven, we are seeing 10% ~70% build > time reduced for different projects depend on how complex the dependencies > are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain > the same. > We've verified the resolver performance patch leveraging an automation > solution to certify 2000+ apps of our company by comparing the *mvn > dependency:tree* and *mvn dependency:list* result with/without the > performance patch. > Please help review the PR. > [https://github.com/apache/maven-resolver/pull/136] > > -- This message was sent by Atlassian Jira (v8.20.1#820001)