[ https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
wei cai updated MRESOLVER-247: ------------------------------ Attachment: skip-duplicate.png > Avoid unnecessary dependency resolution by a Skip solution based on BFS > ----------------------------------------------------------------------- > > Key: MRESOLVER-247 > URL: https://issues.apache.org/jira/browse/MRESOLVER-247 > Project: Maven Resolver > Issue Type: Improvement > Components: Resolver > Affects Versions: 1.7.3 > Reporter: wei cai > Priority: Major > Attachments: exclusion-matters.png, skip-duplicate.png, > skip-version-conflicts.png > > > h1. *Background* > This Jira is related with MRESOLVER-240 and MRESOLVER-228 > There were discussions about the DFS or BFS algorithm for maven resolver in > MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much > easier to implement. Here is the plan for multiple changes requested recently: > * DFS > BFS - preparation for parallel download > * Skip approach - avoid unnecessary version resolution (Covered in this JIRA) > * Download descriptors in parallel (MRESOLVER-7) > h1. *The phenomenon* > When comes to resolve the huge amount of dependencies of an enterprise level > project, the maven resolver is very slow to resolve the dependency > graph/tree. Take one of our app as example, it could take *10minutes+ and 16G > memory* to print out the result of {*}mvn dependency:tree{*}. > This is because there are many dependencies declared in the project, and some > of the dependencies would introduce *600+* transitive dependencies, and > exclusions are widely used to solve dependency conflicts. > By checking the > [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500], > we know the exclusion is also part of the cache key. This means when the > exclusions up the tree differs, the cached resolution result for the same GAV > won't be picked up and need s to be recalculated. > > !exclusion-matters.png! > > From above figure, we know: > * In 1st case, D will be resolved only once as there are no exclusions/same > exclusions up the tree. > * In 2nd case, the B and C have different exclusions and D needs to be > recalculated, if D is a heavy dependency which introduce many transitive > dependencies, all D and its children (D & E in this case) need to be > recalculated. Recalculating all of these nodes introduces 2 issues: > ** Slow in resolving dependencies. > ** Lots of DependencyNodes cached (all calculated/recalculated nodes would > be cached) and will consume huge memory. > h1. Solution > To improve the speed of maven resolver's dependency resolution, I > implemented a skip approach to avoid unnecessary dependency resolution. > h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)* > !skip-duplicate.png! > From above figure: > * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 > is the winner. > * The R#5 won't be the winner, however here we force resolved this node as > it is more left than the last resolved R#3 node. This is because maven picks > up widest scope present among conflicting dependencies (scopes of the > conflicts may differ), we need to *retain the conflict paths* by using a > strategy like: > ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while > R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve > R#5. > ** If there is a R#6 which is more left than R#5, then need to force resolve > R#6. > * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is > already resolved at depth 2. This is because the same node with deeper depth > won't be picked up by maven as maven employs a "nearest transitive dependency > in the tree depth and the first in resolution" strategy. > h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)* > *!skip-version-conflicts.png!* > In above figure. > * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already > know D1 is the winner > * When comes to resolve D2 (D with version 2, after #4), we know D2 is > having a different version and it conflicts with D1, D2 will be skipped as it > won't be picked up by maven, all D2's children *won't be resolved* then. > After we enabled the resolver patch in maven, we are seeing 10% ~70% build > time reduced for different projects depend on how complex the dependencies > are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain > the same. > We've verified the resolver performance patch leveraging an automation > solution to certify 2000+ apps of our company by comparing the *mvn > dependency:tree* and *mvn dependency:list* result with/without the > performance patch. > Please help review the PR. > -- This message was sent by Atlassian Jira (v8.20.1#820001)