caiwei-ebay edited a comment on pull request #158: URL: https://github.com/apache/maven-resolver/pull/158#issuecomment-1068901110
@michael-o @jebeaudet @cstamas With the skip solution, there will be less poms or jars, but not that much. As the skip solution just skip for version conflict cases. Ex, D1 is the winner, then when maven comes to resolve D2, D2 will be still downloaded (to get the artifact version), but D2's children will be skipped as it already knows D2 is conflict loser, however the fact is D1 and D2 may have most of the children in common, thus only children in D2 (and children's transitive dependencies) that differs from children in D1 will be skipped. The key benefit of this solution is to avoid unnecessary dependency resolution for - Version conflicting nodes - Duplicate nodes with different exclusions. Originally when maven resolves D1, it caches D1's children, when maven comes to resolve D1 (yes, D1, same GAV) with different exclusion, it cannot reuse D1 from cache and needs to resolve again. For most of the projects, there won't be too many conflicts matching above 2 cases, so this solution can do little help. For enterprise level projects, like user is using a heavy dependency (with many transitive dependencies) in its core/low-level library, then user uses the core library plus different exclusions to solve conflicts everywhere in his multiple-module project, then user builds his project as new libraries and share with other teams, other team starts to use and also might add all kinds of exclusions .to solve conflicts, and again shared to other team for use... It is now getting more and more complex as Maven dependency is by default transitive, they have to use exclusions to avoid conflicts in some cases. There is no dependency born to be heavy, it is just because more and more code and dependencies add in to their project... And exclusions in Maven can be inherited! Given A -> B -> C and D -> E -> F -> C, if any of A, B or D, E, F has a different exclusion, C will be resolved twice. If C is a dependency again introduces lots of transitive dependencies, all transitive dependencies of C will be resolved twice. Here the transitive dependency propagation + different exclusions + exclusions inheritance lead to the build slowness. This is probably why Gradle would like to make dependency non transitive by default and Spring would like to declare maven dependencies in pom as **optional**. I don't think transitive dependencies propagation is a fault but it do increase learning curve for end users. A misuse of heavy dependency in low level library would make all projects using this low level library with different exclusions suffer slow performance, and besides it would need cross team effort to fix that if the library is from other domain team. We could guide users to fix their poms one by one with such methods: - Use optional to stop transitive dependency propagation - Manage exclusions in parent pom so exclusions would keep the same - Remove all unused dependencies in pom to make the dependency tree clean - Do not use a heavy dependency in low level library - Manage dependencies in parent pom or a central managed BOM. Yes, we tuned around 10 projects with above methods, it do speed up their builds a lot, however it’s very time consuming and error prone, and we are sure the slowness would happen again as people are probably maven newbies, this is why we want to make the enhancement at maven resolver which could benefit all maven users. Here is the brief history of this solution: 1. We started a project aims to improve maven build performance for all projects in our company. 2. We developed a maven extension to collect data from user’s builds and visual build insights (download time, plugin execution time, test time…) in ElasticSearch & Kibana. 3. One project comes to us for help, it took 20G memory and 30+ minutes to build the project. We checked the Kibana dashboard of his project and found the total time (the same value printed out by maven) is not simply equal to download time + sums of plugin execution time + test time. Here was the question came to my mind: where is the additional time spent? 4. Debugged the maven-resolver and found the problem, huge amount of duplicate nodes were resolved again and again, the node objects were cached in memory with exclusions as part of the cache key and finally consumed up the memory. 5. We made our extension to visualize this part of time as “resolution time” and found quite a few projects were experiencing the same dependency resolution slowness. 6. We tuned the project by adjusting the dependencies with the methods mentioned above. We also tuned a few other projects with the same methods. It could cost one week/one person for 1~2 app tunings. 7. We no longer want to tune project one by one, so we implemented this solution and shipped it with above maven extension. It speeded up user's maven builds and also reduced memory cost. It could bring up to 70% performance gain depending on the complexity of the project. I have to admit it is hard to measure the performance again for non complex projects, may be 0% - 10%? I was thinking we could make it apple-to-apple compare (same version of maven) firstly, here is the command I would recommend. First we need build the custom mvn. 1. run `mvn clean install -DskipTests -Dmaven.repo.local=m1` (nuked repo case) 2. run `mvn clean install -DskipTests -Dmaven.repo.local=m1` (full cached repo now as this is the 2nd run) 3. run `mvn clean install -DskipTests -Dmaven.repo.local=m2 -Daether.dependencyCollector.useSkip=false` (nuked repo case and no skip solution) 4. run `mvn clean install -DskipTests -Dmaven.repo.local=m2 -Daether.dependencyCollector.useSkip=false` (full cached repo and no skip solution) Then we can check the file nums , file sizes of local maven repo and also the build time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@maven.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org