caiwei-ebay commented on pull request #158:
URL: https://github.com/apache/maven-resolver/pull/158#issuecomment-1068901110


   @michael-o @jebeaudet @cstamas 
   
   With the skip solution, there will be less poms or jars, but not that much. 
As the skip solution just skip for version conflict cases. Ex, D1 is the 
winner, then when maven comes to resolve D2, D2 will be still downloaded (to 
get the artifact version), but D2's children will be skipped as it already 
knows D2 is conflict loser, however the fact is D1 and D2 may have most of the 
children in common, thus only children in D2 (and children's transitive 
dependencies) that differs from children in D1 will be skipped.
   
   The key benefit of this solution is to avoid unnecessary dependency 
resolution for 
   - Version conflicting nodes
   - Duplicate nodes with different exclusions. Originally when maven resolves 
D1, it caches D1's children, when maven comes to resolve D1 (yes, D1, same GAV) 
with different exclusion, it cannot reuse D1 from cache and needs to resolve 
again. 
   
   For most of the projects, there won't be too many conflicts matching above 2 
cases, so this solution can do little help.
   
   For enterprise level projects, like user is using a heavy dependency (with 
many transitive dependencies) in its core/low-level library, then user uses the 
core library plus different exclusions to solve conflicts everywhere in his 
multiple-module project, then user builds his project as new libraries and 
share with other teams, other team starts to use and also might add all kinds 
of exclusions .to solve conflicts... It is now getting more and more complex as 
Maven dependency is by default transitive, they have to use exclusions to avoid 
conflicts in some cases. 
   
   Here the transitive dependency propagation + heavy dependency + different 
exclusions lead to the build slowness. This is probably why Gradle would like 
to make dependency non transitive by default and Spring would like to declare 
maven dependencies in pom as **optional**.  I don't think transitive 
dependencies propagation is a fault but it do increase learning curve for end 
users.  A misuse of heavy dependency in low level library would make all 
projects using this low level library with different exclusions suffer slow 
performance, and besides it would need cross team effort to fix that.
   
   We could guide users to fix their poms one by one with such methods: 
   
   - Use optional to stop transitive dependency propagation
   - Manage exclusions in parent pom so exclusions would keep the same
   - Remove all unused dependencies in pom to make the dependency tree clean
   - Do not use a heavy dependency in low level library
   - Manage dependencies in parent pom or a central managed BOM.
   
   Yes, we tuned around 10 projects with above methods, it do speed up their 
builds a lot, however it’s very time consuming and error prone, and we are sure 
the slowness would happen again as people are probably maven newbies, this is 
why we want to make the enhancement at maven resolver which could benefit all 
maven users.
   
   Here is the brief history of this solution:
   1. We started a project aims to improve maven build performance for all 
projects in our company.
   2. We developed a maven extension to collect data from user’s builds and 
visual build insights (download time, plugin execution time, test time…) in 
ElasticSearch & Kibana.
   3. One project comes to us for help, it took 20G memory and 30+ minutes to 
build the project.
   4. We checked the Kibana dashboard of his project and found the total time 
(the same value printed out by maven) is not simply equal to download time + 
sums of plugin execution time + test time. Here was the question came to my 
mind: where is the additional time spent?
   5. Debugged the maven-resolver and found the problem,  huge amount of 
duplicate nodes were resolved again and again, the node objects were cached in 
memory with exclusions as part of the cache key and finally consumed up the 
memory.
   6. We made our extension to visualize this part of time as “resolution time” 
and found quite a few projects were experiencing the same dependency resolution 
slowness.
   7. We implemented this solution and shipped it with above maven extension.  
It speeded up their maven builds and also reduced memory cost. It could bring 
up to 70% performance gain depending on the complexity of the project.
   
   I have to admit it is hard to measure the performance again for non complex 
projects, may be 0% - 10%? 
I was thinking we could make it apple-to-apple 
compare (same version of maven) firstly, here is the command I would recommend.
   
   First we need build the custom mvn.
   
   1. run `mvn clean install -DskipTests -Dmaven.repo.local=m1` (nuked repo 
case)
   2. run `mvn clean install -DskipTests -Dmaven.repo.local=m1` (full cached 
repo for 2nd run)
   3. run `mvn clean install -DskipTests -Dmaven.repo.local=m2 
-Daether.dependencyCollector.useSkip=false` (nuked repo case and no skip 
solution)
   4. run `mvn clean install -DskipTests -Dmaven.repo.local=m2 
-Daether.dependencyCollector.useSkip=false` (full cached repo and no skip 
solution)
   
   Then we can check the file nums , file sizes of local maven repo and also 
the build time.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@maven.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to