[GitHub] tomcat pull request: [Bug 58242] Scanning jars in classpath to get...
GitHub user wangyf2010 opened a pull request: https://github.com/apache/tomcat/pull/22 [Bug 58242] Scanning jars in classpath to get annotations in parallel instead of synchronously During tomcat startup, it takes most of time on annotation scanning. For our case, our apps have over 500 jars in class path. Tomcat startup takes 28 seconds and annotation scanning takes 23 seconds. So idea is: scan jars in parallel, per our benchmark testing, it could save 9 seconds. And jetty9 already support this feature. Per Mark Thomas comments: New features are first added to trunk and then back-ported. So file another PR for trunk. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyf2010/tomcat trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tomcat/pull/22.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22 commit 9e560a87e1325f46dc316b13c5c428ff7b73be05 Author: Wang, Simon Date: 2015-08-13T06:32:42Z [Bug 58242] Scanning jars in classpath to get annotations in parallel instead of synchronously --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
[GitHub] tomcat pull request: [Bug 58242] Scanning jars in classpath to get...
Github user wangyf2010 commented on the pull request: https://github.com/apache/tomcat/pull/22#issuecomment-131473216 Thanks for your comments, market! Understood your concern, let me explain it one by one: >The patch appears to be quite large for a marginal benfit. Separating the refactoring and the changes into separate commits would enable the true scale of the changes to be judged. >I remain on the fence about this enhancement, pending a patch that clearly shows the scale of the changes. Benefit of this feature: For enterprise applications, there are tons of jars in class path. For our case, one application have over 500 jars in class path, it took about 23s on annotation scanning. Scan them in parallel could reduce it to be 14s. That's big gains. And also scan it in parallel isn't a new idea, jetty already supported, anyway, it's a reasonable feature for tomcat also. As for refactoring code on ContextConfig: ContextConfig class has too many code inside it, and annotation scanning is an independent functionality indeed. And parallel scanning's idea is: split each jar's scanning as separate task to run. That's why I'd prefer to have separate class and let it focus on one jar's annotation scanning. BTW, it's only move that jar scanning logic into new class, no change on all methods' logics. So from this perspective, change is few. >I am -1 on the patch as current written since: >it is not thread safe (look at the Java class cache) >it does not follow the coding style of the existing code: >no tabs (use 4 * spaces for indent) >no new line before { >no space after ( >no space before ) >it uses system properties for configuration when they are not necessary. Configuration for this feature >should be at the Context level >no documentation provided >no license header in new files >max coding line length is 100 >max comment line length is 80 >it needs to be dsiabled by default 1. code style Could you let me know where is the check style xml for tomcat coding? I tried to find it, but haven't get it. 2. document & license are problem, you're right. I will add them. 3. system property to disable this feature should be in context level. Agree, will cover that. >I also have some more general concerns about this feature: >Is parallel threads scanning the same class an issue and if so, how do we prevent that? It isn't possible to let parallel threads to scan same class, because, this feature is split each jar's scanning as separate task and dispatch separate thread to scanning. Won't have the case that multiple threads scan same jar or same class. >The deployment process is already (or can be) multi-threaded. How much benefit does this feature >provide when multiple apps are being deployed in parallel? This feature is for one app's performance improvement. Even multiple apps are deployed in parallel, this feature could benefit each app's startup. As for how much, it depends on each app's dependency jars. If there are tons of dependency jars, this feature could benefit a lot. >What will be the impact on performance for already deployed applications if this feature is enabled? First, you mean "deployed", means already started it or just deployed onto tomcat? If you're meaning latter one. This feature is to improve server startup performance, not care whether this app is deployed or not. >Is a 20s default tieout to scan a JAR appropriate? yes, that's for one jar, that should be good enough, per benchmark metrics, 20s could scan one jar with size 200MB. Thanks for your detail comments, again. From my view, scanning annotations in parallel is better way, and as web server, tomcat also need it. It could benefit those monster apps, especially for enterprise level apps. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
[GitHub] tomcat pull request: [Bug 58242] Scanning jars in classpath to get...
Github user wangyf2010 commented on the pull request: https://github.com/apache/tomcat/pull/22#issuecomment-131489980 >it is not thread safe (look at the Java class cache) Please refer AnnotationScanningResult, it will fork a new instance for Java class cache or maps, that means for each thread, they're operation their own instances. They're safe. After completed all threads operations, then will merge results together into shared instances. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
[GitHub] tomcat pull request: [Bug 58242] Scanning jars in classpath to get...
Github user wangyf2010 commented on the pull request: https://github.com/apache/tomcat/pull/22#issuecomment-131649270 >My comment regarding separating the refactoring and the functional changes stands. It is not the refactoring I am against, it is the mixing of refactoring and functional change in a single commit. make sense, I can split it into separate commits for refactoring & feature. >/res/checkstyle >Not every rule I mentioned is enforced mainly because not all of the current code follows it (due to the age of the code). We could go through and make all the changes but that creates a different problem (back-ports get more difficult). The general rule is a patch should follow the style of the code it is changing. Got it, thanks for detail explanation for that. >Regarding thread safety I can now see that my concern regarding the Java class cache was mis-placed. However, scanning an individual class triggers the scanning of all the classes in the hierarchy. If those classes are in different JARs then you will get the same class scanned by multiple threads. How much of an issue that is is TBD. You're right. I can see JavaClassCacheEntry isn't thread safe. Should cover it. >While this feature is likely to improve the startup of a Tomcat instance with a single, large application deployed on a multi-core machine I am far from convinced it will help when there are multiple applications deployed particularly when there are more applications deployed than cores available. That's the feature will benefit single, large application better. Might not huge benefit for multiple applications. But that's fair enough, we can't rely on one feature to benefit all cases. And in enterprise environment, one single web app is deployed in a single tomcat indeed. >Your lack of concern for the impact this feature may have on running applications is worring. As currently written it is likely that a deployment using this feature would have a noticeable performance impact on running applications if the system was moderately loaded at the time. It only need resources when scan annotations during server startup, impact is little. Even without this feature, deploy a new web app onto running tomcat, it will have performance impact also. What I'm thinking is: mark those forked maps to be null and let GC recycle them ASAP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org