CorsFilter's Vary: Origin still a bit broken, can poison a cache to deny access
Hello. Due to external input and internal development I recently had to work with CORS and the Tomcat CorsFilter and encountered the issues https://bz.apache.org/bugzilla/show_bug.cgi?id=61101 - CorsFilter should add Vary header to response https://bz.apache.org/bugzilla/show_bug.cgi?id=62343 - CORS security: reflecting any origin header value when configured to * is dangerous It seems these issues have been fixed and we told our customer to update to the latest release (contains at least fix for 61101) to get the “Vary: Origin” header of which absence they complained about. But we were still getting complaints from the customer, and looking at the source code it seems “Vary: Origin” is only set if the request contains an origin header. I don’t believe this is sufficient. The spec states: 6.4 Implementation Considerations This section is non-normative. Resources that wish to enable themselves to be shared with multiple Origins but do not respond uniformly with "*" must in practice generate the Access-Control-Allow-Origin header dynamically in response to every request they wish to allow. As a consequence, authors of such resources should send a Vary: Origin HTTP header or provide other appropriate control directives to prevent caching of such responses, which may be inaccurate if re-used across-origins. So if the same resource is accessed without origin (CORSRequestType.NOT_CORS) or origin is local (also CORSRequestType.NOT_CORS, i.e. the origin is discarded right now and no Vary: Origin set) and later with non-local origin, a caching layer in between can still be confused. Even worse, there are user agents (Edge, older Firefox) that do html form posts w/o origin header at all and thus could be actively used to poison the cache (aside from doing worse to the servlet handling the call): https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/10482384/ A confused/poisoned cache will return the local origin response to all callers thus effectively denying any non-local client access since the access control headers will be missing. This has affected people before: - https://bugs.chromium.org/p/chromium/issues/detail?id=409090 - https://stackoverflow.com/questions/44800431/caching-effect-on-cors-no-access-control-allow-origin-header-is-present-on-th I believe that for all URLs the CorsFilter matches, any unique Origin header’s value is a variant, since the returned access control headers differ. But also its absence is a variant (allowed headers are missing altogether) and thus warrants a “Vary: Origin”. I.e. always set this header unless it’s an invalid CORS request. I had to write a little filter that is configured next to the CorsFilter and checks for the missing “Vary: Origin” header to add it when being called by CorsFilter in the filter chain. That aside, setting “Vary: Origin” is of course very bad for caching. But then the CorsFilter should probably only be mapped to APIs called remotely. As improvement, since with Bug 62343 allowed.origins = ‘*’ will always set Access-Control-Allow-Origin to '*', if any origin is allowed the filter could help caching by: - always setting the access control response headers, i.e. even if not a CORS request (should not have negative effects) - not setting “Vary: Origin” This would probably greatly improve caching for public APIs that do not rely on credentials. Kind of discussed here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=443530 A single allowed origin is similar, but alas Access-Control-Allow-Origin headers will differ for this origin and the local origin (set to allowed host / missing) and thus cannot be improved the same way. Sincerely, Gunnar Brand -- Gunnar Brand | Softwareentwickler interface projects GmbH | www.intergator.de Zwinglistraße 11/13 | 01277 Dresden | Deutschland Tel: +49-351-31809-41 | Fax: +49-351-31809-33 Folgen Sie intergator auf: Twitter : https://twitter.com/intergator xing: https://www.xing.com/companies/interfaceprojectsgmbh LinkedIn: http://www.linkedin.com/company/interface-projects-gmbh YouTube : http://www.youtube.com/user/TheEnterpriseSearch?feature=watch Facebook: https://www.facebook.com/intergator/ RSS : https://www.intergator.de/feed/ Geschäftsführer: Dr. Uwe Crenze, Eduard Daoud Amtsgericht Dresden HRB 8740 Haftungsausschluss / Disclaimer http://www.intergator.de/email-disclaimer.shtml - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Performance bottleneck in WebappClassLoaderBase
Hello. Recently we received complaints from a customer about slow response times (45 seconds) every morning. Every morning starting a fresh session in the browser would require a long wait time for the first user, but once this hurdle had been taken everything seemed to work fine. We could reproduce this in house, and doing a thread dump during this time span showed the class loader trying to open a jar file. Immediately we suspected a virus scanner slowing access to jar files (and since signatures are updated daily, this happened daily). Blocking the virus scanner from the tomcat folder helped but this is just alleviating the symptoms, not the cure. Doing a full trace log of the class loader package showed the problem: For any unknown resource or class the class loader seems to populate all known jars, and either finds it, then loads it, and keeps a cache entry (with the class for classes), or if it can’t find it, asks the parent class loader (this is the spec order of things). The log was filled classes and resources that couldn’t be found and were delegated to the parent. Over and over again. Each and every one of theses requests scans all jar files. Clear the OS cache and the problem appears, too. For classes one would believe this cannot be, once a class is loaded, it’s loaded. But with dynamic script languages, they often use Class.forName() and for example Rhino even does that for packages (that might not even exist!). We fixed at least Rhino with an application wide global top level , so for classes this problem did go away (there might be other, unless they implement their own caching, there is a risk). Resources are still a problem, our log is filled with 24-Sep-2019 17:22:55.482 FINE [http-nio-8080-exec-6] org.apache.catalina.loader.WebappClassLoaderBase.findResources findResources(META-INF/services/javax.xml.parsers.DocumentBuilderFactory) 24-Sep-2019 17:22:55.484 FINE [http-nio-8080-exec-6] org.apache.catalina.loader.WebappClassLoaderBase.getResourceAsStream getResourceAsStream(META-INF/services/org.apache.xerces.xni.parser.XMLParserConfiguration) 24-Sep-2019 17:22:55.484 FINE [http-nio-8080-exec-6] org.apache.catalina.loader.WebappClassLoaderBase.getResourceAsStream Searching local repositories 24-Sep-2019 17:22:55.485 FINE [http-nio-8080-exec-6] org.apache.catalina.loader.WebappClassLoaderBase.getResourceAsStream Delegating to parent classloader unconditionally java.net.URLClassLoader@578486a3 24-Sep-2019 17:22:55.485 FINE [http-nio-8080-exec-6] org.apache.catalina.loader.WebappClassLoaderBase.getResourceAsStream --> Resource not found, returning null If there is only a single repeated “Searching local repositories” for a certain URL, the problem is there (doesn’t matter if the resource or class exists or not). It’s obviously a bad idea to scan the filesystem or jars every time. The WebappClassLoaderBase keeps information about every resource and class it finds, so repeated calls for these do not cost that much time, especially for classes that are stored in the cache. I believe it might work to keep a negative entry for classes it doesn’t find but the parent class loader knows and immediately ask the parent class loader. Maybe also for resources. But I don’t know what is supposed to happen if somebody overwrites a resource (or class) later. Even worse if the class/resource does not exist yet. Maybe populating the whole resource tree in memory is the only solution (with background update?!) . I mainly wanted to raise awareness of this problem, I am not well versed with class loading, so I can’t really help there. Sincerely, Gunnar Brand Gunnar Brand | Softwareentwickler interface projects GmbH | www.intergator.de Zwinglistraße 11/13 | 01277 Dresden | Deutschland Tel: +49-351-31809-41 | Fax: +49-351-31809-33 Folgen Sie intergator auf: Twitter : https://twitter.com/intergator xing: https://www.xing.com/companies/interfaceprojectsgmbh LinkedIn: http://www.linkedin.com/company/interface-projects-gmbh YouTube : http://www.youtube.com/user/TheEnterpriseSearch?feature=watch Facebook: https://www.facebook.com/intergator/ RSS : https://www.intergator.de/feed/ Geschäftsführer: Dr. Uwe Crenze, Eduard Daoud Amtsgericht Dresden HRB 8740 Haftungsausschluss / Disclaimer http://www.intergator.de/email-disclaimer.shtml