https://bz.apache.org/bugzilla/show_bug.cgi?id=63235
--- Comment #5 from Phillip Webb <pw...@pivotal.io> --- @Mark Thanks for pointing me at those previous issues, I should have thought to search first! It seems like I may have misunderstood the reason for the cache existing in the first place. My original assumption was that it was necessary for case-insensitive name lookups. It appears, on OpenJDK at least, calling `Charset.forName` works with any case combination works. The following will run without error: Set<String> names = new LinkedHashSet<>(); Charset.availableCharsets().forEach((key, value) -> { names.add(value.name()); names.addAll(value.aliases()); }); names.forEach(name -> { System.out.println(Charset.forName(name.toLowerCase())); }); So the main reasons for the cache, as I understand it, are: 1) Improve lookup performance for initial requests by having a hot cache 2) Prevent DoS attacks by limiting the impact of a user requesting an invalid charset Looking at each of these in turn: 1) Improve lookup performance for initial requests by having a hot cache > populate the cache values at start-up for 'popular' charsets (ISO-8859-1, > UTF-8 plus any others we think are likely to be used) I like this suggestion and I think this would be a good match for most users. Popular charsets would be fast, and initial requests to others would take a slight hit. The `Charset` class itself includes a cache, so this could be as simple as calling `Charset.forName` a few times. 2) Prevent DoS attacks by limiting the impact of a user requesting an invalid charset Looking at the existing JDK `Charset` and `AbstractCharsetProvider`[1] class, it appears that asking for charsets that don't exist does not have an impact on memory. When I run the following code, I don't see a large change: for (int i = 0; i < 10000000; i++) { try { Charset.forName("missing-" + i); } catch (Exception ex) { } if (i % 10000 == 0) { System.out.println(Runtime.getRuntime().totalMemory()); System.out.println(Runtime.getRuntime().maxMemory()); System.out.println(Runtime.getRuntime().freeMemory()); System.out.println("--"); } } Of course, this might be JVM specific so it may well be worth still protecting against that. One simple option might be to count lookup misses and after a certain threshold switch back to a set limited by availableCharsets. Something like: public static Charset getCharset(String enc) throws UnsupportedEncodingException { Map<String, Charset> cache = encodingToCharsetCache; if (cache != null) { Charset charset = cache.get(enc.toLowerCase()); if (charset == null) { throw new UnsupportedEncodingException( sm.getString("b2cConverter.unknownEncoding", enc)); } } try { return Charset.forName(enc); } catch (Exception ex) { int count = charsetMissCount.incrementAndGet(); if (count > THRESHOLD) { encodingToCharsetCache = populateEncodingCache(); } throw ex; } } I'm happy to rework the PR if that's a sensible suggestion? Even if it were opt-int, I think it would very nice to not have to pay the startup cost that the existing code incurs, especially if you use an embedded Tomcat and restart frequently during application development. [1] http://cr.openjdk.java.net/~sherman/6898310/webrev/src/share/classes/sun/nio/cs/AbstractCharsetProvider.java.html -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org