https://issues.apache.org/bugzilla/show_bug.cgi?id=51400
--- Comment #11 from Konstantin Preißer <prei...@web.de> 2011-06-23 15:25:01 UTC --- Hi Christopher, (In reply to comment #10) > If you read some of the online posts linked from this BZ issue, you'll see > claims that pre-populating such a cache does not have a noticeable impact on > performance. Honestly, I'm okay not pre-populating things because there are > probably a dozen encodings that get any significant amount of real use on the > web, while Charset.availableCharsets returns 163 different obscure character > sets. > > I suppose it's a fairly small set of encodings, but with little benefit, > there's no reason IMO to pre-populate. You're right; however if I read the reports correctly, this is true if charsets with valid names only are used for the lookup. But everytime when there is a loopkup for a non-existing Charset, the JVM-synchronized Charset.lookup() is called. Probably to speed this up, Konstantin Kolinko suggested to cache charset missings. If a list with all avaliable charsets would be pre-populated, including their aliases, missing charsets could also be determined fast. > Actually, I might leave the case in-tact for performance considerations. Yes, > it's true that utf-8, UTF-8, uTf-8, UTf-8, UtF-8, etc. would all be the same, > I > suspect that only "utf-8" and "UTF-8" will be used in the wild with any > reasonable frequency. Normalizing case for every lookup is probably a waste of > time, unless there are significant concerns of DOS using long, non-normalized > permutations of valid encodings (longest is x-MacCentralEurope with 17 > characters to play with). 17 characters is a lot of permutations (~2MiB), > though. Well, on my Windows machine the longest alias (not canonical name) of a charset is "Extended_UNIX_Code_Packed_Format_for_Japanese" which consists of 39 muatble characters. The current (trunk) implementation in o.a.tomcat.util.buf.B2CConverter.getCharset() does not normalize the name, so a Client could send requests with 2^39 permutations in a Content-Type header (which would make 49 TiB of Charset strings) ;-) -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org