On 29/06/2019 21:01, Mark Thomas wrote: > On 29/06/2019 02:26, GitBox wrote: >> alpire opened a new pull request #176: CoyoteAdapter: fix out-of-bounds read >> in checkNormalize >> URL: https://github.com/apache/tomcat/pull/176 >> >> >> On malformed requests, checkNormalize would throw an >> ArrayIndexOutOfBoundsException leading to a 500 response. This change fixes >> checkNormalize to return false instead of throwing exception on those >> inputs, and adds a few tests to check the new functionality. > > That there are URI sequences that can trigger this is certainly > something that we need to fix. > > On a slightly different topic, this got me thinking.> > checkNormalize() is a test that runs on every request. It is essentially > there to ensure that whatever character set has been specified for the > Connector is "safe". In this case "safe" means when the bytes are > converted to characters we don't get any unexpected "." or "/" > characters in the result that make the character version of the URI > non-normalized. > > Rather than run this test on every request, we could: > - pre-calculate the character sets we consider to be safe > - throw an exception if the user attempts to configure the Connector to > use one > > I think we could safely exclude any character set where any of the > following were not true: > > - 0x00 <-> '\u0000' > - 0x2e <-> '.' > - 0x2f <-> '/' > - 0x5c <-> '\' > > We could create a unit test that maintains/checks the list of "safe" > character set canonical names. After creating the character set in > Connector.setURIEncoding(), if the canonical name of the resulting > character set is not in the safe list, throw an exception. > > Then remove checkNormalize() and replace with a comment that explains > why the conversion is known to be safe. > > There are several loops through the URI in checkNormalize(). I think - > I'd need to test to confirm - that removing them would provide a > measurable performance benefit. > > Thoughts?
I thought about this some more over the weekend and I think it is a good idea in theory but a bad idea in practice. checkNormalize() serves a purpose that wasn't addressed in my comments above. Broken decoders. There is a history of issues with Java's UTF-8 decoder that could be exploited in the way checkNormalize() is designed to protect against. While I am reasonably confident Java's UTF-8 decoder is now safe, I am not confident that every Java decoder is bug free. Therefore, I think the checkNormalize() call needs to be retained for every request. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org