URIs, %nn decoding and error handling

Mark Thomas Fri, 01 Mar 2013 12:36:12 -0800

Due to bug 54602 [1] I have been writing some test cases to examine howwe handle invalid bytes sequences in URIs.


My expectation was:
- valid byte sequence for expected encoding -> 200 (assuming no other
  problems)
- invalid byte sequence for expected encoding -> 400
- partial byte sequence for expected encoding -> 400

However, that isn't what happens and I currently believe that thisshould happen. The purpose of this e-mail is, therefore, to getagreement on what should happen.

There are multiple moving parts here so forgive me if this e-mail gets alittle long. There are multiple decisions and I expect some to be lesscontentious than others.

These issues were observed with UTF-8. Other encodings may have similarissues. May aim is to get a consistent approach regardless of encoding.


Issue 1: URI ends with partial byte sequence

Currently the partial byte sequences are ignored. I think theB2CConverter should throw an Exception if the full input (i.e. whenendOfInput == true) ends in with a partial byte sequence


Issue 2: URI ends with invalid byte sequence

This appears to be a bug in the UTF-8 decoder provided by the JVM. [1]has provided one set of input bytes that triggers this. Currently theinvalid data is ignored. I think that B2CConverter should throw anException as soon as it can determine that input is invalid. This wouldrequire:

- switching to the Harmony based UTF-8 decoder used by WebSocket

- further testing of the JRE and Harmony UTF-8 decoders to check forother potential issues


Issue 3: Fall back to 'ASCII'

If the conversion fails (i.e. throws an exception for any reason) [2],the CoyoteAdapter attempts to decode the provided URI using 'ASCII'rather than the configured connector encoding. I say 'ASCII" because thecomments say ASCII but it is actually ISO-8859-1.I don't believe it appropriate to fall back to anything here. The fallback code has been present since conversion support was added but Ican't think of any scenario where this stands any chance of workingreliably. I would like to remove this fall back code.


I would like to make these changes in trunk and 7.0.x.

I expect to have a similar discussion about request bodies once URIs areresolved where I have essentially the same view - a decoding errorshould lead to a request failure.


Thoughts?

Mark


[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=54602

[2]http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/catalina/connector/CoyoteAdapter.java?view=annotate(line ~1054)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

URIs, %nn decoding and error handling

Reply via email to