Due to bug 54602 [1] I have been writing some test cases to examine how we handle invalid bytes sequences in URIs.

My expectation was:
- valid byte sequence for expected encoding -> 200 (assuming no other
  problems)
- invalid byte sequence for expected encoding -> 400
- partial byte sequence for expected encoding -> 400

However, that isn't what happens and I currently believe that this should happen. The purpose of this e-mail is, therefore, to get agreement on what should happen.

There are multiple moving parts here so forgive me if this e-mail gets a little long. There are multiple decisions and I expect some to be less contentious than others.

These issues were observed with UTF-8. Other encodings may have similar issues. May aim is to get a consistent approach regardless of encoding.

Issue 1: URI ends with partial byte sequence
Currently the partial byte sequences are ignored. I think the B2CConverter should throw an Exception if the full input (i.e. when endOfInput == true) ends in with a partial byte sequence

Issue 2: URI ends with invalid byte sequence
This appears to be a bug in the UTF-8 decoder provided by the JVM. [1] has provided one set of input bytes that triggers this. Currently the invalid data is ignored. I think that B2CConverter should throw an Exception as soon as it can determine that input is invalid. This would require:
- switching to the Harmony based UTF-8 decoder used by WebSocket
- further testing of the JRE and Harmony UTF-8 decoders to check for other potential issues

Issue 3: Fall back to 'ASCII'
If the conversion fails (i.e. throws an exception for any reason) [2], the CoyoteAdapter attempts to decode the provided URI using 'ASCII' rather than the configured connector encoding. I say 'ASCII" because the comments say ASCII but it is actually ISO-8859-1. I don't believe it appropriate to fall back to anything here. The fall back code has been present since conversion support was added but I can't think of any scenario where this stands any chance of working reliably. I would like to remove this fall back code.

I would like to make these changes in trunk and 7.0.x.

I expect to have a similar discussion about request bodies once URIs are resolved where I have essentially the same view - a decoding error should lead to a request failure.

Thoughts?

Mark


[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=54602
[2] http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/catalina/connector/CoyoteAdapter.java?view=annotate (line ~1054)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to