Due to bug 54602 [1] I have been writing some test cases to examine how
we handle invalid bytes sequences in URIs.
My expectation was:
- valid byte sequence for expected encoding -> 200 (assuming no other
problems)
- invalid byte sequence for expected encoding -> 400
- partial byte sequence for expected encoding -> 400
However, that isn't what happens and I currently believe that this
should happen. The purpose of this e-mail is, therefore, to get
agreement on what should happen.
There are multiple moving parts here so forgive me if this e-mail gets a
little long. There are multiple decisions and I expect some to be less
contentious than others.
These issues were observed with UTF-8. Other encodings may have similar
issues. May aim is to get a consistent approach regardless of encoding.
Issue 1: URI ends with partial byte sequence
Currently the partial byte sequences are ignored. I think the
B2CConverter should throw an Exception if the full input (i.e. when
endOfInput == true) ends in with a partial byte sequence
Issue 2: URI ends with invalid byte sequence
This appears to be a bug in the UTF-8 decoder provided by the JVM. [1]
has provided one set of input bytes that triggers this. Currently the
invalid data is ignored. I think that B2CConverter should throw an
Exception as soon as it can determine that input is invalid. This would
require:
- switching to the Harmony based UTF-8 decoder used by WebSocket
- further testing of the JRE and Harmony UTF-8 decoders to check for
other potential issues
Issue 3: Fall back to 'ASCII'
If the conversion fails (i.e. throws an exception for any reason) [2],
the CoyoteAdapter attempts to decode the provided URI using 'ASCII'
rather than the configured connector encoding. I say 'ASCII" because the
comments say ASCII but it is actually ISO-8859-1.
I don't believe it appropriate to fall back to anything here. The fall
back code has been present since conversion support was added but I
can't think of any scenario where this stands any chance of working
reliably. I would like to remove this fall back code.
I would like to make these changes in trunk and 7.0.x.
I expect to have a similar discussion about request bodies once URIs are
resolved where I have essentially the same view - a decoding error
should lead to a request failure.
Thoughts?
Mark
[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=54602
[2]
http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/catalina/connector/CoyoteAdapter.java?view=annotate
(line ~1054)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org