DO NOT REPLY [Bug 45406] Decoding URI encoded in UTF-16 does not work correctly.

bugzilla Mon, 21 Jul 2008 03:42:16 -0700

https://issues.apache.org/bugzilla/show_bug.cgi?id=45406






--- Comment #11 from Ran Rubinstein <[EMAIL PROTECTED]>  2008-07-21 03:41:16 
PST ---
(In reply to comment #10)

Will, I'm sorry to drag this on, but I want to understand fully where I'm wrong
in this.

AFAIK, an ascii URL with one character represented in %-encoding such as
http://www.google.com/q=%D7%05 does represent a legal UTF-16 encoded URL.

UTF-16 %-Encoding does not mean the client sends two bytes or a wchar for each
letter in the URL, but rather that it sends the URL in ASCII, except for the
parts of the query string are not ASCII and they are encoded using %-Encoding,
with the bytes there determined by the selected encoding (usually UTF-8).
This is also the behavior of java's built-in URLEncoder.encode()/decode()
functions.

So a UTF-16 encoded URL, can look like this:

http://www.google.com/q=%D7%05

and be legal.

Is my concept completely off-base?

If this is true, I see no reason for tomcat not to support this (except of
course that the architecture right now does not support it, since the
%-decoding and string building classes are separate - byteChunk expects, well,
a chunk of bytes, which it translates to a string according to the given
encoding. UDecoder translates the URL to this chunk of bytes.
I suggest that instead of this, when processing URLs/URI's tomcat will use a
combined approach that is compatible with the %-encoding rule that only
non-ascii characters are %-encoded.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 45406] Decoding URI encoded in UTF-16 does not work correctly.

Reply via email to