Re: [Shared] Bug in HTMLEncoder.encodeURIAttribute()

Curtiss Howard Mon, 01 Aug 2011 15:28:10 -0700

On Mon, Aug 1, 2011 at 2:58 PM, Leonardo Uribe <[email protected]> wrote:
> Hi
>
> 2011/8/1 Curtiss Howard <[email protected]>:
>> Hi,
>>
>> One of the teams we work with was recently complaining that whenever
>> they pass an already-encoded URL to <h:graphicImage> that they get a
>> double encoding, which never happened before in our previous release
>> (which corresponded to the 1.2 timeframe).  It appears that this is
>> the result of a bug in encodeURIAttribute(), introduced after 1.2 and
>> before 2.0:
>>
>>                    else if (c == '%')
>>                    {
>>                        if (i + 2 < string.length())
>>                        {
>>                            char c1 = string.charAt(i+1);
>>                            char c2 = string.charAt(i+2);
>>                            if ((( c1 >= '0' && c1 <='9') || (c1 >='A' && c1 
>> <='Z')) &&
>>                                (( c2 >= '0' && c2 <='9') || (c2 >='A' && c2 
>> <='Z')))
>>                            {
>>                                // do not percent encode, because it could be
>> already encoded
>>                                // and we don't want encode it twice
>>                            }
>>                            else
>>                            {
>>                                app = percentEncode(c, UTF8);
>>                            }
>>                        }
>>                        else
>>                        {
>>                            app = percentEncode(c, UTF8);
>>                        }
>>                    }
>>
>> (Forgive me if the formatting doesn't come out right...)
>
> Just for reference the issue you are talking about is MYFACES-1841
> HtmlResponseWriterImpl.writeURIAttribute does not perform proper URLs
> encoding ( ex: & should be encoded in &amp)
>


That's the issue where this code was introduced, yes.

>>
>> So whenever we find a %, we try to look ahead and see if the next two
>> characters constitute two hex characters.  If so, we leave the % alone
>> since this is clearly a portion of the string that has already been
>> encoded.
>>
>> There are two problems with that code however:
>>
>> 1. Checking for '0' <= c <= '9' && 'A' <= c <= 'Z' means that "%XX"
>> would remain unchanged, when really it needs to be "%2AXX".  We should
>> be checking for 'A' <= c <= 'F' since we're talking about hex
>> characters.
>
> Are you mixing some javascript here? In theory if the percent encoding
> was already done, there is no reason to do it again, because this
> could break the generated URI. The algorithm check that condition and
> do this is intentional. Maybe the problem could be solved detecting
> this special condition when h:graphicImage do its job.
>

No, there's no javascript.  Just URLs.  Specifically, this is in a
Portal environment when using WSRP.  The image URLs are altered by
some Portal/WSRP magic into some sort of query string that includes
the original URL as a parameter (I don't follow all of the details,
I'm not a Portal guy).  Hence, they %-encode the URL before it reaches
the h:graphicImage renderer, which in turn calls
ResponseWriter.writeURIAttribute(), which in turn calls into
HTMLEncoder.encodeURIAttribute().

I've argued back and forth with them but I don't see any reason why
they can't pass a %-encoded value into
ResponseWriter.writeURIAttribute() and assume that they don't get
encoded twice.  In other words, the argument is that:

writeURIAttribute(writeURIAttribute (value)) === writeURIAttribute(value)

The code I've highlighted does what we're talking about, but it
clearly doesn't work with lowercase characters, which we both agree on
(and is the source of their problem; the URLs they pass in use
lowercase hex characters).

The second point amounts to something I think is a bug in the code
(rather than just a missing check).  It's deciding whether or not to
leave a % alone if the next two characters are alphanumeric.  I think
this is wrong, it should be checking for the next two characters to be
_hex only_.  After all, if my value is "%XX", then the result should
be "%25XX" since "XX" isn't valid hexadecimal.  But, I can understand
if changing the character range has greater implications.

>> 2. The problem our colleagues saw: we aren't handling lowercase characters!
>>
>> So I just want to make sure everyone is in agreement with my
>> assessment that we need to change the character range from A..Z to
>> A..F and also check for a..f.  If so, I'll make the change.
>
> I checked it and rfc 2141 specify lower case chars are valid in this
> case, so this change is valid. See:
>
> http://www.ietf.org/rfc/rfc2141.txt
>
> regards,
>
> Leonardo Uribe
>
>>
>> Thanks,
>>
>>
>> Curtiss Howard
>>
>

Re: [Shared] Bug in HTMLEncoder.encodeURIAttribute()

Reply via email to