On 8/18/13 5:02 PM, Geoff Field wrote:
> Reading through that RFC, I note the following paragraph:
> 
> 2.3.  Unreserved Characters
> 
>    Characters that are allowed in a URI but do not have a reserved
>    purpose are called unreserved.  These include uppercase and lowercase
>    letters, decimal digits, hyphen, period, underscore, and tilde.
> 
>       unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
> 
> To me, this quite clearly states that underscores ARE permitted in URLs.  Any 
> code that fails to allow them is contrary to the RFC.

That section only applies to the path portion of the URI.

The relevant section is 3.1 which defines what can be in the scheme section:

   scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

>> The related change was not in Subversion itself, but in 
>> apr-util 1.5.2.
>> [[
>>   *) apr_uri_parse(): Do not accept invalid characters in the scheme.
>>      Per RFC 3986 3.3, enforce that the first segment of a 
>> relative path does
>>      not contain a colon. PR 52479.
>> ]]
> 
> That's about a colon, not an underscore.

That's what he was fixing but he specifically started enforcing the RFC.
 I think the section 3.3 in his commit message is a typo.

He adds the following to limit the characters allowed in the scheme:
+#define T_SCHEME          0x10        /* '0' ... '9', '-', '+', '.'
+                                       * (allowed in scheme except
first char)
+                                       */

You can see the full change here:
http://svn.apache.org/viewvc/apr/apr-util/branches/1.5.x/uri/apr_uri.c?r1=1460351&r2=1462434&pathrev=1462434

Reply via email to