[issue13866] {urllib, urllib.parse}.urlencode should not use quote_plus

Stephen Day Mon, 13 Feb 2012 12:46:54 -0800

Stephen Day <[email protected]> added the comment:

I apologize for reopening this bug, but I find your interpretation to be 
inaccurate. While technically valid, the combination of the documentation, the 
function name and the main use cases yields pathological invocations of 
urlencode. My bug report is to help mitigate these problems.


The main use case for "url encoding" of mapping types is not for posting form 
data; the main use case is appending url parameters to a url:

>>> from urllib import urlencode
>>> from urlparse import urlunparse
>>> urlunparse(('http', 'example.com', '/', None, urlencode({'a': 'some 
>>> string'}), None))
'http://example.com/?a=some+string'

Any sane person would naturally gravitate to a function called "urlencode" to 
url encode a mapping type. If the urllib.urlencode function is indeed intended 
for form-encoding, as I agree is hinted in the documentation, it should 
indicate that its result is 'application/x-www-form-urlencoded' or it should be 
called "formencode".

The quote or quote_plus is not at all "what I am looking for"; I am quite 
familiar with these library functions. These functions are for encoding 
component strings; they don't meet the use case described at all:

>>> quote({'a': 1})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py",
 line 1248, in quote
    if not s.rstrip(safe):
AttributeError: 'dict' object has no attribute 'rstrip'

In addition, Java's URLEncoder implementation is hardly a good example of 
standards compliant URL manipulation. Python is not Java. The Python community 
needs to make its own, independent, mature language decisions. In general, the 
use of '+' to encode spaces in content, even if it is compliant against an 
arbitrary standard, is pathological, especially when used in urls. Even though 
python's quote_plus function works symmetrically on its own, when pluses are 
used in a multi-language environment it can become impossible to tell whether a 
plus is a literal '+' or an encoded space. In addition, the usage of '%20' for 
spaces will work in almost all cases.

RFC3986, Section 2 [1] describes the use of percent-encoding as a solution to 
representing reserved characters. In practice, percent-encoding is used on the 
value component of 'key=value' productions and this works in nearly all cases. 
The referenced standard [2], while relevant to the "implied" use case, is not 
applicable to url assembly.

Given your interpretation, it seems that there is no function in the python 
standard library to meet the use case of correctly assembling url parameter 
values, leaving application developers to come up with something like this:

>>> '&'.join(['='.join((quote(k), quote(v))) for k,v in {'a': '1', 'b': 'with 
>>> spaces'}.iteritems()])
'a=1&b=with%20spaces'

In most cases, people will just use urlencode, which uses pluses for spaces, 
yielding pathological, noncompliant urls.

In deference to this bug closure, there are a few options:

1. Close this issue and keep polluting the world's urls with pluses for spaces.

2. Make urlencode target path/query parameter encoding and then create a new 
function, formencode, for use in encoding form data, breaking backwards 
compatibility.

3. Simply add a keyword argument to urlencode to allow the caller to specify 
the encoding function and separator, retaining compatibility and satisfying all 
of the above use cases.

Naturally, 3 seems to be a very reasonable solution to this bug.

[1] http://tools.ietf.org/html/rfc3986#section-2 explicitly covers 
[2] http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

----------
resolution: invalid -> 
status: closed -> open

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue13866>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13866] {urllib, urllib.parse}.urlencode should not use quote_plus

Reply via email to