Stephen Day <[email protected]> added the comment:
I apologize for reopening this bug, but I find your interpretation to be
inaccurate. While technically valid, the combination of the documentation, the
function name and the main use cases yields pathological invocations of
urlencode. My bug report is to help mitigate these problems.
The main use case for "url encoding" of mapping types is not for posting form
data; the main use case is appending url parameters to a url:
>>> from urllib import urlencode
>>> from urlparse import urlunparse
>>> urlunparse(('http', 'example.com', '/', None, urlencode({'a': 'some
>>> string'}), None))
'http://example.com/?a=some+string'
Any sane person would naturally gravitate to a function called "urlencode" to
url encode a mapping type. If the urllib.urlencode function is indeed intended
for form-encoding, as I agree is hinted in the documentation, it should
indicate that its result is 'application/x-www-form-urlencoded' or it should be
called "formencode".
The quote or quote_plus is not at all "what I am looking for"; I am quite
familiar with these library functions. These functions are for encoding
component strings; they don't meet the use case described at all:
>>> quote({'a': 1})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py",
line 1248, in quote
if not s.rstrip(safe):
AttributeError: 'dict' object has no attribute 'rstrip'
In addition, Java's URLEncoder implementation is hardly a good example of
standards compliant URL manipulation. Python is not Java. The Python community
needs to make its own, independent, mature language decisions. In general, the
use of '+' to encode spaces in content, even if it is compliant against an
arbitrary standard, is pathological, especially when used in urls. Even though
python's quote_plus function works symmetrically on its own, when pluses are
used in a multi-language environment it can become impossible to tell whether a
plus is a literal '+' or an encoded space. In addition, the usage of '%20' for
spaces will work in almost all cases.
RFC3986, Section 2 [1] describes the use of percent-encoding as a solution to
representing reserved characters. In practice, percent-encoding is used on the
value component of 'key=value' productions and this works in nearly all cases.
The referenced standard [2], while relevant to the "implied" use case, is not
applicable to url assembly.
Given your interpretation, it seems that there is no function in the python
standard library to meet the use case of correctly assembling url parameter
values, leaving application developers to come up with something like this:
>>> '&'.join(['='.join((quote(k), quote(v))) for k,v in {'a': '1', 'b': 'with
>>> spaces'}.iteritems()])
'a=1&b=with%20spaces'
In most cases, people will just use urlencode, which uses pluses for spaces,
yielding pathological, noncompliant urls.
In deference to this bug closure, there are a few options:
1. Close this issue and keep polluting the world's urls with pluses for spaces.
2. Make urlencode target path/query parameter encoding and then create a new
function, formencode, for use in encoding form data, breaking backwards
compatibility.
3. Simply add a keyword argument to urlencode to allow the caller to specify
the encoding function and separator, retaining compatibility and satisfying all
of the above use cases.
Naturally, 3 seems to be a very reasonable solution to this bug.
[1] http://tools.ietf.org/html/rfc3986#section-2 explicitly covers
[2] http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
----------
resolution: invalid ->
status: closed -> open
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue13866>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com