Luke Plant wrote:
This is important:
http://www.w3.org/2001/tag/doc/whenToUseGet-20030709.html#i18n
There is a note in that text saying that when encoding is unknown
browsers may use the encoding that was used for outputting the form.
Which they really do.
But as I understod Hugo's note he was talk about GET params that come
from request URL. But anyway an application in most cases also expects
data in the same encoding that was used for outputting HTML.
This means that decoding from DEFAULT_CHARSET would almost always work.
And for cases when there is invalid input I think we should just do
decode(errors='replace') rather than raising an exception.
This happens for example on some russian search engines that expect
input in legacy encoding but get utf-8 (from Firefox's bookmark keyword
as a common case). User just get results for something looking like
"тест" and actually see that "something wrong with the language".
This is unfortunately still not so uncommon and, in my opinion, explains
the problem better than error page mentioning scary things like
"encoding" which says little to a user.