Luke Plant wrote:

This is important:

http://www.w3.org/2001/tag/doc/whenToUseGet-20030709.html#i18n
There is a note in that text saying that when encoding is unknown browsers may use the encoding that was used for outputting the form. Which they really do.

But as I understod Hugo's note he was talk about GET params that come from request URL. But anyway an application in most cases also expects data in the same encoding that was used for outputting HTML.

This means that decoding from DEFAULT_CHARSET would almost always work. And for cases when there is invalid input I think we should just do decode(errors='replace') rather than raising an exception.

This happens for example on some russian search engines that expect input in legacy encoding but get utf-8 (from Firefox's bookmark keyword as a common case). User just get results for something looking like "тест" and actually see that "something wrong with the language". This is unfortunately still not so uncommon and, in my opinion, explains the problem better than error page mentioning scary things like "encoding" which says little to a user.

Reply via email to