Re: Detection of unlabeled UTF-8

Joshua Cranmer 🐧 Fri, 30 Aug 2013 11:39:44 -0700

On 8/30/2013 4:01 AM, Anne van Kesteren wrote:

On Fri, Aug 30, 2013 at 9:40 AM, Gervase Markham <g...@mozilla.org> wrote:

We don't want people to try and move to UTF-8, but move back because
they haven't figured out how (or are technically unable) to label it
correctly and "it comes out all wrong".

You also don't want it to be wrong half of the time. Given that full
content scans won't fly (we try to restrict scanning for encodings as
much as possible), that's a very real possibility, especially given
forums such as in OP that are mostly ASCII.


Labeling is what people ought to do, and it's very easy: <meta
charset=utf-8> (if all other files end up unlabeled, they'll inherit
from this one).

The problem I have with this approach is that it assumes that the pageis authored by someone who definitively knows the charset, which is nota scenario which universally holds. Suppose you have a page that servesup the contents of a plain text file, so your source data has noindication of its charset. What charset should the page report? Thechoice is between guessing (presumably UTF-8) or saying nothing (whichcauses the browser to guess Windows-1252, generally).


--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Detection of unlabeled UTF-8

Reply via email to