On Fri, Aug 30, 2013 at 1:03 PM, Henri Sivonen <hsivo...@hsivonen.fi> wrote: > This is true if you run the heuristic over the entire byte stream. > Unfortunately, since we support incremental loading of HTML (and will > have to continue to do so), we don't have the entire byte stream > available at the time when we need to make a decision of what encoding > to assume.
In particular, you need to decide on the encoding before you start running any user script, because you don't want document.characterSet etc. to change once it might have already been accessed. For performance reasons, we want to be able to run scripts immediately after receiving the initial TCP response, if there are any to run yet. This implies we need to decide on character set after reading the first segment, which typically will not contain the actual content of the page that we would want to sniff on pages like http://www.eyrie-productions.com/. Right? (I say this only because my initial reaction was that we could hold off on deciding what encoding to use until we find the first non-ASCII byte without any ill effects, if we really wanted to. That would probably make the site in question work. But then I realized it would break document.characterSet, so it's not an option even if we wanted more sniffing.) _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform