On 06/09/13 16:45, Robert Kaiser wrote:
Henri Sivonen schrieb:
Considering what Aryeh said earlier in this thread, do you have a
suggestion how to do that so that
> [...]
Hmm, do we have to treat the whole document as a consistent charset?
Could we instead, if we don't know the charset, look at every
rendered-as-text node/attribute in the DOM tree and run some kind of
charset detection on it?
May be a dumb idea but might avoid the problem on the parsing level.
Robert Kaiser
I think that would create a whole lot more problems than it would fix,
and would be unworkable in practice.
Charset detection from content is a probabilistic matter at best, and
treating the document as many small snippets of text would not only
increase the probability of the detection algorithm getting it wrong for
each node, but also give a large number of opportunities per page for at
least one of those detections to go wrong.
-- N.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform