On 06/09/13 16:45, Robert Kaiser wrote:
Henri Sivonen schrieb:
Considering what Aryeh said earlier in this thread, do you have a
suggestion how to do that so that
> [...]

Hmm, do we have to treat the whole document as a consistent charset? Could we instead, if we don't know the charset, look at every rendered-as-text node/attribute in the DOM tree and run some kind of charset detection on it?

May be a dumb idea but might avoid the problem on the parsing level.

Robert Kaiser


I think that would create a whole lot more problems than it would fix, and would be unworkable in practice.

Charset detection from content is a probabilistic matter at best, and treating the document as many small snippets of text would not only increase the probability of the detection algorithm getting it wrong for each node, but also give a large number of opportunities per page for at least one of those detections to go wrong.

-- N.


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to