On 9/9/13 02:31, Henri Sivonen wrote:
We don't have telemetry for the question "How often are pages that are not
labeled as UTF-8, UTF-16 or anything that maps to their replacement
encoding according to the Encoding Standard and that contain non-ASCII
bytes in fact valid UTF-8?" How rare would the mislabeled UTF-8 case need
to be for you to consider the UI that you're proposing not worth it?

I'd think it would depend somewhat on the severity of the misencoding. For example, interpreting a page of UTF-8 as Windows-1252 isn't generally going to completely ruin a page with the occasional accented Latin character, although it will certainly be an obvious defect. I'd be happy to leave the situation be if this happened to fewer than 1% of users over a six week period.

On the other hand, misrendering a page of UTF-8 that consists predominantly of a non-Latin character set is pretty catastrophic, and is going to tend to happen to the same subset of users over and over again. For that situation, I think I'd like to see fewer than 0.1% of users who have a build that has been localized into a non-Latin character set impacted over a six-week period before I was happy leaving things as-is.

However, we do have telemetry for the percentage of Firefox sessions in
which the  current character encoding override UI has been used at least
once. See https://bugzilla.mozilla.org/show_bug.cgi?id=906032 for the
results broken down by desktop versus Android and then by locale.

I don't think measuring the behavior those few people who know about this feature is particularly relevant. The status quo works for them, by definition. I'm far more concerned about those users who get garbled pages and don't have the knowledge to do anything about it.

I would accept  a (performance-conscious) patch for gathering telemetry for
the UTF-8 question in the HTML parser.  However, I'm not volunteering to
write one myself immediately, because I have bugs on my todo list that have
been caused by previous attempts of Gecko developers to be well-intentioned
about DWIM and UI around character encodings. Gotta fix those first.

Great. I'll see if I can wedge in some time to put one together (although I'm similarly swamped, so I don't have a good timeframe for this). If anyone else has time to roll one out, that would be even better.

Even non-automatic correction means authors can take the attitude that
getting the encoding wrong is no big deal since the fix is a click away for
the user.

I'll repeat that it's not our job to police the web. I'm firmly of the opinion that those developers who don't care about doing things right won't do them right no matter how big a stick you personally choose to beat them with. On the other hand, I'm quite worried about collateral damage to our users in your crusade to control publishers.

Give the publishers the tools to understand their errors, and the users the tools to use the web the way they want to use it. Those publishers who aren't bad actors will correct their own behavior -- those who _are_ bad actors aren't going to behave anyway. There's no point getting authoritarian about it and making the web a less accessible place as a consequence.

--
Adam Roach
Principal Platform Engineer
a...@mozilla.com
+1 650 903 0800 x863
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to