In article <[EMAIL PROTECTED]>,
 Boris Zbarsky <[EMAIL PROTECTED]> wrote:

> Henri Sivonen wrote:
> 
> > More often in practice such anchors get styled as links which isn't cool 
> > at all. Depends on the style sheet in use, of course.
> 
> This will be OUR stylesheet.  I trust we can avoid such mistakes.
> 
> > Suggested replacement:
> > <li><p><strong>Use the ISO-8859-1 aka. Latin-1 character 
> > encoding.</strong>
> 
> Why is this there, if I may ask?  

Back in 1998 almost everyone creating an English-language site was using 
US-ASCII or ISO-8859-1, because ISO-8859-1 was *the* character encoding 
for HTML before anything else was officially taken into account. The 
original requirement is mostly about not using the non-ISO CP1252 
characters that Windows users might accidentally insert in documents and 
about warning clueless Mac users who don't realize MacRoman is not 
ISO-8859-1.

> Why not use something like UTF8 instead?

Some reasons for not using UTF-8 on www.mozilla.org:
* Many American and European contributors use Emacs but 
  haven't bothered to figure out how to make it use UTF-8.
  If UTF-8 documents were allowed, Emacs users could easily
  introduce invalid byte sequences to the files.
* Currently the pages served by www.mozilla.org don't come 
  with a proper charset parameter, which is bad. Using any 
  encoding besides ISO-8859-1 while at the same time banning 
  the <meta> thingy would make matters even worse, 
  because then even Americans and Western Europeans would 
  have an unpleasant encounter with the Character Encoding 
  menu (which would not need to exist if people used HTTP 
  features right). Of course, the right way to approach 
  the issue would be migrating to Apache with contributor-
  writable .htaccess files, but I've been around for long 
  enough to remember the time when Gerv was drafting the 
  newsgroup reorg document, so I'm not overly optimistic.
* Doctor isn't UTF-8-aware. It isn't ISO-8859-1-aware, 
  either. (Dodging this same issue early on has come to 
  haunt Bugzilla later...)

(Personally, if I were to write a content management system from scratch 
now, I'd go with UTF-8 all the way.)

> If I'm authoring a Mozilla.org page and need some non-English 
> text on it (eg testcases for rendering or something), am I supposed to 
> encode every single char as an entity?

If you need only some non-English text, then using NCRs is workable and 
even appropriate given the problems with UTF-8 outline above. Test cases 
are out of the scope of the style guide. For documents in language other 
than English (such documents are ofter hosted elsewhere anyway) I'd go 
with UTF-8 or an encoding commonly used for the language *and* I'd use 
the <meta> thingy (assuming that no migration to Apache has happened).

> > | Unfortunately, that tag makes 3.0-vintage Navigators load 
> > | the document twice and generally lose their minds.
> > 
> > Come on. That's an obsolete excuse.
> 
> This goes in the category of doing things you know will break browsers 
> for no good reason other than "I can do it."  We should not be doing 
> such things, imo.

Using the <meta> thing would be useful. Considering that the real HTTP 
headers will continue to be broken indefinitely, the <meta> workaround 
is the only thing that helps relieving the users of using the Character 
Encoding menu. (People who routinely read badly served non-ISO-8859-1 
pages may not have ISO-8859-1 or Windows-1252 as the default.)

> > | Add meta description and keywords to help indexing.
> > 
> > What's the concrete use case that justifies this requirement?
> 
> The fact that we may want to write 

Is it appropriate to require author effort until the piece of software 
justifying the requirement has actually been written?

> an indexing tool that does a better 
> job of searching _documentation_ in particular than Google does.  Google 
> indexes a whole lot of non-documentation crap on the Mozilla.org site.

PageRank should take care of less relevant documents appearing later in 
the results.

> Google doesn't use such metadata because out in the wide world it is 
> unreliable.  On our own website, it will be reliable, since we control 
> all of it.

It won't be reliable because authors are too lazy to include useful 
metadata, authors forget to keep the metadata up to date and keywords 
without a controlled thesaurus of accepted keywords aren't particularly 
useful and badgering authors *and* the people doing the searches to use 
one properly is too difficult. (If you wanted to look up some 
documentation, would you want to learn the controlled set of keywords 
first?)

-- 
Henri Sivonen
[EMAIL PROTECTED]
http://www.iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Reply via email to