Bug#292330: project: UTF-8 as default

Harald Thingelstad Wed, 26 Jan 2005 04:53:17 -0800

Package: project
Severity: wishlist
Tags: l10n


Most basic problems with use of UTF-8 (both in languages and standard
libraries) should have been fixed now, and as I see it, it's time to
head for easier integration of UTF-8, system-wide.

By this, I'm not talking about enforcing this character code on the
whole Debian system, but see to that:
1) Installing systems with UTF-8 is easier, also with locales not
strictly in need of this. UTF-8 as default is not necessarily my
ultimate goal (as the title suggests), but having the option of using
UTF-8 (or other encodings) system-wide, no matter what languages are
chosen.
2) See to that all Debian packages handles UTF-8 properly.

The problem with choosing one character encoding per language is multilingual 
environments.
When one language suggests one encoding and another language something else, 
trying to mix these languages will always give you unreadable text. 
One way or another.

As written in http://www.jw-stumpel.nl/stestu.html:
"Traditionally, for storing texts in various languages, special encoding
methods are used, for instance Latin-1 (1 byte per character) for
West-European languages with accented letters, KOI-8 for Russian, or
EUC-JP (2 bytes per character) for Japanese.

Only very limited 'mixing' of languages (..) is possible in these
systems."


Some examples:
1)
I've been working in Eritrea lately, setting up computers in a school.
Eritreans have nine official languages, all treated equally. 
One is Arabic, using arabic script of course.
Two, Tigre and Tigrinya, uses an ancient script called Geez. Normal
western left-to-right, but more than two hundred letters look nothing
like Latin.
The rest of the languages use the latin alphabet.
Adding to that, the official language in school, secondary level and up, is
English. That doesn't stop them from wanting to use their own languages from 
time to time.
So the situation is this:
They'll mostly use English, but sometimes other languages, covering
up to three script systems. This means documents, file names, etcetera.
And even when using English desktop settings, they'll want be able to
read these other scripts.
Only option is to use UTF-8 on the whole system, no matter what
language.

2)
There's an ethnic minority in this country of mine, called Sami. They have 
their own language. Basically they use latin characters, but with some 
extensions only covered in UTF-8.
The rest of us use ISO-LATIN-1, also called ISO-8859-1. Popular
eight bit charset, that is.
Now: Most of us only see Sami language occasionally. We can't even read
that other language, so it doesn't bother us if ISO-8859-1 is default.
Debian-installer enforces it quite heavily.
But some people use both. More or less one, more or less the other.
So what do you make your default language, when one of them (the most
popular) will give you gibberish in every second word?


So:
So for people only using English, it doesn't matter. Nor much in Western
Europe.
But the rest of the world uses several languages and even several
scripts. Especially when using computers, english-dominated as they are.
Character encodings not supporting all characters can only be used for a
few languages at a time. Redhat solved this a long time ago, so why
can't we?

I think it's time to wake up and smell the coffee.

-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.8-1-386
Locale: LANG=nb_NO, LC_CTYPE=nb_NO (charmap=ISO-8859-1)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#292330: project: UTF-8 as default

Reply via email to