Package: project Severity: wishlist Tags: l10n
Most basic problems with use of UTF-8 (both in languages and standard libraries) should have been fixed now, and as I see it, it's time to head for easier integration of UTF-8, system-wide. By this, I'm not talking about enforcing this character code on the whole Debian system, but see to that: 1) Installing systems with UTF-8 is easier, also with locales not strictly in need of this. UTF-8 as default is not necessarily my ultimate goal (as the title suggests), but having the option of using UTF-8 (or other encodings) system-wide, no matter what languages are chosen. 2) See to that all Debian packages handles UTF-8 properly. The problem with choosing one character encoding per language is multilingual environments. When one language suggests one encoding and another language something else, trying to mix these languages will always give you unreadable text. One way or another. As written in http://www.jw-stumpel.nl/stestu.html: "Traditionally, for storing texts in various languages, special encoding methods are used, for instance Latin-1 (1 byte per character) for West-European languages with accented letters, KOI-8 for Russian, or EUC-JP (2 bytes per character) for Japanese. Only very limited 'mixing' of languages (..) is possible in these systems." Some examples: 1) I've been working in Eritrea lately, setting up computers in a school. Eritreans have nine official languages, all treated equally. One is Arabic, using arabic script of course. Two, Tigre and Tigrinya, uses an ancient script called Geez. Normal western left-to-right, but more than two hundred letters look nothing like Latin. The rest of the languages use the latin alphabet. Adding to that, the official language in school, secondary level and up, is English. That doesn't stop them from wanting to use their own languages from time to time. So the situation is this: They'll mostly use English, but sometimes other languages, covering up to three script systems. This means documents, file names, etcetera. And even when using English desktop settings, they'll want be able to read these other scripts. Only option is to use UTF-8 on the whole system, no matter what language. 2) There's an ethnic minority in this country of mine, called Sami. They have their own language. Basically they use latin characters, but with some extensions only covered in UTF-8. The rest of us use ISO-LATIN-1, also called ISO-8859-1. Popular eight bit charset, that is. Now: Most of us only see Sami language occasionally. We can't even read that other language, so it doesn't bother us if ISO-8859-1 is default. Debian-installer enforces it quite heavily. But some people use both. More or less one, more or less the other. So what do you make your default language, when one of them (the most popular) will give you gibberish in every second word? So: So for people only using English, it doesn't matter. Nor much in Western Europe. But the rest of the world uses several languages and even several scripts. Especially when using computers, english-dominated as they are. Character encodings not supporting all characters can only be used for a few languages at a time. Redhat solved this a long time ago, so why can't we? I think it's time to wake up and smell the coffee. -- System Information: Debian Release: 3.1 APT prefers testing APT policy: (500, 'testing') Architecture: i386 (i686) Kernel: Linux 2.6.8-1-386 Locale: LANG=nb_NO, LC_CTYPE=nb_NO (charmap=ISO-8859-1) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]