On Tue, Nov 26, 2002 at 12:38:52AM +0530, Krishna Dagli wrote: > > Why is a Unix program using UTF16BE (or UCS2BE) for its internal > > representation of localization data? > > As per the upstream author : > UTF16LE or UTF16BE tells that it's unicode (Gammu support > both). I use Unicode in localisation data to avoid such problem: in the > OS of somebody, who will make localisation data for X language, > there is set different codepage than in my PC. But my codepage contains > the same chars too. Using Unicode allows to avoid problems - on my PC > all chars are displayed correctly too. I can open it in Unicode editor > and see correct accent, etc. chars
Except that UTF16 is the absolute dumbest Unicode encoding in existence, inheriting compatibility problems from both widechar and multibyte encoding styles. The Unix convention is to use UTF8 as the encoding for such things, to maintain compatibility with C strings -- and with tools like diff. If you're stuck with changes to such files in your package, you must encode those changes in a format diff can understand, either using something like sharutils to store the binary data in a text format, or something like 'iconv' to convert the text to a sensible encoding. -- Steve Langasek postmodern programmer
pgpZTVt2lZCPj.pgp
Description: PGP signature