Re: Newbie : unrepresentable changes to source

Steve Langasek Mon, 25 Nov 2002 14:08:11 -0600

On Tue, Nov 26, 2002 at 12:38:52AM +0530, Krishna Dagli wrote:
> > Why is a Unix program using UTF16BE (or UCS2BE) for its internal
> > representation of localization data?
> 
> As per the upstream author :
> UTF16LE or  UTF16BE tells that it's unicode (Gammu support
> both). I use Unicode in localisation data to avoid such problem: in the
> OS of somebody, who will make localisation data for X language,
> there is set different codepage than in my PC. But my codepage contains
> the same chars too. Using Unicode allows to avoid problems - on my PC
> all chars are displayed correctly too. I can open it in Unicode editor
> and see correct accent, etc. chars


Except that UTF16 is the absolute dumbest Unicode encoding in existence,
inheriting compatibility problems from both widechar and multibyte
encoding styles.  The Unix convention is to use UTF8 as the encoding for
such things, to maintain compatibility with C strings -- and with tools
like diff.

If you're stuck with changes to such files in your package, you must
encode those changes in a format diff can understand, either using
something like sharutils to store the binary data in a text format, or
something like 'iconv' to convert the text to a sensible encoding.

-- 
Steve Langasek
postmodern programmer

pgpZTVt2lZCPj.pgp
Description: PGP signature

Re: Newbie : unrepresentable changes to source

Reply via email to