unicode.c is trying to do?

John Kearney Tue, 06 Mar 2012 21:07:36 -0800

You really should stop using this function. It is just plain wrong, and
is not predictable.

It may enocde BIG5 and SJIS but is is more by accident that intent.

If you want to do something like this then do it properly.

basically all of the multibyte system have to have a detection method
for multibyte characters, most of them rely on bit7 to indicate a
multibyte sequence or use vt100 SS3 escape sequences. You really can't
just inject random data into a txt buffer. even returning UTF-8 as a
fallback is a bug. The most that should be done is return ASCII in error
case and I mean U+0-U+7f only and ignore or warn about any unsupported
characters.

Using this function is dangerous and pointless.

I mean seriously in what world does it make sense to inject utf-8 into a
big5 string? Or indead into a ascii string. Code should behave like an
adult, not like a frightened kid. By which I mean it shouldn't pretend
it knows what its doing when it doesn't, it should admit the problem so
that the problem can be fixed.

On 02/21/2012 04:28 AM, Chet Ramey wrote:
> On 2/19/12 5:07 PM, John Kearney wrote:
>> Can somebody explain to me what u32tochar is trying to do?
>>
>> It seems like dangerous code?
>>
>> from the context i'm guessing it trying to make a hail mary pass at
>> converting utf-32 to mb (not utf-8 mb)
> 
> Pretty much.  It's a big-endian representation of a 32-bit integer
> as a character string.  It's what you get when you don't have iconv
> or iconv fails and the locale isn't UTF-8.  It may not be useful,
> but it's predictable.  If we have a locale the system doesn't know
> about or can't translate, there's not a lot we can do.
> 
> Chet

Re: Can somebody explain to me what u32tochar in /lib/sh/unicode.c is trying to do?

Reply via email to