On 23/09/12 05:03, Steven D'Aprano wrote:
On 23/09/12 04:29, Paul Crawford wrote:

What I hate about unicode was the idea of adopting 16-bit characters and
thus breaking so much byte-orientated code that was written, tested, and
integrated over the history of computing.

You make it sound like the Unicode Consortium hacked into people's
computers
and changed their existing 8-bit ASCII files into 16-bit UCS-2 files. I'm
pretty sure that never happened.

The point I was hoping to make was not to denigrate the desirability of a single universal character set, but about the specific idea of USC-2 representation.

For example, it is (was?) the case that if you wanted to properly use multi-language support on Windows NT (and later) you had to re-write any application to make use of 16-bit 'wide' character strings, thus breaking anything written in the past that assumed byte-orientated text.

And that is a *lot* of useful stuff that we are talking about: libraries, applications, storage devices, file compression utilities, etc.

Now you may have a point that the use of byte-orientated and NUL-terminated strings as developed for C/UNIX was possibly short-sighted, but in the context of 1960s/70s computing it was reasonable, quite possibly necessary, to be usably fast on the hardware of the day.

USC-2 breaks that by going 16-bit wide with NUL upper bytes in most common cases, and it requires a byte-order marker to cope with differing CPU architectures. Both should have been obvious at the time, so I don't know why it was adopted in that form.

UTF-8 on the other had allows a universal character set (and one much bigger than UCS-2) *and* it works with legacy code that relies on byte-represented text with NUL string terminators and all of the corresponding stuff built around that.

Regards, Paul

_______________________________________________
Pan-users mailing list
Pan-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/pan-users

Reply via email to