On 23/09/12 05:03, Steven D'Aprano wrote:
On 23/09/12 04:29, Paul Crawford wrote:
What I hate about unicode was the idea of adopting 16-bit characters and
thus breaking so much byte-orientated code that was written, tested, and
integrated over the history of computing.
You make it sound like the Unicode Consortium hacked into people's
computers
and changed their existing 8-bit ASCII files into 16-bit UCS-2 files. I'm
pretty sure that never happened.
The point I was hoping to make was not to denigrate the desirability of
a single universal character set, but about the specific idea of USC-2
representation.
For example, it is (was?) the case that if you wanted to properly use
multi-language support on Windows NT (and later) you had to re-write any
application to make use of 16-bit 'wide' character strings, thus
breaking anything written in the past that assumed byte-orientated text.
And that is a *lot* of useful stuff that we are talking about:
libraries, applications, storage devices, file compression utilities, etc.
Now you may have a point that the use of byte-orientated and
NUL-terminated strings as developed for C/UNIX was possibly
short-sighted, but in the context of 1960s/70s computing it was
reasonable, quite possibly necessary, to be usably fast on the hardware
of the day.
USC-2 breaks that by going 16-bit wide with NUL upper bytes in most
common cases, and it requires a byte-order marker to cope with differing
CPU architectures. Both should have been obvious at the time, so I don't
know why it was adopted in that form.
UTF-8 on the other had allows a universal character set (and one much
bigger than UCS-2) *and* it works with legacy code that relies on
byte-represented text with NUL string terminators and all of the
corresponding stuff built around that.
Regards, Paul
_______________________________________________
Pan-users mailing list
Pan-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/pan-users