On 23/09/12 04:29, Paul Crawford wrote:
What I hate about unicode was the idea of adopting 16-bit characters and thus breaking so much byte-orientated code that was written, tested, and integrated over the history of computing.
You make it sound like the Unicode Consortium hacked into people's computers and changed their existing 8-bit ASCII files into 16-bit UCS-2 files. I'm pretty sure that never happened. The actual problem was with the application writers, for failing to distinguish between file formats correctly. If people failed to distinguish PNG files from GIF files (say, they decided to keep the .gif file extension for PNG files), would you blame PNG for breaking the code? Would you insist that all progress in creating better image formats should cease, that 8-bit colour is enough for everybody? That people should just stick to "plain images"? Of course not. You would recognise that the fault was in the people who stupidly decided that there was no need to distinguish between GIF and PNG, and the programs that were already broken because they made certain assumptions about the data they would be given but didn't cope well when those assumptions were violated. The problem you describe predates Unicode. The same problem occurs with the older "code page" standards such as Latin-1, so-called "ANSII text", and dozens of other encodings that predate Unicode, some of which were multibyte. The actual problem was two-fold: * The writers of ASCII text editors and ASCII-only tools foolishly believed that there was such a thing as "plain text". Historically, that is understandable. That some people continue to think so is unforgivable willful ignorance. * The writers of text editors foolishly had no mechanism for accurately determining the format used. On Unix, they assumed there was only one text format. On Windows, they used the same .txt file extension for all text files, regardless of format. That second is equivalent to insisting that all image files (JPEGs, TIFFs, GIFs, PNGs, and dozens of others) should either have no file extension at all, or all should use (say) ".bmp". That's fine, *if* you write your program to detect formats you can't deal with and gracefully decline to handle them. But people didn't do this, because they had this idea that they were dealing with "plain text" instead of dozens of different formats. "Plain text" is one of the most pernicious, harmful, and *idiotic* memes in computing, about up there with the idea that you only need two years to specify the year. There has *never* been such a thing as "plain text" -- ASCII post-dates text formats such as EBCDIC, there have *always* been multiple single- byte text formats. To say nothing of different conventions for line endings. Adding multi-byte Unicode didn't create the problem. It just made the problem obvious to those who were ignorant of it because they hardly ever interchanged "plain text" files between (say) Unix and DOS, or Windows and Macintosh, or IBM mainframes and Commodore home computers. (And when they did, *both* sides grumbled that the *other* side didn't know what "plain text" was.) That's at least six different "plain texts" right there: - ASCII with \n line endings - ASCII with \r\n line endings and ^Z end-of-file marker - "extended ASCII" with any of dozens of different code pages - Mac 8-bit "extended ASCII" character set (MacRoman) - EBCDIC - PETSCII People got away with these wrong-headed assumptions for so long because, before the Internet, folks hardly ever interchanged text with users of different languages and formats. But that was then, this is now, and interchange text in different languages and formats is all we do on the Internet. Every file has a filename, every webpage is text. Unicode is the solution to these dozens of incompatible text formats, it is not the cause. The sooner people stop pining for a Golden Age of "good ol' plain ASCII text" that never existed, and start using Unicode, the better off we'll all be. -- Steven _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users