On 2020-12-29, Walter Dnes <[email protected]> wrote: > On Tue, Dec 29, 2020 at 05:11:36PM +0200, Andreas K. Huettel wrote >> Hi Walter, >> >> > "-pch -roaming -sendmail -spell -tcpd -udev -udisks -unicode -upower >> > -xinerama" >> >> mostly out of curiosity, why do you want to disable unicode support >> here? >> >> This feels odd to me since utf8 has effectively become the standard >> encoding over the past years. > > I don't know if this has improved over the years, but my initial > experience with unicode was rather negative. The fact that text > files were twice as large wasn't a major problem in itself. The > real showstopper was that importing text files into spreadsheets > and text-editors and word processors failed miseraby.
You must be talking about some sort of weird "wide" encoding (is there such a thing as UTF-16?). I've never seen a file like that. Everybody and everything uses UTF-8 these days and has for years. UTF-8 is a superset of ASCII, and doesn't increase size of the file unless non-ascii characters are used. Converting an ASCII file to UTF-8 encoding is a noop. An ASCII file _is_ UTF-8. > I looked at a unicode text file with a binary viewer. It turns out > that a simple text string like "1234" was actually... > > "1" binary-zero "2" binary-zero "3" binary-zero "4" binary zero, etc. > > This padding explains why the file was twice as large, and also why > "a simple textfile import" failed miserably. I've never seen a file like that. All the Unicode I run into is UTF-8, and a UTF-8 file with the string "1234" is the same exact 4 bytes as an ASCII file with the string "1234". > On top of that Cyrillic letters like "m", "i", "c", and "o" are > considered different from their English equivalants. Security experts > showed proof-of-cocept attacks where clicking on "microsoft.com" can > take you to a hostile domain (queue the jokes). I don't speak or read > or write any languages which have thousands of unique characters. > Seeing Chinese spam "as it was intended to be seen", is not a priority > for me.

