>>>>> Paul Eggert <egg...@cs.ucla.edu> writes: >>>>> On 05/29/2012 06:11 AM, Reuben Thomas wrote:
>> I find UTF-8 to be a great boon precisely for making plain text more >> legible. I'd say that it allows the machine to discern certain things better. As for, e. g., distinguishing “ambivalent” quote ('; as used in programming languages, with the notable expception of M4, which pairs it with `) from the proper typographic single quotes (‘, ’), an arrow from an ASCII-based C (or GNU R) construct, etc. > UTF-8 is sometimes necessary and usually works, but even today it > fails often enough that I'd rather avoid it if it's merely a minor > style issue such as arrows. For example, if from my Fedora desktop I > run plain "ssh" into a random Solaris 11 host and try to paste that > "→" into Emacs, Emacs says "Regexp I-search backward:", The problem is that the 7-th bit, undefined by ASCII, was historically used for multiple purposes, and among those is the indication of the use of the Meta key. Now, the arrow (U+2192) is encoded as follows per UTF-8: $ enable -n printf ; LC_ALL=en_US.UTF-8 printf \\u2192 | od -t o1 0000000 342 206 222 0000003 $ Which Emacs interprets as: M-b C-f C-M-r, or, given the bindings (currently effective in my Emacs instance; I assume they're the defaults; still): backward-word forward-char isearch-backward-regexp. > and if I try to visit a file containing the "→" I see "?". I'm sure > that I can work around this issue with the proper ssh flags and > environment settings and whatnot, but who has the time? I've never seen a non-7-bit-clean SSH, but you still may need to set a UTF-8 locale (such as, e. g., en_US.UTF-8 in GNU; I'm not sure about Solaris), and check your terminal emulator's settings. As for Emacs, I guess that (set-language-environment "UTF-8") is sufficient. -- FSF associate member #7257