Re: UTF-8 in jessie

Ian Jackson Wed, 28 Aug 2013 08:22:27 -0700

Adam Borowski writes ("UTF-8 in jessie"):
> I would like to propose full UTF-8 support.  I don't mean here full
> support for all of Unicode's finer points, merely complete eradication of
> mojibake.  That is, ensuring that /m.o/ matches "möo", or that "ä" sorts
> as equal to "a""combining ¨" is out of scope of this proposal.


I agree with everything you propose except that I have one reservation
regarding this:

> 4. all text files should be encoded in UTF-8

I agree with this except that I think it should be permitted that a
text file uses ASCII codepoints.

You may say "but UTF-8 is a superset of ASCII".  Well, no, it isn't.
UTF-8 is a superset of ISO-646 but ISO-646 is not identical to ASCII.
In particular the descriptions of the codepoints ` ' in ISO-646
effectively forbids them from being used as matching single quotes,
despite that being specified as allowed in ASCII.

I don't think that better UTF-8 support should involve needlessly
converting 7-bit ASCII text files which use ` ' as matched quotes,
into UTF-8 text files which use non-ISO-646 codepoints.

(In fact I would like to see Markus Kuhn's decision about ` ' reversed
- our default character set should be ASCII for 0..127 plus UTF for
the rest.  That's not an argument I expect to win but at the very
least we shouldn't have to worsify things for ASCII users.)

Thanks,
Ian.


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/21022.5425.511942.342...@chiark.greenend.org.uk

Re: UTF-8 in jessie

Reply via email to