On Tue, Dec 10, 2024 at 02:52:05PM +0100, Gioele Barabucci wrote: > NFC has been mentioned in a broader discussion on PRECIS/RFC8264/RFC8265. > > The IdentifierClass of RFC 8264 explicitly disallows all these "security > land mines": https://www.rfc-editor.org/rfc/rfc8264.html#section-4.2.3 > > The "Security considerations" section is quite extensive (5 pages long): > https://www.rfc-editor.org/rfc/rfc8264.html#section-12
Oh, good. I was just getting worried when discussion on the list seemed to be treating NFC as a silver bullet, and people were suggesting that the canonicalization should be done both by readers and writers of /etc/passwd --- which would imply linking libunicode into setuid programs like sudo and login, with the (to my view) invevitable results of hilarity ensuing. As I look at RFC 8264, I note that it does not take a position about which version of Unicode should be considered canonical, and in fact talks about one of the features (tm) of RFC 8264 being that it is agile with respect to newer versions of Unicode. However, it should be noted that RFC 8264 also states that code points which are not defined in whatever version of the Unicode supported by "the application" shall be disallowed. From Debian's perspective, though, if we are going to take a position about what version of Unicode should be supported by "the application(s)" that read and write /etc/passwd, we *will* need to take a position on what version of Unicode should be supported, and therefore, what set of characters will be disallowed. It also means that we need to be careful about what happens when we want to upgrade to newer versions of Unicode in future versions of Debian. If the system administrator wants to support more than one version of Debian, then it would be advisable if the Unicode version is something which is configurable, especially if the passwd entries are being supplied via some kind of network protocol such as LDAP or Hesiod (for those people who remember MIT Project Athena :-P). There is also (admittedly, only on edge case) of what to do if a newer version of Unicode disallows or remove characters. This rarely happens, but it has in the past (in particular in the case of various security disasters, or in the case of characters getting deprecated in favor of newer characters, many of which are mentioned in RFC 8264). So we can probably just ignore this case and hope that the Unicode consortium will be more careful in the future, but I'd thought I'd just mention it. The bottom line is that while I am sympethetic to the desire to support Unicode --- heck, I was one of the primary drivers of libunicode into the kernel so we could support case folding for more than just the ASCII character set --- the meme of "One does not simply walk into Morder" also applies for "adopting Unicode". And I am reminded of one of my IETF mentors who was an Iternationalization expert tell me two decades ago that, late at night, in the bar after a standard meeting, one of the things that I18N folks would say, just amongst themselves, was, "It would be easier just to teach everyone English" --- and this was with I18N experts who understood everything that was involved in doing full I18N support. No doubt this was only half-joking, but I think the point is valid. So if we're going to do this, let's do it right. :-) - Ted