Marc Haber left as an exercise for the reader: > > * any upstream tool could say "bad idea" and refuse patches, > > requiring their long term management, > > Depending of how important this tool is, we could get away without > patching and probably not even documenting this failure.
This kind of attitude seems self-defeating. Despite being *strongly* in favor of this effort, I would oppose it if were strictly a Debian thing. We can inspire the move, but going it alone seems a recipe for present and future pain (think SSHing from/to Debian and a non-Debian machine). > > * the Linux framebuffer console is pretty limited in what > > glyphs it has available, and the number of glyphs it can > > support, > > Probably, yes. But people working on the Linux framebuffer console are > unlikely to actually use UTF-8 user names, so the only really bad With all due respect, this seems totally unsupported by anything other than vibes =]. > > * broken localization (or failure to call setlocale()) could be > > a bigger problem, especially for root/system accounts. > > I don't think we should allow UTF-8 charactes in the string "root" or in > system account names. And if a local admin decides to do so, Debian > packages should still restrict themselves to using US-ASCII in their > system accounts. Why? This would require multiple code paths for what seems to me a very questionable objective. You point out later in your response that there already exist diverging codepaths, but isn't unifying such things always a goal? > Do you have a suggestion for a perl regexp that allows this? My current > development directory has "qr/[\p{Graph}*\.\${}><%'@]+/". I do not. This is not a regex problem in my mind and experience; you need full access to complicated libraries. Any such effort should go through Annex 15 canonicalization before being inspected at all. At that point, you're well past regular languages so far as I can tell. I do not see this goal as possible with small surgeries on the adduser code base, but rather something that requires work across the chain. > > Names containing invalid UTF-8 sequences ought be rejected. > Agreed. How do I check for this in perl? I have no idea. It's not very simple. Here's code from my Notcurses library that extracts a single EGC from a UTF8 string: https://github.com/dankamongmen/notcurses/blob/a5c7d2262a333353bd5c3428c9397de4864c79ff/src/lib/egcpool.h#L87 > > My printer is administered by > > i̸̒n̴͛e̵̎l̴͝u̷̾c̴̉t̵́å̵b̷͋l̷͐e̴̋m̸̆o̷̚d̴̐ä̸́l̶͝i̷̋t̷͗ẏ̷ȏ̵f̸̃t̶͘h̷͗e̴̿v̶͘i̷̛s̸̈́ì̵b̷̃l̶̎e̷͊. > That really renders strangely here. That was intended, to demonstrate the complexity of potential strings we might have to deal with. > > It cannot. "C" is not UTF-8. Assumption of UTF-8 requires a > > properly set LANG and programs calling setlocale(). This, as > > alluded to above, has the potential for a big mess. > Our default is C.UTF-8 and has been like that for a while. Yes, but that can be changed. With all due respect, I admire your gung ho candoit spirit, but adduser alone is not IMHO the place. This is a major change requiring support from libraries, applications, and UI to do right, and thus wide buyin. I love the idea, but it's not going to happen with a few Perl regexes. Please don't read this as commentary on you or your code. -- nick black -=- https://nick-black.com to make an apple pie from scratch, you need first invent a universe.
signature.asc
Description: PGP signature