Marc Haber left as an exercise for the reader:
> >  * any upstream tool could say "bad idea" and refuse patches,
> >    requiring their long term management,
> 
> Depending of how important this tool is, we could get away without
> patching and probably not even documenting this failure.

This kind of attitude seems self-defeating. Despite being
*strongly* in favor of this effort, I would oppose it if were
strictly a Debian thing. We can inspire the move, but going it
alone seems a recipe for present and future pain (think SSHing
from/to Debian and a non-Debian machine).

> >  * the Linux framebuffer console is pretty limited in what
> >    glyphs it has available, and the number of glyphs it can
> >    support,
> 
> Probably, yes. But people working on the Linux framebuffer console are
> unlikely to actually use UTF-8 user names, so the only really bad

With all due respect, this seems totally unsupported by anything
other than vibes =].

> >  * broken localization (or failure to call setlocale()) could be
> >    a bigger problem, especially for root/system accounts.
> 
> I don't think we should allow UTF-8 charactes in the string "root" or in
> system account names. And if a local admin decides to do so, Debian
> packages should still restrict themselves to using US-ASCII in their
> system accounts.

Why? This would require multiple code paths for what seems to me a
very questionable objective. You point out later in your
response that there already exist diverging codepaths, but isn't
unifying such things always a goal?

> Do you have a suggestion for a perl regexp that allows this? My current
> development directory has "qr/[\p{Graph}*\.\${}><%'@]+/".

I do not. This is not a regex problem in my mind and experience;
you need full access to complicated libraries. Any such effort
should go through Annex 15 canonicalization before being
inspected at all. At that point, you're well past regular
languages so far as I can tell. I do not see this goal as
possible with small surgeries on the adduser code base, but
rather something that requires work across the chain.

> > Names containing invalid UTF-8 sequences ought be rejected.
> Agreed. How do I check for this in perl?

I have no idea. It's not very simple. Here's code from my
Notcurses library that extracts a single EGC from a UTF8 string:

https://github.com/dankamongmen/notcurses/blob/a5c7d2262a333353bd5c3428c9397de4864c79ff/src/lib/egcpool.h#L87

> > My printer is administered by 
> > i̸̒n̴͛e̵̎l̴͝u̷̾c̴̉t̵́å̵b̷͋l̷͐e̴̋m̸̆o̷̚d̴̐ä̸́l̶͝i̷̋t̷͗ẏ̷ȏ̵f̸̃t̶͘h̷͗e̴̿v̶͘i̷̛s̸̈́ì̵b̷̃l̶̎e̷͊.
> That really renders strangely here.

That was intended, to demonstrate the complexity of potential
strings we might have to deal with.

> > It cannot. "C" is not UTF-8. Assumption of UTF-8 requires a
> > properly set LANG and programs calling setlocale(). This, as
> > alluded to above, has the potential for a big mess.
> Our default is C.UTF-8 and has been like that for a while.

Yes, but that can be changed.

With all due respect, I admire your gung ho candoit spirit, but
adduser alone is not IMHO the place. This is a major change
requiring support from libraries, applications, and UI to do
right, and thus wide buyin. I love the idea, but it's not going
to happen with a few Perl regexes. Please don't read this as
commentary on you or your code.

-- 
nick black -=- https://nick-black.com
to make an apple pie from scratch,
you need first invent a universe.

Attachment: signature.asc
Description: PGP signature

Reply via email to