On 04/05/2025 at 15:27, Marc Haber wrote:

It looks like the \p{L} and other Unicode character classes dont match anything if libperl is not installed.

According to my tests, they match at least ASCII letters, digits, regular ASCII space and non-breakable space.

So we just extend the regexp to match explictly what would be in ISO-8859-x, yielding the kind of uncomfortable

commentre => qr/[-"_\.+!\$%&()\]\[;\/'’ A-Za-z0-9\x{a1}-\x{ac}\x{ae}- \x{ff}\p{L}\p{Nd}\p{Zs}]*/,

So this allows the safe special characters below 0x40, a regular space, the latin letters in both cases, digits, the high order characters that are different in any ISO-8859 charset (explicitly excluding the non- breaking space and soft hyphen), followed by the Unicode Letters, Unicode Digits and Unicode Whitespace.

My test results with àœæßéÀÔùñ:

* with libperl5.40 and perl & perl-modules-5.40
 * with LANG=fr_FR.UTF-8 or C.UTF-8
  \p{L}\p{Nd}\p{Zs}: OK
  \x{a1}-\x{ac}\x{ae}-\x{ff}: OK except œŒ
 * with LANG=C
  \p{L}\p{Nd}\p{Zs}: non-ASCII KO
  \x{a1}-\x{ac}\x{ae}-\x{ff}: non-ASCII KO

Note: with LANG=C and either the original or new regexes, adduser indefinitely hangs with high CPU load if the gecos field contains more than 5 non-ASCII characters. It does not happen without libperl5.40. This currently affects the installer.

* without libperl5.40 and perl, with or without perl-modules-5.40
 * LANG=fr_FR.UTF-8 or C.UTF-8 or C
  \p{L}\p{Nd}\p{Zs}: non-ASCII KO except à
  \x{a1}-\x{ac}\x{ae}-\x{ff}: àœæÆß and uppercase accented letters KO

So, on a system without full perl (and probably with a non UTF-8- locale), this will match most languages that have an ISO-8859 charset. In a full system, we have full Unicode support.

d-i always installs C.UTF-8, so there is at least one UTF-8 locale.

Would this help the installer?

It looks like a step forward, but the new regex still does not match some letters nor uppercase accented letters when libperl is not installed.

Reply via email to