Package: less Version: 551-1 Severity: important [Severity set to important as this regression breaks scripts and most file formats, making the cause not even show up on "git diff" etc.]
Hi, since v494 (released upstream in 2017 but not uploaded to Debian until a few days ago), control characters with the Unicode "category" property of Cf are ignored instead of being displayed as <1234> as before. These characters cannot be generally displayed without reader-specific code for that particular character. Their function is either: * reformatting a piece of text, requiring a text processing engine with a view for a whole line (RTL redirections, etc) or paragraph (vertical formatting, ...) * invisible symbols ("tags", "invisible times" for math ("ab" as in "a*b")) Other than U+FEFF ZERO WIDTH NO-BREAK SPACE (aka BOM) none of the above are seen in normal use. Alas, some Windows text editors inject U+FEFF as the first character of text that's being saved. This goes against explicit recommendation from Unicode, and is not even needed by new Microsoft products (who are finally transitioning to UTF-8). But you can't fix what's already out there. For this reason, less v494 started silently hiding U+FEFF so the text looks better. Alas, this breaks text that's supposed to be interpreted by machine rather than humans -- and on an Unix system, that's the majority of text. We don't tend to write a "letter to Mom" as a text file, we write code or some markup. And that's broken by such invisible characters. The main case are hashbangs. A script that has U+FEFF before #! instead of invoking the specified interpreter uses some unspecified shell. This causes a mysterious failure -- or, worse -- using a different shell when ran interactively (from bash) than invoked from a sh script. With less <494, the cause was obvious the moment you glanced at the file. With less >=494, expect hours of troubleshooting, especially if you did not expect that. And less is not only the most used file viewer by itself, it's also used by git and others, letting stray U+FEFF escape review. A proposed fix: let's revert both commits in 494. The actual change is a two-liner -- but as less doesn't build a good part of its code from source, this requires manually calling "make -f Makefile.aut $SOMETARGET", thus reverting both is less work (obviously, we'd want to build from actual source for DFSG reasons, but that's not a topic for this bug). Meow! -- System Information: Debian Release: bullseye/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'unstable'), (500, 'stable'), (150, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 5.3.1-00048-g49ab9d355af6 (SMP w/6 CPU cores) Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE=C.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: sysvinit (via /sbin/init) Versions of packages less depends on: ii libc6 2.29-2 ii libtinfo6 6.1+20190803-1 less recommends no packages. less suggests no packages. -- no debconf information