Package: less
Version: 551-1
Severity: important

[Severity set to important as this regression breaks scripts and most
file formats, making the cause not even show up on "git diff" etc.]

Hi, since v494 (released upstream in 2017 but not uploaded to Debian until a
few days ago), control characters with the Unicode "category" property of Cf
are ignored instead of being displayed as <1234> as before.

These characters cannot be generally displayed without reader-specific code
for that particular character.  Their function is either:
 * reformatting a piece of text, requiring a text processing engine with a
   view for a whole line (RTL redirections, etc) or paragraph (vertical
   formatting, ...)
 * invisible symbols ("tags", "invisible times" for math ("ab" as in "a*b"))

Other than U+FEFF ZERO WIDTH NO-BREAK SPACE (aka BOM) none of the above are
seen in normal use.  Alas, some Windows text editors inject U+FEFF as the
first character of text that's being saved.

This goes against explicit recommendation from Unicode, and is not even
needed by new Microsoft products (who are finally transitioning to UTF-8).
But you can't fix what's already out there.

For this reason, less v494 started silently hiding U+FEFF so the text looks
better.  Alas, this breaks text that's supposed to be interpreted by machine
rather than humans -- and on an Unix system, that's the majority of text.
We don't tend to write a "letter to Mom" as a text file, we write code or
some markup.  And that's broken by such invisible characters.

The main case are hashbangs.  A script that has U+FEFF before #! instead of
invoking the specified interpreter uses some unspecified shell.  This causes
a mysterious failure -- or, worse -- using a different shell when ran
interactively (from bash) than invoked from a sh script.  With less <494, the
cause was obvious the moment you glanced at the file.  With less >=494, expect
hours of troubleshooting, especially if you did not expect that.

And less is not only the most used file viewer by itself, it's also used by
git and others, letting stray U+FEFF escape review.


A proposed fix:
let's revert both commits in 494.  The actual change is a two-liner -- but as
less doesn't build a good part of its code from source, this requires manually
calling "make -f Makefile.aut $SOMETARGET", thus reverting both is less work
(obviously, we'd want to build from actual source for DFSG reasons, but that's
not a topic for this bug).


Meow!
-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'unstable'), (500, 'stable'), 
(150, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.3.1-00048-g49ab9d355af6 (SMP w/6 CPU cores)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE=C.UTF-8 
(charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages less depends on:
ii  libc6      2.29-2
ii  libtinfo6  6.1+20190803-1

less recommends no packages.

less suggests no packages.

-- no debconf information

Reply via email to