Hi,

Martijn van Duren wrote on Thu, Apr 01, 2021 at 09:30:36AM +0200:

> When it comes to these discussions I prefer to go back to the standards

I would propose an even more rigorous stance: not only go back to
the standards, but use whatever the Unicode data files (indirectly,
via the Perl modules) parsed by gen_ctype_utf8.pl specify.  Manually
changing properties of individual characters should be restricted
to very rare cases of crystal clear, absolutely unambiguous errors.
When there is the slightest doubt or when there are arguments both
ways, follow the Unicode data files and how Perl interprets them.

We have

  iswcntrl = 1  because UnicodeData.txt has class Cf (format control char)
  iswprint = 1  because the class is neither Cc nor Cs
  wcwidth  = 0  because the class starts with C (control char)

This is also neither obviously nor unambiguously wrong, so it should
not be changed.

The choice of iswcntrl = 1 is most definitely correct because
that's what class Cf says, there can be no doubt about that at all.
Consequently, NetBSD, glibc, and musl are definitely buggy in so far
as they return iswcntrl = 0.

Whether class Cf is always printable is maybe not absolutely clear.
There are arguments both ways.  The stronger argument seems to be
that these format control chars usually appear in the middle of
printable characters and they are printed together with the
surrounding characters.  But maybe the FreeBSD argument that
some of them are sometimes not ptinted and hence iswprint = 0
can also be made, though somewhat dubiously because sometimes
they are printed.  Besides, which property would you use for
deciding printability?  Please, don't resort to deciding that
character-by-character.

Whether all control chars are always width 0 can maybe also be
disputed.  Again, the stronger argument seems to me that they are.
If they weren't, they would not be control characters but alphanumeric,
punctuation, spaces, or special printable characters, none of which
they are.  I say width 1 and 2 require standalone glyphs that are
normally used for the character.  Besides, no operating system
correctly identifies this as a control character and yet gives it
width 1.

I insist that the discussion should remain very strictly formal,
about the properties and classification in the Unicode data files
and nothing else.  If people start arguing about what makes sense
for any particular character, that's already an argument going
astray.


> So going by this phrase the character should not be printed

When formatting a document, for example for printing on paper or
the online equivalent like PostScript or PDF, i agree.  But i
strongly prefer the terminal to always display this character because
the terminal's usual purpose is not nice text formatting for visual
consumption.  It should usually show the full content of strings
or files, be it for inspection or for editing.  Omitting characters
in such contexts sets nasty traps for the person working with the
terminal.

So i say nothing should be changed at all in OpenBSD.

Yes, that means column counting is wrong on the terminal, but that's
a very minor problem, if it's a problem at all, compared to the havoc
that could result from not showing the character on the terminal at
all, and it cannot be fixed without causing worse problems in situations
that matter more.

The bug in NetBSD and Linux should be fixed, but that's off-topic here.

Yours,
  Ingo

Reply via email to