Hi, Martijn van Duren wrote on Thu, Apr 01, 2021 at 09:30:36AM +0200:
> When it comes to these discussions I prefer to go back to the standards I would propose an even more rigorous stance: not only go back to the standards, but use whatever the Unicode data files (indirectly, via the Perl modules) parsed by gen_ctype_utf8.pl specify. Manually changing properties of individual characters should be restricted to very rare cases of crystal clear, absolutely unambiguous errors. When there is the slightest doubt or when there are arguments both ways, follow the Unicode data files and how Perl interprets them. We have iswcntrl = 1 because UnicodeData.txt has class Cf (format control char) iswprint = 1 because the class is neither Cc nor Cs wcwidth = 0 because the class starts with C (control char) This is also neither obviously nor unambiguously wrong, so it should not be changed. The choice of iswcntrl = 1 is most definitely correct because that's what class Cf says, there can be no doubt about that at all. Consequently, NetBSD, glibc, and musl are definitely buggy in so far as they return iswcntrl = 0. Whether class Cf is always printable is maybe not absolutely clear. There are arguments both ways. The stronger argument seems to be that these format control chars usually appear in the middle of printable characters and they are printed together with the surrounding characters. But maybe the FreeBSD argument that some of them are sometimes not ptinted and hence iswprint = 0 can also be made, though somewhat dubiously because sometimes they are printed. Besides, which property would you use for deciding printability? Please, don't resort to deciding that character-by-character. Whether all control chars are always width 0 can maybe also be disputed. Again, the stronger argument seems to me that they are. If they weren't, they would not be control characters but alphanumeric, punctuation, spaces, or special printable characters, none of which they are. I say width 1 and 2 require standalone glyphs that are normally used for the character. Besides, no operating system correctly identifies this as a control character and yet gives it width 1. I insist that the discussion should remain very strictly formal, about the properties and classification in the Unicode data files and nothing else. If people start arguing about what makes sense for any particular character, that's already an argument going astray. > So going by this phrase the character should not be printed When formatting a document, for example for printing on paper or the online equivalent like PostScript or PDF, i agree. But i strongly prefer the terminal to always display this character because the terminal's usual purpose is not nice text formatting for visual consumption. It should usually show the full content of strings or files, be it for inspection or for editing. Omitting characters in such contexts sets nasty traps for the person working with the terminal. So i say nothing should be changed at all in OpenBSD. Yes, that means column counting is wrong on the terminal, but that's a very minor problem, if it's a problem at all, compared to the havoc that could result from not showing the character on the terminal at all, and it cannot be fixed without causing worse problems in situations that matter more. The bug in NetBSD and Linux should be fixed, but that's off-topic here. Yours, Ingo