Eli Zaretskii wrote: > > From: Bruno Haible <br...@clisp.org> > > Cc: bug-texinfo@gnu.org > > Date: Mon, 09 Oct 2023 18:15:05 +0200 > > > > Eli Zaretskii wrote: > > > unless the locale's codeset is UTF-8, any character that is not > > > printable _in_the_current_locale_ will return -1 from wcwidth. I'm > > > guessing that no one has ever tried to run the test suite in a > > > non-UTF-8 locale before? > > > > I just tried it now: On Linux (Ubuntu 22.04), in a de_DE.UTF-8 locale,
Oops, typo: What I tested was the de_DE.ISO-8859-1 locale: $ export LC_ALL=de_DE.ISO-8859-1 > > texinfo 7.0.93 build fine and all tests pass. And likewise on FreeBSD 13.2 with $ export LC_ALL=de_DE.ISO8859-1 > > This character is U+0237 LATIN SMALL LETTER DOTLESS J. It *should* be > > recognized as having a width of 1 in all implementations of wcwidth. > > But if U+0237 cannot be represented in the locale's codeset, its width > can not be 1, because it cannot be printed. This is my interpretation > of the standard's language (emphasis mine): > > DESCRIPTION > > The wcwidth() function shall determine the number of column > positions required for the wide character wc. The application > shall ensure that the value of wc is a character representable > as a wchar_t, and is a wide-character code corresponding to a > valid character in the current locale. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > RETURN VALUE > > The wcwidth() function shall either return 0 (if wc is a null > wide-character code), or return the number of column positions > to be occupied by the wide-character code wc, or return -1 (if > wc does not correspond to a printable wide-character code). > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Since U+0237 is not printable in my locale (it isn't supported by the > system codepage), the value -1 is correct. Am I missing something? True. But why don't we see the same test failure on glibc and on FreeBSD systems, then, in a locale with ISO-8859-1 encoding? > > This "simpler approximation" would not return a good result when wc > > is a control character (such as CR, LF, TAB, or such). It is important > > that the caller of wcwidth() or wcswidth() is able to recognize that > > the string as a whole does not have a definite width. > > It is still better than returning -1, don't you agree? No, I don't agree. Returning -1 tells the caller "watch out, you cannot assume anything about printed outline of this string". > But for some reason you completely ignored my more general comment > about what Texinfo needs from wcwidth. That's because I am not familiar with the Texinfo code. I don't know whether and where Texinfo calls wcwidth(), and I don't know with which expectations it does so. Bruno