On Sun, Oct 08, 2023 at 07:31:12PM +0300, Eli Zaretskii wrote: > I see a very large diff, full of non-ASCII characters. A typical hunk > is below: > > -(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ > -(ȷ) ‘@H{a}’ a̋ ‘@dotaccent{a}’ ȧ (ȧ) ‘@ringaccent{a}’ å (å) > -‘@tieaccent{a}’ a͡ ‘@u{a}’ ă (ă) ‘@ubaraccent{a}’ a̲ ‘@udotaccent{a}’ ạ > -(ạ) ‘@v{a}’ ǎ (ǎ) @,c ç (ç) ‘@,{c}’ ç (ç) ‘@ogonek{a}’ ą (ą) > +(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ (ȷ) > +‘@H{a}’ a̋ ‘@dotaccent{a}’ ȧ (ȧ) ‘@ringaccent{a}’ å (å) ‘@tieaccent{a}’ a͡ > +‘@u{a}’ ă (ă) ‘@ubaraccent{a}’ a̲ ‘@udotaccent{a}’ ạ (ạ) ‘@v{a}’ ǎ (ǎ) > +@,c ç (ç) ‘@,{c}’ ç (ç) ‘@ogonek{a}’ ą (ą) > > It looks like a filling problem to me, perhaps because something > counts bytes instead of characters?
It's almost certainly a problem with filling as you say. In the C (XS) code, the return value of wcwidth is used for each character to get the width of each line. The pure Perl code doesn't use the wcwidth function as far as I know but keeps a count for each line based on regex character classes. The relevant code is in Texinfo/Convert/Unicode.pm, in the 'string_width' function. Do you know whether the XS modules are in use? You could try "export TEXINFO_XS=omit" or "export TEXINFO_XS=require" to check if it makes a difference. That would narrow it down to which version of the code had the problem (or if they both have a problem). I remember that in the past, I broke up some of these lines to avoid test failures on some platform that had different wcwidth results for some characters. > The diffs like above are followed by diffs in the Index part, where it > looks like the differences are just line counts: > > * Menu: > > -* truc: chapter. (line 2236) > +* truc: chapter. (line 2234) > > Probably due to the same problem of incorrect filling of lines? Yes, that follows on from the different line breaking decisions, and these parts of the diff will go away once the other problem is fixed.