Hi Adam,

On 2025-10-09 00:14:26 +0100, Adam Sampson wrote:
> On Thu, Oct 09, 2025 at 12:41:22AM +0200, Vincent Lefevre wrote:
> > In the Find dialogue box, one can double-click on a word to select
> > this word. But this does not work correctly when there are non-ASCII
> > letters. For instance, type "abcdéfg", and double-click on "b"; the
> > result is that only "abcd" is selected instead of the whole word.
> 
> I don't think I've got any control over that in xpopple -- it does the
> same in the text box in the Open dialog, and other Motif apps behave the
> same way (unless they don't set the locale at all).

So perhaps reassign the bug to libxm4.

> Looking at the Motif code, this behaviour is controlled by the FindWord
> and _XmTextFieldIsWordBoundary functions in lib/Xm/TextF.c.
> The latter looks like it will always treat characters that take up more
> than one byte in the encoding as word boundaries, which would explain
> this behaviour...

No, according to my analysis of the code, this is not excatly what
it does, and it seems to me that the code is inconsistent (so this
is clearly unintended). The code is

  if (tf->text.max_char_size == 1) { /* data is char* and one-byte per char */
    if (isspace((unsigned char)TextF_Value(tf)[pos1]) || 
        isspace((unsigned char)TextF_Value(tf)[pos2])) return True;
  } else {
    size_pos1 = wctomb(s1, TextF_WcValue(tf)[pos1]);
    size_pos2 = wctomb(s2, TextF_WcValue(tf)[pos2]);
    if (size_pos1 == 1 && (size_pos2 != 1 || isspace((unsigned char)*s1)))
      return True;
    if (size_pos2 == 1 && (size_pos1 != 1 || isspace((unsigned char)*s2)))
      return True;
  }
  return False;

For text with multibyte characters, this is the "else" case.

So, for "dé", 'é' is regarded as a space due to size_pos2 != 1.
Note that if one has 2 ASCII spaces, this also returns True.
But with "éé", one would have size_pos1 > 1 and size_pos2 > 1,
so that neither "if" is satisfied, and this returns False, i.e.
in such a case, 'é' is regarded as a non-space character.

In short, a non-ASCII letter (like 'é') is regarded as a space
only if the other adjacent character is an ASCII character.
This does not make sense!

I think that either iswspace() should be used or non-ASCII characters
should consistently be regarded as non-space characters (this is not
always true, but probably the best behavior if iswspace() isn't used).
In the latter case, "size_pos2 != 1 ||" and "size_pos1 != 1 ||" would
just have to be removed.

-- 
Vincent Lefèvre <[email protected]> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)

Reply via email to