Hi, John Darrington wrote: > In lib/isblank.c I see the following: > > /* The "blank" characters are '\t', ' ', > U+1680, U+180E, U+2000..U+2006, U+2008..U+200A, U+205F, U+3000, and none > except the first two is present in a common 8-bit encoding. Therefore > the substitute for other platforms is not more complicated than this. */ > return (c == ' ' || c == '\t'); > > This is incorrect. In iso-8859-1 (a very common 8-bit encoding), U+00A0 is > the > non-breaking-space character.
U+00A0 NO-BREAK SPACE is a glyph that carries no ink, but that is like a non-blank punctuation character for other respects. In particular, its very definition is that, unlike U+0020 SPACE, it is not an opportunity for line breaking. The function isblank() is not used in graphical rendering engines; it is used in programs that do line breaking, such as 'fold': coreutils/src/fold.c:178: if (isblank (to_uchar (line_out[logical_end]))) For this reason, isblank(U+00A0) *must* return false. Otherwise many programs would treat is like U+0020 SPACE. Bruno