On Sat, Sep 13, 2025, at 8:46 AM, Greg Wooledge wrote:
>> On Fri, Sep 12, 2025 at 11:23:17PM -0400, Lawrence Velázquez wrote:
>> > On Fri, Sep 12, 2025, at 10:56 PM, Duncan Roe via Bug reports for the GNU 
>> > Bourne Again SHell wrote:
>> > > Bash is not recognising U+00A0 as whitespace. What to do about it, if 
>> > > anything?
>> >
>> > I believe bash mostly tokenizes on <blank> characters.  Is U+00A0
>> > considered a <blank> in your locale?
>
> I'm not sure that's true.

Yeah, I misunderstood how locale_setblanks [1] works.  Its function
comment is:

        Set every character in the <blank> character class to be a
        shell break character for the lexical analyzer when the
        locale changes.

But it seems to only consider 0x00 through 0xFF, which are tabulated
into syntax.c at build time [2].

  [1] 
https://cgit.git.savannah.gnu.org/cgit/bash.git/tree/locale.c?h=bash-5.3#n592
  [2] https://cgit.git.savannah.gnu.org/cgit/bash.git/tree/mksyntax.c?h=bash-5.3


> In my locale, only space and tab are considered :blank: out of the set
> space, tab, carriage return, newline, non-breaking space.  However, even
> if NBSP were considered a :blank:, I'm not sure the bash parser would
> care.  (In fact, I'd consider it a bug if it did.)

I suppose a theoretical locale with additional low <blank>s could
produce different behavior, whether acceptably or not.  There's
already at least one workaround for something like that:

https://cgit.git.savannah.gnu.org/cgit/bash.git/tree/locale.c?h=bash-5.3#n586


-- 
vq

Reply via email to