Hi Pádraig,
> A related question is, would it be useful to replace c32isblank() etc.
> to be IS30112¹ compliant or at least more standard?
>
> I.e. adjust c23isblank() to return true for:
> U+0009, U+0020, U+1680, U+180E?, U+2000..U+2006, U+2008..U+200A, U+205F,
> and U+3000
>
> Then on musl, macOS etc. c32isblank() etc. would behave much like glibc?
Thanks for mentioning ISO 30112; I wasn't aware of it.
As I see it, there are two projects regarding OS-independent i18n data:
CLDR and ISO 30112. Since ISO 30112 is based on glibc (see page 8 / 13),
if you want something that behaves like glibc, ISO 30112 is a reasonable
starting point, but the glibc sources would be more up-to-date.
I defined the c32is* and c32to* functions in Gnulib with the goal of
keeping the platform's character properties when possible. In other
words, I wanted to the char32_t type to be identical to the wchar_t
type when — like on glibc — wchar_t represents a Unicode code point.
If we were to make the c32is* and c32to* functions behave differently
than isw* and tow*, people may start questioning the migration from
wchar_t to char32_t.
Instead, for your purpose, I would suggest that you use
- the Gnulib module uchar-h-c23 (not just uchar-h),
- the Gnulib module unictype/ctype-blank, with the function
uc_is_blank (that takes a Unicode code point as argument).
As you can see from gnulib/tests/unictype/test-ctype_blank.c,
this function returns true for exactly those Unicode code points
that you listed above (*). (Except for U+180E, totally irrelevant in
practice, since no one uses Mongolian scripts with command-line tools.)
(*) This is not a coincidence, but rather a consequence of the fact
that the gnulib tables are generated from gnulib/lib/gen-uni-tables.c,
which is a modified version of the generator that I had written for
glibc a couple of years earlier (and that has been converted from C
to Python meanwhile).
Bruno