Re: character class "alpha"

Bruno Haible via Cygwin Mon, 31 Jul 2023 14:37:57 -0700

Brian Inglis wrote:
> It seems to me that most application developers needing to support 
> non-Western-European languages might want a non-POSIX interpretation of 
> digits.


Sure. GNU libunistring has dedicated API for this:
  - 
https://www.gnu.org/software/libunistring/manual/html_node/Object-oriented-API.html
    UC_DECIMAL_DIGIT_NUMBER.
  - 
https://www.gnu.org/software/libunistring/manual/html_node/Decimal-digit-value.html
  - https://www.gnu.org/software/libunistring/manual/html_node/Digit-value.html
  - 
https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-objects.html
    UC_PROPERTY_DECIMAL_DIGIT
  - 
https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-functions.html
    uc_is_property_decimal_digit

I'm sure ICU4C has similar APIs too.

> Are the Unicode character attribute classes supported for those application 
> use 
> cases that need more than POSIX limitations allow?

POSIX allows the libc to define additional character classes. But these will be
platform and locale dependent, and I don't know of any application which makes
use of such additional character classes via wctype() and iswctype().

> I know that I sometimes want to see some alternative numeric digit forms and 
> expect to be able to find those with an appropriate grep expression.

I think you can do so with GNU 'grep', when it was built with PCRE support.
PCRE includes support for Unicode character classes.
<https://www.pcre.org/current/doc/html/pcre2pattern.html>

Bruno




-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Re: character class "alpha"

Reply via email to