Racket REPL doesn’t handle unicode well. If you try (regexp-match?
#px"^[a-zA-Z]+$" "héllo") in DrRacket, or write it as a program in a file
and run it, you will find that it does evaluate to #f.

On Thu, Jul 9, 2020 at 7:19 AM Peter W A Wood <[email protected]> wrote:

> I was experimenting with regular expressions to try to emulate the Python
> isalpha() String method. Using a simple [a-zA-Z] character class worked for
> the English alphabet (ASCII characters):
>
> > (regexp-match? #px"^[a-zA-Z]+$" "hello")
> #t
> > (regexp-match? #px"^[a-zA-Z]+$" "h1llo")
> #f
>
> It then dawned on me that the Python is alpha() method was Unicode aware:
>
> >>> "é".isalpha()
> True
>
> I started scratching my head as how to achieve the equivalent using a
> regular expression in Python. I tried the same regular expression with a
> non-English character in the string. To my surprise, the regular expression
> recognised the non-ASCII characters.
>
> > (regexp-match? #px"^[a-zA-Z]+$" "h\U+FFC3\U+FFA9llo")
> #t
>
> Are Racket regular expression character classes Unicode aware or is there
> some other explanation why this regular expression matches?
>
> Peter
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-users/2197C34F-165D-4D97-97AD-F158153316F5%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/CADcuegsvf-hFwofptc2ieKQmqWFzxDnD1Cn8G7bFSzBZ%2BM3EDA%40mail.gmail.com.

Reply via email to