Racket REPL doesn’t handle unicode well. If you try (regexp-match? #px"^[a-zA-Z]+$" "héllo") in DrRacket, or write it as a program in a file and run it, you will find that it does evaluate to #f.
On Thu, Jul 9, 2020 at 7:19 AM Peter W A Wood <[email protected]> wrote: > I was experimenting with regular expressions to try to emulate the Python > isalpha() String method. Using a simple [a-zA-Z] character class worked for > the English alphabet (ASCII characters): > > > (regexp-match? #px"^[a-zA-Z]+$" "hello") > #t > > (regexp-match? #px"^[a-zA-Z]+$" "h1llo") > #f > > It then dawned on me that the Python is alpha() method was Unicode aware: > > >>> "é".isalpha() > True > > I started scratching my head as how to achieve the equivalent using a > regular expression in Python. I tried the same regular expression with a > non-English character in the string. To my surprise, the regular expression > recognised the non-ASCII characters. > > > (regexp-match? #px"^[a-zA-Z]+$" "h\U+FFC3\U+FFA9llo") > #t > > Are Racket regular expression character classes Unicode aware or is there > some other explanation why this regular expression matches? > > Peter > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/racket-users/2197C34F-165D-4D97-97AD-F158153316F5%40gmail.com > . > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CADcuegsvf-hFwofptc2ieKQmqWFzxDnD1Cn8G7bFSzBZ%2BM3EDA%40mail.gmail.com.

