On Mon, Dec 24, 2012 at 2:51 AM, Albert-Jan Roskam wrote:
>
> First, check if the first character is a (unicode) letter
You can use unicode.isalpha, with a caveat. On a narrow build isalpha
fails for supplementary planes. That's about 50% of all alphabetic
characters, +/- depending on the version
>>Is the code below the only/shortest way to match unicode characters? I would
>>like to match whatever is defined as a character in the unicode reference
>>database. So letters in the broadest sense of the word, but not digits,
>>underscore or whitespace. Until just now, I was convinced that th
On Sat, Dec 22, 2012 at 11:12 PM, Steven D'Aprano wrote:
>
> No. You could install a more Unicode-aware regex engine, and use it instead
> of Python's re module, where Unicode support is at best only partial.
>
> Try this one:
>
> http://pypi.python.org/pypi/regex
Looking over the old docs, I cou
On 23/12/12 07:53, Albert-Jan Roskam wrote:
Hi,
Is the code below the only/shortest way to match unicode characters?
No. You could install a more Unicode-aware regex engine, and use it instead
of Python's re module, where Unicode support is at best only partial.
Try this one:
http://pypi.py
On Sat, Dec 22, 2012 at 9:53 PM, Albert-Jan Roskam wrote:
> Hi,
>
> Is the code below the only/shortest way to match unicode characters? I
> would like to match whatever is defined as a character in the unicode
> reference database. So letters in the broadest sense of the word, but not
> digits,
Hi,
Is the code below the only/shortest way to match unicode characters? I would
like to match whatever is defined as a character in the unicode reference
database. So letters in the broadest sense of the word, but not digits,
underscore or whitespace. Until just now, I was convinced that the r