On Sun, 5 Aug 2007, Kent Johnson wrote:
Hmm...actually, isupper() works fine on unicode strings:
In [18]: s='H\303\211RON'.decode('utf-8')
In [21]: print 'H\303\211RON'
HÉRON
In [22]: s.isupper()
Out[22]: True
:-)
I modified uppers to include only the latin characters, and added the
apostroph
Jon Crump wrote:
>
> Kent, Many thanks again, and thanks too to Paul at
> http://tinyurl.com/yrl8cy.
>
> That's very effective, thanks very much for the detailed explanation;
> however, I'm a little surprised that it's necessary. I would have
> thought that there would be some standard module
Kent, Many thanks again, and thanks too to Paul at
http://tinyurl.com/yrl8cy.
That's very effective, thanks very much for the detailed explanation;
however, I'm a little surprised that it's necessary. I would have thought
that there would be some standard module that included a unicode
equivalent
Jon Crump wrote:
> I'm parsing a utf-8 encoded file with lines characterized by placenames
> in all caps thus:
>
> HEREFORD, Herefordshire.
> ..other lines..
> HÉRON (LE), Normandie.
> ..other lines..
>
> I identify these lines for parsing using
>
> for line in data:
> if re.match(r'[A-Z]{2