Re: [Tutor] more encoding confusion

2007-08-05 Thread Jon Crump
On Sun, 5 Aug 2007, Kent Johnson wrote: Hmm...actually, isupper() works fine on unicode strings: In [18]: s='H\303\211RON'.decode('utf-8') In [21]: print 'H\303\211RON' HÉRON In [22]: s.isupper() Out[22]: True :-) I modified uppers to include only the latin characters, and added the apostroph

Re: [Tutor] more encoding confusion

2007-08-05 Thread Kent Johnson
Jon Crump wrote: > > Kent, Many thanks again, and thanks too to Paul at > http://tinyurl.com/yrl8cy. > > That's very effective, thanks very much for the detailed explanation; > however, I'm a little surprised that it's necessary. I would have > thought that there would be some standard module

Re: [Tutor] more encoding confusion

2007-08-05 Thread Jon Crump
Kent, Many thanks again, and thanks too to Paul at http://tinyurl.com/yrl8cy. That's very effective, thanks very much for the detailed explanation; however, I'm a little surprised that it's necessary. I would have thought that there would be some standard module that included a unicode equivalent

Re: [Tutor] more encoding confusion

2007-08-03 Thread Kent Johnson
Jon Crump wrote: > I'm parsing a utf-8 encoded file with lines characterized by placenames > in all caps thus: > > HEREFORD, Herefordshire. > ..other lines.. > HÉRON (LE), Normandie. > ..other lines.. > > I identify these lines for parsing using > > for line in data: > if re.match(r'[A-Z]{2