Re: [Tutor] Stuck: unicode in regular expressions

Kent Johnson Tue, 09 Aug 2005 08:04:09 -0700

Ron Phillips wrote:
> I am expecting users to cut-and-paste DMS data into an application — 
> like:  +40 30 15   E40 15 34.56, -81 0 0,   81 57 34.27E, W 40° 13’ 
> 27.343”, 40° 13’ 27.343” S, 140° 13’ 27.343”S, S40° 13’ 27.34454,  
> 81:57:34.27E 
>  
> I've been able to write a regex that seems to work in redemo.py, but it 
> doesn't do at all what I want when I try to code it using the re module. 
> The problem seems to be the way I am using unicode — specifically all 
> those punctuation marks that might get pasted in. I anticipate the 
> program getting its input from a browser; maybe that will narrow down 
> the range somewhat.


I'm guessing a bit here, but you have to know what encoding you are getting 
from the browser. If the input is from a form, I think you will get back 
results in the same encoding as the page containing the form. Then I think you 
can either
- convert the form data to unicode and use unicode in the regex, or
- use the same encoding for the regex as the form data

A good way to start would be to
print repr(formdata)
that will show you exactly what is in the data.

Kent

>  
> Anyway, given the string above, what regex will match the  ” and    ’ 
> characters, please? I have tried \x02BC and \x92 and \x2019 for the ’ , 
> but no result. I am sure it's simple; I am sure some other newbie has 
> asked it, but I have Googled my brains out, and can't find it.
>  
> Ron 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Stuck: unicode in regular expressions

Reply via email to