Hello,

I have a couple of regex questions:

1 -- In the code below, how can I match the connecting words 'van de' , 'van 
der', etc. (all quite common in Dutch family names)?
2 -- It is quite hard to make a regex for all surnames, but easier to make 
regexes for the initials and the connecting words. How could I ' subtract'  
those two regexes to end up with something that matches the surnames (I used 
two 
.replaces() in my code, which roughly work, but I'm thinking there's a re way 
to 
do it, perhaps with carets (^).
3 -- Suppose I want to yank up my nerd rating by adding a re.NONDIACRITIC flag 
to the re module (matches letters independent of their accents), how would I go 
about? Should I subclass from re and implement the method, using the other 
existing methods as an example? I would find this a very useful addition.

Thanks in advance for your thoughts!

Python 2.7.0+ (r27:82500, Sep 15 2010, 18:04:55) 
[GCC 4.4.5] on linux2

>>> import re
>>> names = ["J. van der Meer", "J. van den Meer", "J. van Meer", "Meer, J. van 
>>>der", "Meer, J. van den", "Meer, J. van de", "Meer, J. van"]
>>> for name in names:
    print re.search("(van? de[nr]?)\b? ?", name, re.IGNORECASE).group(1)

van der
van den
Traceback (most recent call last):
  File "<pyshell#26>", line 2, in <module>
    print re.search("(van? de[nr]?)\b? ?", name, re.IGNORECASE).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

 Cheers!!
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a fresh water system, and public health, what have 
the 
Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



      
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to