Hello, I have a couple of regex questions:
1 -- In the code below, how can I match the connecting words 'van de' , 'van der', etc. (all quite common in Dutch family names)? 2 -- It is quite hard to make a regex for all surnames, but easier to make regexes for the initials and the connecting words. How could I ' subtract' those two regexes to end up with something that matches the surnames (I used two .replaces() in my code, which roughly work, but I'm thinking there's a re way to do it, perhaps with carets (^). 3 -- Suppose I want to yank up my nerd rating by adding a re.NONDIACRITIC flag to the re module (matches letters independent of their accents), how would I go about? Should I subclass from re and implement the method, using the other existing methods as an example? I would find this a very useful addition. Thanks in advance for your thoughts! Python 2.7.0+ (r27:82500, Sep 15 2010, 18:04:55) [GCC 4.4.5] on linux2 >>> import re >>> names = ["J. van der Meer", "J. van den Meer", "J. van Meer", "Meer, J. van >>>der", "Meer, J. van den", "Meer, J. van de", "Meer, J. van"] >>> for name in names: print re.search("(van? de[nr]?)\b? ?", name, re.IGNORECASE).group(1) van der van den Traceback (most recent call last): File "<pyshell#26>", line 2, in <module> print re.search("(van? de[nr]?)\b? ?", name, re.IGNORECASE).group(1) AttributeError: 'NoneType' object has no attribute 'group' Cheers!! Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor