Hi Santosh, On Tue, Feb 18, 2014 at 9:52 AM, Santosh Kumar <rhce....@gmail.com> wrote: > > Hi All, > > If you notice the below example, case I is working as expected. > > Case I: > In [41]: string = "<H*>test<H*>" > > In [42]: re.match('<H\*>',string).group() > Out[42]: '<H*>' > > But why is the raw string 'r' not working as expected ? > > Case II: > > In [43]: re.match(r'<H*>',string).group() > --------------------------------------------------------------------------- > AttributeError Traceback (most recent call last) > <ipython-input-43-d66b47f01f1c> in <module>() > ----> 1 re.match(r'<H*>',string).group() > > AttributeError: 'NoneType' object has no attribute 'group' > > In [44]: re.match(r'<H*>',string)
It is working as expected, but you're not expecting the right thing ;). Raw strings don't escape anything, they just prevent backslash escapes from expanding. Case I works because "\*" is not a special character to Python (like "\n" or "\t"), so it leaves the backslash in place: >>> '<H\*>' '<H\*>' The equivalent raw string is exactly the same in this case: >>> r'<H\*>' '<H\*>' The raw string you provided doesn't have the backslash, and Python will not add backslashes for you: >>> r'<H*>' '<H*>' The purpose of raw strings is to prevent Python from recognizing backslash escapes. For example: >>> path = 'C:\temp\new\dir' # Windows paths are notorious... >>> path # it looks mostly ok... [1] 'C:\temp\new\\dir' >>> print(path) # until you try to use it C: emp ew\dir >>> path = r'C:\temp\new\dir' # now try a raw string >>> path # Now it looks like it's stuffed full of backslashes [2] 'C:\\temp\\new\\dir' >>> print(path) # but it works properly! C:\temp\new\dir [1] Count the backslashes in the repr of 'path'. Notice that there is only one before the 't' and the 'n', but two before the 'd'. "\d" is not a special character, so Python didn't do anything to it. There are two backslashes in the repr of "\d", because that's the only way to distinguish a real backslash; the "\t" and "\n" are actually the TAB and LINE FEED characters, as seen when printing 'path'. [2] Because they are all real backslashes now, so they have to be shown escaped ("\\") in the repr. In your regex, since you're looking for, literally, "<H*>", you'll need to backslash escape the "*" since it is a special character *in regular expressions*. To avoid having to keep track of what's special to Python as well as regular expressions, you'll need to make sure the backslash itself is escaped, to make sure the regex sees "\*", and the easiest way to do that is a raw string: >>> re.match(r'<H\*>', string).group() '<H*>' I hope this makes some amount of sense; I've had to write it up piecemeal and will never get it posted at all if I don't go ahead and post :). If you still have questions, I'm happy to try again. You may also want to have a look at the Regex HowTo in the Python docs: http://docs.python.org/3/howto/regex.html Hope this helps, -- Zach _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor