Liam Clarke wrote:
> Hi all,
> 
> Using Beautiful Soup and regexes.. I've noticed that all the examples
> used regexes like so - anchors = parseTree.fetch("a",
> {"href":re.compile("pattern")} )  instead of precompiling the pattern.
> 
> Myself, I have the following code -
> 
>>>>z = []
>>>>x = q.findNext("a", {"href":re.compile(".*?thread/[0-9]*?/.*",
> 
> re.IGNORECASE)})
> 
> 
>>>>while x:
> 
> ...   num = x.findNext("td", "tableColA")
> ...   h = (x.contents[0],x.attrMap["href"],num.contents[0])
> ...   z.append(h)
> ...   x = x.findNext("a",{"href":re.compile(".*?thread/[0-9]*?/.*",
> re.IGNORECASE)})
> ...
> 
> This gives me a correct set of results. However, using the following -
> 
> 
>>>>z = []
>>>>pattern = re.compile(".*?thread/[0-9]*?/.*", re.IGNORECASE)
>>>>x = q.findNext("a", {"href":pattern)})
> 
> 
>>>>while x:
> 
> ...   num = x.findNext("td", "tableColA")
> ...   h = (x.contents[0],x.attrMap["href"],num.contents[0])
> ...   z.append(h)
> ...   x = x.findNext("a",{"href":pattern} )
> 
> will only return the first found tag.
> 
> Is the regex only evaluated once or similar?

I don't know why there should be any difference unless BS modifies the compiled 
regex 
object and for some reason needs a fresh one each time. That would be odd and I 
don't see 
it in the source code.

The code above has a syntax error (extra paren in the first findNext() call) - 
can you 
post the exact non-working code?
> 
> (Also any pointers on how to get negative lookahead matching working
> would be great.
> the regex (/thread/[0-9]*)(?!\/) still matches "/thread/28606/" and
> I'd assumed it wouldn't.

Putting these expressions into Regex Demo is enlightening - the regex matches 
against 
"/thread/2860" - in other words the "not /" is matching against the 6.

You don't give an example of what you do want to match so it's hard to know 
what a better 
solution is. Some possibilities
- match anything except a digit or a slash - [^0-9/]
- match the end of the string - $
- both of the above - ([^0-9/]|$)

Kent

> 
> Regards,
> 
> Liam Clarke
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 
> 


_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to