Good afternoon Clayton,

!A regex doesn't understand the structure of an html document. For
!example
!you need to keep track of the nesting level manually to find the cells
!of
!the inner of two nested tables.
!
!> question still remains: does the
!> search start at the beginning of the line each time or does it step
!> forward from the last search?
!
!re.search() doesn't keep track of prior searches; whatever string you
!feed
!it (in your case a line cut out of an html document) is searched.
!

So, you are saying that each regex starts at the beginning of the long line? Is there a way to start the next search at the end of the last one?

Well, it depends on how you are using the re module (if you really want to do that). Have a look at:

  https://docs.python.org/2/library/re.html#re.RegexObject.search

But...I'll add my voice to the admonition against using regex here.

Consider the following events that could happen in the future after you have labored over your program and are able to get it to work, based on today's HTML.

  1. Somebody inserts a line-break in the middle of the element you
     were searching for with regex.
  2. A week from now, somebody runs 'tidy' on the HTML or changes
     or removes the the line endings.
  3. Somebody adds an HTML comment which causes your regex to match.

These are the first three reasons that occur to me for why regex is the wrong tool for the job here, given that you know precisely the format of the data. It is HTML.

The good thing is that there are other tools for processing HTML.

Anyway, if you want to use regexes, nobody can stop you, so see below, a bit of nonsense text which you can search for 2 distinct instances of the string "ei" [0].

!> I will check out beautiful soup as suggested
!> in a subsequent mail; I'd still like to finish this process:<}}

!Do you say that when someone points out that you are eating your shoe?
Depends on the flavor of the shoe:<)))

Root beer float.

-Martin

 [0] If you really, really want to use regex, here's an example of how to
     keep track of where you last sought, and how to search from
     that place in the string.

       from __future__ import print_function

       import re

       def main():
           s = 'Wo lattenzaun aneinander erhaltenen vorpfeifen grasgarten.'
           pattern = re.compile('ei', re.IGNORECASE)
           matched = pattern.search(s,0)
           while matched:
               endpos = matched.end()
               print(matched.group(0), matched.start(), matched.end())
               matched = pattern.search(s, endpos)


--
Martin A. Brown
http://linux-ip.net/
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to