Here is a pyparsing approach to your question. I've added some comments to walk you through the various steps. By using pyparsing's makeHTMLTags helper method, it is easy to write short programs to skim selected data tags from out of an HTML page.
-- Paul from pyparsing import makeHTMLTags, SkipTo html = """ <A name=4></a><b>Table of Contents</b> ......... <A name=5></a><b>Preface</b> """ # define the pattern to search for, using pyparsing makeHTMLTags helper # makeHTMLTags constructs a very tolerant mini-pattern, to match HTML # tags with the given tag name: # - caseless matching on the tag name # - embedded whitespace is handled # - detection of empty tags (opening tags that end in "/") # - detection of tag attributes # - returning parsed data using results names for attribute values # makeHTMLTags actually returns two patterns, one for the opening tag # and one for the closing tag aStart,aEnd = makeHTMLTags("A") bStart,bEnd = makeHTMLTags("B") pattern = aStart + aEnd + bStart + SkipTo(bEnd)("text") + bEnd # search the input string - dump matched structure for each match for pp in pattern.searchString(html): print pp.dump() print pp.startA.name, pp.text # parse input and build a dict using the results nameDict = dict( (pp.startA.name,pp.text) for pp in pattern.searchString(html) ) print nameDict The last line of the output is the dict that is created: {'5': 'Preface', '4': 'Table of Contents'} _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor