"Shriphani Palakodety" <[EMAIL PROTECTED]> wrote in > I have a html document here which goes like this: > > <A name=4></a><b>Table of Contents</b> > ......... > <A name=5></a><b>Preface</b> > > Can someone tell me how I can get the string between the <b> tag for > an a tag for a given value of the name attribute.
Heres an example using the standard library HTML parser (from an unfinished topic in tutorial...). You could also use BeautifulSoup and I recommend that if your needs get any more complex... ---------------------------------------------- In practice we usually want to extract more specific data from a page, maybe the content of a particular row in a table or similar. For that we need to use the handle_starttag() and handle_endtag() methods. As an example let's extract the text of the second H1 level header: html = ''' <html><head><title>Test page</title></head> <body> <center> <h1>Here is the first heading</h1> </center> <p>A short paragraph <h1>A second heading</h1> <p>A paragraph containing a <a href="www.google.com">hyperlink to google</a> </body></html> ''' from HTMLParser import HTMLParser class H1Parser(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.h1_count = 0 self.isHeading = False def handle_starttag(self,tag,attributes=None): if tag == 'h1': self.h1_count += 1 self.isHeading = True def handle_endtag(self,tag): if tag == 'h1': self.isHeading = False def handle_data(self,data): if self.isHeading and self.h1_count == 2: print "Second Header contained: ", data parser = H1Parser() parser.feed(html) parser.close() ------------------------------Hopefully you can see how to alter that pattern to suit your scenario.-- Alan GauldAuthor of the Learn to Program web sitehttp://www.freenetpages.co.uk/hp/alan.gauld _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor