> Please, see the attachment and examine a code I have provide. The > problem is, I want fetch data from <H2>Comments</H2> until the first > </TD> occurrence ,
Do you mean the unmatched /td that occurs after the dd section? > import re > import string > > htmlData = """ > <h2>Instructions</h2>.... > <h2>Comments</h2> > > <dl> > <dd>None > </dd></dl> > </td> To this one here? Its probably a bad idea to use a regular tag as a marker, some browsers get confused by unmatched tags. Using a comment is usually better. > <td valign="top" width="50%"><h2>Classification</h2> > > <h2><table border="1" cellpadding="1" cellspacing="0" height="60" > width="100%"> > <tbody><tr> > <td width="50%"><b> Utility:</b></td> But regex don;t like working with nested tags, you have a table cell inside another and writing regexs to match that can get very tricky. So if you want to search into this part of the string you should probably look at using Beautiful Soup or similar HTML parser. > if __name__ == '__main__': > # Extract comments > p = re.search('<H2>Comments</H2>(.+)</TD>', htmlData, > re.I | re.S | re.M) Looks like you are getting caught out by the "greedy" nature of regex - they grab as much as they can. You can control that by adding a ? immediately after the + but given the nature of your html I'd try using BeautifulSoup instead. You'll find a short section on greedy expressions in my regex topic on my tutorial site. HTH, Alan Gauld Author of the Learn to Program web site http://www.freenetpages.co.uk/hp/alan.gauld _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor