On Fri, Jan 15, 2010 at 4:24 AM, Paul Melvin <p...@assured-networks.co.uk> wrote: > Hi, > > Thanks very much to all your suggestions, I am looking into the suggestions > of Hugo and Alan. > > The file is not very big, only 700KB (~20000 lines), which I think should be > fine to be loaded into memory? > > I have two further questions though please, the lines are like this: > > <img width="13" height="15" alt="NEW" > src="/m/I/I/star.png" /> > <strong><a href="/browse/post/5354361/">Revenge > (2011)</a></strong> > > </td> > <td class="final"> > <span title="Exact date/time: 05-01-2011 23:08" > class="ageVeryNew">5 days </span> > </td> > <td class="final"> > <span title="Exact date/time: 18-01-2011 16:06" > class="ageVeryNew">65 minutes </span> > > Etc with a chunk (between each NEW) being about 60 lines, I need to extract > info from these lines, e.g. /browse/post/5354361/ and Revenge (2011) to pass > back to the output, is re the best option to get all these various bits, > maybe a generic function that I pass the search strings too?
You might be better off using an HTML parser such as BeautifulSoup or lxml. Kent _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor