Hi, Thanks very much to all your suggestions, I am looking into the suggestions of Hugo and Alan.
The file is not very big, only 700KB (~20000 lines), which I think should be fine to be loaded into memory? I have two further questions though please, the lines are like this: <img width="13" height="15" alt="NEW" src="/m/I/I/star.png" /> <strong><a href="/browse/post/5354361/">Revenge (2011)</a></strong> </td> <td class="final"> <span title="Exact date/time: 05-01-2011 23:08" class="ageVeryNew">5 days </span> </td> <td class="final"> <span title="Exact date/time: 18-01-2011 16:06" class="ageVeryNew">65 minutes </span> Etc with a chunk (between each NEW) being about 60 lines, I need to extract info from these lines, e.g. /browse/post/5354361/ and Revenge (2011) to pass back to the output, is re the best option to get all these various bits, maybe a generic function that I pass the search strings too? And if I use the split suggestion of Alan's I assume the last one would be the rest of the file, would the next() option just let me search for the next /browse/post/5354361/ etc after the NEW? (maybe putting this info into a list) Thanks again paul __________ Information from ESET Smart Security, version of virus signature database 4773 (20100114) __________ The message was checked by ESET Smart Security. http://www.eset.com _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor