Hello, I'm writing a little Tkinter application to retrieve news from various news websites such as http://news.bbc.co.uk/, and display them in a TK listbox. All I want are news title and url information. Since each news site has a different layout, I think I need some template-based techniques to build news extractors for each site, ignoring information such as table, image, advertise, flash that I'm not interested in.
So far I have built a simple GUI using Tkinter, a link extractor using HTMLlib to extract HREFs from web page. But I really have no idea how to extract news from web site. Is anyone aware of general techniques for extracting web news? Or can point me to some falimiar projects. I have seen some search engines doing this, for example:http://news.ithaki.net/, but do not know the technique used. Any tips? Thanks in advance, Zhang Le -- http://mail.python.org/mailman/listinfo/python-list
