Haibao Tang wrote: > with high accuracy... > > My temporary plan is to first recognized consecutive two or three > initial-capitalized words, but certainly we need to do more than that? > Anyone has suggestions? > > Thanks first.
It's not easy to say without seeing the HTML. If you the structure allows it, a couple of str.split() is probably the easiest way, but you always have BeautifulSoup. http://www.crummy.com/software/BeautifulSoup/ -- http://mail.python.org/mailman/listinfo/python-list
