Tiago Saboga wrote: > I'm still there, trying to parse man pages (I want to gather a list of all > options with their help strings). I've tried to use regex on both the > formatted output of man and the source troff files and I discovered what is > already said in the doclifter man page: you have to do a number of hints, and > it's really not simple. So I'm know using doclifter, and it's working, but is > terribly slow. Doclifter itself take around a second to parse the troff file, > but my few lines of code take 25 seconds to parse the resultant xml. I've > pasted the code at http://pastebin.ca/166941 > and I'd like to hear from you how I could possibly optimize it.
How big is the XML? 25 seconds is a long time...I would look at cElementTree (implementation of ElementTree in C), it is pretty fast. http://effbot.org/zone/celementtree.htm In particular iterparse() might be helpful: http://effbot.org/zone/element-iterparse.htm I would also try specifying a buffer size in the call to os.popen2(), if the I/O is unbuffered or the buffer is small that might be the bottleneck. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor