Re: [Tutor] man pages parsing (still)

Kent Johnson Mon, 11 Sep 2006 07:22:30 -0700

Tiago Saboga wrote:
> I'm still there, trying to parse man pages (I want to gather a list of all 
> options with their help strings). I've tried to use regex on both the 
> formatted output of man and the source troff files and I discovered what is 
> already said in the doclifter man page: you have to do a number of hints, and 
> it's really not simple. So I'm know using doclifter, and it's working, but is 
> terribly slow. Doclifter itself take around a second to parse the troff file, 
> but my few lines of code take 25 seconds to parse the resultant xml. I've 
> pasted the code at http://pastebin.ca/166941
> and I'd like to hear from you how I could possibly optimize it.


How big is the XML? 25 seconds is a long time...I would look at 
cElementTree (implementation of ElementTree in C), it is pretty fast.
http://effbot.org/zone/celementtree.htm

In particular iterparse() might be helpful:
http://effbot.org/zone/element-iterparse.htm

I would also try specifying a buffer size in the call to os.popen2(), if 
the I/O is unbuffered or the buffer is small that might be the bottleneck.

Kent

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] man pages parsing (still)

Reply via email to