|> I'm interested in building a troff parser to extract information from |> manpages (e.g. what do the flags mean when we say `rm -rf *`?).
For the mdocmx(7) project i have written a simple mdoc(7) parser in awk(1), the entire thing 18966 bytes but that includes comments and a shell wrapper that finds awk(1)s and ensures signal safe temporary file lifetime for usage in (man(1)ual) pipes. Also many lines for table-of-content handling. It has comments and shouldn't be too hard to follow (and note it even supports """Bla""" quoting that is not even mentioned in the official mdoc(7) manual). Works with [nmg]awk(1). An older version is in the groff@ archive (before christmas 2014), the last (rather identical regarding awk(1)) is only in my S-roff repo for now, at [1] or [2] (root directory [not yet sorted in]). Dependent on what you want it may be a starting point. [1] http://sourceforge.net/p/s-roff/code/ [2] https://gitorious.org/s-roff/s-roff/ --steffen