At 2020-09-18T01:18:06+1000, John Gardner wrote: > To preserve metadata, or identify regions of semantic or structural > interest, write a preprocessor to delineate unprocessed roff(7) syntax > with device control functions: > > .TH \X'meta: begin title'TITLE\X'end title'
Yes. > Which comes out looking like this in troff's intermediate output: > > x X meta: begin title > t TITLE > x X meta: end title > > Which postprocessors can use if they have some reason to care about > semantic data. Yes! > Even if you only care about extracting abstract info instead of > rendering a document, there's no reason a postprocessor actually has > to be a typesetter: > > $ infer | troff | post-infer --extract-outline --xml ./outline.xml > | grotty | less > > Of course, this would require infer to have prior knowledge of > specific macro packages, but I fail to see that being an issue. > Moreover, infer can also identify preprocessor markup, such as tables, > pictures, equations, and any other shite that's impossible to > recognise in preprocessor output. > > This is similar in spirit to what Werner Lemberg started with > devtag.tmac, which grohtml(1) already uses to identify numbered > headings and section titles, Personally, there's a lot more we could > be doing with that same technique. Yes! Yes! Yes! I've poked my snout a little bit into grohtml recently because I had to test my changes to the handling of the man(7) registers (C, D, P, X) that aren't honored when the output device is -Thtml. I dimly perceive a lot of good infrastructure there that could be put to some excellent use. We just need a contributor to pick up the mission. Regards, Branden
signature.asc
Description: PGP signature