Some of this is really cool, and ties in with a couple things I’ve tried in the past.
John Gardner <gardnerjo...@gmail.com> wrote: > *1. Handling semantics* > We all know you can't draw semantics from cold, low-level formatting > commands. But for certain contexts - hierarchically sorted documents, > consistently indented code-samples and tables marked as tables, I believe > (okay, *hoping)* it's possible to reconstruct meaning from... well, stuff > that looks like this: > > n12000 0 V84000 H72000 > x X devtag:.NH 1 > x font 36 TB > f36s10950V84000H72000 > > How? See the x X devtag line? That's what inspired this whole landslide of > absurd ambition. I wondered what we could do if more metadata were provided > that way – as device-specific control strings from, say, a preprocessor. So you’re going to insert devtags to pass semantic info to the postprocessor? Cool idea. I wrote a script called “htbl” some years back to go with grothml; it turns a subset of tbl markup into HTML tables. I never thought of using devtags to mark rows/cells like that; it might have worked better. > ... > > We know the widths and heights of each mounted device-font, their > kerning-pairs, ligatures, and lord knows what else. We milk this for all > it's worth: by plotting each glyph's bounding box in a scaled space > representing the output medium, we identify the most obvious constructs > first. That’s pretty similar to the PDF-to-markup thing I blithered about earlier. I think a more skilled programmer than myself (I’m a jumped up tech writer) could really make it work well… although as i said before, each document is unique. Personally, I think you’re going to have better luck passing semantic hints through as you mentioned above. But it does sound like fun! I hope you keep us posted. Larry