Some of this is really cool, and ties in with a couple things I’ve tried in the 
past.

John Gardner <gardnerjo...@gmail.com> wrote:

> *1. Handling semantics*
> We all know you can't draw semantics from cold, low-level formatting
> commands. But for certain contexts - hierarchically sorted documents,
> consistently indented code-samples and tables marked as tables, I believe
> (okay, *hoping)* it's possible to reconstruct meaning from... well, stuff
> that looks like this:
> 
> n12000 0 V84000 H72000
> x X devtag:.NH 1
> x font 36 TB
> f36s10950V84000H72000
> 
> How? See the x X devtag line? That's what inspired this whole landslide of
> absurd ambition. I wondered what we could do if more metadata were provided
> that way – as device-specific control strings from, say, a preprocessor.

So you’re going to insert devtags to pass semantic info to the postprocessor?
Cool idea. I wrote a script called “htbl” some years back to go with grothml; it
turns a subset of tbl markup into HTML tables. I never thought of using devtags
to mark rows/cells like that; it might have worked better.

> ...
> 
> We know the widths and heights of each mounted device-font, their
> kerning-pairs, ligatures, and lord knows what else. We milk this for all
> it's worth: by plotting each glyph's bounding box in a scaled space
> representing the output medium, we identify the most obvious constructs
> first.

That’s pretty similar to the PDF-to-markup thing I blithered about earlier.
I think a more skilled programmer than myself (I’m a jumped up tech writer)
could really make it work well… although as i said before, each document
is unique. Personally, I think you’re going to have better luck passing semantic
hints through as you mentioned above. But it does sound like fun! I hope you
keep us posted.

        Larry


Reply via email to