I wanted to have a paper hardcopy of a (simple) web page to hand out to new people here at work, but I wasn't happy with the results from either Firefox or wkhtmltopdf. (Due to my viewing preferences, the output from Firefox was completely unusable [why aren't there different settings for screen display and printing?], and wkhtmltopdf had the nasty habit of introducing subtle random-seeming irregular spacing between the letters of every word [wtf?]). So I decided to roll my own.
My solution consists of a sed script that converts the html stream into a roff-compatible line-oriented format, and a small set of macros that reads a sort of style sheet and typesets the converted input accordingly. There is no real box model, therefore also no floats and side-by-side stuff, and classes and inline-styles aren't supported either; neither are tables (my web page didn't require them). But for very simple HTML pages it works quite well. Oh yes, and it's *blazingly fast* compared to the other options. I'm not sure when (or if) I will continue working on the project, as it already does what it was designed to do. But if anybody is interested in grabbing some ideas, you can find it in ~hoffmann/roff/html on www.usm.uni-muenchen.de.
