> On Sep 3, 2019, at 10:46 PM, Jörn Franke <jornfra...@gmail.com> wrote: > > PDF is a problematic format as headers and footers are not specified per se > as headers and footers in the document, but only as drawing instructions on > the page. There is no chance for a software to find them based on the > structure.
I worked with someone who observed that turning PDF back into structured text was about as likely as turning hamburger back into a cow. PDF throws away the structure, then turns text into instructions for monkeys with rubber stamps. In order to find word breaks, the program needs the width of every character in the current font. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)