> On Sep 3, 2019, at 10:46 PM, Jörn Franke <jornfra...@gmail.com> wrote:
> 
> PDF is a problematic format as headers and footers are not specified per se 
> as headers and footers in the document, but only as drawing instructions on 
> the page. There is no chance for a software to find them based on the 
> structure.

I worked with someone who observed that turning PDF back into structured text 
was about as likely as turning hamburger back into a cow.

PDF throws away the structure, then turns text into instructions for monkeys 
with rubber stamps. In order to find word breaks, the program needs the width 
of every character in the current font. 

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

Reply via email to