> man pdftotext says -raw is no longer recommended, but
> $ wget \
> http://www.phac-aspc.gc.ca/ncfv-cnivf/familyviolence/pdfs/husbandenglish.pdf
> $ pdftotext -raw -f 12 -l 13 -enc Latin1 husbandenglish.pdf
> $ pdftotext -f 12 -l 13 -enc Latin1 husbandenglish.pdf husbandenglishNOraw.txt
> $ less +/11 husbandenglish.txt
> $ less +/11 husbandenglishNOraw.txt
> Show that it is a lifesaver around the area of the 11.
> Therefore the man page shouldn't say it is so bad.
> In fact it should mention how to make it the default in the config
> file. It seems that there is no way to put it in the config file.

Regarding that PDF file -- I'm seeing a misordering of columns on page
12 (the one with the title "The Debate about Husband Abuse".  Are you
seeing any other problems besides that?

The reason that raw mode is marked as deprecated is that it's not
reliable.  I'm not planning to remove it any time soon, but the plan is
to make the other two modes better.  In particular, you probably don't
want to make it your default because it will occasionally mess up.
(For example, I've seen two-column PDF files where the text was drawn in
the order "column 1, line 1", "col 2, line 1", "col 1, line 2", "col 2,
line 2", etc.  In that case, raw mode would have produced something
completely unreadable.)

So the short answer is: reading order (the default) mode still needs
work, but the plan is to eventually fix it.

- Derek



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to