> man pdftotext says -raw is no longer recommended, but > $ wget \ > http://www.phac-aspc.gc.ca/ncfv-cnivf/familyviolence/pdfs/husbandenglish.pdf > $ pdftotext -raw -f 12 -l 13 -enc Latin1 husbandenglish.pdf > $ pdftotext -f 12 -l 13 -enc Latin1 husbandenglish.pdf husbandenglishNOraw.txt > $ less +/11 husbandenglish.txt > $ less +/11 husbandenglishNOraw.txt > Show that it is a lifesaver around the area of the 11. > Therefore the man page shouldn't say it is so bad. > In fact it should mention how to make it the default in the config > file. It seems that there is no way to put it in the config file.
Regarding that PDF file -- I'm seeing a misordering of columns on page 12 (the one with the title "The Debate about Husband Abuse". Are you seeing any other problems besides that? The reason that raw mode is marked as deprecated is that it's not reliable. I'm not planning to remove it any time soon, but the plan is to make the other two modes better. In particular, you probably don't want to make it your default because it will occasionally mess up. (For example, I've seen two-column PDF files where the text was drawn in the order "column 1, line 1", "col 2, line 1", "col 1, line 2", "col 2, line 2", etc. In that case, raw mode would have produced something completely unreadable.) So the short answer is: reading order (the default) mode still needs work, but the plan is to eventually fix it. - Derek -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]