Hi, I'm attempting to use pdftohtml and pdftotext on fedora25 (poppler-utils-0.45.0-5.fc25.x86_64) and I'm unable to get it to extract the text from a particular PDF I need.
I'm trying to use the poppler-utils to work with a spamassassin plugin to extract text from PDFs that may be malicious. Here is one such example: https://www.dropbox.com/s/b97pcvl1wm1oocq/pdf-phish.pdf?dl=0 It appears to extract the header information (author, date, etc) but no text from within the PDF. Would someone be interested in trying to extract the URL from within this PDF for me? Is there a big difference between version 0.45 and the latest that may affect this? It would require compiling it here locally. podofopdfinfo is able to identify the URL within the PDF, but I'm not sure if that's helpful. Any ideas greatly appreciated. _______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
