On 25/08/2017 19:17, Alex wrote:
I'm trying to use the poppler-utils to work with a spamassassin plugin
to extract text from PDFs that may be malicious.

antiphish tools, admiration


Would someone be interested in trying to extract the URL from within
this PDF for me?

cat pdf-phish.pdf | tr '\0 ' '\n' | grep http | sed 's/.*[(<]\(http.*\)[>)].*/\1/'

in case, you can add some other tag delimitation chars inside square brackets


Is there a big difference between version 0.45 and
the latest that may affect this?

0.57.0 does not find the link with that PDF, not know why

Valerio
_______________________________________________
poppler mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to