Package: poppler-utils
Version: 0.10.6-1
Severity: wishlist
File: /usr/bin/pdftotext
X-debbugs-Cc: der...@foolabs.com

pdftotext has no option to copy link locations from the document into
the output stream.

E.g, here are 50 or so email addresses that one has to dig straight out
of the PDF, as one never would get to see them in any pdftotext output,

$ GET http://portal.gsdi.org/files/?artifact_id=448 |
perl -nwle 'print for /\...@\w+\.[\w.]+/g'|wc -l
51

Those were the "/Subtype/Link/A<</URI(mailto:"; mailto links. I bet
pdftotext does no better for http links.

How to print them, e.g.,
Nordsburg[http://nordsburg.net/]
Nordsburg(http://nordsburg.net/)
Nordsburg (http://nordsburg.net/)
etc. is up to you.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to