There's a open source tool named OCRmyPDF which claims to do what you're trying
to do: see https://github.com/fritz-hh/OCRmyPDF
As far as I understand, it makes use of standard GNU/Linux software and produces
a searchable pdf file (which implies in my understanding that the text is
extractable). I haven't used this tool. Maybe, the source code could give you
some hints.
-- 
Regards,
jvp.



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/m35bvo$735$1...@ger.gmane.org

Reply via email to