There's a open source tool named OCRmyPDF which claims to do what you're trying to do: see https://github.com/fritz-hh/OCRmyPDF As far as I understand, it makes use of standard GNU/Linux software and produces a searchable pdf file (which implies in my understanding that the text is extractable). I haven't used this tool. Maybe, the source code could give you some hints. -- Regards, jvp.
-- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/m35bvo$735$1...@ger.gmane.org