Re: [OCR] Extract text layer, fix errors, re-import?

Gilles Fri, 30 Aug 2024 03:47:28 -0700

Never mind: I'll just convert the PDF to EPUB, and edit the HTML filesit contains.


On 29/08/2024 21:08, Gilles wrote:

Hello,
I noticed some typos in the text layer added by an OCR into a "bitmap"PDF, ie. pages are actually scanned pages.
I first tried opening the EPUB generated by Abbyy Finereader, butLibreOffice couldn't open it at all, while Sigil could after showingan error message but lacks a French dictionary to run the job (as faras I can tell).
As an alternative, pdftotext or mutool (convert) can extract the textlayer from such PDF, but can they put it back after I fixed the typos?
Thank you.

Re: [OCR] Extract text layer, fix errors, re-import?

Reply via email to