Calibre uses "pdftohtml" to convert PDF files into other formats.
Older "pdftohtml" provides wrong output around surrogate pair characters.
This makes choke Python lxml library.
Use the newest "pdftohtml" to solve this problem.
Install the newest "poppler-utils" package (0.85.0-2) from Debian unst
The official version is still based on Python2, and the error message indicated
problems with Python3 which turned around the complete character handling.
Basically it means that some parts are not Python3 ready.
If you can provide a small pdf that fails, please send it here or personally to
me
I've got exactly the same error (the one with the surrogates not
allowed) with way too many PDFs. Already since months, always using the
newest calibre version in testing.
Now I've just installed the official version as described in
https://calibre-ebook.com/download_linux
and that version does
3 matches
Mail list logo