El dimecres, 5 de febrer de 2020, a les 12:20:10 CET, Albretch Mueller va escriure: > pdftotext has the option > > -layout : maintain original physical layout > > but pdftohtml doesn't
pdftotext and pdftohtml use different code/algorithms, you'd have to see if one can be adapted/improved for the other. Cheers, Albert > > $ pdftohtml --help > pdftohtml version 0.48.0 > Copyright 2005-2016 The Poppler Developers - http://poppler.freedesktop.org > Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch > Copyright 1996-2011 Glyph & Cog, LLC > > Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>] > -f <int> : first page to convert > -l <int> : last page to convert > -q : don't print any messages or errors > -h : print usage information > -? : print usage information > -help : print usage information > --help : print usage information > -p : exchange .pdf links by .html > -c : generate complex document > -s : generate single document that includes all pages > -i : ignore images > -noframes : generate no frames > -stdout : use standard output > -zoom <fp> : zoom the pdf document (default 1.5) > -xml : output for XML post-processing > -hidden : output hidden text > -nomerge : do not merge paragraphs > -enc <string> : output text encoding name > -fmt <string> : image file format for Splash output (png or jpg) > -v : print copyright and version info > -opw <string> : owner password (for encrypted files) > -upw <string> : user password (for encrypted files) > -nodrm : override document DRM settings > -wbt <fp> : word break threshold (default 10 percent) > -fontfullname : outputs font full name > $ > ~ > is it some sort of "hidden" parameter?, or, how do work around it? > > lbrtchx > _______________________________________________ > poppler mailing list > [email protected] > https://lists.freedesktop.org/mailman/listinfo/poppler > _______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
