pdftotext has the option -layout : maintain original physical layout
but pdftohtml doesn't $ pdftohtml --help pdftohtml version 0.48.0 Copyright 2005-2016 The Poppler Developers - http://poppler.freedesktop.org Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch Copyright 1996-2011 Glyph & Cog, LLC Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>] -f <int> : first page to convert -l <int> : last page to convert -q : don't print any messages or errors -h : print usage information -? : print usage information -help : print usage information --help : print usage information -p : exchange .pdf links by .html -c : generate complex document -s : generate single document that includes all pages -i : ignore images -noframes : generate no frames -stdout : use standard output -zoom <fp> : zoom the pdf document (default 1.5) -xml : output for XML post-processing -hidden : output hidden text -nomerge : do not merge paragraphs -enc <string> : output text encoding name -fmt <string> : image file format for Splash output (png or jpg) -v : print copyright and version info -opw <string> : owner password (for encrypted files) -upw <string> : user password (for encrypted files) -nodrm : override document DRM settings -wbt <fp> : word break threshold (default 10 percent) -fontfullname : outputs font full name $ ~ is it some sort of "hidden" parameter?, or, how do work around it? lbrtchx _______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
