A Diumenge 05 Octubre 2008, Warren Toomey va escriure: > pdftohtml used to have a "raw" mode which has been removed. In "raw" mode, > text from a PDF document is processed in the order that it occurs. However, > the current version of pdftohtml reorders the text to be in increasing > y-value, i.e. from the top of a page going down to the bottom. > > This text reordering plays merry havoc with multi-column pages, as the text > from the columns becomes interleaved instead of remaining separate. > The attached patch restores the -raw command-line option to pdftohtml. The > program retains its current behaviour if the -raw option is not used, but > reverts to the "text as it appears" behaviour with the -raw option enabled.
I've had a look at all the pdftohtml tarballs present at http://sourceforge.net/project/showfiles.php?group_id=45839 and none of them had the raw option enabled for the user to use. Are you sure this is ok to enable? Albert > > Cheers, > Warren _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
