On Sun, Dec 19, 2004 at 01:39:49PM -0800, Karsten M. Self wrote: > on Wed, Dec 01, 2004 at 01:48:33AM +0100, Gerard Robin ([EMAIL PROTECTED]) > wrote: > > Hello, > > > > I have a few problems with pdftohtml (unstable) : > > > > with one pdf file I get a suitable html file but with another one I get an > > unreadable html file. > > > > I tried "pdftohtml -c -l 1 file.pdf" but the output is always unreadable > > and I get the message: > > > > free(): invalid pointer 0x80f02e0! > > Page-1 > > > > > > However xpdf (or gv) displays correctly this file.pdf. > > > > I guess that the problem comes out of the feature of this pdf file and > > I would like to know if it > > Note first that 'PDF' isn't a simple file format. Some PDFs are little > more than marked-up text, others are essentially large image files > (scanned in faxes from lawyers, such as are posted to Groklaw, are > infamous for this). > > There are also a few different versions of the PDF and PS formats. > > > If you can post or point to the file you're trying to convert, this > could be helpful. Knowing how that file was created and with what > tools, ditto. > > 'ps2ps' on a Postscript file sometimes works around bugs that stymie > some viewers (or printers). It's a roundabout way, but: > > pdf2ps file.pdf file.ps > ps2ps file.ps file-new.ps > ps2pdf file-new.ps file-new.pdf > pdftohtml file-new.pdf file-new.html > > ...might get you somewhere. Most likely, a really broken hash of a > file. > > > Alternatively, if the source of the PDF file is available, converting > *it* to HTML directly should provide far superior results.
I have joined the pdftohtml-general list and I obtained part of the solution: We have to copy the file /etc/xpdf/xpdfrc in our home directoty (.xpdfrc)and add int it the line: unicodeMap Latin2 /usr/share/xpdf/latin2/Latin2.unicodeMap After that, we must launch the command: pdftohtml -enc Latin2 file.pdf Normaly we expected a file: file.html, but I obtained : segmentation fault ;-) I tried again pdftohtml -c -enc Latin2 file.pdf and then it works. The result was better than with the command: pdftohtml file.pdf, but it was not perfect yet: The accents are almost right except the è and the ê and the underline (image.png) which was not in the right place. The user of the list pdftohtml-general who helped me was surprised that the command: pdftohtml -enc Latin2 file.pdf gave me segmentation fault whereas for him this command worked fine. He wondered if it was my OS (unstable) which had problem ? There is the link where the pdf file (cobjet.pdf) that I use is located: http://perso.wanadoo.fr/aymeric.sabine/developpement/bibliotheque/c/libal.zip thanks. -- Gerard -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]