1) I compared the rendering of pdftohtml with [-c], [-s] and [-c -s]. The options -c -s don't generate xhtml because you're putting several <html> in the html file (like with -s). You could just merge the contents of <style>, do the same for the contents of <body> and then obtain an xhtml file. I tried to modify your code to do it but I really didn't succeed to handle it... This is why I did a XSL and not a patch. Maybe you will succeed to do it. Would it be easy for you ?
Did you notice that using -c give a different cuttering of the text (more precise) and a better rendering of the font-size, font-color and text-alig ? Is it normal ? 2) I did look carefully at your file and it suits me well ! I'm looking forward using your next stable version ! 3) Like I said in 1), I don't handle your code, but I hope you will find how to manage right-to-left text ! Justine 2011/9/24 Josh Richardson <[email protected]> > Sorry for the delay — been on an airplane all day — and had a lot of emails > to read on the list. ;-) > > 1) You can use both –s and –c at the same time. > 2) Ok, was worth a shot. I've lost track a little bit where the code base > is — I haven't yet contributed back everything, just because it takes time > to format the patches. I definitely have code that embeds the size of each > paragraph — well, at least I think it's what you want. I've attached a > sample file — let me know. > 3) I'm a little surprised, but yes, I confirmed that the Arabic shows up in > the wrong direction even in my version. Looks like we'll need to do some > work to make it handle right-to-left text correctly. If you want to write > the patch, contact me off-list and I'll try and help you do it. > > --josh > > From: Justine Guillaumont <[email protected]> > Date: Fri, 23 Sep 2011 04:35:52 -0700 > To: "[email protected]" <[email protected]> > Subject: [poppler] pdftohtml (width-height and Arabic pdf) > > Hi, > > It seems that the subject from my fisrt email has diverged... I open this > new subject to let you finish your conversation on the other. > > Thank you for your advice Josh. I finally succed to built the latest > version of the GIT ! But my problems are the same... > > 1) pdftohtml -c generate indeed xhtml but I prefer the display of pdftohtml > -s (all the pages in one html). I will keep (and modify) my xsl to obtain > xhtml with pdftohtml -s > > 2) the <div> I was talking about (in version 0.16.7) has been replace by > <p> in the lastest version, and they don't contain width and height > either... > Example : <P style="position:absolute;top:2187px;left:364px;white-space: > nowrap" class="ft01"> > > 3) I tryed severals arabic pdf with the lastest version and I did obtain > the same results (with pdftohtml -c and pdftohtml -s) : all the text is > backwards (see enclusure). Do have one arabic pdf that has a good rendering > ? > > Justine >
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
