El Dilluns, 9 de març de 2015, a les 20:36:12, Børre Gaup va escriure: > Hi!
Hi > > pdftohtml -xml sometimes produces invalid xml, resulting in lines like this: > > <text top="218" left="142" width="532" height="16" font="1"><i>lea <b>sadji > </b></i>(korreláhta),<b> <i>gosa mannat </b>/ Minulla on <b>paikka</b> > </i>(korreláhta)<b>, <i>jonne </i></b></text> > > In our collection of 1078 pdfs, pdftohtml produces 11 documents with this > 'opening and ending tag mismatch' error. > > I did some changes in utils/HtmlOutputDev.cc that make those 11 documents > wellformed and do not break the wellformedness of the other documents. > > The changes I did is found here: > https://github.com/albbas/poppler/compare/fix_xml_wellformedness > > I also made a diff (8174 lines) which shows what kind of changes this > version makes on our 1078 pdf documents compared to pdftothml 0.30.0. That > diff is found here: > https://github.com/albbas/poppler/blob/fix_xml_wellformedness/all-pdf.diff > > Would you be interested in incorporating these changes into the main branch? Can you please link to a pdf with such error (if you don't have an internet link i'd suggest opening a bug in bugs.freedesktop.org and attaching both the patch and the file there). Cheers, Albert > > Regards, > Børre Gaup > > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
