Bug#676238: Unable to convert PDF to xml using pdftohtml (empty pages)

2012-06-29 Thread Petter Reinholdtsen
[Pino Toscano] >> Any idea where the text went? Anything I can do to get access to >> the text as XML? > > Note adding also -hidden to the arguments makes the text show up in the > XML output. Thank you for the hint. It provide me with a workaround that allow my PDF scraper to work. No idea

Bug#676238: Unable to convert PDF to xml using pdftohtml (empty pages)

2012-06-21 Thread Pino Toscano
forwarded 676238 https://bugs.freedesktop.org/show_bug.cgi?id=50739 found 676238 poppler/0.18.4-2 tag 676238 + confirmed thanks Hi Petter, Alle martedì 5 giugno 2012, Petter Reinholdtsen ha scritto: > Package: poppler-utils > Version: 0.12.4-1.2 Hm it is an old poppler (the one in stable), thoug

Bug#676238: Unable to convert PDF to xml using pdftohtml (empty pages)

2012-06-05 Thread Petter Reinholdtsen
I've also reported this upstream, https://bugs.freedesktop.org/show_bug.cgi?id=50739 >. -- Happy hacking Petter Reinholdtsen -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#676238: Unable to convert PDF to xml using pdftohtml (empty pages)

2012-06-05 Thread Petter Reinholdtsen
Package: poppler-utils Version: 0.12.4-1.2 Severity: normal When I convert http://nrk.no/contentfile/file/1.8116520!offentligjournal02052012.pdf > to XML using pdftohtml -xml -noframes 1.8116520\!offentligjournal02052012.pdf I get the following content-less XML file. I find this rather stran