Package: poppler-utils Version: 0.12.4-1.2 Severity: normal When I convert <URL: http://nrk.no/contentfile/file/1.8116520!offentligjournal02052012.pdf > to XML using
pdftohtml -xml -noframes 1.8116520\!offentligjournal02052012.pdf I get the following content-less XML file. I find this rather strange, as the PDF is searchable using xpdf, okular and evince. Any idea where the text went? Anything I can do to get access to the text as XML? This is the output I get: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd"> <pdf2xml> <page number="1" position="absolute" top="0" left="0" height="792" width="612"> <fontspec id="0" size="18" family="Helvetica" color="#000000"/> <fontspec id="1" size="5" family="Helvetica" color="#000000"/> <fontspec id="2" size="5" family="Helvetica" color="#000000"/> <fontspec id="3" size="7" family="Helvetica" color="#000000"/> </page> <page number="2" position="absolute" top="0" left="0" height="792" width="612"> <fontspec id="4" size="6" family="Helvetica" color="#000000"/> </page> <page number="3" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="4" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="5" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="6" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="7" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="8" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="9" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="10" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="11" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="12" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="13" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="14" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="15" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="16" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="17" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="18" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="19" position="absolute" top="0" left="0" height="792" width="612"> </page> <page number="20" position="absolute" top="0" left="0" height="792" width="612"> </page> </pdf2xml> -- System Information: Debian Release: 6.0.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: i386 (i686) Kernel: Linux 2.6.32-5-686 (SMP w/1 CPU core) Locale: LANG=nb_NO.UTF-8, LC_CTYPE=nb_NO.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages poppler-utils depends on: ii libc6 2.11.3-3 Embedded GNU C Library: Shared lib ii libfontconfig1 2.8.0-2.1 generic font configuration library ii libgcc1 1:4.4.5-8 GCC support library ii libpoppler5 0.12.4-1.2 PDF rendering library ii libstdc++6 4.4.5-8 The GNU Standard C++ Library v3 ii libxml2 2.7.8.dfsg-2+squeeze4 GNOME XML library Versions of packages poppler-utils recommends: ii ghostscript 8.71~dfsg2-9 The GPL Ghostscript PostScript/PDF poppler-utils suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org