Package: poppler-utils
Version: 0.12.4-1.2
Severity: normal

When I convert
<URL: http://nrk.no/contentfile/file/1.8116520!offentligjournal02052012.pdf >
to XML using

  pdftohtml -xml -noframes 1.8116520\!offentligjournal02052012.pdf

I get the following content-less XML file.  I find this rather strange,
as the PDF is searchable using xpdf, okular and evince.  Any idea where
the text went?  Anything I can do to get access to the text as XML?

This is the output I get:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">

<pdf2xml>
<page number="1" position="absolute" top="0" left="0" height="792" width="612">
        <fontspec id="0" size="18" family="Helvetica" color="#000000"/>
        <fontspec id="1" size="5" family="Helvetica" color="#000000"/>
        <fontspec id="2" size="5" family="Helvetica" color="#000000"/>
        <fontspec id="3" size="7" family="Helvetica" color="#000000"/>
</page>
<page number="2" position="absolute" top="0" left="0" height="792" width="612">
        <fontspec id="4" size="6" family="Helvetica" color="#000000"/>
</page>
<page number="3" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="4" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="5" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="6" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="7" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="8" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="9" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="10" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="11" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="12" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="13" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="14" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="15" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="16" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="17" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="18" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="19" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="20" position="absolute" top="0" left="0" height="792" width="612">
</page>
</pdf2xml>

-- System Information:
Debian Release: 6.0.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-5-686 (SMP w/1 CPU core)
Locale: LANG=nb_NO.UTF-8, LC_CTYPE=nb_NO.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages poppler-utils depends on:
ii  libc6              2.11.3-3              Embedded GNU C Library: Shared lib
ii  libfontconfig1     2.8.0-2.1             generic font configuration library
ii  libgcc1            1:4.4.5-8             GCC support library
ii  libpoppler5        0.12.4-1.2            PDF rendering library
ii  libstdc++6         4.4.5-8               The GNU Standard C++ Library v3
ii  libxml2            2.7.8.dfsg-2+squeeze4 GNOME XML library

Versions of packages poppler-utils recommends:
ii  ghostscript                 8.71~dfsg2-9 The GPL Ghostscript PostScript/PDF

poppler-utils suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to