On 4/6/12, Albert Astals Cid <[email protected]> wrote:
> El Diumenge, 1 d'abril de 2012, a les 11:57:59, Ihar `Philips` Filipau va
> escriure:
>> Add version to produced XML file.
>
> This needs an update to the dtd too, doesn't it?
No Clue. Not really an XML specialist. My XML reader (libxml2 based)
has optional DTD validation which I have never used. Otherwise, I have
no idea why DTD is even needed - to me it kind of defies purpose of
XML.
Considering that Googling revealed about 7 distinctly different
pdf2xml.dtd's, I think the best change in the area could have been
*removal* of the DTD. Or at least renaming it into something else, if
it is really needed. But that is too much of a change.
Now bit more seriously. Is it possible to extract PDF file properties
(producer, date, etc) in some easier way, than what is present in the
pdfinfo tool? It uses the PDFDoc::getDocInfo() to access the
dictionary and then parses the data ... well, pretty much manually.
Manually assembling unicode characters, surrogate pairs, UnicodeMap
and all. If poppler has a method to parse the data for me, then I
would love to include the info into the XML output too. If no, then
let it be.
P.S. The patch for the poppler version information in XML and DTD attached.
diff --git a/utils/HtmlOutputDev.cc b/utils/HtmlOutputDev.cc
index 3d8836b..02699e4 100644
--- a/utils/HtmlOutputDev.cc
+++ b/utils/HtmlOutputDev.cc
@@ -1192,7 +1192,7 @@ HtmlOutputDev::HtmlOutputDev(Catalog *catalogA, char *fileName, char *title,
{
fprintf(page, "<?xml version=\"1.0\" encoding=\"%s\"?>\n", htmlEncoding->getCString());
fputs("<!DOCTYPE pdf2xml SYSTEM \"pdf2xml.dtd\">\n\n", page);
- fputs("<pdf2xml>\n",page);
+ fprintf(page,"<pdf2xml producer=\"%s\" version=\"%s\">\n",PACKAGE_NAME,PACKAGE_VERSION);
}
else
{
diff --git a/utils/pdf2xml.dtd b/utils/pdf2xml.dtd
index 389676c..3dcfb11 100644
--- a/utils/pdf2xml.dtd
+++ b/utils/pdf2xml.dtd
@@ -1,5 +1,9 @@
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT pdf2xml (page+, outline?)>
+<!ATTLIST pdf2xml
+ producer CDATA ""
+ version CDATA ""
+>
<!ELEMENT page (fontspec*, image*, text*)>
<!ATTLIST page
number CDATA #REQUIRED
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler