Hello,
We use PDFBox to read the XMP metadata of PDF documents in the Factur-X
standard, a Franco-German e-invoicing standard.
The XML schema corresponding to this metadata is quite simple, and retrieving
the values are perfectly working with the
org.apache.xmpbox.XMPMetadata.getSchema(String) method.
By default, the prefix is fx :
<rdf:Description
xmlns:fx="urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#" rdf:about="">
<fx:DocumentType>INVOICE</fx:DocumentType>
<fx:DocumentFileName>factur-x.xml</fx:DocumentFileName>
<fx:Version>1.0</fx:Version>
<fx:ConformanceLevel>BASIC</fx:ConformanceLevel>
</rdf:Description>
In one case, there were a document with two schemas with the same namespace
URI, but different prefixes (fx and zf)
I tried the org.apache.xmpbox.XMPMetadata.getSchema(String, String) method,
which according to the documentation seems to handle this case by filtering by
prefix.
I got a NullPointerException from this method (line 268), because the prefix of
the Factur-x schema in the org.apache.xmpbox.XMPMetadata.schemas collection was
null.
So, I've run tests with a hundred example files provided by the Factur-X
consortium, and it seems that for any file, the schema with the Factur-X URI
always gets a null prefix, regardless of whether one or more schemas exist with
this namespace.
This raise two points :
1. If the prefix can be null, the getSchema(String, String) method should
handle it.
2. Is the Factur-X metadata specification a correct XMP standard, or is
there a bug in the prefix parsing ?
Here's the PDF document : [Icône pdf]
pdfExemple.pdf<https://cegidgroup-my.sharepoint.com/:b:/g/personal/sbabin_cegid_com/EVN8vpGbR1pEvaOuoIjyvfQBuhV1ZWFlYfAIKMfuAhd6Aw?e=cahEv2>
Here's the code I use to retrieve the Factur-X metadata values :
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
import org.apache.xmpbox.XMPMetadata;
import org.apache.xmpbox.schema.XMPSchema;
import org.apache.xmpbox.xml.DomXmpParser;
import org.apache.xmpbox.xml.XmpParsingException;
public class FacturX {
public static void main(String[] args) throws XmpParsingException,
IOException {
try {
File finputFile = new File(args[0]);
PDDocument doc = PDDocument.load(finputFile);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDMetadata m = catalog.getMetadata();
InputStream xmlInputStream = m.createInputStream();
DomXmpParser p = new DomXmpParser();
p.setStrictParsing(false);
XMPMetadata metadata = p.parse(xmlInputStream);
// Getting the factur-x schema with the default "fx" prefix
(case of two factur-x schemas with different prefixes)
XMPSchema fx = metadata.getSchema("fx",
"urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#");
// If there is no schema with fx prefix, searching for the
schema only with the namespace URI
if (fx == null) {
fx =
metadata.getSchema("urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#");
}
if (fx == null) {
System.out.println("This PDF document is not a valid
Factur-X file");
} else {
String conformanceLevel =
fx.getUnqualifiedTextPropertyValue("ConformanceLevel");
String documentType =
fx.getUnqualifiedTextPropertyValue("DocumentType");
String version =
fx.getUnqualifiedTextPropertyValue("Version");
String documentFileName =
fx.getUnqualifiedTextPropertyValue("DocumentFileName");
}
} catch (XmpParsingException | IOException e) {
e.printStackTrace();
}
}
}
Thanks for your help,
Sylvère Babin
Developer
Cegid est susceptible d'effectuer un traitement sur vos données personnelles à
des fins de gestion de notre relation commerciale. Pour plus d'information,
consultez https://www.cegid.com/fr/privacy-policy
Ce message et les pièces jointes sont confidentiels et établis à l'attention
exclusive de ses destinataires. Toute utilisation ou diffusion, même partielle,
non autorisée est interdite. Tout message électronique est susceptible
d'altération; Cegid décline donc toute responsabilité au titre de ce message.
Si vous n'êtes pas le destinataire de ce message, merci de le détruire et
d'avertir l'expéditeur.
Cegid may process your personal data for the purpose of our business
relationship management. For more information, please visit our website
https://www.cegid.com/en/privacy-policy
This message and any attachments are confidential and intended solely for the
addressees. Any unauthorized use or disclosure, either whole or partial is
prohibited. E-mails are susceptible to alteration; Cegid shall therefore not be
liable for the content of this message. If you are not the intended recipient
of this message, please delete it and notify the sender.