Dear list, I use the Python bindings for Poppler (through GObject introspection) to extract some metadata from PDF documents.
Here is a minimal script:
import sys
import os
import gi
gi.require_version('Poppler', '0.18')
from gi.repository import Poppler
gi.require_version('Gst', '1.0')
from gi.repository import Gst
Gst.init(sys.argv)
pdf = "a.pdf"
uri = Gst.filename_to_uri(os.path.abspath(pdf))
doc = Poppler.Document.new_from_file(uri, None)
title = doc.get_title()
print(title)
Is there a way that I can extract the /Lang value from the /Catalog
dictionary? (Attached PDF document with that entry.)
I’m afraid I searched https://lazka.github.io/pgi-docs/, but I wasn’t
able to find anything that could give the language from the document.
Many thanks for your help,
Pablo
a.pdf
Description: Adobe PDF document
