On 24/4/25 10:31, Max Nikulin wrote:
By the way, PDF files may be tagged for screen readers. Is there a
dedicated structure to explicitly mark tables? It would be the best
source for data extraction.
ISO 14289 is an accessibility standard for PDF. It allows for the
creation of a "Tagged PDF" where semantic information, including table
structures (<Table>, <TR>, <TH>, <TD>), can be embedded in a separate
logical structure tree
You can download it for free at https://pdfa.org/resource/iso-14289-pdfua/
Whether your PDF generator uses it is another matter, as is whether your
PDF reading module can handle it.