On Thu, Apr 24, 2025 at 08:48:43AM +0200, to...@tuxteam.de wrote: > On Thu, Apr 24, 2025 at 11:32:23AM +0800, jeremy ardley wrote: > > > > On 24/4/25 10:31, Max Nikulin wrote: > > > > > > By the way, PDF files may be tagged for screen readers. Is there a > > > dedicated structure to explicitly mark tables? It would be the best > > > source for data extraction. > > > > > > ISO 14289 is an accessibility standard for PDF. It allows for the creation > > of a "Tagged PDF" where semantic information, including table structures > > (<Table>, <TR>, <TH>, <TD>), can be embedded in a separate logical structure > > tree > >
Disclaimer: I deal with some accessibility documentation in my day job. The problem is that very few authors know this - and very few tools support tagging. Adobe Acrobat is about the best but the $$ versions. Informal advice is always "Write it in Word, then let Word convert it to PDF" That works if the author is disciplined and knows how to tag, heading orders and so on - but it can still produce tagged PDFs that are nominally accessible to screen readers but practically unusable. The result is that PDFs may well be completely fine as a secure archival format, non-modifiable, readable everywhere - and useless to a segment of the population which is blind or visually impaired. . Deque University - deque.com - has a whole series of accessibility courses and a couple of *long* ones on how to write a PDF :( This also goes for HTML wihich has to be well written and tagging images with alt-text and so on. There is an ARIA standard which helps make the web more accessible but that's an adjunct, to be used over and above well-written HTML and CSS. All best, as ever, Andy (amaca...@debian.org) > > You can download it for free at https://pdfa.org/resource/iso-14289-pdfua/ > > Oh, thanks for this one :) > > Cheers > -- > tomás