On Tue 22 Jul 2025 at 10:14:37 (-0500), Richard Owlett wrote: > On 7/20/25 5:52 AM, Richard Owlett wrote: > > I'm running Debian 12.8. > > > > I have a 100+ page PDF document. > > I wish to extract 2 of those pages, each to their own PDF file.
[ … ] > I should have put more "em-FAY-sis" on my goal for this thread being > learning how to extract specific pages of a large PDF document.[1] I > had not fully appreciated how graphically oriented the PDF format is. > > The sub-goal being to perceive the the byte level structure of *that* > page in order to extract the semantic content perceived by a human. I > would then edit/reformat the content to be *useful* to a different > target audience. It's very simple to burst a document into individual pages with pdftk: $ pdftk document.pdf burst $ The pages, named pg_0001.pdf, pg_0002.pdf, etc. will be in the working directory, and it may create a file doc_data.txt containing some metadata, which you can ignore. Be warned that it will overwrite files with these names if previously existing, so do it in the right place. (I use a script that bursts into a temporary directory and then uses mv -i to move them with more control.) Cheers, David.