On Tue 22 Jul 2025 at 10:14:37 (-0500), Richard Owlett wrote:
> On 7/20/25 5:52 AM, Richard Owlett wrote:
> > I'm running Debian 12.8.
> > 
> > I have a 100+ page PDF document.
> > I wish to extract 2 of those pages, each to their own PDF file.

[ … ]

> I should have put more "em-FAY-sis" on my goal for this thread being
> learning how to extract specific pages of a large PDF document.[1] I
> had not fully appreciated how graphically oriented the PDF format is.
> 
> The sub-goal being to perceive the the byte level structure of *that*
> page in order to extract the semantic content perceived by a human. I
> would then edit/reformat the content to be *useful* to a different
> target audience.

It's very simple to burst a document into individual pages with pdftk:

  $ pdftk document.pdf burst
  $ 

The pages, named pg_0001.pdf, pg_0002.pdf, etc. will be in the
working directory, and it may create a file doc_data.txt
containing some metadata, which you can ignore.

Be warned that it will overwrite files with these names if previously
existing, so do it in the right place. (I use a script that bursts
into a temporary directory and then uses   mv -i   to move them
with more control.)

Cheers,
David.

Reply via email to