> I am trying to convert a pdf (and for that matter a
> postscript) type file to plain text. Response to my earlier
> mail suggested using File:Slurp, specifically;
>
> #!/usr/bin/perl
>
> use File::Slurp;
> use CGI qw/:standard/;
>
> $pdf_guts = read_file("/path/to/my.pdf");
Yes $pdf_guts is the 'guts' of the pdf file.
That was just a way for you to get the pdf contents
into the script properly.
After that, yes, you'd have to find a way to covert the contents
Of the pdf file ($pdf_guts) to whatever you wanted.
I thought I'd explained that in that message.
There are two ways you could do this ::
1) See if there's a module to assist you.
Did you look at http::search.cpan.org ??
2) Pipe $pdf_guts to an external program that can
translate pdf to whatever it is you want.
I used this method to translate html to pdf via the prog htmldoc.
So instead of $pdf_guts I had $html_guts and executed it in the
script as I would have form the command line.
But you have to find a program that converts it first.
That's the part I'm not familiar with and I realize that is what
you're asking about. So Hopefully that helps you get a better
idea of what you have to do since it's not a complete solution.
>
> ---------------------------------
>
> In the above, however, $pdf_guts is an "unintelligible text
> file" (same contents and size as the pdf). What I will
It's not an "unintelligible text file", it is unintelligible,
but a pdf is not a text file.
It's the binary stuff that makes the pdf file the way it is.
You'll need to have that if you are to translate it somehow.
DMuey
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]