RE: pdf2txt

Dan Muey Mon, 03 Mar 2003 10:27:07 -0800

> I am trying to convert a pdf (and for that matter a 
> postscript) type file to plain text.  Response to my earlier 
> mail suggested using File:Slurp, specifically;
> 
> #!/usr/bin/perl
> 
> use File::Slurp;
> use CGI qw/:standard/;
> 
> $pdf_guts = read_file("/path/to/my.pdf");


Yes $pdf_guts is the 'guts' of the pdf file.
That was just a way for you to get the pdf contents 
into the script properly.

After that, yes, you'd have to find a way to covert the contents 
Of the pdf file ($pdf_guts) to whatever you wanted.

I thought I'd explained that in that message. 


There are two ways you could do this ::

1) See if there's a module to assist you.
        Did you look at http::search.cpan.org ??

2) Pipe $pdf_guts to an external program that can 
translate pdf to whatever it is you want.

I used this method to translate html to pdf via the prog htmldoc.
So instead of $pdf_guts I had $html_guts and executed it in the 
script as I would have form the command line.

But you have to find a program that converts it first.
That's the part I'm not familiar with and I realize that is what 
you're asking about. So Hopefully that helps you get a better 
idea of what you have to do since it's not a complete solution.


> 
> ---------------------------------
> 
> In the above, however, $pdf_guts is an "unintelligible text 
> file" (same contents and size as the pdf).  What I will 

It's not an "unintelligible text file", it is unintelligible, 
but a pdf is not a text file.
It's the binary stuff that makes the pdf file the way it is.
You'll need to have that if you are to translate it somehow.


DMuey

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: pdf2txt

Reply via email to