> -----Original Message-----
> From: Andrew Ballard [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 05, 2008 9:11 AM
> To: Jim Lucas
> Cc: Shawn McKenzie; php-general@lists.php.net
> Subject: Re: [PHP] How to fetch .DOC or .DOCX file in php
> 
> On Thu, Dec 4, 2008 at 10:35 PM, Jim Lucas <[EMAIL PROTECTED]> wrote:
> > I was going to say that I haven't yet decided on what the final
> output format is going to be.  Probably either rtf or OpenXML.
> >
> > How about I ask for suggestions on what would be the best format to
> store the final copy.
> >
> > I figured that this tool would mainly be used for .doc to web
> conversion, but I guess it could be used to also convert to other
> document formats too.
> >
> > But, I would like to have the ability to at least store the formating
> inline with the text.  So, either some form of xml.  Be it (x)HTML or
> plain XML
> > or even OpenXML.
> >
> > A question to all then.  How would you like to see the text, with
> formating, stored?
> 
> It's an excellent start. It pulled in some additional control
> characters in some of the documents I tried, and some documents had
> extra stuff at the end of the document. It was still text, but it
> looked like the text from the page header/footer definitions. It would
> be cool to see this polished and released. I just wish there was
> something this basic that worked this well on PDF files! :-)

Andrew,

There's something to be said about inter-language operability. I've become 
enamored with the iText package for manipulating, creating, and extracting PDF 
documents and associated info/bookmarks/tags/etc. There was, for a time, an 
OpenSource PDF editor built with JPedal/iText that looked like it would soon 
compete with Acrobat for PDF fillable forms; but the author has little time to 
play with it.

Anyway, you can setup a Java program (yes, iText is Java) to extract the text 
from the fields--or entire document--and spit it out however you format it 
(text, XML, whatev).

iText - http://www.lowagie.com/iText/ 
PHP/Java bridge - http://php-java-bridge.sourceforge.net/pjb/

HTH,


// Todd

Reply via email to