I like Aperture (http://aperture.sourceforge.net) and Tika (Lucene subproject) is coming along nicely. Otherwise, you can use the individual libraries like PDFBox, etc. that Aperture and Tika both use.

As for NLP applications, there are many to list, OpenNLP, LingPipe (not free), Carrot, libSVM, etc. The list goes on. Actually, OpenNLP has many pointers to other projects, as does LingPipe.

-Grant

On Nov 23, 2007, at 8:24 PM, Venkatraman S wrote:

Hi,

I would be interested in knowing as to which open source utilities does the community use for text conversions , as in, pdf to text, xls to text, word
to text , ps to text etc etc.

Are their any other 'interesting' utilities/libraries(free and available for commercial use) that can be used for text analytic applications. I already
know of OpenNLP lib.

--
Venkat
Blog @ http://blizzardzblogs.blogspot.com

Reply via email to