I like Aperture (http://aperture.sourceforge.net) and Tika (Lucene
subproject) is coming along nicely. Otherwise, you can use the
individual libraries like PDFBox, etc. that Aperture and Tika both use.
As for NLP applications, there are many to list, OpenNLP, LingPipe
(not free), Carrot, libSVM, etc. The list goes on. Actually, OpenNLP
has many pointers to other projects, as does LingPipe.
-Grant
On Nov 23, 2007, at 8:24 PM, Venkatraman S wrote:
Hi,
I would be interested in knowing as to which open source utilities
does the
community use for text conversions , as in, pdf to text, xls to
text, word
to text , ps to text etc etc.
Are their any other 'interesting' utilities/libraries(free and
available for
commercial use) that can be used for text analytic applications. I
already
know of OpenNLP lib.
--
Venkat
Blog @ http://blizzardzblogs.blogspot.com