Try SolrCell (ExtractingRequestHandler).
See:
http://wiki.apache.org/solr/ExtractingRequestHandler
-- Jack Krupansky
-----Original Message-----
From: Tolga
Sent: Monday, May 07, 2012 3:24 PM
To: solr-user@lucene.apache.org
Subject: PDF indexing
Hi,
From what I have read, I think I have to use Tika (?) to index PDF,
xls, doc, etc files. How do I start? Do I use mvn clean install in the
source directory to get all the jar files to begin? Centos doesn't
provide mvn, how do I build Tika after getting it from
http://maven.apache.org ?
Sorry for the noob questions, I'm just beginning.