Re: PDF indexing

Tolga Mon, 07 May 2012 12:57:59 -0700

On 05/07/2012 10:35 PM, Jack Krupansky wrote:

Try SolrCell (ExtractingRequestHandler).
See:
http://wiki.apache.org/solr/ExtractingRequestHandler

-- Jack Krupansky
-----Original Message----- From: Tolga Sent: Monday, May 07, 2012 3:24PM To: solr-user@lucene.apache.org Subject: PDF indexing
Hi,
From what I have read, I think I have to use Tika (?) to index PDF,xls, doc, etc files. How do I start? Do I use mvn clean install in thesource directory to get all the jar files to begin? Centos doesn'tprovide mvn, how do I build Tika after getting it fromhttp://maven.apache.org ?
Sorry for the noob questions, I'm just beginning.

Jack,

Thank you very much, I've managed to index a pdf file after a few tries.With this curl syntax, would it be possible to index an xml file as wellor do we need to use java -jar post.jar file.xml? Or let me put it thisway, how is post.jar different than curl?


Regards,

Re: PDF indexing

Reply via email to