Hi,
I just started using Solr....I am using SolrJ client, but uploading the file
directly to Solr. I think we can use Tika in our code first.
Here I send the file directly to Solr which will do the text extraction:
CommonsHttpSolrServer solr = new
CommonsHttpSolrServer("http://localhost:8983/solr");
solr.setRequestWriter(new BinaryRequestWriter());
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest
("/update/extract");
// read a file
File file = new File ("tutorial.pdf");
up.addFile(file);
up.setParam("literal.id", "tutorial.pdf");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
solr.request(up);
So what we need to do is to add Tika.
I have a question about up.setParam - am I able to create my own fields ?
rgds,
canal
________________________________
From: Steve Johnson <[email protected]>
To: [email protected]
Sent: Sun, June 27, 2010 6:50:01 AM
Subject: How to index rich document with XML payload?
Greetings,
I am new to Solr, but have gotten as far as successfully indexing documents
both by sending XML describing the document and by sending the document itself
using "update/extract". What I want to do now is, in effect, do both of these
on each of my documents. I want to be able to have Tika do its magic first,
and then I want to add additional fields to my document entries using XML.
Is there any way to do this? In general, is there any way to apply multiple
update requests to a single document entry?
I do understand that I can put literal values on the "update/extract" URL to do
what I'm asking. This is what I'll have to do if I can't figure out another
way, but it seems messy to me...I'd much rather send an XML payload.
TIA for any help.