If I understand it, you are sending the file to Solr which then uses Tika
library to do the preprocessing/extraction and stores the results in the
defined fields .

If you don't want Solr to do the storing and want to change extracted
fields, just use the Tika library in your client and work with returned
document yourself. This is less of a network load as well, as you don't
send the whole file over the wire.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Jan 11, 2013 at 3:55 PM, uwe72 <uwe.clem...@exxcellent.de> wrote:

> i have a bit strange usecase.
>
> when i index a pdf to solr i use ContentStreamUpdateRequest.
> The lucene document then contains in the "text" field all containing items
> (the parsed items of the physical pdf).
>
> i also need to add these parsed items to another lucene document.
>
> is there a way, to receive/parse these items just in memory, without
> comitting them to lucene?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to