1 - puts the work on the Solr server though.
2 - This is just a SolrJ program, could be run anywhere. See:
http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ It would give
you the most flexibility to offload the Tika processing to N other
machines.
3 - This could work, but you'd then be indexing every document twice
as well as loading the server with the Tika work. And you'd have to
store all the fields.

Personally I like <2>...

FWIW,
Erick


On Wed, Oct 9, 2013 at 11:50 AM, Jeroen Steggink <jer...@stegg-inc.com> wrote:
> Hi,
>
> In a content management system I have a document and an attachment. The
> document contains the meta data and the attachment the actual data.
> I would like to combine data of both in one Solr document.
>
> I have thought of several options:
>
> 1. Using ExtractingRequestHandler I would extract the data (extractOnly)
> and combine it with the meta data and send it to Solr.
>      But this might be inefficient and increase the network traffic.
> 2. Seperate Tika installation and use that to extract and send the data
> to Solr.
>      This would stress an already busy web server.
> 3. First upload the file using ExtractingRequestHandler, then use atomic
> updates to add the other fields.
>
> Or is there another way? First add the meta data and later use the
> ExtractingRequestHandler to add the file contents?
>
> Cheers,
> Jeroen
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply via email to