1 - puts the work on the Solr server though. 2 - This is just a SolrJ program, could be run anywhere. See: http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ It would give you the most flexibility to offload the Tika processing to N other machines. 3 - This could work, but you'd then be indexing every document twice as well as loading the server with the Tika work. And you'd have to store all the fields.
Personally I like <2>... FWIW, Erick On Wed, Oct 9, 2013 at 11:50 AM, Jeroen Steggink <jer...@stegg-inc.com> wrote: > Hi, > > In a content management system I have a document and an attachment. The > document contains the meta data and the attachment the actual data. > I would like to combine data of both in one Solr document. > > I have thought of several options: > > 1. Using ExtractingRequestHandler I would extract the data (extractOnly) > and combine it with the meta data and send it to Solr. > But this might be inefficient and increase the network traffic. > 2. Seperate Tika installation and use that to extract and send the data > to Solr. > This would stress an already busy web server. > 3. First upload the file using ExtractingRequestHandler, then use atomic > updates to add the other fields. > > Or is there another way? First add the meta data and later use the > ExtractingRequestHandler to add the file contents? > > Cheers, > Jeroen > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity.