As an endorsement of Erick's like, the primary benefit I see to processing through your own code is better error-, exception-, and logging-handling which is trivial for you to write.
Consider that your code could reside on any server, either receiving through a PUSH or PULLing the data from your web server (as suits your needs) and thus offloads the effort from your busy web server. In the long run, this will be a more flexible, adaptable solution that meets future needs with minimal effort. Further, it typically doesn't require a "Solr expert" to write so you can find plenty of people to help on this as future needs dictate. On Oct 10, 2013, at 4:21 AM, Erick Erickson <erickerick...@gmail.com> wrote: > 1 - puts the work on the Solr server though. > 2 - This is just a SolrJ program, could be run anywhere. See: > http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ It would give > you the most flexibility to offload the Tika processing to N other > machines. > 3 - This could work, but you'd then be indexing every document twice > as well as loading the server with the Tika work. And you'd have to > store all the fields. > > Personally I like <2>... > > FWIW, > Erick > > > On Wed, Oct 9, 2013 at 11:50 AM, Jeroen Steggink <jer...@stegg-inc.com> wrote: >> Hi, >> >> In a content management system I have a document and an attachment. The >> document contains the meta data and the attachment the actual data. >> I would like to combine data of both in one Solr document. >> >> I have thought of several options: >> >> 1. Using ExtractingRequestHandler I would extract the data (extractOnly) >> and combine it with the meta data and send it to Solr. >> But this might be inefficient and increase the network traffic. >> 2. Seperate Tika installation and use that to extract and send the data >> to Solr. >> This would stress an already busy web server. >> 3. First upload the file using ExtractingRequestHandler, then use atomic >> updates to add the other fields. >> >> Or is there another way? First add the meta data and later use the >> ExtractingRequestHandler to add the file contents? >> >> Cheers, >> Jeroen >> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity.