Re: Update existing documents when using ExtractingRequestHandler?

Jeroen Steggink Mon, 14 Oct 2013 13:22:59 -0700

Thanks for your advice Erick and Jason.

I implemented the document extraction on a separate server, indeedbetter load balancing and error handling.


Cheers,
Jeroen

On 10-10-2013 17:09, Jason Hellman wrote:

As an endorsement of Erick's like, the primary benefit I see to processing 
through your own code is better error-, exception-, and logging-handling which 
is trivial for you to write.

Consider that your code could reside on any server, either receiving through a 
PUSH or PULLing the data from your web server (as suits your needs) and thus 
offloads the effort from your busy web server.

In the long run, this will be a more flexible, adaptable solution that meets future needs 
with minimal effort.  Further, it typically doesn't require a "Solr expert" to 
write so you can find plenty of people to help on this as future needs dictate.


On Oct 10, 2013, at 4:21 AM, Erick Erickson <erickerick...@gmail.com> wrote:

1 - puts the work on the Solr server though.
2 - This is just a SolrJ program, could be run anywhere. See:
http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ It would give
you the most flexibility to offload the Tika processing to N other
machines.
3 - This could work, but you'd then be indexing every document twice
as well as loading the server with the Tika work. And you'd have to
store all the fields.

Personally I like <2>...

FWIW,
Erick


On Wed, Oct 9, 2013 at 11:50 AM, Jeroen Steggink <jer...@stegg-inc.com> wrote:

Hi,

In a content management system I have a document and an attachment. The
document contains the meta data and the attachment the actual data.
I would like to combine data of both in one Solr document.

I have thought of several options:

1. Using ExtractingRequestHandler I would extract the data (extractOnly)
and combine it with the meta data and send it to Solr.
     But this might be inefficient and increase the network traffic.
2. Seperate Tika installation and use that to extract and send the data
to Solr.
     This would stress an already busy web server.
3. First upload the file using ExtractingRequestHandler, then use atomic
updates to add the other fields.

Or is there another way? First add the meta data and later use the
ExtractingRequestHandler to add the file contents?

Cheers,
Jeroen

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Update existing documents when using ExtractingRequestHandler?

Reply via email to