Re: Custom update processor and race condition with concurrent requests

Erick Erickson Tue, 03 Mar 2020 09:21:24 -0800

I don’t really see how this test setup can work, I think you’re just getting 
lucky with the 4 threads.

But let’s be specific about what optimistic concurrency is. If you update a 
document that has a _version_ field, and that document already exists with a 
value in the _version_ field higher than in the new doc, that doc will fail 
indexing.

Here’s the problem here, and I’m assuming that your test shell can get the same 
document in multiple threads and that’s where your problem is. If thread1 and 
thread2 have the same document with the same _version_ stamp, then one of the 
two updates will fail. But which one fails is entirely dependent on whether 
thread1 or thread2 sends the doc first, which Solr doesn’t control at all. If 
your custom update processor is changing the version stamp, all bets are off.

Hmmm, _where_ is your custom update processor running? And is this SolrCloud? 
If you run it _after_ the update is distributed (i.e. insure it’ll run on the 
leader) _and_ you can insure that your custom update processor is smart enough 
to know which version of the document is the “right” one, I should think you 
can get this to work. You’ll have to use “real time get”, which fetches the 
most current version of the document even if it hasn’t been committed and 
reject the update if it’s too old. Anything in this path requires that the 
desired update doesn’t depend on the value already having been changed by the 
first update...

Good luck,
Erick

> On Mar 3, 2020, at 09:54, Sachin Divekar <sachin.dive...@merce.co.invalid> 
> wrote:
> 
> Hi,
> 
> We are using Solr where there are many update operations. This may not be
> the right use case for Solr but it's an old application and at this moment
> we are in no mood to replace Solr with something else.
> 
> For one of our use case, we had to use optimistic concurrency for handling
> concurrent updates. The implementation involves fetching a set of
> documents, checking a field's value, then constructing the update
> instruction with an appropriate version for optimistic concurrency, etc.
> 
> For reducing HTTP communication and network calls I have written a custom
> update processor. We now push the request document blindly. The request
> processor modifies the request document based on the documents already
> present in the index e.g.
> 
> if field1 of indexedDoc == 'x'
>  set field2 of reqDoc to 'y'
>  index reqDoc
> 
> For testing, I'm processing 6k unique records (uniqueKey) with roughly 40k
> updates on the same records in a random sequence. Ideally, at the end state
> in Solr should be the same after every round of testing -- that's how the
> algorithm is designed.
> 
> The client is written in Go. It works perfectly when I run less than four
> goroutines (concurrent Solr requests). The end state of Solr is exactly the
> same after every run. But, if I run more I see discrepancies with up to
> four records out of 6k. The process finishes in 7 to 12 seconds. If I use
> softCommit=true while pushing the updates to Solr the number of
> discrepancies are fewer than
> 
> I think this the case of a race condition where when my update processor is
> processing a request document it fetches the indexed or tlogged document,
> another instance of the update processor due to concurrent request is
> already modified the indexed document.
> 
> Does my understanding seem correct? If yes, is there any way this can be
> achieved without using optimistic concurrency in the update processor?
> Basically, can I somehow serialize the update requests in the update
> processor?
> 
> That's a lot of information and also confusing, I guess. Please let me know
> if any specific details or clarification is required.
> 
> Thank you.
> Sachin

Re: Custom update processor and race condition with concurrent requests

Reply via email to