Re: Custom update processor and race condition with concurrent requests

Erick Erickson Tue, 03 Mar 2020 13:02:23 -0800

I guess I’m missing something. Assuming that S1 and S2 are sent in different 
batches from different threads from your client, there are any number of ways 
they could arrive out of order. Network delays, client delays, etc. So I don’t 
se any way to serialize them reliably.


If they’re sent either in the same batch or by the same thread, then they 
should be sequential and I’d look again at your custom processor to see what’s 
happening there.

I do wonder if it’s possible to insure that a given doc is always updated from 
the same thread? I’m assuming that the root of your issue is that you’re 
pushing updates in parallel and the same doc is being updated from two 
different places.

Best,
Erick

> On Mar 3, 2020, at 11:23, Sachin Divekar <ssd...@gmail.com> wrote:
> 
> Thank, Erick.
> 
> I think I was not clear enough. With the custom update processor, I'm not
> using optimistic concurrency at all. The update processor just modifies the
> incoming document with updated field values and atomic update instructions.
> It then forwards the modified request further in the chain. So, just to be
> clear in this test setup optimistic concurrency is not in the picture.
> 
> However, it looks like if I want to run concurrent update requests I will
> have to use optimistic concurrency, be it in update processor or in the
> client. I was wondering if I can avoid that by serializing requests at the
> update processor level.
> 
>> Hmmm, _where_ is your custom update processor running? And is this
> SolrCloud?
> Currently, it's a single node Solr but eventually, it will be SolrCloud. I
> am just testing the idea of doing something like this. Right now I am
> running the custom update processor before DistributedProcessor in the
> chain.
> 
>> If you run it _after_ the update is distributed (i.e. insure it’ll run on
> the leader) _and_ you can insure that your custom update processor is smart
> enough to know which version of the document is the “right” one, I should
> think you can get this to work.
> I think that's the exact problem. My update processor fetches the document,
> updates the request object and forwards it in the chain. The two concurrent
> instances (S1 and S2) of the update processor can fetch the document, get
> value 'x' of field 'f1' at the same time and process them whereas ideally,
> S2 should see the value updated by S1.
> 
> S1: fetches id1 -> gets f1: x -> sets f1: y -> Solr append it to tlog
> S2: fetches id1 -> gets f1: x ...... ideally it should get 'y'
> 
> Is that possible with UpdateProcessor? I am using realtimeget (
> RealTimeGetComponent.getInputDocument()) in the update processor to fetch
> the document.
> 
>> You’ll have to use “real time get”, which fetches the most current
> version of the document even if it hasn’t been committed and reject the
> update if it’s too old. Anything in this path requires that the desired
> update doesn’t depend on the value already having been changed by the first
> update...
> 
> In the case of multiple concurrent instances of the update processor
> are RealTimeGetComponent.getInputDocument()
> calls serialzed?
> 
> thank you
> Sachin

Re: Custom update processor and race condition with concurrent requests

Reply via email to