Re: Custom update processor and race condition with concurrent requests

Walter Underwood Wed, 04 Mar 2020 07:41:28 -0800

This really, really looks like something that should be done with a
database, not with Solr. This assumes a transactional model, which
Solr doesn’t have.


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 3, 2020, at 7:56 PM, Sachin Divekar <ssd...@gmail.com> wrote:
> 
> Thanks for the reply, Chris. Sure, I will start from the beginning and
> explain the problem I'm trying to solve.
> 
> We have objects which we index in Solr. They go through state transitions
> based on various events in their life. But, the events can come out of
> sequence. So, to maintain that consistency we need to implement rules while
> updating the document state in Solr. e.g. if the old state is X and the new
> one is Y then update status field, if the old state is Y and the new one is
> X then do not update status field, etc.
> 
> This is a distributed system and the events of the same object can be
> produced on different nodes. They are updated into Solr on the same node.
> This is SolrCloud setup so these updates can be received by different Solr
> nodes.
> 
> We have already implemented it by using optimistic concurrency and realtime
> get. The client program runs on each node where the events are produced.
> Summary of the processing the client does as follows:
> 
> - the client batches multiple events
> - it uses _version_ to /update the records
> - based on various conflicts it modifies the records for which update failed
> - it /updates the modified records
> 
> That works fine but there is a lot of to and fro between the client and
> Solr and the implementation is complex.
> 
> So, I thought it can be simplified by moving this state transitions and
> processing logic into Solr by writing a custom update processor. The idea
> occurred to me when I was thinking about Solr serializing multiple
> concurrent requests for a document on the leader replica. So, my thought
> process was if I am getting this serialization for free I can implement the
> entire processing inside Solr and a dumb client to push records to Solr
> would be sufficient. But, that's not working. Perhaps the point I missed is
> that even though this processing is moved inside Solr I still have a race
> condition because of time-of-check to time-of-update gap.
> 
> While writing this it just occurred to me that I'm running my custom update
> processor before DistributedProcessor. I'm committing the same XY crime
> again but if I run it after DistributedProcessor can this race condition be
> avoided?
> 
> My secondary purpose in doing this exercise is to understand how Solr and
> distributed databases in general work. And that's the reason I am coming up
> with these hypotheses and try to validate them.
> 
> thanks
> Sachin
> 
> On Wed, Mar 4, 2020 at 12:09 AM Chris Hostetter <hossman_luc...@fucit.org>
> wrote:
> 
>> 
>> It sounds like fundementally the problem you have is that you want solr to
>> "block" all updates to docId=X ... at the update processor chain level ...
>> until an existing update is done.
>> 
>> but solr has no way to know that you want to block at that level.
>> 
>> ie: you asked...
>> 
>> : In the case of multiple concurrent instances of the update processor
>> : are RealTimeGetComponent.getInputDocument()
>> : calls serialzed?
>> 
>> ...but the answer to that question isn't really relevant, because
>> regardless of the answer, there is no garuntee at the java thread
>> scheduling level that the operations your custom code performs on the
>> results will happen in any particular order -- even if
>> RealTimeGetComponent.getInputDocument(42) where to block other concurrent
>> calls to RealTimeGetComponent.getInputDocument(42) that wouldn't ensure
>> that the custom code you have in Thread1 that calls that method will
>> finish it's modifications to the SolrInputDocument *before* the same
>> custom code in Thread2 calls RealTimeGetComponent.getInputDocument(42).
>> 
>> The only way to do something like this would be to add locking in your
>> custom code itself -- based on the uniqueKey of the document -- to say
>> "don't allow another thread to modify this document until i'm done" and
>> keep that lock held until the delegated processAdd call finishes (so you
>> know that the other update processors include RunUpdateProcessor has
>> finished) ... but that would only work (easily) in a single node
>> situation, in a multinode situation you'd have to first check the state of
>> the request and ensure that your processor (and it's locking logic) only
>> happen on the "leader" for that document, and deal with things at a
>> distributed level ... andyou've got a whole host of new headaches.
>> 
>> I would really suggest you take a step back and re-think your objectve,
>> and share with us the "end goal" you're trying to achieve with this custom
>> update processor, because it seems you may haveheaded down an
>> uneccessarily complex route.
>> 
>> what exactly is it you're trying to achieve?
>> 
>> https://people.apache.org/~hossman/#xyproblem
>> XY Problem
>> 
>> Your question appears to be an "XY Problem" ... that is: you are dealing
>> with "X", you are assuming "Y" will help you, and you are asking about "Y"
>> without giving more details about the "X" so that we can understand the
>> full issue.  Perhaps the best solution doesn't involve "Y" at all?
>> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>> 
>> 
>> 
>> 
>> 
>> : Date: Tue, 3 Mar 2020 23:52:38 +0530
>> : From: Sachin Divekar <ssd...@gmail.com>
>> : Reply-To: solr-user@lucene.apache.org
>> : To: solr-user@lucene.apache.org
>> : Subject: Re: Custom update processor and race condition with concurrent
>> :     requests
>> :
>> : Thank, Erick.
>> :
>> : I think I was not clear enough. With the custom update processor, I'm not
>> : using optimistic concurrency at all. The update processor just modifies
>> the
>> : incoming document with updated field values and atomic update
>> instructions.
>> : It then forwards the modified request further in the chain. So, just to
>> be
>> : clear in this test setup optimistic concurrency is not in the picture.
>> :
>> : However, it looks like if I want to run concurrent update requests I will
>> : have to use optimistic concurrency, be it in update processor or in the
>> : client. I was wondering if I can avoid that by serializing requests at
>> the
>> : update processor level.
>> :
>> : > Hmmm, _where_ is your custom update processor running? And is this
>> : SolrCloud?
>> : Currently, it's a single node Solr but eventually, it will be SolrCloud.
>> I
>> : am just testing the idea of doing something like this. Right now I am
>> : running the custom update processor before DistributedProcessor in the
>> : chain.
>> :
>> : > If you run it _after_ the update is distributed (i.e. insure it’ll run
>> on
>> : the leader) _and_ you can insure that your custom update processor is
>> smart
>> : enough to know which version of the document is the “right” one, I should
>> : think you can get this to work.
>> : I think that's the exact problem. My update processor fetches the
>> document,
>> : updates the request object and forwards it in the chain. The two
>> concurrent
>> : instances (S1 and S2) of the update processor can fetch the document, get
>> : value 'x' of field 'f1' at the same time and process them whereas
>> ideally,
>> : S2 should see the value updated by S1.
>> :
>> : S1: fetches id1 -> gets f1: x -> sets f1: y -> Solr append it to tlog
>> : S2: fetches id1 -> gets f1: x ...... ideally it should get 'y'
>> :
>> : Is that possible with UpdateProcessor? I am using realtimeget (
>> : RealTimeGetComponent.getInputDocument()) in the update processor to fetch
>> : the document.
>> :
>> : > You’ll have to use “real time get”, which fetches the most current
>> : version of the document even if it hasn’t been committed and reject the
>> : update if it’s too old. Anything in this path requires that the desired
>> : update doesn’t depend on the value already having been changed by the
>> first
>> : update...
>> :
>> : In the case of multiple concurrent instances of the update processor
>> : are RealTimeGetComponent.getInputDocument()
>> : calls serialzed?
>> :
>> : thank you
>> : Sachin
>> :
>> 
>> -Hoss
>> http://www.lucidworks.com/

Re: Custom update processor and race condition with concurrent requests

Reply via email to