Re: Custom update processor and race condition with concurrent requests

Chris Hostetter Tue, 03 Mar 2020 10:39:23 -0800

It sounds like fundementally the problem you have is that you want solr to 
"block" all updates to docId=X ... at the update processor chain level ... 
until an existing update is done.


but solr has no way to know that you want to block at that level.

ie: you asked...

: In the case of multiple concurrent instances of the update processor
: are RealTimeGetComponent.getInputDocument()
: calls serialzed?

...but the answer to that question isn't really relevant, because 
regardless of the answer, there is no garuntee at the java thread 
scheduling level that the operations your custom code performs on the 
results will happen in any particular order -- even if 
RealTimeGetComponent.getInputDocument(42) where to block other concurrent 
calls to RealTimeGetComponent.getInputDocument(42) that wouldn't ensure 
that the custom code you have in Thread1 that calls that method will 
finish it's modifications to the SolrInputDocument *before* the same 
custom code in Thread2 calls RealTimeGetComponent.getInputDocument(42).

The only way to do something like this would be to add locking in your 
custom code itself -- based on the uniqueKey of the document -- to say 
"don't allow another thread to modify this document until i'm done" and 
keep that lock held until the delegated processAdd call finishes (so you 
know that the other update processors include RunUpdateProcessor has 
finished) ... but that would only work (easily) in a single node 
situation, in a multinode situation you'd have to first check the state of 
the request and ensure that your processor (and it's locking logic) only 
happen on the "leader" for that document, and deal with things at a 
distributed level ... andyou've got a whole host of new headaches.

I would really suggest you take a step back and re-think your objectve, 
and share with us the "end goal" you're trying to achieve with this custom 
update processor, because it seems you may haveheaded down an 
uneccessarily complex route.  

what exactly is it you're trying to achieve?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341





: Date: Tue, 3 Mar 2020 23:52:38 +0530
: From: Sachin Divekar <ssd...@gmail.com>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Custom update processor and race condition with concurrent
:     requests
: 
: Thank, Erick.
: 
: I think I was not clear enough. With the custom update processor, I'm not
: using optimistic concurrency at all. The update processor just modifies the
: incoming document with updated field values and atomic update instructions.
: It then forwards the modified request further in the chain. So, just to be
: clear in this test setup optimistic concurrency is not in the picture.
: 
: However, it looks like if I want to run concurrent update requests I will
: have to use optimistic concurrency, be it in update processor or in the
: client. I was wondering if I can avoid that by serializing requests at the
: update processor level.
: 
: > Hmmm, _where_ is your custom update processor running? And is this
: SolrCloud?
: Currently, it's a single node Solr but eventually, it will be SolrCloud. I
: am just testing the idea of doing something like this. Right now I am
: running the custom update processor before DistributedProcessor in the
: chain.
: 
: > If you run it _after_ the update is distributed (i.e. insure it’ll run on
: the leader) _and_ you can insure that your custom update processor is smart
: enough to know which version of the document is the “right” one, I should
: think you can get this to work.
: I think that's the exact problem. My update processor fetches the document,
: updates the request object and forwards it in the chain. The two concurrent
: instances (S1 and S2) of the update processor can fetch the document, get
: value 'x' of field 'f1' at the same time and process them whereas ideally,
: S2 should see the value updated by S1.
: 
: S1: fetches id1 -> gets f1: x -> sets f1: y -> Solr append it to tlog
: S2: fetches id1 -> gets f1: x ...... ideally it should get 'y'
: 
: Is that possible with UpdateProcessor? I am using realtimeget (
: RealTimeGetComponent.getInputDocument()) in the update processor to fetch
: the document.
: 
: > You’ll have to use “real time get”, which fetches the most current
: version of the document even if it hasn’t been committed and reject the
: update if it’s too old. Anything in this path requires that the desired
: update doesn’t depend on the value already having been changed by the first
: update...
: 
: In the case of multiple concurrent instances of the update processor
: are RealTimeGetComponent.getInputDocument()
: calls serialzed?
: 
: thank you
: Sachin
: 

-Hoss
http://www.lucidworks.com/

Re: Custom update processor and race condition with concurrent requests

Reply via email to