This really, really looks like something that should be done with a database, not with Solr. This assumes a transactional model, which Solr doesn’t have.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 3, 2020, at 7:56 PM, Sachin Divekar <ssd...@gmail.com> wrote: > > Thanks for the reply, Chris. Sure, I will start from the beginning and > explain the problem I'm trying to solve. > > We have objects which we index in Solr. They go through state transitions > based on various events in their life. But, the events can come out of > sequence. So, to maintain that consistency we need to implement rules while > updating the document state in Solr. e.g. if the old state is X and the new > one is Y then update status field, if the old state is Y and the new one is > X then do not update status field, etc. > > This is a distributed system and the events of the same object can be > produced on different nodes. They are updated into Solr on the same node. > This is SolrCloud setup so these updates can be received by different Solr > nodes. > > We have already implemented it by using optimistic concurrency and realtime > get. The client program runs on each node where the events are produced. > Summary of the processing the client does as follows: > > - the client batches multiple events > - it uses _version_ to /update the records > - based on various conflicts it modifies the records for which update failed > - it /updates the modified records > > That works fine but there is a lot of to and fro between the client and > Solr and the implementation is complex. > > So, I thought it can be simplified by moving this state transitions and > processing logic into Solr by writing a custom update processor. The idea > occurred to me when I was thinking about Solr serializing multiple > concurrent requests for a document on the leader replica. So, my thought > process was if I am getting this serialization for free I can implement the > entire processing inside Solr and a dumb client to push records to Solr > would be sufficient. But, that's not working. Perhaps the point I missed is > that even though this processing is moved inside Solr I still have a race > condition because of time-of-check to time-of-update gap. > > While writing this it just occurred to me that I'm running my custom update > processor before DistributedProcessor. I'm committing the same XY crime > again but if I run it after DistributedProcessor can this race condition be > avoided? > > My secondary purpose in doing this exercise is to understand how Solr and > distributed databases in general work. And that's the reason I am coming up > with these hypotheses and try to validate them. > > thanks > Sachin > > On Wed, Mar 4, 2020 at 12:09 AM Chris Hostetter <hossman_luc...@fucit.org> > wrote: > >> >> It sounds like fundementally the problem you have is that you want solr to >> "block" all updates to docId=X ... at the update processor chain level ... >> until an existing update is done. >> >> but solr has no way to know that you want to block at that level. >> >> ie: you asked... >> >> : In the case of multiple concurrent instances of the update processor >> : are RealTimeGetComponent.getInputDocument() >> : calls serialzed? >> >> ...but the answer to that question isn't really relevant, because >> regardless of the answer, there is no garuntee at the java thread >> scheduling level that the operations your custom code performs on the >> results will happen in any particular order -- even if >> RealTimeGetComponent.getInputDocument(42) where to block other concurrent >> calls to RealTimeGetComponent.getInputDocument(42) that wouldn't ensure >> that the custom code you have in Thread1 that calls that method will >> finish it's modifications to the SolrInputDocument *before* the same >> custom code in Thread2 calls RealTimeGetComponent.getInputDocument(42). >> >> The only way to do something like this would be to add locking in your >> custom code itself -- based on the uniqueKey of the document -- to say >> "don't allow another thread to modify this document until i'm done" and >> keep that lock held until the delegated processAdd call finishes (so you >> know that the other update processors include RunUpdateProcessor has >> finished) ... but that would only work (easily) in a single node >> situation, in a multinode situation you'd have to first check the state of >> the request and ensure that your processor (and it's locking logic) only >> happen on the "leader" for that document, and deal with things at a >> distributed level ... andyou've got a whole host of new headaches. >> >> I would really suggest you take a step back and re-think your objectve, >> and share with us the "end goal" you're trying to achieve with this custom >> update processor, because it seems you may haveheaded down an >> uneccessarily complex route. >> >> what exactly is it you're trying to achieve? >> >> https://people.apache.org/~hossman/#xyproblem >> XY Problem >> >> Your question appears to be an "XY Problem" ... that is: you are dealing >> with "X", you are assuming "Y" will help you, and you are asking about "Y" >> without giving more details about the "X" so that we can understand the >> full issue. Perhaps the best solution doesn't involve "Y" at all? >> See Also: http://www.perlmonks.org/index.pl?node_id=542341 >> >> >> >> >> >> : Date: Tue, 3 Mar 2020 23:52:38 +0530 >> : From: Sachin Divekar <ssd...@gmail.com> >> : Reply-To: solr-user@lucene.apache.org >> : To: solr-user@lucene.apache.org >> : Subject: Re: Custom update processor and race condition with concurrent >> : requests >> : >> : Thank, Erick. >> : >> : I think I was not clear enough. With the custom update processor, I'm not >> : using optimistic concurrency at all. The update processor just modifies >> the >> : incoming document with updated field values and atomic update >> instructions. >> : It then forwards the modified request further in the chain. So, just to >> be >> : clear in this test setup optimistic concurrency is not in the picture. >> : >> : However, it looks like if I want to run concurrent update requests I will >> : have to use optimistic concurrency, be it in update processor or in the >> : client. I was wondering if I can avoid that by serializing requests at >> the >> : update processor level. >> : >> : > Hmmm, _where_ is your custom update processor running? And is this >> : SolrCloud? >> : Currently, it's a single node Solr but eventually, it will be SolrCloud. >> I >> : am just testing the idea of doing something like this. Right now I am >> : running the custom update processor before DistributedProcessor in the >> : chain. >> : >> : > If you run it _after_ the update is distributed (i.e. insure it’ll run >> on >> : the leader) _and_ you can insure that your custom update processor is >> smart >> : enough to know which version of the document is the “right” one, I should >> : think you can get this to work. >> : I think that's the exact problem. My update processor fetches the >> document, >> : updates the request object and forwards it in the chain. The two >> concurrent >> : instances (S1 and S2) of the update processor can fetch the document, get >> : value 'x' of field 'f1' at the same time and process them whereas >> ideally, >> : S2 should see the value updated by S1. >> : >> : S1: fetches id1 -> gets f1: x -> sets f1: y -> Solr append it to tlog >> : S2: fetches id1 -> gets f1: x ...... ideally it should get 'y' >> : >> : Is that possible with UpdateProcessor? I am using realtimeget ( >> : RealTimeGetComponent.getInputDocument()) in the update processor to fetch >> : the document. >> : >> : > You’ll have to use “real time get”, which fetches the most current >> : version of the document even if it hasn’t been committed and reject the >> : update if it’s too old. Anything in this path requires that the desired >> : update doesn’t depend on the value already having been changed by the >> first >> : update... >> : >> : In the case of multiple concurrent instances of the update processor >> : are RealTimeGetComponent.getInputDocument() >> : calls serialzed? >> : >> : thank you >> : Sachin >> : >> >> -Hoss >> http://www.lucidworks.com/