> One concern is that centralized processing (that the Overseer does) can sometimes be more scalable, particularly for lots of concurrent changes to a single collection.
This is indeed a huge blocker. I'm happy that DSU is not the default and it will perform very poorly compared to the current overseer based systems. Is there any perf numbers comparing DSU vs non DSU ? Yes. PRS can eliminate most of the updates to state.json files. Until and unless we move to the PRS model we SHOULD NOT make DSU the default. On Thu, Sep 22, 2022 at 3:30 AM David Smiley <dsmi...@apache.org> wrote: > Thanks for raising this topic Houston! > > With respect to Distributed State Updates (DSU), I think it's a much > cleaner model to understand & debug. It's one thread vs sender & queue & > receiver. It would get tremendously simpler if the code that DSU supports > no longer went through the Overseer (which would still exist for some > things) because a great deal of Solr API interactions serialize & > deserialize requests to JSON messages to go on the Overseer queue. It'd be > amazing to jettison that! > > I think it's a shame that DSU isn't the new default in Solr 9 but whatever; > it can be changed. > > One concern is that centralized processing (that the Overseer does) can > sometimes be more scalable, particularly for lots of concurrent changes to > a single collection. I forget; maybe PRS addresses this performance > concern already.? Any way, I would rather us solve that in different ways > from the Overseer. Again, for now, the Overseer _still_ needs to exist, so > I'm just speaking with respect to operations that are DSU enabled. Perhaps > if one replica in a collection did the DSU processing for its collection (a > hint/preference, not a guarantee), it would lead to a way to do many > changes efficiently? > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > Houston said: > > > We've seen some interesting developments over the last 2 years in the way > > that Solr state and distributed logic is handled. Notably we've seen the > > introduction of PerReplicaStates (PRS) and the Distributed State Updates > > (no overseer). > > > > I think for the health of our code and future maintainability, we should > > really look to decide on what implementations we want to use for State > > management and Distributed operations. Basically do we want to adopt or > > abandon PRS/Distributed State Updates. Note that these are separate > > concepts, so the decision on each will be separate. > > ... > > I don't see the Distributed State Update logic nearly as much, but I > > imagine our code can only get cleaner with one implementation versus two. > > > > This is just my opinion, let me know what y'all think about making > > decisions or going forward with the status quo. > > > -- ----------------------------------------------------- Noble Paul