Re: The design of ClusterStateProvider & ClusterState

David Smiley Tue, 09 Apr 2024 20:03:45 -0700

On Fri, Apr 5, 2024 at 8:52 AM Ilan Ginzburg <ilans...@gmail.com> wrote:
>
> I would suggest doing any such change in two independent steps:
> - Moving classes around without any functional change ("pure" refactoring)
> - A change to what a class exposes, its behavior etc.
> Otherwise it is very hard to track what has simply moved and what has
> changed.


Makes sense.  I heard this approach is championed by Kent Beck in his
"Tidy First" book.

> Is the principal motivation here class content refactoring or change of
> behavior?

My initial motivation is *refactoring* to avoid expensive methods that
appear innocent.  In our community we sometimes call them "traps" or
"trappy behavior".  ClusterStateProvider.getClusterState may just be a
getter in ZkClientClusterStateProvider but for
HttpClusterStateProvider -- it's always O(Collections) (with an HTTP
call per collection *every* time).  So there are callers out there
that call it and think nothing of it.  Watch out!  Likewise
org.apache.solr.common.cloud.ClusterState#getCollectionsMap is
O(Collections) -- ouch!  This is *the* way to list collections in
SolrCloud.  So at least with the refactoring, we'd be clear when we're
looping per collection and fetching; wouldn't be hidden behind an
innocent looking method.  Out of scope is reducing/optimizing these
cases, which should probably happen first for some of them.  In this
phase the API changes should be minimized for 9x compatibility (not
strictly but reasonable effort).

My motivation beyond that is to reconsider what the ClusterState API /
behavior *ought* to be, especially to be a cache...

> Independent of which classes do the work, I would like the cluster state to
> be considered for what it is: a *stale cache* of the ZooKeeper data (at
> varying levels of details and staleness).
> Rather than try to keep it up to date with constant watches (it is still
> stale given these are async), consider it can be very stale and deal with
> the staleness when encountered.

> For example, DO NOT watch the whole collection list (for collections not
> represented on the node). If a request for an unknown collection is
> received, check (in ZK) if it exists. Implement of course a level of
> caching of fetched data.
>
> Similarly, for "watched" collections, no need to get all updates. A
> periodic re-fetch (or re-check) from ZooKeeper might be justified, or in
> general fetch from ZooKeeper the collection state when it is absent locally
> or is identified as stale.
>
> I believe approaching the distribution of cluster state to all nodes in
> such a way will greatly limit the load and chatiness with ZooKeeper esp on
> "dynamic" clusters with many state changes.

100% to all that; thanks for expanding upon my economy of words.  I
think the StateCache stuff in CloudSolrClient resembles what we're
after.

~ David

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Re: The design of ClusterStateProvider & ClusterState

Reply via email to