I'll definitely create a JIRA for this. Looking at the code in CloudState I think we could do the following
as we iterate over shardINames we check to see if the oldCloudState had the slice already, if so get the state from there, otherwise do what is already happening. Something like the following: for (String shardIdZkPath : shardIdNames) { Slice slice = null; if(oldCloudState.liveNodesContain(shardIdZkPath)) { slice = oldCloudState.getCollectionStates().get(collection).get(shardIdZkPath); } if(slice == null){ Map<String,ZkNodeProps> shardsMap = readShards(zkClient, shardIdPaths + "/" + shardIdZkPath); slice = new Slice(shardIdZkPath, shardsMap); } slices.put(shardIdZkPath, slice); } I don't see a need to remove the old states since we only keep the states that are already in oldCloudState and read new ones. Does that make sense? On Wed, Sep 28, 2011 at 11:01 PM, Mark Miller <markrmil...@gmail.com> wrote: > No, we don't have any patches for it yet. You might make a JIRA issue for it? > > I think the big win is a fairly easy one - basically, right now when we > update the cloud state, we look at the children of the 'shards' node, and > then we read the data at each node individually. I imagine this is the part > that breaks down :) > > We have already likely have most of that info though - really, you should > just have to compare the children of the 'shards' node with the list we > already have from the last time we got the cloud state - remove any that are > no longer in the list, read the data for those not in the list, and get your > new state efficiently. > > - Mark Miller > lucidimagination.com > 2011.lucene-eurocon.org | Oct 17-20 | Barcelona > > On Sep 28, 2011, at 10:35 PM, Jamie Johnson wrote: > >> Thanks Mark found the TODO in ZkStateReader.java >> >> // TODO: - possibly: incremental update rather than reread everything >> >> Was there a patch they provided back to address this? >> >> On Tue, Sep 27, 2011 at 9:20 PM, Mark Miller <markrmil...@gmail.com> wrote: >>> >>> On Sep 26, 2011, at 11:42 AM, Jamie Johnson wrote: >>> >>>> Is there any limitation, be it technical or for sanity reasons, on the >>>> number of shards that can be part of a solr cloud implementation? >>> >>> >>> The loggly guys ended up hitting a limit somewhere. Essentially, whenever >>> the cloud state is updated, info is read about each shard to update the >>> state (from zookeeper). There is a TODO that I put in there that says >>> something like, "consider updating this incrementally" - usually the data >>> on most shards has not changed, so no reason to read it all. They >>> implemented that today in their own code, but we have not yet done this in >>> trunk. What that places the upper limit at, I don't know - I imagine it >>> takes quite a few shards before it ends up being too much of a problem - >>> they shard by user I believe, so lot's of shards. >>> >>> >>> - Mark Miller >>> lucidimagination.com >>> 2011.lucene-eurocon.org | Oct 17-20 | Barcelona >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> > > > > > > > > > > > > >