Re: Solr Cloud Number of Shard Limitation?

Jamie Johnson Wed, 28 Sep 2011 22:07:44 -0700

So I tested what I wrote, and man was that wrong.  I have updated it
and created a JIRA for this issue.  I also attached a patch which will
patch CloudState to address this issue.  Feedback is appreciated.


https://issues.apache.org/jira/browse/SOLR-2799

On Wed, Sep 28, 2011 at 11:46 PM, Jamie Johnson <jej2...@gmail.com> wrote:
> I'll definitely create a JIRA for this.  Looking at the code in
> CloudState I think we could do the following
>
> as we iterate over shardINames we check to see if the oldCloudState
> had the slice already, if so get the state from there, otherwise do
> what is already happening.  Something like the following:
>
> for (String shardIdZkPath : shardIdNames) {
>                        Slice slice = null;
>                        if(oldCloudState.liveNodesContain(shardIdZkPath)) {
>                                slice = 
> oldCloudState.getCollectionStates().get(collection).get(shardIdZkPath);
>                        }
>
>                        if(slice == null){
>                                Map<String,ZkNodeProps> shardsMap = 
> readShards(zkClient,
> shardIdPaths + "/" + shardIdZkPath);
>                                slice = new Slice(shardIdZkPath, shardsMap);
>                        }
>
>          slices.put(shardIdZkPath, slice);
>        }
> I don't see a need to remove the old states since we only keep the
> states that are already in oldCloudState and read new ones.  Does that
> make sense?
>
> On Wed, Sep 28, 2011 at 11:01 PM, Mark Miller <markrmil...@gmail.com> wrote:
>> No, we don't have any patches for it yet. You might make a JIRA issue for it?
>>
>> I think the big win is a fairly easy one - basically, right now when we 
>> update the cloud state, we look at the children of the 'shards' node, and 
>> then we read the data at each node individually. I imagine this is the part 
>> that breaks down :)
>>
>> We have already likely have most of that info though - really, you should 
>> just have to compare the children of the 'shards' node with the list we 
>> already have from the last time we got the cloud state - remove any that are 
>> no longer in the list, read the data for those not in the list, and get your 
>> new state efficiently.
>>
>> - Mark Miller
>> lucidimagination.com
>> 2011.lucene-eurocon.org | Oct 17-20 | Barcelona
>>
>> On Sep 28, 2011, at 10:35 PM, Jamie Johnson wrote:
>>
>>> Thanks Mark found the TODO in ZkStateReader.java
>>>
>>> // TODO: - possibly: incremental update rather than reread everything
>>>
>>> Was there a patch they provided back to address this?
>>>
>>> On Tue, Sep 27, 2011 at 9:20 PM, Mark Miller <markrmil...@gmail.com> wrote:
>>>>
>>>> On Sep 26, 2011, at 11:42 AM, Jamie Johnson wrote:
>>>>
>>>>> Is there any limitation, be it technical or for sanity reasons, on the
>>>>> number of shards that can be part of a solr cloud implementation?
>>>>
>>>>
>>>> The loggly guys ended up hitting a limit somewhere. Essentially, whenever 
>>>> the cloud state is updated, info is read about each shard to update the 
>>>> state (from zookeeper). There is a TODO that I put in there that says 
>>>> something like, "consider updating this incrementally" - usually the data 
>>>> on most shards has not changed, so no reason to read it all. They 
>>>> implemented that today in their own code, but we have not yet done this in 
>>>> trunk. What that places the upper limit at, I don't know - I imagine it 
>>>> takes quite a few shards before it ends up being too much of a problem - 
>>>> they shard by user I believe, so lot's of shards.
>>>>
>>>>
>>>> - Mark Miller
>>>> lucidimagination.com
>>>> 2011.lucene-eurocon.org | Oct 17-20 | Barcelona
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Solr Cloud Number of Shard Limitation?

Reply via email to