Re: SolrCloud failover behavior

Erick Erickson Tue, 06 Nov 2012 15:48:59 -0800

I was right for once <G>..

Thanks for updating the Wiki!


Erick


On Tue, Nov 6, 2012 at 9:42 AM, Nick Chase <nch...@earthlink.net> wrote:

> Thanks a million, Erick!  You're right about killing both nodes hosting
> the shard.  I'll get the wiki corrected.
>
> ----  Nick
>
>
> On 11/3/2012 10:51 PM, Erick Erickson wrote:
>
>> SolrCloud doesn't work unless every shard has at least one server that is
>> up and running.
>>
>> I _think_ you might be killing both nodes that host one of the shards. The
>> admin
>> page has a link showing you the state of your cluster. So when this
>> happens,
>> does that page show both nodes for that shard being down?
>>
>> And yeah, SolrCloud requires a quorum of ZK nodes up. So with only one ZK
>> node, killing that will bring down the whole cluster. Which is why the
>> usual
>> recommendation is that ZK be run externally and usually an odd number of
>> ZK
>> nodes (three or more).
>>
>> Anyone can create a login and edit the Wiki, so any clarifications are
>> welcome!
>>
>> Best
>> Erick
>>
>>
>> On Sat, Nov 3, 2012 at 12:17 PM, Nick Chase <nch...@earthlink.net> wrote:
>>
>>  I think there's a change in the behavior of SolrCloud vs. what's in the
>>> wiki, but I was hoping someone could confirm for me.  I checked JIRA and
>>> there were a couple of issues requesting partial results if one server
>>> comes down, but that doesn't seem to be the issue here.  I also checked
>>> CHANGES.txt and don't see anything that seems to apply.
>>>
>>> I'm running "Example B: Simple two shard cluster with shard replicas"
>>> from
>>> the wiki at 
>>> https://wiki.apache.org/solr/****SolrCloud<https://wiki.apache.org/solr/**SolrCloud>
>>> <https://wiki.**apache.org/solr/SolrCloud<https://wiki.apache.org/solr/SolrCloud>>and
>>> everything starts out as expected.  However, when I get to the part
>>>
>>> about fail over behavior is when things get a little wonky.
>>>
>>> I added data to the shard running on 7475.  If I kill 7500, a query to
>>> any
>>> of the other servers works fine.  But if I kill 7475, rather than getting
>>> zero results on a search to 8983 or 8900, I get a 503 error:
>>>
>>> <response>
>>>     <lst name="responseHeader">
>>>        <int name="status">503</int>
>>>        <int name="QTime">5</int>
>>>        <lst name="params">
>>>           <str name="q">*:*</str>
>>>        </lst>
>>>     </lst>
>>>     <lst name="error">
>>>        <str name="msg">no servers hosting shard:</str>
>>>        <int name="code">503</int>
>>>     </lst>
>>> </response>
>>>
>>> I don't see any errors in the consoles.
>>>
>>> Also, if I kill 8983, which includes the Zookeeper server, everything
>>> dies, rather than just staying in a steady state; the other servers
>>> continually show:
>>>
>>> Nov 03, 2012 11:39:34 AM org.apache.zookeeper.****ClientCnxn$SendThread
>>>
>>> startConnect
>>> NFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9983
>>> ov 03, 2012 11:39:35 AM org.apache.zookeeper.****ClientCnxn$SendThread
>>> run
>>>
>>> ARNING: Session 0x13ac6cf87890002 for server null, unexpected error,
>>> closing socket connection and attempting reconnect
>>> ava.net.ConnectException: Connection refused: no further information
>>>         at sun.nio.ch.SocketChannelImpl.****checkConnect(Native Method)
>>>         at sun.nio.ch.SocketChannelImpl.****finishConnect(Unknown
>>> Source)
>>>         at org.apache.zookeeper.****ClientCnxn$SendThread.run(**
>>> ClientCnxn.java:1143)
>>>
>>> ov 03, 2012 11:39:35 AM org.apache.zookeeper.****ClientCnxn$SendThread
>>>
>>> startConnect
>>>
>>> over and over again, and a call to any of the servers shows a connection
>>> error to 8983.
>>>
>>> This is the current 4.0.0 release, running on Windows 7.
>>>
>>> If this is the proper behavior and the wiki needs updating, fine; I just
>>> need to know.  Otherwise if anybody has any clues as to what I may be
>>> missing, I'd be grateful. :)
>>>
>>> Thanks...
>>>
>>> ---  Nick
>>>
>>>
>>

Re: SolrCloud failover behavior

Reply via email to