@Shawn Heisey,

Thanks so much for your input! We will try your suggestion and hope it will 
resolve the issue.

On the side note, would you know if this is an existing bug? if yes, has it 
been resolved in later version? i.e. zk allows adding nodes when it exceeds the 
buffer.

We are currently using ZK 3.4.6 to use with SolrCloud 5.1.0.

Thanks again!

Best Regards,

Christopher Tarjono
Accenture Pte Ltd

+65 9347 2484
c.a.tarj...@accenture.com
________________________________
From: Shawn Heisey <apa...@elyograg.org>
Sent: 25 October 2017 20:57:30
To: solr-user@lucene.apache.org
Subject: [External] Re: SolrCloud not able to view cloud page - Loading of 
"/solr/zookeeper?wt=json" failed (HTTP-Status 500)

On 10/24/2017 8:11 AM, Tarjono, C. A. wrote:
> Would like to check if anyone have seen this issue before, we started
> having this a few days ago:
>
> Â
>
> The only error I can see in solr console is below:
>
> 5960847[main-SendThread(172.16.130.132:2281)] WARN
> org.apache.zookeeper.ClientCnxn [ ] – Session 0x65f4e28b7370001 for
> server 172.16.130.132/172.16.130.132:2281, unexpected error, closing
> socket connection and attempting reconnect java.io.IOException: Packet
> len30829010 is out of range!
>

Combining the last part of what I quoted above with the image you shared
later, I am pretty sure I know what is happening.

The overseer queue in zookeeper (at the ZK path of /overseer/queue) has
a lot of entries in it.  Based on the fact that you are seeing a packet
length beyond 30 million bytes, I am betting that the number of entries
in the queue is between 1.5 million and 2 million.  ZK cannot handle
that packet size without a special startup argument.  The value of the
special parameter defaults to a little over one million bytes.

To fix this, you're going to need to wipe out the overseer queue.  ZK
includes a script named ZkCli.  Note that Solr includes a script called
zkcli as well, which does very different things.  You need the one
included with zookeeper.

Wiping out the queue when it is that large is not straightforward.  You
need to start the ZkCli script included with zookeeper with a
-Djute.maxbuffer=31000000 argument and the same zkHost value used by
Solr, and then use a command like "rmr /overseer/queue" in that command
shell to completely remove the /overseer/queue path.  Then you can
restart the ZK servers without the jute.maxbuffer setting.  You may need
to restart Solr.  Running this procedure might also require temporarily
restarting the ZK servers with the same jute.maxbuffer argument, but I
am not sure whether that is required.

The basic underlying problem here is that ZK allows adding new nodes
even when the size of the parent node exceeds the default buffer size.Â
That issue is documented here:

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D1162&d=DwID-g&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=nMQjeyON92LbZ8rY3nXuv_He9mq8qtY9BEKkAyIxX-o&m=gk-2k71keLZeoINvrC1CZC2NLBiRkNVKK2VMu8UXb7Q&s=0ekWo10I-HOI3ppcq8pVpjzaHNaIhhE2XhhZnGUjn5M&e=

I can't be sure why why your cloud is adding so many entries to the
overseer queue.  I have seen this problem happen when restarting a
server in the cloud, particularly when there are a large number of
collections or shard replicas in the cloud.  Restarting multiple servers
or restarting the same server multiple times without waiting for the
overseer queue to empty could also cause the issue.

Thanks,
Shawn


________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
______________________________________________________________________________________

www.accenture.com

Reply via email to