RE: [External] Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)

2017-10-25 Thread Tarjono, C. A.
Thanks Eric for your response, please see below link for the image of our 
solrcloud dashboard that shows the error.

https://imgur.com/QCn9BCl


Best Regards,

Christopher Tarjono
Accenture Pte Ltd

+65 9347 2484
c.a.tarj...@accenture.com


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, October 24, 2017 11:32 PM
To: solr-user 
Subject: [External] Re: SolrCloud not able to view cloud page - Loading of 
"/solr/zookeeper?wt=json" failed (HTTP-Status 500)

The mail server aggressively removes attachments and the like, you'll have to 
put it somewhere and provide a link.

Did anything change in that time frame?

Best,
Erick

On Tue, Oct 24, 2017 at 7:11 AM, Tarjono, C. A. 
wrote:

> Hi All,
>
>
>
> Would like to check if anyone have seen this issue before, we started
> having this a few days ago:
>
>
>
> The only error I can see in solr console is below:
>
> 5960847 [main-SendThread(172.16.130.132:2281)] WARN org.apache.zookeeper.
> ClientCnxn [ ] – Session 0x65f4e28b7370001 for server
> 172.16.130.132/172.16.130.132:2281, unexpected error, closing socket
> connection and attempting reconnect java.io.IOException: Packet
> len30829010 is out of range! at org.apache.zookeeper.
> ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java
> :79) at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> I
> O.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(Clie
> ntCnxn.java:1081) 5960947 [zkCallback-2-thread-120] INFO
> org.apache.solr.common.cloud.ConnectionManager [ ] – Watcher
> org.apache.solr.common.cloud.ConnectionManager@4cf4d11e
> name:ZooKeeperConnection
> Watcher:172.16.129.132:2281,172.16.129.133:2281,
> 172.16.129.134:2281,172.16.130.132:2281,172.16.130.133:2281,172.16.130.
> 134:2281 got event WatchedEvent state:Disconnected type:None path:null
> path:null type:None 5960947 [zkCallback-2-thread-120] INFO
> org.apache.solr.common.cloud.ConnectionManager [ ] – zkClient has
> disconnected
>
>
>
> We cant find any corresponding error in zookeeper log.
>
> Appreciate any input, thanks!
>
>
>
> Best Regards,
>
>
>
> Christopher Tarjono
>
> *Accenture Pte Ltd*
>
>
>
> +65 9347 2484 <+65%209347%202484>
>
> c.a.tarj...@accenture.com
>
>
>
> --
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you
> have received it in error, please notify the sender immediately and
> delete the original. Any other use of the e-mail by you is prohibited.
> Where allowed by local law, electronic communications with Accenture
> and its affiliates, including e-mail and instant messaging (including
> content), may be scanned by our systems for the purposes of
> information security and assessment of internal compliance with Accenture 
> policy.
> 
> __
>
> www.accenture.com
>



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


Facet fields limits

2017-10-25 Thread Vincenzo D'Amore
Hi all,

Do you know if there is a configuration parameter able to limit the number
of concurrent facets a user can submit in one request?

Looking at documentation it seems is not possibile.

Best regards,
Vincenzo


RE: Date range queries no longer work 6.6 to 7.1

2017-10-25 Thread Markus Jelsma
Thanks!
 
-Original message-
> From:Shawn Heisey 
> Sent: Tuesday 24th October 2017 19:04
> To: solr-user@lucene.apache.org
> Subject: Re: Date range queries no longer work 6.6 to 7.1
> 
> On 10/24/2017 9:38 AM, Markus Jelsma wrote:
> > We have switched back to 6.6 for now so we are fine for now. Although i 
> > didn't try range queries other than date, i assume other Point fields can 
> > also have this problem?
> >
> > That would mean completely switch back to Trie if you don't can/want to 
> > fully reindex all data.
> >
> > Suggestions? A forceMerge at least fixes nothing. I'll take a look at index 
> > upgrade tool.
> 
> As I said in the earlier reply, data written by a Trie field class
> cannot be read by a Point field class.  That's true for any of them --
> Int, Float, Double, Long, etc.
> 
> Lucene's IndexUpgrader just performs a forceMerge on the index.  There's
> nothing special about the job it does.  It is not capable of converting
> one field class to another.  It doesn't know anything about Solr's field
> classes.
> 
> Solr 7.x can still use Trie fields, but they will be gone by the 8.0
> release.  Lucene 7.0 no longer contains the legacy numeric classes that
> Trie fields are built with.  Solr has kept those around for one more
> major version.
> 
> Thanks,
> Shawn
> 
> 


Re: BlendedTermQuery for Solr?

2017-10-25 Thread Rick Leir
James
It looks as if Markus could help:
http://lucene.472066.n3.nabble.com/BlendedTermQuery-causing-negative-IDF-td4271289.html

Also, ES has a query. You could look at the source there.
"BlendedTermQuery forms the guts behind Elasticsearch’s cross_field search. -- 
Doug Turnbull

Cheers -- Rick

On October 25, 2017 2:11:39 AM EDT, James  wrote:
> 
>
>On my Solr 6.6 server I'd like to use BlendedTermQuery.
>
>https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/BlendedTe
>rmQuery.html
>
> 
>
>I know it is a Lucene class. Is there a Solr API available to access
>it? If
>not, maybe some workaround?
>
> 
>
>Thanks!

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)

2017-10-25 Thread Shawn Heisey
On 10/24/2017 8:11 AM, Tarjono, C. A. wrote:
> Would like to check if anyone have seen this issue before, we started
> having this a few days ago:
>
>  
>
> The only error I can see in solr console is below:
>
> 5960847[main-SendThread(172.16.130.132:2281)] WARN
> org.apache.zookeeper.ClientCnxn [ ] – Session 0x65f4e28b7370001 for
> server 172.16.130.132/172.16.130.132:2281, unexpected error, closing
> socket connection and attempting reconnect java.io.IOException: Packet
> len30829010 is out of range!
>

Combining the last part of what I quoted above with the image you shared
later, I am pretty sure I know what is happening.

The overseer queue in zookeeper (at the ZK path of /overseer/queue) has
a lot of entries in it.  Based on the fact that you are seeing a packet
length beyond 30 million bytes, I am betting that the number of entries
in the queue is between 1.5 million and 2 million.  ZK cannot handle
that packet size without a special startup argument.  The value of the
special parameter defaults to a little over one million bytes.

To fix this, you're going to need to wipe out the overseer queue.  ZK
includes a script named ZkCli.  Note that Solr includes a script called
zkcli as well, which does very different things.  You need the one
included with zookeeper.

Wiping out the queue when it is that large is not straightforward.  You
need to start the ZkCli script included with zookeeper with a
-Djute.maxbuffer=3100 argument and the same zkHost value used by
Solr, and then use a command like "rmr /overseer/queue" in that command
shell to completely remove the /overseer/queue path.  Then you can
restart the ZK servers without the jute.maxbuffer setting.  You may need
to restart Solr.  Running this procedure might also require temporarily
restarting the ZK servers with the same jute.maxbuffer argument, but I
am not sure whether that is required.

The basic underlying problem here is that ZK allows adding new nodes
even when the size of the parent node exceeds the default buffer size. 
That issue is documented here:

https://issues.apache.org/jira/browse/ZOOKEEPER-1162

I can't be sure why why your cloud is adding so many entries to the
overseer queue.  I have seen this problem happen when restarting a
server in the cloud, particularly when there are a large number of
collections or shard replicas in the cloud.  Restarting multiple servers
or restarting the same server multiple times without waiting for the
overseer queue to empty could also cause the issue.

Thanks,
Shawn



Re: Some problems in SOLR-6.5.1

2017-10-25 Thread Rick Leir
Klin,
You need to use the new version's solrconfig.xml, with modifications as 
necessary. Start by looking at the current solrconfig, what was modified there?

Did you re-index? If you cannot reindex then you should upgrade to 5.n then to 
6.m.
Cheers -- Rick

On October 24, 2017 11:21:48 PM EDT, SOLR4189  wrote:
>Before two days we upgraded our SOLR servers from 4.10.1 version to
>6.5.1. We
>explored logs and saw too many errors like:
>
>1)
>org.apache.solr.common.SolrException;
>null:java.lang.NullPointerException
>  at
>org.apache.solr.search.grouping.distributed.responseprocessor.StoredFieldsShardResponseProcessor.process(StoredFieldsShardResponseProcessor.java:41)
>  at
>org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:771)
> . . .
>
>We don't know from which queries it throws.
>
>2) Second error or something strange that we saw in logs - sometimes
>SOLR
>service restarts automatically without any error
>
>Can somebody help to us? Does someone have problems like ours?
>
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Really slow facet performance in 6.6

2017-10-25 Thread Yonik Seeley
On Mon, Oct 23, 2017 at 3:06 PM, John Davis  wrote:
> Hello,
>
> We are seeing really slow facet performance with new solr release. This is
> on an index of 2M documents. A few things we've tried:

What happens when you run this facet request again?
The first time a UIF faceting method runs for a field on a changed
index, the data structure needs to be rebuilt (i.e. it's not good for
NRT).  Maybe that build time is being included.  Otherwise I've never
seen faceting so slow and there is something else going on here.

-Yonik


Re: Some problems in SOLR-6.5.1

2017-10-25 Thread SOLR4189
Ofcource I did it. I did all changes in solrconfig.xml and used IndexUpgrader
from 4 to 5 and then from 5 to 6.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Facet fields limits

2017-10-25 Thread Erick Erickson
None that I know of.

On Wed, Oct 25, 2017 at 1:59 AM, Vincenzo D'Amore  wrote:
> Hi all,
>
> Do you know if there is a configuration parameter able to limit the number
> of concurrent facets a user can submit in one request?
>
> Looking at documentation it seems is not possibile.
>
> Best regards,
> Vincenzo


TimeoutException, IOException, Read timed out

2017-10-25 Thread Fengtan
Hi,

We run a SolrCloud 6.4.2 cluster with ZooKeeper 3.4.6 on 3 VM's.
Each VM runs RHEL 7 with 16 GB RAM and 8 CPU and OpenJDK 1.8.0_131 ; each
VM has one Solr and one ZK instance.
The cluster hosts 1,000 collections ; each collection has 1 shard and
between 500 and 50,000 documents.
Documents are indexed incrementally every day ; the Solr client mostly does
searching.
Solr runs with -Xms7g -Xmx7g.

Everything has been working fine for about one month but a few days ago we
started to see Solr timeouts: https://pastebin.com/raw/E2prSrQm

Also we have always seen these:
  PERFORMANCE WARNING: Overlapping onDeckSearchers=2


We are not sure what is causing the timeouts, although we have identified a
few things that could be improved:

1) Ignore explicit commits using IgnoreCommitOptimizeUpdateProcessorFactory
-- we are aware that explicit commits are expensive

2) Drop the 1,000 collections and use a single one instead (all our
collections use the same schema/solrconfig.xml) since stability problems
are expected when the number of collections reaches the low hundreds
. The
downside is that the new collection would contain 1,000,000 documents which
may bring new challenges.

3) Tune the GC and possibly switch from CMS to G1 as it seems to bring a
better performance according to this
,
this

and this
.
The downside is that Lucene explicitely discourages the usage of G1

so we are not sure what to expect. We use the default GC settings:
  -XX:NewRatio=3
  -XX:SurvivorRatio=4
  -XX:TargetSurvivorRatio=90
  -XX:MaxTenuringThreshold=8
  -XX:+UseConcMarkSweepGC
  -XX:+UseParNewGC
  -XX:ConcGCThreads=4
  -XX:ParallelGCThreads=4
  -XX:+CMSScavengeBeforeRemark
  -XX:PretenureSizeThreshold=64m
  -XX:+UseCMSInitiatingOccupancyOnly
  -XX:CMSInitiatingOccupancyFraction=50
  -XX:CMSMaxAbortablePrecleanTime=6000
  -XX:+CMSParallelRemarkEnabled
  -XX:+ParallelRefProcEnabled

4) Tune the caches, possibly by increasing autowarmCount on filterCache --
our current config is:
  
  
  

5) Tweak the timeout settings, although this would not fix the underlying
issue


Does any of these options seem relevant ? Is there anything else that might
address the timeouts ?

Thanks


Re: TimeoutException, IOException, Read timed out

2017-10-25 Thread Erick Erickson
<1> It's not the explicit commits are expensive, it's that they happen
too fast. An explicit commit and an internal autocommit have exactly
the same cost. Your "overlapping ondeck searchers"  is definitely an
indication that your commits are happening from somwhere too quickly
and are piling up.

<2> Likely a good thing, each collection increases overhead. And
1,000,000 documents is quite small in Solr's terms unless the
individual documents are enormous. I'd do this for a number of
reasons.

<3> Certainly an option, but I'd put that last. Fix the commit problem first ;)

<4> If you do this, make the autowarm count quite small. That said,
this will be very little use if you have frequent commits. Let's say
you commit every second. The autowarming will warm caches, which will
then be thrown out a second later. And will increase the time it takes
to open a new searcher.

<5> Yeah, this would probably just be a band-aid.

If I were prioritizing these, I'd do
<1> first. If you control the client, just don't call commit. If you
do not control the client, then what you've outlined is fine. Tip: set
your soft commit settings to be as long as you can stand. If you must
have very short intervals, consider disabling your caches completely.
Here's a long article on commits
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

<2> Actually, this and <1> are pretty close in priority.

Then re-evaluate. Fixing the commit issue may buy you quite a bit of
time. Having 1,000 collections is pushing the boundaries presently.
Each collection will establish watchers on the bits it cares about in
ZooKeeper, and reducing the watchers by a factor approaching 1,000 is
A Good Thing.

Frankly, between these two things I'd pretty much expect your problems
to disappear. wouldn't be the first time I've been totally wrong, but
it's where I'd start ;)

Best,
Erick

On Wed, Oct 25, 2017 at 8:54 AM, Fengtan  wrote:
> Hi,
>
> We run a SolrCloud 6.4.2 cluster with ZooKeeper 3.4.6 on 3 VM's.
> Each VM runs RHEL 7 with 16 GB RAM and 8 CPU and OpenJDK 1.8.0_131 ; each
> VM has one Solr and one ZK instance.
> The cluster hosts 1,000 collections ; each collection has 1 shard and
> between 500 and 50,000 documents.
> Documents are indexed incrementally every day ; the Solr client mostly does
> searching.
> Solr runs with -Xms7g -Xmx7g.
>
> Everything has been working fine for about one month but a few days ago we
> started to see Solr timeouts: https://pastebin.com/raw/E2prSrQm
>
> Also we have always seen these:
>   PERFORMANCE WARNING: Overlapping onDeckSearchers=2
>
>
> We are not sure what is causing the timeouts, although we have identified a
> few things that could be improved:
>
> 1) Ignore explicit commits using IgnoreCommitOptimizeUpdateProcessorFactory
> -- we are aware that explicit commits are expensive
>
> 2) Drop the 1,000 collections and use a single one instead (all our
> collections use the same schema/solrconfig.xml) since stability problems
> are expected when the number of collections reaches the low hundreds
> . The
> downside is that the new collection would contain 1,000,000 documents which
> may bring new challenges.
>
> 3) Tune the GC and possibly switch from CMS to G1 as it seems to bring a
> better performance according to this
> ,
> this
> 
> and this
> .
> The downside is that Lucene explicitely discourages the usage of G1
> 
> so we are not sure what to expect. We use the default GC settings:
>   -XX:NewRatio=3
>   -XX:SurvivorRatio=4
>   -XX:TargetSurvivorRatio=90
>   -XX:MaxTenuringThreshold=8
>   -XX:+UseConcMarkSweepGC
>   -XX:+UseParNewGC
>   -XX:ConcGCThreads=4
>   -XX:ParallelGCThreads=4
>   -XX:+CMSScavengeBeforeRemark
>   -XX:PretenureSizeThreshold=64m
>   -XX:+UseCMSInitiatingOccupancyOnly
>   -XX:CMSInitiatingOccupancyFraction=50
>   -XX:CMSMaxAbortablePrecleanTime=6000
>   -XX:+CMSParallelRemarkEnabled
>   -XX:+ParallelRefProcEnabled
>
> 4) Tune the caches, possibly by increasing autowarmCount on filterCache --
> our current config is:
>autowarmCount="0"/>
>autowarmCount="32"/>
>autowarmCount="0"/>
>
> 5) Tweak the timeout settings, although this would not fix the underlying
> issue
>
>
> Does any of these options seem relevant ? Is there anything else that might
> address the timeouts ?
>
> Thanks


Using Ltr and payload together

2017-10-25 Thread isspek
Hi,

I have a question on Solr version 7. Is it possible to use ltr and payload
plugins together for enhancement of search results? I am newbie on this
topic and I would like to know how I can use if it is possible.

Thanks.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: [External] Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)

2017-10-25 Thread Tarjono, C. A.
@Shawn Heisey,

Thanks so much for your input! We will try your suggestion and hope it will 
resolve the issue.

On the side note, would you know if this is an existing bug? if yes, has it 
been resolved in later version? i.e. zk allows adding nodes when it exceeds the 
buffer.

We are currently using ZK 3.4.6 to use with SolrCloud 5.1.0.

Thanks again!

Best Regards,

Christopher Tarjono
Accenture Pte Ltd

+65 9347 2484
c.a.tarj...@accenture.com

From: Shawn Heisey 
Sent: 25 October 2017 20:57:30
To: solr-user@lucene.apache.org
Subject: [External] Re: SolrCloud not able to view cloud page - Loading of 
"/solr/zookeeper?wt=json" failed (HTTP-Status 500)

On 10/24/2017 8:11 AM, Tarjono, C. A. wrote:
> Would like to check if anyone have seen this issue before, we started
> having this a few days ago:
>
> Â
>
> The only error I can see in solr console is below:
>
> 5960847[main-SendThread(172.16.130.132:2281)] WARN
> org.apache.zookeeper.ClientCnxn [ ] – Session 0x65f4e28b7370001 for
> server 172.16.130.132/172.16.130.132:2281, unexpected error, closing
> socket connection and attempting reconnect java.io.IOException: Packet
> len30829010 is out of range!
>

Combining the last part of what I quoted above with the image you shared
later, I am pretty sure I know what is happening.

The overseer queue in zookeeper (at the ZK path of /overseer/queue) has
a lot of entries in it.  Based on the fact that you are seeing a packet
length beyond 30 million bytes, I am betting that the number of entries
in the queue is between 1.5 million and 2 million.  ZK cannot handle
that packet size without a special startup argument.  The value of the
special parameter defaults to a little over one million bytes.

To fix this, you're going to need to wipe out the overseer queue.  ZK
includes a script named ZkCli.  Note that Solr includes a script called
zkcli as well, which does very different things.  You need the one
included with zookeeper.

Wiping out the queue when it is that large is not straightforward.  You
need to start the ZkCli script included with zookeeper with a
-Djute.maxbuffer=3100 argument and the same zkHost value used by
Solr, and then use a command like "rmr /overseer/queue" in that command
shell to completely remove the /overseer/queue path.  Then you can
restart the ZK servers without the jute.maxbuffer setting.  You may need
to restart Solr.  Running this procedure might also require temporarily
restarting the ZK servers with the same jute.maxbuffer argument, but I
am not sure whether that is required.

The basic underlying problem here is that ZK allows adding new nodes
even when the size of the parent node exceeds the default buffer size.Â
That issue is documented here:

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D1162&d=DwID-g&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=nMQjeyON92LbZ8rY3nXuv_He9mq8qtY9BEKkAyIxX-o&m=gk-2k71keLZeoINvrC1CZC2NLBiRkNVKK2VMu8UXb7Q&s=0ekWo10I-HOI3ppcq8pVpjzaHNaIhhE2XhhZnGUjn5M&e=

I can't be sure why why your cloud is adding so many entries to the
overseer queue.  I have seen this problem happen when restarting a
server in the cloud, particularly when there are a large number of
collections or shard replicas in the cloud.  Restarting multiple servers
or restarting the same server multiple times without waiting for the
overseer queue to empty could also cause the issue.

Thanks,
Shawn




This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


JSON facet not working with dates

2017-10-25 Thread George Petasis

Hi all,

I am using solr 6.5.0, and I want to do pivot faceting including a date 
field. My simple facet.json is:


{
  "dates": {
    "type": "range",
    "field": "observationStart.TimeOP",
    "start": "3000-01-01T00:00:00Z",
    "end":   "3000-01-02T00:00:00Z",
    "gap":   "%2B15MINUTE",
    "facet": {
  "x": "sum(trafficCnt)"
    }
  }
}

What I get back is an error though:

error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Unable to range facet on 
field:observationStart.TimeOP{type=date_range,properties=indexed,stored,omitTermFreqAndPositions,useDocValuesAsStored}"


On the other hand, if I use the old interface, it seems to work:

"facet":"on",
"facet.range.start":"3000-01-01T00:00:00Z",
"facet.range.end":"3000-01-01T00:00:00Z+1DAY"
"facet.range.gap":"+15MINUTE"

I get:

"facet_ranges":{
  "observationStart.TimeOP":{
    "counts":[
  "3000-01-01T00:00:00Z",258,
  "3000-01-01T00:15:00Z",261,
  "3000-01-01T00:30:00Z",258,
  "3000-01-01T00:45:00Z",254,
  ...


My date fields are of type solr.DateRangeField.

Searching for the error I get, I found this source file:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/facet/FacetRange.java

Where in line 180 it has "if (ft instanceof TrieField || ft.isPointField()".

Is it related to my problem? Is the new json facet interface not working 
with date ranges?


Regards,

George



Re: [External] Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)

2017-10-25 Thread Erick Erickson
Later versions of Solr have been changed two ways:
1> changes have been made to not put so many items in the overseer
queue in the first place
2> changes have been made to process the messages that do get there
much more quickly.

Meanwhile, my guess is you have a lot of replicas out there. I've seen
this happen when there are lots of collections and/or replicas and
people try to start many of them up at once. One strategy to get by is
to start your Solr nodes a few at a time, wait for the Overseer queue
to get processed then start a few more. Unsatisfactory, but if the
precursor to this was starting all your Solr instances and you have a
lot of replicas, it may help until you can upgrade.

Best,
Erick

On Wed, Oct 25, 2017 at 5:44 PM, Tarjono, C. A.
 wrote:
> @Shawn Heisey,
>
> Thanks so much for your input! We will try your suggestion and hope it will 
> resolve the issue.
>
> On the side note, would you know if this is an existing bug? if yes, has it 
> been resolved in later version? i.e. zk allows adding nodes when it exceeds 
> the buffer.
>
> We are currently using ZK 3.4.6 to use with SolrCloud 5.1.0.
>
> Thanks again!
>
> Best Regards,
>
> Christopher Tarjono
> Accenture Pte Ltd
>
> +65 9347 2484
> c.a.tarj...@accenture.com
> 
> From: Shawn Heisey 
> Sent: 25 October 2017 20:57:30
> To: solr-user@lucene.apache.org
> Subject: [External] Re: SolrCloud not able to view cloud page - Loading of 
> "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
>
> On 10/24/2017 8:11 AM, Tarjono, C. A. wrote:
>> Would like to check if anyone have seen this issue before, we started
>> having this a few days ago:
>>
>> Â
>>
>> The only error I can see in solr console is below:
>>
>> 5960847[main-SendThread(172.16.130.132:2281)] WARN
>> org.apache.zookeeper.ClientCnxn [ ] – Session 0x65f4e28b7370001 for
>> server 172.16.130.132/172.16.130.132:2281, unexpected error, closing
>> socket connection and attempting reconnect java.io.IOException: Packet
>> len30829010 is out of range!
>>
>
> Combining the last part of what I quoted above with the image you shared
> later, I am pretty sure I know what is happening.
>
> The overseer queue in zookeeper (at the ZK path of /overseer/queue) has
> a lot of entries in it.  Based on the fact that you are seeing a packet
> length beyond 30 million bytes, I am betting that the number of entries
> in the queue is between 1.5 million and 2 million.  ZK cannot handle
> that packet size without a special startup argument.  The value of the
> special parameter defaults to a little over one million bytes.
>
> To fix this, you're going to need to wipe out the overseer queue.  ZK
> includes a script named ZkCli.  Note that Solr includes a script called
> zkcli as well, which does very different things.  You need the one
> included with zookeeper.
>
> Wiping out the queue when it is that large is not straightforward.  You
> need to start the ZkCli script included with zookeeper with a
> -Djute.maxbuffer=3100 argument and the same zkHost value used by
> Solr, and then use a command like "rmr /overseer/queue" in that command
> shell to completely remove the /overseer/queue path.  Then you can
> restart the ZK servers without the jute.maxbuffer setting.  You may need
> to restart Solr.  Running this procedure might also require temporarily
> restarting the ZK servers with the same jute.maxbuffer argument, but I
> am not sure whether that is required.
>
> The basic underlying problem here is that ZK allows adding new nodes
> even when the size of the parent node exceeds the default buffer size.Â
> That issue is documented here:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D1162&d=DwID-g&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=nMQjeyON92LbZ8rY3nXuv_He9mq8qtY9BEKkAyIxX-o&m=gk-2k71keLZeoINvrC1CZC2NLBiRkNVKK2VMu8UXb7Q&s=0ekWo10I-HOI3ppcq8pVpjzaHNaIhhE2XhhZnGUjn5M&e=
>
> I can't be sure why why your cloud is adding so many entries to the
> overseer queue.  I have seen this problem happen when restarting a
> server in the cloud, particularly when there are a large number of
> collections or shard replicas in the cloud.  Restarting multiple servers
> or restarting the same server multiple times without waiting for the
> overseer queue to empty could also cause the issue.
>
> Thanks,
> Shawn
>
>
> 
>
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise confidential information. If you have received it 
> in error, please notify the sender immediately and delete the original. Any 
> other use of the e-mail by you is prohibited. Where allowed by local law, 
> electronic communications with Accenture and its affiliates, including e-mail 
> and instant messaging (including content), may be scanned by our systems for 
> the purposes of information security and assessment of internal compliance 
> with Accenture policy

Re: [External] Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)

2017-10-25 Thread Shawn Heisey
On 10/25/2017 6:44 PM, Tarjono, C. A. wrote:
> Thanks so much for your input! We will try your suggestion and hope it will 
> resolve the issue.
>
> On the side note, would you know if this is an existing bug? if yes, has it 
> been resolved in later version? i.e. zk allows adding nodes when it exceeds 
> the buffer.
>
> We are currently using ZK 3.4.6 to use with SolrCloud 5.1.0.

The ZOOKEEPER-1162 issue has not been fixed.  It is a very old bug --
opened six years ago.  They probably aren't going to fix it.

If you find that restarting a single Solr instance ends up filling the
queue with too many entries, you may need to increase the jute.maxbuffer
setting on both Solr and ZK so that a large queue won't cause everything
to break.

There has been some effort in recent 6.x versions to improve this
situation, as Erick mentioned in his reply.  There's nothing that can be
done for problems like this in 5.x versions.

Thanks,
Shawn