Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Erick Erickson
This has been an occasional problem with clusters with lots of replicas in aggregate. There was a major improvement in how large Overseer queues are handled in SOLR-10619 which was released with Solr 6.6. that you might want to look at. If you can't go to 6.6 (or apply the patch yourself to your v

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
It is a known problem: https://cwiki.apache.org/confluence/display/CURATOR/TN4 There are multiple JIRAs around this, like the one I pointed to earlier: https://issues.apache.org/jira/browse/SOLR-10524 There it states: This JIRA is to break out that part of the discussion as it might be an eas

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
righto, thanks very much for your help clarifying this. I am not alone :) I have been looking at this for a few days now. I am seeing people who have experienced this issue going back to solr version 4.x. I am wondering if it is an underlying issue with the way the q is managed. I would think

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
- stop all solr nodes - start zk with the new jute.maxbuffer setting - start a zk client, like zkCli, with the changed jute.maxbuffer setting and check that you can read out the overseer queue - clear the queue - restart zk with the normal settings - slowly start solr On 22.08.2017 15:27, Jeff

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
I set jute.maxbuffer on the so hosts should this be done to solr as well? Mine is happening in a severely memory constrained end as well. Jeff Courtade M: 240.507.6116 On Aug 22, 2017 8:53 AM, "Hendrik Haddorp" wrote: > We have Solr and ZK running in Docker containers. There is no more then >

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
We have Solr and ZK running in Docker containers. There is no more then one Solr/ZK node per host but Solr and ZK node can run on the same host. So Solr and ZK are spread out separately. I have not seen this problem during normal processing just when we recycle nodes or when we have nodes fail

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
Thanks very much. I will followup when we try this. Im curious in the env this is happening to you are the zookeeper servers residing on solr nodes? Are the solr nodes underpowered ram and or cpu? Jeff Courtade M: 240.507.6116 On Aug 22, 2017 8:30 AM, "Hendrik Haddorp" wrote: > I'm always

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
I'm always using a small Java program to delete the nodes directly. I assume you can also delete the whole node but that is nothing I have tried myself. On 22.08.2017 14:27, Jeff Courtade wrote: So ... Using the zkCli.sh i have the jute.maxbuffer setup so I can list it now. Can I rmr /ove

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
So ... Using the zkCli.sh i have the jute.maxbuffer setup so I can list it now. Can I rmr /overseer/queue Or do i need to delete individual entries? Will rmr /overseer/queue/* work? Jeff Courtade M: 240.507.6116 On Aug 22, 2017 8:20 AM, "Hendrik Haddorp" wrote: > When Solr is stopped

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
When Solr is stopped it did not cause a problem so far. I cleared the queue also a few times while Solr was still running. That also didn't result in a real problem but some replicas might not come up again. In those case it helps to either restart the node with the replicas that are in state "

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
How does the cluster react to the overseer q entries disapeering? Jeff Courtade M: 240.507.6116 On Aug 22, 2017 8:01 AM, "Hendrik Haddorp" wrote: > Hi Jeff, > > we ran into that a few times already. We have lots of collections and when > nodes get started too fast the overseer queue grows fas

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
Hi Jeff, we ran into that a few times already. We have lots of collections and when nodes get started too fast the overseer queue grows faster then Solr can process it. At some point Solr tries to redo things like leaders votes and adds new tasks to the list, which then gets longer and longer

700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
Hi, I have an issue with what seems to be a blocked up /overseer/queue There are 700k + entries. Solr cloud 6.x You cannot addreplica or deletereplica the commands time out. Full stop and start of solr and zookeeper does not clear it. Is it safe to use the zookeeper supplied zkCli.sh to simpl