Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Mark Miller
Cool, useful info. As soon as I can duplicate the issue I'll work out what we need to do differently for this case. - Mark On Mar 7, 2013, at 10:19 AM, Brett Hoerner wrote: > As an update to this, I did my SolrCloud dance and made it 2xJVMs per > machine (2 machines still, the same ones) and

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner
As an update to this, I did my SolrCloud dance and made it 2xJVMs per machine (2 machines still, the same ones) and spread the load around. Each Solr instance now has 16 total shards (master for 8, replica for 8). *drum roll* ... I can repeatedly run my delete script and nothing breaks. :) On Th

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Mark Miller
No, not a poor idea at all, definitely a valid setup. - Mark On Mar 7, 2013, at 9:30 AM, Brett Hoerner wrote: > As a side note, do you think that was a poor idea? I figured it's better to > spread the master "load" around? > > > On Thu, Mar 7, 2013 at 11:29 AM, Mark Miller wrote: > >> >> O

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner
As a side note, do you think that was a poor idea? I figured it's better to spread the master "load" around? On Thu, Mar 7, 2013 at 11:29 AM, Mark Miller wrote: > > On Mar 7, 2013, at 9:03 AM, Brett Hoerner wrote: > > > To be clear, neither is really "the replica", I have 32 shards and each >

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Mark Miller
On Mar 7, 2013, at 9:03 AM, Brett Hoerner wrote: > To be clear, neither is really "the replica", I have 32 shards and each > physical server is the leader for 16, and the replica for 16. Ah, interesting. That actually could be part of the issue - some brain cells are firing. I'm away from home

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner
Here is the other server when it's locked: https://gist.github.com/3529b7b6415756ead413 To be clear, neither is really "the replica", I have 32 shards and each physical server is the leader for 16, and the replica for 16. Also, related to the max threads hunch: my working cluster has many, many f

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Mark Miller
Any chance you can grab the stack trace of a replica as well? (also when it's locked up of course). - Mark On Mar 6, 2013, at 3:34 PM, Brett Hoerner wrote: > If there's anything I can try, let me know. Interestingly, I think I have > noticed that if I stop my indexer, do my delete, and restart

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
If there's anything I can try, let me know. Interestingly, I think I have noticed that if I stop my indexer, do my delete, and restart the indexer then I'm fine. Which goes along with the update thread contention theory. On Wed, Mar 6, 2013 at 5:03 PM, Mark Miller wrote: > This is what I see: >

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Mark Miller
This is what I see: We currently limit the number of outstanding update requests at one time to avoid a crazy number of threads being used. It looks like a bunch of update requests are stuck in socket reads and are taking up the available threads. It looks like the deletes are hanging out wait

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Alexandre Rafalovitch
It does not look like a deadlock, though it could be a distributed one. Or it could be a livelock, though that's less likely. Here is what we used to recommend in similar situations for large Java systems (BEA Weblogic): 1) Do thread dump of both systems before anything. As simultaneous as you can

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Mark Miller
Thans Brett, good stuff (though not a good problem). We def need to look into this. - Mark On Mar 6, 2013, at 1:53 PM, Brett Hoerner wrote: > Here is a dump after the delete, indexing has been stopped: > https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e > > An interesting hint that I

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
Here is a dump after the delete, indexing has been stopped: https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e An interesting hint that I forgot to mention: it doesn't always happen on the first delete. I manually ran the delete cron, and the server continued to work. I waited about 5 minut

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
4.1, I'll induce it again and run jstack. On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller wrote: > Which version of Solr? > > Can you use jconsole, visualvm, or jstack to get some stack traces and see > where things are halting? > > - Mark > > On Mar 6, 2013, at 11:45 AM, Brett Hoerner wrote: > >

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Mark Miller
Which version of Solr? Can you use jconsole, visualvm, or jstack to get some stack traces and see where things are halting? - Mark On Mar 6, 2013, at 11:45 AM, Brett Hoerner wrote: > I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards, > replication factor of 2) that I've been