Re: Dealing with bad apples in a SolrCloud cluster

2014-11-26 Thread Ramkumar R. Aiyengar
As Eric mentions, his change to have a state where indexing happens but querying doesn't surely helps in this case. But these are still boolean decisions of send vs don't send. In general, it would be nice to abstract the routing policy so that it is pluggable. You could then do stuff like have a

Re: Dealing with bad apples in a SolrCloud cluster

2014-11-21 Thread Erick Erickson
Moshin: As the author of the transient cores stuff I can authoritatively state that it wasn't designed with SolrCloud in mind, so I'd be a little careful about extending that functionality, even by analogy ;). Not to say that it's totally incompatible, but That said, I may be working on some

Re: Dealing with bad apples in a SolrCloud cluster

2014-11-21 Thread Mohsin Beg Beg
How about dynamic loading/unloading of some shards (cores) similar to the transient cores feature. Should be ok if the unloaded shard has a replica. If no replica then extending shards.tolerant concept to use some timeout/acceptable-latency value sounds interesting. -Mohsin - Original Mes

Re: Dealing with bad apples in a SolrCloud cluster

2014-11-21 Thread ralph tice
bq. We ran into one of failure modes that only AWS can dream up recently, where for an extended amount of time, two nodes in the same placement group couldn't talk to one another, but they could both see Zookeeper, so nothing was marked as down. I had something similar happen with one of my SolrCl

RE: Dealing with bad apples in a SolrCloud cluster

2014-11-21 Thread steve
"Last Gasp" is the last message that Sun Storage controllers would send to each other when things whet sideways... For what it's worth. > Date: Fri, 21 Nov 2014 14:07:12 -0500 > From: michael.della.bi...@appinions.com > To: solr-user@lucene.apache.org > Subject: Re: D

Re: Dealing with bad apples in a SolrCloud cluster

2014-11-21 Thread Michael Della Bitta
Good discussion topic. I'm wondering if Solr doesn't need some sort of "shoot the other node in the head" functionality. We ran into one of failure modes that only AWS can dream up recently, where for an extended amount of time, two nodes in the same placement group couldn't talk to one anot

Re: Dealing with bad apples in a SolrCloud cluster

2014-11-21 Thread Mark Miller
bq. esp. since we've set max threads so high to avoid distributed dead-lock. We should fix this for 5.0 - add a second thread pool that is used for internal requests. We can make it optional if necessary (simpler default container support), but it's a fairly easy improvement I think. - Mark On