Re: SolrCloud replicas consistently out of sync

2016-05-19 Thread Jeff Wartes
>To: "solr-user@lucene.apache.org" >Cc: Aleksey Mezhva , Hans Zhou >Subject: Re: SolrCloud replicas consistently out of sync > >Gotcha - well that's nice. Still, we seem to be permanently out of sync. > >I see this thread with someone having a similar issue: >

Re: SolrCloud replicas consistently out of sync

2016-05-19 Thread Aleksey Mezhva
:25 PM To: "solr-user@lucene.apache.org" Cc: Aleksey Mezhva , Hans Zhou Subject: Re: SolrCloud replicas consistently out of sync Gotcha - well that's nice. Still, we seem to be permanently out of sync. I see this thread with someone having a similar issue: https://mail-arch

Re: SolrCloud replicas consistently out of sync

2016-05-17 Thread Stephen Weiss
u > mailto:hans.z...@wgsn.com>> > Subject: Re: SolrCloud replicas consistently out of sync > > I should add - looking back through the logs, we're seeing frequent errors > like this now: > > 78819692 WARN (qtp110456297-1145) [ ] o.a.s.h.a.LukeRequestHandler Error > g

RE: SolrCloud replicas consistently out of sync

2016-05-17 Thread Markus Jelsma
Hi, thats a known issue and unrelated: https://issues.apache.org/jira/browse/SOLR-9120 M. -Original message- > From:Stephen Weiss > Sent: Tuesday 17th May 2016 23:10 > To: solr-user@lucene.apache.org; Aleksey Mezhva ; > Hans Zhou > Subject: Re: SolrCloud replicas c

Re: SolrCloud replicas consistently out of sync

2016-05-17 Thread Stephen Weiss
I should add - looking back through the logs, we're seeing frequent errors like this now: 78819692 WARN (qtp110456297-1145) [ ] o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_4o] java.nio.file.NoSuchFileException: /var/solr/data/instock_shard5_replica1/data/index.201605

Re: SolrCloud replicas consistently out of sync

2016-05-17 Thread Stephen Weiss
OK, so we did as you suggest, read through that article, and we reconfigured the autocommit to: ${solr.autoCommit.maxTime:3} false ${solr.autoSoftCommit.maxTime:60} However, we see no change, aside from the fact that it's clearly committing more frequently. I will say on our end,

Re: SolrCloud replicas consistently out of sync

2016-05-17 Thread Erick Erickson
OK, these autocommit settings need revisiting. First off, I'd remove the maxDocs entirely although with the setting you're using it probably doesn't matter. The maxTime of 1,200,000 is 20 minutes. Which means if you evern un-gracefully kill your shards you'll have up to 20 minutes worth of data t

Re: SolrCloud replicas consistently out of sync

2016-05-17 Thread Stephen Weiss
Yes, after startup there was a recovery process, you are right. It's just that this process doesn't seem to happen unless we do a full restart. These are our autocommit settings - to be honest, we did not really use autocommit until we switched up to SolrCloud so it's totally possible they are

Re: SolrCloud replicas consistently out of sync

2016-05-17 Thread Daniel Collins
Terminology question: by nodes I assume you mean machines? So "8 nodes, with 4 shards a piece, all running one collection with about 900M documents", is 1 collection split into 32 shards, with 4 shards located on each machine? Is each shard in its own JVM, or do you have 1 JVM on each machine runn

Re: SolrCloud replicas consistently out of sync

2016-05-16 Thread Erick Erickson
OK, this is very strange. There's no _good_ reason that restarting the servers should make a difference. The fact that it took 1/2 hour leads me to believe, though, that your shards are somehow "incomplete", especially that you are indexing to the system and don't have, say, your autocommit setting

Re: SolrCloud replicas consistently out of sync

2016-05-16 Thread Stephen Weiss
Just one more note - while experimenting, I found that if I stopped all nodes (full cluster shutdown), and then startup all nodes, they do in fact seem to repair themselves then. We have a script to monitor the differences between replicas (just looking at numDocs) and before the full shutdown

Re: SolrCloud replicas consistently out of sync

2016-05-16 Thread Stephen Weiss
Each node has one JVM with 16GB of RAM. Are you suggesting we would put each shard into a separate JVM (something like 32 nodes)? We aren't encountering any OOMs. We are testing this in a separate cloud which no one is even using, the only activity is this very small amount of indexing and st

Re: SolrCloud replicas consistently out of sync

2016-05-16 Thread Erick Erickson
8 nodes, 4 shards apiece? All in the same JVM? People have gotten by the GC pain by running in separate JVMs with less Java memory each on big beefy machines That's not a recommendation as much as an observation. That aside, unless you have some very strange stuff going on this is totally weir

SolrCloud replicas consistently out of sync

2016-05-16 Thread Stephen Weiss
Hi everyone, I'm running into a problem with SolrCloud replicas and thought I would ask the list to see if anyone else has seen this / gotten past it. Right now, we are running with only one replica per shard. This is obviously a problem because if one node goes down anywhere, the whole collec