John - the first recommendation that pops out is to run (only) 3 zookeepers, 
entirely separate from Solr servers, and then as many Solr servers from there 
that you need to scale indexing and querying to your needs.  Sounds like 3 ZKs 
+ 2 Solr’s is a good start, given you have 5 servers at your disposal.


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com <http://www.lucidworks.com/>



> On Dec 21, 2015, at 10:37 AM, John Smith <solr-u...@remailme.net> wrote:
> 
> This is my first experience with SolrCloud, so please bear with me.
> 
> I've inherited a setup with 5 servers, 2 of which are Zookeeper only and
> the 3 others SolrCloud + Zookeeper. Versions are respectively 5.4.0 &
> 3.4.7. There's around 80 Gb of index, some collections are rather big
> (20Gb) and some very small. All of them have only one shard. The bigger
> ones are almost constantly being updated (and of course queried at the
> same time).
> 
> I've had a huge number of errors, many different ones. At some point the
> system seemed rather stable, but I've tried to add a few new collections
> and things went wrong again. The usual symptom is that some cores stop
> synchronizing; sometimes an entire server is shown as "gone" (although
> it's still alive and well). When I add a core on a server, another (or
> several others) often goes down on that server. Even when the system is
> rather stable some cores are shown as recovering. When restarting a
> server it takes a very long time (30 min at least) to fully recover.
> 
> Some of the many errors I've got (I've skipped the warnings):
> - org.apache.solr.common.SolrException: Error trying to proxy request
> for url
> - org.apache.solr.update.processor.DistributedUpdateProcessor; Setting
> up to try to start recovery on replica
> - org.apache.solr.common.SolrException; Error while trying to recover.
> core=[...]:org.apache.solr.common.SolrException: No registered leader
> was found after waiting
> - update log not in ACTIVE or REPLAY state. FSUpdateLog{state=BUFFERING,
> tlog=null}
> - org.apache.solr.cloud.RecoveryStrategy; Could not publish as ACTIVE
> after succesful recovery
> - org.apache.solr.common.SolrException; Could not find core to call recovery
> - org.apache.solr.common.SolrException: Error CREATEing SolrCore '...':
> Unable to create core
> - org.apache.solr.request.SolrRequestInfo; prev == info : false
> - org.apache.solr.request.SolrRequestInfo; Previous SolrRequestInfo was
> not closed!
> - org.apache.solr.update.SolrIndexWriter; Error closing IndexWriter
> - org.apache.solr.update.SolrIndexWriter; SolrIndexWriter was not closed
> prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
> - org.apache.solr.cloud.OverseerCollectionMessageHandler; Error from shard
> - org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting
> for connection from pool
> - and so on...
> 
> Any advice on where I should start? I've checked disk space, memory
> usage, max number of open files, everything seems fine there. My guess
> is that the configuration is rather unaltered from the defaults. I've
> extended timeouts in Zookeeper already.
> 
> Thanks,
> John
> 

Reply via email to