Thanks, I'll have a try. Can the load on the Solr servers impair the zk
response time in the current situation, which would cause the desync? Is
this the reason for the change?

John.


On 21/12/15 16:45, Erik Hatcher wrote:
> John - the first recommendation that pops out is to run (only) 3 zookeepers, 
> entirely separate from Solr servers, and then as many Solr servers from there 
> that you need to scale indexing and querying to your needs.  Sounds like 3 
> ZKs + 2 Solr’s is a good start, given you have 5 servers at your disposal.
>
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com <http://www.lucidworks.com/>
>
>
>
>> On Dec 21, 2015, at 10:37 AM, John Smith <solr-u...@remailme.net> wrote:
>>
>> This is my first experience with SolrCloud, so please bear with me.
>>
>> I've inherited a setup with 5 servers, 2 of which are Zookeeper only and
>> the 3 others SolrCloud + Zookeeper. Versions are respectively 5.4.0 &
>> 3.4.7. There's around 80 Gb of index, some collections are rather big
>> (20Gb) and some very small. All of them have only one shard. The bigger
>> ones are almost constantly being updated (and of course queried at the
>> same time).
>>
>> I've had a huge number of errors, many different ones. At some point the
>> system seemed rather stable, but I've tried to add a few new collections
>> and things went wrong again. The usual symptom is that some cores stop
>> synchronizing; sometimes an entire server is shown as "gone" (although
>> it's still alive and well). When I add a core on a server, another (or
>> several others) often goes down on that server. Even when the system is
>> rather stable some cores are shown as recovering. When restarting a
>> server it takes a very long time (30 min at least) to fully recover.
>>
>> Some of the many errors I've got (I've skipped the warnings):
>> - org.apache.solr.common.SolrException: Error trying to proxy request
>> for url
>> - org.apache.solr.update.processor.DistributedUpdateProcessor; Setting
>> up to try to start recovery on replica
>> - org.apache.solr.common.SolrException; Error while trying to recover.
>> core=[...]:org.apache.solr.common.SolrException: No registered leader
>> was found after waiting
>> - update log not in ACTIVE or REPLAY state. FSUpdateLog{state=BUFFERING,
>> tlog=null}
>> - org.apache.solr.cloud.RecoveryStrategy; Could not publish as ACTIVE
>> after succesful recovery
>> - org.apache.solr.common.SolrException; Could not find core to call recovery
>> - org.apache.solr.common.SolrException: Error CREATEing SolrCore '...':
>> Unable to create core
>> - org.apache.solr.request.SolrRequestInfo; prev == info : false
>> - org.apache.solr.request.SolrRequestInfo; Previous SolrRequestInfo was
>> not closed!
>> - org.apache.solr.update.SolrIndexWriter; Error closing IndexWriter
>> - org.apache.solr.update.SolrIndexWriter; SolrIndexWriter was not closed
>> prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
>> - org.apache.solr.cloud.OverseerCollectionMessageHandler; Error from shard
>> - org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting
>> for connection from pool
>> - and so on...
>>
>> Any advice on where I should start? I've checked disk space, memory
>> usage, max number of open files, everything seems fine there. My guess
>> is that the configuration is rather unaltered from the defaults. I've
>> extended timeouts in Zookeeper already.
>>
>> Thanks,
>> John
>>
>

Reply via email to