We tried to set ZK timeout to 1s and did load testing (both indexing and
search) and this issue didn't happen.


2014-03-24 17:00 GMT+02:00 Lukas Mikuckis <lukasmikuc...@gmail.com>:

> Garbage Collectors Summary:
> https://apps.sematext.com/spm-reports/s/rgRnwuShgI<https://app.getsignals.com/link?url=https%3A%2F%2Fapps.sematext.com%2Fspm-reports%2Fs%2FrgRnwuShgI&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIa0jfILDA&k=26275c93-7d78-4359-c01e-afe10a004d52>
>
> Pool Size:
> https://apps.sematext.com/spm-reports/s/H16ndqichM<https://app.getsignals.com/link?url=https%3A%2F%2Fapps.sematext.com%2Fspm-reports%2Fs%2FH16ndqichM&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIa0jfILDA&k=5027ed8d-cdc8-4e12-ea51-ea5677720d9a>
>
> First Stopping recovery warning: 4:00, OOM error: 6:30.
>
>
> 2014-03-24 16:35 GMT+02:00 Shalin Shekhar Mangar <shalinman...@gmail.com>:
>
> I am guessing that it is all related to memory issues. I guess that as
>> the used heap increases, full GC cycles increase causing ZK timeouts
>> which in turn cause more recoveries to be initiated. In the end,
>> everything blows up with the out of memory errors. Do you log GC
>> activity on your servers?
>>
>> I suggest that you rollback to 4.6.1 for now and upgrade to 4.7.1 when
>> it releases next week.
>>
>> On Mon, Mar 24, 2014 at 7:51 PM, Lukas Mikuckis <lukasmikuc...@gmail.com>
>> wrote:
>> > Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr
>> > started crashing).
>> > When we were upgrading, we just upgraded solr and changed versions in
>> > collections configs.
>> >
>> > When solr crashes we get OOM but only 2h after first Stopping recovery
>> > warnings.
>> >
>> > Maybe you have any ideas when Stopping recovery warnings are thrown?
>> > Because now we have no idea what could cause this issue.
>> >
>> > Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar <
>> shalinman...@gmail.com
>> >>:
>> >>
>> >> Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can
>> >> cause out of memory issues. Can you check your logs for out of memory
>> >> errors?
>> >>
>> >> On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis <
>> lukasmikuc...@gmail.com>
>> > wrote:
>> >> > Solr version: 4.7
>> >> >
>> >> > Architecture:
>> >> > 2 solrs (1 shard, leader + replica)
>> >> > 3 zookeepers
>> >> >
>> >> > Servers:
>> >> > * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores
>> >> > * zookeeper + solr  (heap 4gb) - RAM 8gb, 2 cpu cores
>> >> > * zookeeper
>> >> >
>> >> > Solr data:
>> >> > * 21 collections
>> >> > * Many fields, small docs, docs count per collection from 1k to 500k
>> >> >
>> >> > About a week ago solr started crashing. It crashes every day, 3-4
>> times
>> > a
>> >> > day. Usually at nigh. I can't tell anything what could it be related
>> to
>> >> > because at that time we haven't done any configuration changes. Load
>> >> > haven't changed too.
>> >> >
>> >> >
>> >> > Everything starts with Stopping recovery for .. warnings (every
>> > warnings is
>> >> > repeated several times):
>> >> >
>> >> > WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
>> >> > zkNodeName=core_node1core=******************
>> >> >
>> >> > WARN  org.apache.solr.cloud.ElectionContext; cancelElection did not
>> find
>> >> > election node to remove
>> >> >
>> >> > WARN  org.apache.solr.update.PeerSync; no frame of reference to tell
>> if
>> >> > we've missed updates
>> >> >
>> >> > WARN  - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no
>> > frame
>> >> > of reference to tell if we've missed updates
>> >> >
>> >> > WARN  - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller;
>> > File
>> >> > _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879
>> >> >
>> >> > WARN  - 2014-03-23 04:00:54.126;
>> >> > org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay
>> >> >
>> >
>> tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0000000000000003272
>> >> > refcount=2} active=true starting pos=356216606
>> >> >
>> >> > Then again Stopping recovery for .. warnings:
>> >> >
>> >> > WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
>> >> > zkNodeName=core_node1core=******************
>> >> >
>> >> > ERROR - 2014-03-23 05:19:29.566;
>> org.apache.solr.common.SolrException;
>> >> > org.apache.solr.common.SolrException: No registered leader was found
>> > after
>> >> > waiting for 4000ms , collection: collection1 slice: shard1
>> >> >
>> >> > ERROR - 2014-03-23 05:20:03.961;
>> org.apache.solr.common.SolrException;
>> >> > org.apache.solr.common.SolrException: I was asked to wait on state
>> down
>> > for
>> >> > IP:PORT_solr but I still do not see the requested state. I see state:
>> >> > active live:false
>> >> >
>> >> >
>> >> > After this serves mostly didn't recover.
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Shalin Shekhar Mangar.
>> >>
>> >>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>

Reply via email to