Re: SolrCloud Nodes going to recovery state during indexing

Sravan Kumar Wed, 03 Jan 2018 07:05:33 -0800

Emir,
    Yes there is a delete_by_query on every bulk insert.
    This delete_by_query deletes all the documents which are updated lesser
than a day before the current time.
    Is bulk delete_by_query the reason?


On Wed, Jan 3, 2018 at 7:58 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Do you have deletes by query while indexing or it is append only index?
>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 3 Jan 2018, at 12:16, sravan <sra...@caavo.com> wrote:
> >
> > SolrCloud Nodes going to recovery state during indexing
> >
> >
> > We have solr cloud setup with the settings shared below. We have a
> collection with 3 shards and a replica for each of them.
> >
> > Normal State(As soon as the whole cluster is restarted):
> >     - Status of all the shards is UP.
> >     - a bulk update request of 50 documents each takes < 100ms.
> >     - 6-10 simultaneous bulk updates.
> >
> > Nodes going to recover state after updates for 15-30 mins.
> >     - Some shards starts giving the following ERRORs:
> >         - o.a.s.h.RequestHandlerBase org.apache.solr.update.processor.
> DistributedUpdateProcessor$DistributedUpdatesAsyncException: Async
> exception during distributed update: Read timed out
> >         - o.a.s.u.StreamingSolrClients error 
> > java.net.SocketTimeoutException:
> Read timed out
> >     - the following error is seen on the shard which goes to recovery
> state.
> >         - too many updates received since start - startingUpdates no
> longer overlaps with our currentUpdates.
> >     - Sometimes, the same shard even goes to DOWN state and needs a node
> restart to come back.
> >     - a bulk update request of 50 documents takes more than 5 seconds.
> Sometimes even >120 secs. This is seen for all the requests if at least one
> node is in recovery state in the whole cluster.
> >
> > We have a standalone setup with the same collection schema which is able
> to take update & query load without any errors.
> >
> >
> > We have the following solrcloud setup.
> >     - setup in AWS.
> >
> >     - Zookeeper Setup:
> >         - number of nodes: 3
> >         - aws instance type: t2.small
> >         - instance memory: 2gb
> >
> >     - Solr Setup:
> >         - Solr version: 6.6.0
> >         - number of nodes: 3
> >         - aws instance type: m5.xlarge
> >         - instance memory: 16gb
> >         - number of cores: 4
> >         - JAVA HEAP: 8gb
> >         - JAVA VERSION: oracle java version "1.8.0_151"
> >         - GC settings: default CMS.
> >
> >         collection settings:
> >             - number of shards: 3
> >             - replication factor: 2
> >             - total 6 replicas.
> >             - total number of documents in the collection: 12 million
> >             - total number of documents in each shard: 4 million
> >             - Each document has around 25 fields with 12 of them
> containing textual analysers & filters.
> >             - Commit Strategy:
> >                 - No explicit commits from application code.
> >                 - Hard commit of 15 secs with OpenSearcher as false.
> >                 - Soft commit of 10 mins.
> >             - Cache Strategy:
> >                 - filter queries
> >                     - number: 512
> >                     - autowarmCount: 100
> >                 - all other caches
> >                     - number: 512
> >                     - autowarmCount: 0
> >             - maxWarmingSearchers: 2
> >
> >
> > - We tried the following
> >     - commit strategy
> >         - hard commit - 150 secs
> >         - soft commit - 5 mins
> >     - with GCG1 garbage collector based on https://wiki.apache.org/solr/
> ShawnHeisey#Java_8_recommendation_for_Solr:
> >         - the nodes go to recover state in less than a minute.
> >
> > The issue is seen even when the leaders are balanced across the three
> nodes.
> >
> > Can you help us find the soluttion to this problem?
>
>


-- 
Regards,
Sravan

Re: SolrCloud Nodes going to recovery state during indexing

Reply via email to