Re: Replicas in Recovery During Atomic Updates

Anshuman Singh Mon, 10 Aug 2020 13:40:22 -0700

Just to give you an idea, this is how we are ingesting:

{"id": 1, "field1": {"inc": 20}, "field2": {"inc": 30}, "field3": 40.
"field4": "some string"}


We are using Solr-8.5.1. We have not configured any update processor. Hard
commit happens every minute or at 100k docs, soft commit happens every 10
mins.
We have an external ZK setup with 5 nodes.

Open files hard/soft limit is 65k and "max user processes" is unlimited.

These are the different ERROR logs I found in the log files:

ERROR (qtp1546693040-2637) [c:collection s:shard27 r:core_node109
x:collection_shard27_replica_n106] o.a.s.s.HttpSolrCall
null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: java.net.ConnectException:
Connection refused

ERROR (qtp1546693040-1136) [c:collection s:shard101 r:core_node405
x:collection_shard101_replica_n402] o.a.s.s.HttpSolrCall
null:java.io.IOException: java.lang.InterruptedException

ERROR (qtp1546693040-2704) [c:collection s:shard101 r:core_node405
x:collection_shard101_replica_n402] o.a.s.s.HttpSolrCall
null:org.eclipse.jetty.io.EofException: Reset cancel_stream_error

ERROR (qtp1546693040-1344) [c:collection s:shard20 r:core_node79
x:collection_shard20_replica_n76] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: No registered leader was found after
waiting for 4000ms , collection: collection slice: shard48 saw
state=DocCollection(collection//collections/collection/state.json/96434)={

ERROR (qtp1546693040-2928) [c:collection s:shard80 r:core_node319
x:collection_shard80_replica_n316] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Request says it is coming from
leader, but we are the leader

ERROR (updateExecutor-5-thread-47-processing-n:192.100.20.19:8985_solr
x:collection_shard161_replica_n641 c:collection s:shard161 r:core_node646)
[c:collection s:shard161 r:core_node646 x:collection_shard161_replica_n641]
o.a.s.u.SolrCmdDistributor
org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException:
Error from server at null: Expected mime type application/octet-stream but
got application/json

ERROR (recoveryExecutor-7-thread-16-processing-n:192.100.20.33:8984_solr
x:collection_shard80_replica_n47 c:collection s:shard80 r:core_node48)
[c:collection s:shard80 r:core_node48 x:collection_shard80_replica_n47]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=collection_shard80_replica_n47:java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.SolrServerException: IOException occurred when
talking to server at: http://192.100.20.34:8984/solr

ERROR (zkCallback-10-thread-22) [c:collection s:shard19 r:core_node322
x:collection_shard19_replica_n321] o.a.s.c.ShardLeaderElectionContext There
was a problem trying to register as the
leader:org.apache.solr.common.AlreadyClosedException

ERROR
(OverseerStateUpdate-176461820351853980-192.100.20.34:8985_solr-n_0000002357)
[   ] o.a.s.c.Overseer Overseer could not process the current clusterstate
state update message, skipping the message: {

ERROR (main-EventThread) [   ] o.a.z.ClientCnxn Error while calling watcher
 => java.lang.OutOfMemoryError: unable to create new native thread

ERROR 
(coreContainerWorkExecutor-2-thread-1-processing-n:192.100.20.34:8986_solr)
[   ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on
startup => org.apache.solr.cloud.ZkController$NotInClusterStateException:
coreNodeName core_node638 does not exist in shard shard105, ignore the
exception if the replica was deleted

ERROR (qtp836220863-249) [c:collection s:shard162 r:core_node548
x:collection_shard162_replica_n547] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: No registered leader was found after
waiting for 4000ms , collection: collection slice: shard162 saw
state=DocCollection(collection//collections/collection/state.json/43121)={

Regards,
Anshuman

On Mon, Aug 10, 2020 at 9:19 PM Jörn Franke <jornfra...@gmail.com> wrote:

> How do you ingest it exactly with Atomtic updates ? Is there an update
> processor in-between?
>
> What are your settings for hard/soft commit?
>
> For the shared going to recovery - do you have a log entry or something ?
>
> What is the Solr version?
>
> How do you setup ZK?
>
> > Am 10.08.2020 um 16:24 schrieb Anshuman Singh <singhanshuma...@gmail.com
> >:
> >
> > Hi,
> >
> > We have a SolrCloud cluster with 10 nodes. We have 6B records ingested in
> > the Collection. Our use case requires atomic updates ("inc") on 5 fields.
> > Now almost 90% documents are atomic updates and as soon as we start our
> > ingestion pipelines, multiple shards start going into recovery, sometimes
> > all replicas of some shards go into down state.
> > The ingestion rate is also too slow with atomic updates, 4-5k per second.
> > We were able to ingest records without atomic updates at the rate of 50k
> > records per second without any issues.
> >
> > What I'm suspecting is, the fact that these "inc" atomic updates
> > require fetching of fields before indexing can cause slow rates but what
> > I'm not getting is, why are the replicas going into recovery?
> >
> > Regards,
> > Anshuman
>

Re: Replicas in Recovery During Atomic Updates

Reply via email to