Just to give you an idea, this is how we are ingesting: {"id": 1, "field1": {"inc": 20}, "field2": {"inc": 30}, "field3": 40. "field4": "some string"}
We are using Solr-8.5.1. We have not configured any update processor. Hard commit happens every minute or at 100k docs, soft commit happens every 10 mins. We have an external ZK setup with 5 nodes. Open files hard/soft limit is 65k and "max user processes" is unlimited. These are the different ERROR logs I found in the log files: ERROR (qtp1546693040-2637) [c:collection s:shard27 r:core_node109 x:collection_shard27_replica_n106] o.a.s.s.HttpSolrCall null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: Async exception during distributed update: java.net.ConnectException: Connection refused ERROR (qtp1546693040-1136) [c:collection s:shard101 r:core_node405 x:collection_shard101_replica_n402] o.a.s.s.HttpSolrCall null:java.io.IOException: java.lang.InterruptedException ERROR (qtp1546693040-2704) [c:collection s:shard101 r:core_node405 x:collection_shard101_replica_n402] o.a.s.s.HttpSolrCall null:org.eclipse.jetty.io.EofException: Reset cancel_stream_error ERROR (qtp1546693040-1344) [c:collection s:shard20 r:core_node79 x:collection_shard20_replica_n76] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: collection slice: shard48 saw state=DocCollection(collection//collections/collection/state.json/96434)={ ERROR (qtp1546693040-2928) [c:collection s:shard80 r:core_node319 x:collection_shard80_replica_n316] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Request says it is coming from leader, but we are the leader ERROR (updateExecutor-5-thread-47-processing-n:192.100.20.19:8985_solr x:collection_shard161_replica_n641 c:collection s:shard161 r:core_node646) [c:collection s:shard161 r:core_node646 x:collection_shard161_replica_n641] o.a.s.u.SolrCmdDistributor org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException: Error from server at null: Expected mime type application/octet-stream but got application/json ERROR (recoveryExecutor-7-thread-16-processing-n:192.100.20.33:8984_solr x:collection_shard80_replica_n47 c:collection s:shard80 r:core_node48) [c:collection s:shard80 r:core_node48 x:collection_shard80_replica_n47] o.a.s.c.RecoveryStrategy Error while trying to recover. core=collection_shard80_replica_n47:java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.SolrServerException: IOException occurred when talking to server at: http://192.100.20.34:8984/solr ERROR (zkCallback-10-thread-22) [c:collection s:shard19 r:core_node322 x:collection_shard19_replica_n321] o.a.s.c.ShardLeaderElectionContext There was a problem trying to register as the leader:org.apache.solr.common.AlreadyClosedException ERROR (OverseerStateUpdate-176461820351853980-192.100.20.34:8985_solr-n_0000002357) [ ] o.a.s.c.Overseer Overseer could not process the current clusterstate state update message, skipping the message: { ERROR (main-EventThread) [ ] o.a.z.ClientCnxn Error while calling watcher => java.lang.OutOfMemoryError: unable to create new native thread ERROR (coreContainerWorkExecutor-2-thread-1-processing-n:192.100.20.34:8986_solr) [ ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup => org.apache.solr.cloud.ZkController$NotInClusterStateException: coreNodeName core_node638 does not exist in shard shard105, ignore the exception if the replica was deleted ERROR (qtp836220863-249) [c:collection s:shard162 r:core_node548 x:collection_shard162_replica_n547] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: collection slice: shard162 saw state=DocCollection(collection//collections/collection/state.json/43121)={ Regards, Anshuman On Mon, Aug 10, 2020 at 9:19 PM Jörn Franke <jornfra...@gmail.com> wrote: > How do you ingest it exactly with Atomtic updates ? Is there an update > processor in-between? > > What are your settings for hard/soft commit? > > For the shared going to recovery - do you have a log entry or something ? > > What is the Solr version? > > How do you setup ZK? > > > Am 10.08.2020 um 16:24 schrieb Anshuman Singh <singhanshuma...@gmail.com > >: > > > > Hi, > > > > We have a SolrCloud cluster with 10 nodes. We have 6B records ingested in > > the Collection. Our use case requires atomic updates ("inc") on 5 fields. > > Now almost 90% documents are atomic updates and as soon as we start our > > ingestion pipelines, multiple shards start going into recovery, sometimes > > all replicas of some shards go into down state. > > The ingestion rate is also too slow with atomic updates, 4-5k per second. > > We were able to ingest records without atomic updates at the rate of 50k > > records per second without any issues. > > > > What I'm suspecting is, the fact that these "inc" atomic updates > > require fetching of fields before indexing can cause slow rates but what > > I'm not getting is, why are the replicas going into recovery? > > > > Regards, > > Anshuman >