Is zookeeper on the solr hosts or on its own?  Have you tried
opensearcher=false (soft commit?)

On Sun, Dec 6, 2020 at 6:19 PM raj.yadav <rajkum...@cse.ism.ac.in> wrote:
>
> Hi Everyone,
>
>
> matthew sporleder wrote
> > Are you stuck in iowait during that commit?
>
> During commit operation, there is no iowait.
> Infact most of the time cpu utilization percentage is very low.
>
> /*As I mentioned in my previous post that we are getting `SolrCmdDistributor
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server` and `DistributedZkUpdateProcessor` ERROR on
> one of the shards. And this error is always occurring on the shard that is
> used (in culr command) to issue commit. (See below example for better
> understanding)*/
>
> Here is shard and corresponding node details:
> shard1_0=>solr_199
> shard1_1=>solr_200
> shard2_0=> solr_254
> shard2_1=> solr_132
> shard3_0=>solr_133
> shard3_1=>solr_198
>
> We are using the following command to issue commit:
> /curl
> "http://solr_node:8389/solr/my_collection/update?openSearcher=true&commit=true&wt=json"/
>
> For example, in the above command, if we replace solr_node with solr_254,
> then it's throwing SolrCmdDistributor and DistributedZkUpdateProcessor
> errors on shard2_0. Similarly, if we replace solr_node with solr_200 its
> throws errors on shard1_1.
>
> *I'm not able to figure out why this is happening. Is there any connection
> timeout setting that is affecting this? Is there any limit that, at a time
> only N number of shards can run commit ops simultaneously or is it some
> network related issue?*
>
>
> For a better understanding of what's happening in SOLR logs. I will
> demonstrate here one commit operation.
>
> I used the below command to issue commit at `2020-12-06 18:37:40` (approx)
> curl
> "http://solr_200:8389/solr/my_collection/update?openSearcher=true&commit=true&wt=json";
>
>
> /*shard2_0 (node: solr_254) Logs:*/
>
>
> *Commit is received at `2020-12-06 18:37:47` and got over by `2020-12-06
> 18:37:47` since there were no changes to commit. And CPU utilization during
> the whole period is around 2%.*
>
>
> 2020-12-06 18:37:47.023 INFO  (qtp2034610694-31355) [c:my_collection
> s:shard2_0 r:core_node13 x:my_collection_shard2_0_replica_n11]
> o.a.s.u.DirectUpdateHandler2 start
> commit{_version_=1685355093842460672,optimize=false,ope
> nSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2020-12-06 18:37:47.023 INFO No uncommitted changes. Skipping IW.commit.
> 2020-12-06 18:37:47.023 INFO end_commit_flush
> 2020-12-06 18:37:47.023 INFO  (qtp2034610694-31355) [c:my_collection
> s:shard2_0 r:core_node13 x:my_collection_shard2_0_replica_n11]
> o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard2_0_replica_n11]
> webapp=/solr path=/update
>
> params={update.distrib=TOLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=http://solr_200:8389/solr/my_collection_shard1_1_replica_n19/&commit_end_point=leaders&wt=javabi
> n&version=2&expungeDeletes=false}{commit=} 0 3
>
> /*shard2_1 (node: solr_132) Logs:*/
>
> *Commit is received at `2020-12-06 18:37:47` and got over by `2020-12-06
> 18:50:46` in between there were some external file reloading operations (our
> solr-5.4.2 system is also taking similar time to reload external files so
> right now this is not a major concern for us)
> CPU utilization before commit (i.e `2020-12-06 18:37:47` timestamp) is 2%
> and between commit ops (i.e from `2020-12-06 18:37:47`  to `2020-12-06
> 18:50:46` timestamp) is 14% and after commit operation is done it agains
> fall back to 2%*
>
>
> 2020-12-06 18:37:47.024 INFO  (qtp2034610694-30058) [c:my_collection
> s:shard2_1 r:core_node22 x:my_collection_shard2_1_replica_n21]
> o.a.s.u.DirectUpdateHandler2 start
> commit{_version_=1685355093844557824,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>
> 2020-12-06 18:50:46.218 INFO  (qtp2034610694-30058) [c:my_collection
> s:shard2_1 r:core_node22 x:my_collection_shard2_1_replica_n21]
> o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard2_1_replica_n21]
> webapp=/solr path=/update
> params={update.distrib=TOLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=http://solr_200:8389/solr/my_collection_shard1_1_replica_n19/&commit_end_point=leaders&wt=javabin&version=2&expungeDeletes=false}{commit=}
> 0 779196
>
>
> /*shard3_0 (node: solr_133) logs*/
>
> Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by
> `2020-12-06 18:49:24`.
> CPU utilization pattern is same is shard2_1.
>
> /*shard3_1 (node: solr_198) logs.*/
>
> Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by
> `2020-12-06 18:53:57`.
> CPU utilization pattern is same is shard2_1.
>
> /*shard1_0 (node: solr_199) logs.*/
>
> Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by
> `2020-12-06 18:54:51`.
> CPU utilization pattern is same is shard2_1.
>
> /*shard1_1 (node: solr_200) logs.*/
>
> /This is the same solr_node which we used in curl command to issue commit.
> As expected we got SolrCmdDistributor and DistributedZkUpdateProcessor error
> on it./
>
> /Till `2020-12-06 18:46:50` timestamp there is no `start commit`  request
> received. Also CPU utilization is 2% till this time./
> /*Received follwing error at `2020-12-06 18:47:47` timestamp*/
>
> 2020-12-06 18:47:47.013 ERROR
> (updateExecutor-5-thread-6-processing-n:solr_200:8389_solr
> x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1
> r:core_node20) [c:my_collection s:shard1_1 r:core_node20
> x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407)
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753)
>         at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369)
>         at 
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
>         at
> org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344)
>         at
> org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.TimeoutException
>         at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398)
>         ... 13 more
>
> 2020-12-06 18:47:47.018 ERROR
> (updateExecutor-5-thread-2-processing-n:solr_200:8389_solr
> x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1
> r:core_node20) [c:my_collection s:shard1_1 r:core_node20
> x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://solr_199:8389/solr/my_collection_shard1_0_replica_n7/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
> ........
> *(REST ERROR MESSAGE SAME AS ABOVE)*
>
>
>
> 2020-12-06 18:47:47.018 ERROR
> (updateExecutor-5-thread-4-processing-n:solr_200:8389_solr
> x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1
> r:core_node20) [c:my_collection s:shard1_1 r:core_node20
> x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://solr_132:8389/solr/my_collection_shard2_1_replica_n21/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
> ........
>
>
> 2020-12-06 18:47:47.018 ERROR
> (updateExecutor-5-thread-5-processing-n:solr_200:8389_solr
> x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1
> r:core_node20) [c:my_collection s:shard1_1 r:core_node20
> x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://solr_133:8389/solr/my_collection_shard3_0_replica_n15/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
> ........
>
> 2020-12-06 18:47:47.020 INFO  (qtp1267082248-29872) [c:my_collection
> s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
> o.a.s.u.SolrCmdDistributor SolrCmdDistributor found 4 errors
>
>
> /*Then at `2020-12-06 18:47:47` received `start commit` request. Also by
> this time CPU utilization is moved up to 14%. */
>
>
> 2020-12-06 18:47:47.020 INFO  (qtp1267082248-29872) [c:my_collection
> s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
> o.a.s.u.DirectUpdateHandler2 start
> commit{_version_=1685355722985963520,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>
>
> *THEN IN BW THERE WERE SOME EXTERNAL FILE RELOADING OPERATION. CPU
> utilization is still around 14%.*
>
> *Then at `2020-12-06 19:06:09` timestamp got following warning.*
>
> 2020-12-06 19:06:09.672 WARN  (qtp1267082248-29872) [c:my_collection
> s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
> o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to
> http://solr_198:8389/solr =>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407)
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369)
> ~[?:?]
>         at 
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
> ~[?:?]
>         at
> org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344)
> ~[?:?]
>         at
> org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333)
> ~[?:?]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_262]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_262]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_262]
>         at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
> ~[metrics-core-4.1.2.jar:4.1.2]
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> ~[?:?]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_262]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_262]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
> Caused by: java.util.concurrent.TimeoutException
>         at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398)
> ~[?:?]
>         ... 13 more
>
>
> 2020-12-06 19:06:09.673 WARN  (qtp1267082248-29872) [c:my_collection
> s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
> o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to
> http://solr_199:8389/solr =>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://solr_199:8389/solr/........
>
> 2020-12-06 19:06:09.673 WARN  (qtp1267082248-29872) [c:my_collection
> s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
> o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to
> http://solr_132:8389/solr =>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://solr_132:8389/solr/........
>
>
> 2020-12-06 19:06:09.673 WARN  (qtp1267082248-29872) [c:my_collection
> s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
> o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to
> http://solr_133:8389/solr =>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://solr_133:8389/solr/........
>
> /*
> Then at `2020-12-06 19:06:09` timestamp we got following. Which I'm assuming
> indicates that commit got over, since after this CPU utilization fall back
> again to 2%. And We also got curl reponse by this timestamp with status=0. *
> /
>
>
> 2020-12-06 19:06:09.673 INFO  (qtp1267082248-29872) [c:my_collection
> s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
> o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard1_1_replica_n19]
> webapp=/solr path=/update
> params={openSearcher=true&commit=true&wt=json}{commit=} 0 1702717
>
>
> In short to summarize:
> Commit was issued at `2020-12-06 18:37:40` (approx) timestamp.
> Till `2020-12-06 18:46:50` timestamp there was no `start commit` received
> and CPU was at  2% till it throws `SolrCmdDistributor` error at `2020-12-06
> 18:47:47` timestamp. Then at around the same timestamp `2020-12-06 18:47:47`
> received the `start commit` request. And by this time CPU utilization is
> moved up to 14%.
> Then at `2020-12-06 19:06:09` timestamp got `DistributedZkUpdateProcessor`
> error message and then at `2020-12-06 19:06:09.673` timestamp got some
> commit related message (which I'm assuming its indicates end of commit) and
> CPU is again fall back to 2%.
>
>
> Due to these timeout errors overall commit time is getting double. Can
> someone guide me to resolve this issue.
>
>
>
>
>
>
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to