Is zookeeper on the solr hosts or on its own? Have you tried opensearcher=false (soft commit?)
On Sun, Dec 6, 2020 at 6:19 PM raj.yadav <rajkum...@cse.ism.ac.in> wrote: > > Hi Everyone, > > > matthew sporleder wrote > > Are you stuck in iowait during that commit? > > During commit operation, there is no iowait. > Infact most of the time cpu utilization percentage is very low. > > /*As I mentioned in my previous post that we are getting `SolrCmdDistributor > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > waiting response from server` and `DistributedZkUpdateProcessor` ERROR on > one of the shards. And this error is always occurring on the shard that is > used (in culr command) to issue commit. (See below example for better > understanding)*/ > > Here is shard and corresponding node details: > shard1_0=>solr_199 > shard1_1=>solr_200 > shard2_0=> solr_254 > shard2_1=> solr_132 > shard3_0=>solr_133 > shard3_1=>solr_198 > > We are using the following command to issue commit: > /curl > "http://solr_node:8389/solr/my_collection/update?openSearcher=true&commit=true&wt=json"/ > > For example, in the above command, if we replace solr_node with solr_254, > then it's throwing SolrCmdDistributor and DistributedZkUpdateProcessor > errors on shard2_0. Similarly, if we replace solr_node with solr_200 its > throws errors on shard1_1. > > *I'm not able to figure out why this is happening. Is there any connection > timeout setting that is affecting this? Is there any limit that, at a time > only N number of shards can run commit ops simultaneously or is it some > network related issue?* > > > For a better understanding of what's happening in SOLR logs. I will > demonstrate here one commit operation. > > I used the below command to issue commit at `2020-12-06 18:37:40` (approx) > curl > "http://solr_200:8389/solr/my_collection/update?openSearcher=true&commit=true&wt=json" > > > /*shard2_0 (node: solr_254) Logs:*/ > > > *Commit is received at `2020-12-06 18:37:47` and got over by `2020-12-06 > 18:37:47` since there were no changes to commit. And CPU utilization during > the whole period is around 2%.* > > > 2020-12-06 18:37:47.023 INFO (qtp2034610694-31355) [c:my_collection > s:shard2_0 r:core_node13 x:my_collection_shard2_0_replica_n11] > o.a.s.u.DirectUpdateHandler2 start > commit{_version_=1685355093842460672,optimize=false,ope > nSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} > 2020-12-06 18:37:47.023 INFO No uncommitted changes. Skipping IW.commit. > 2020-12-06 18:37:47.023 INFO end_commit_flush > 2020-12-06 18:37:47.023 INFO (qtp2034610694-31355) [c:my_collection > s:shard2_0 r:core_node13 x:my_collection_shard2_0_replica_n11] > o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard2_0_replica_n11] > webapp=/solr path=/update > > params={update.distrib=TOLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=http://solr_200:8389/solr/my_collection_shard1_1_replica_n19/&commit_end_point=leaders&wt=javabi > n&version=2&expungeDeletes=false}{commit=} 0 3 > > /*shard2_1 (node: solr_132) Logs:*/ > > *Commit is received at `2020-12-06 18:37:47` and got over by `2020-12-06 > 18:50:46` in between there were some external file reloading operations (our > solr-5.4.2 system is also taking similar time to reload external files so > right now this is not a major concern for us) > CPU utilization before commit (i.e `2020-12-06 18:37:47` timestamp) is 2% > and between commit ops (i.e from `2020-12-06 18:37:47` to `2020-12-06 > 18:50:46` timestamp) is 14% and after commit operation is done it agains > fall back to 2%* > > > 2020-12-06 18:37:47.024 INFO (qtp2034610694-30058) [c:my_collection > s:shard2_1 r:core_node22 x:my_collection_shard2_1_replica_n21] > o.a.s.u.DirectUpdateHandler2 start > commit{_version_=1685355093844557824,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} > > 2020-12-06 18:50:46.218 INFO (qtp2034610694-30058) [c:my_collection > s:shard2_1 r:core_node22 x:my_collection_shard2_1_replica_n21] > o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard2_1_replica_n21] > webapp=/solr path=/update > params={update.distrib=TOLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=http://solr_200:8389/solr/my_collection_shard1_1_replica_n19/&commit_end_point=leaders&wt=javabin&version=2&expungeDeletes=false}{commit=} > 0 779196 > > > /*shard3_0 (node: solr_133) logs*/ > > Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by > `2020-12-06 18:49:24`. > CPU utilization pattern is same is shard2_1. > > /*shard3_1 (node: solr_198) logs.*/ > > Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by > `2020-12-06 18:53:57`. > CPU utilization pattern is same is shard2_1. > > /*shard1_0 (node: solr_199) logs.*/ > > Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by > `2020-12-06 18:54:51`. > CPU utilization pattern is same is shard2_1. > > /*shard1_1 (node: solr_200) logs.*/ > > /This is the same solr_node which we used in curl command to issue commit. > As expected we got SolrCmdDistributor and DistributedZkUpdateProcessor error > on it./ > > /Till `2020-12-06 18:46:50` timestamp there is no `start commit` request > received. Also CPU utilization is 2% till this time./ > /*Received follwing error at `2020-12-06 18:47:47` timestamp*/ > > 2020-12-06 18:47:47.013 ERROR > (updateExecutor-5-thread-6-processing-n:solr_200:8389_solr > x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1 > r:core_node20) [c:my_collection s:shard1_1 r:core_node20 > x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > waiting response from server at: > http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F > at > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407) > at > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753) > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369) > at > org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) > at > org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344) > at > org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.util.concurrent.TimeoutException > at > org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216) > at > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398) > ... 13 more > > 2020-12-06 18:47:47.018 ERROR > (updateExecutor-5-thread-2-processing-n:solr_200:8389_solr > x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1 > r:core_node20) [c:my_collection s:shard1_1 r:core_node20 > x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > waiting response from server at: > http://solr_199:8389/solr/my_collection_shard1_0_replica_n7/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F > ........ > *(REST ERROR MESSAGE SAME AS ABOVE)* > > > > 2020-12-06 18:47:47.018 ERROR > (updateExecutor-5-thread-4-processing-n:solr_200:8389_solr > x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1 > r:core_node20) [c:my_collection s:shard1_1 r:core_node20 > x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > waiting response from server at: > http://solr_132:8389/solr/my_collection_shard2_1_replica_n21/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F > ........ > > > 2020-12-06 18:47:47.018 ERROR > (updateExecutor-5-thread-5-processing-n:solr_200:8389_solr > x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1 > r:core_node20) [c:my_collection s:shard1_1 r:core_node20 > x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > waiting response from server at: > http://solr_133:8389/solr/my_collection_shard3_0_replica_n15/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F > ........ > > 2020-12-06 18:47:47.020 INFO (qtp1267082248-29872) [c:my_collection > s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] > o.a.s.u.SolrCmdDistributor SolrCmdDistributor found 4 errors > > > /*Then at `2020-12-06 18:47:47` received `start commit` request. Also by > this time CPU utilization is moved up to 14%. */ > > > 2020-12-06 18:47:47.020 INFO (qtp1267082248-29872) [c:my_collection > s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] > o.a.s.u.DirectUpdateHandler2 start > commit{_version_=1685355722985963520,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} > > > *THEN IN BW THERE WERE SOME EXTERNAL FILE RELOADING OPERATION. CPU > utilization is still around 14%.* > > *Then at `2020-12-06 19:06:09` timestamp got following warning.* > > 2020-12-06 19:06:09.672 WARN (qtp1267082248-29872) [c:my_collection > s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] > o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to > http://solr_198:8389/solr => > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > waiting response from server at: > http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F > at > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407) > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > waiting response from server at: > http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F > at > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407) > ~[?:?] > at > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753) > ~[?:?] > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369) > ~[?:?] > at > org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) > ~[?:?] > at > org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344) > ~[?:?] > at > org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333) > ~[?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_262] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_262] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_262] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) > ~[metrics-core-4.1.2.jar:4.1.2] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_262] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_262] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262] > Caused by: java.util.concurrent.TimeoutException > at > org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216) > ~[?:?] > at > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398) > ~[?:?] > ... 13 more > > > 2020-12-06 19:06:09.673 WARN (qtp1267082248-29872) [c:my_collection > s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] > o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to > http://solr_199:8389/solr => > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > waiting response from server at: http://solr_199:8389/solr/........ > > 2020-12-06 19:06:09.673 WARN (qtp1267082248-29872) [c:my_collection > s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] > o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to > http://solr_132:8389/solr => > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > waiting response from server at: http://solr_132:8389/solr/........ > > > 2020-12-06 19:06:09.673 WARN (qtp1267082248-29872) [c:my_collection > s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] > o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to > http://solr_133:8389/solr => > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > waiting response from server at: http://solr_133:8389/solr/........ > > /* > Then at `2020-12-06 19:06:09` timestamp we got following. Which I'm assuming > indicates that commit got over, since after this CPU utilization fall back > again to 2%. And We also got curl reponse by this timestamp with status=0. * > / > > > 2020-12-06 19:06:09.673 INFO (qtp1267082248-29872) [c:my_collection > s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] > o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard1_1_replica_n19] > webapp=/solr path=/update > params={openSearcher=true&commit=true&wt=json}{commit=} 0 1702717 > > > In short to summarize: > Commit was issued at `2020-12-06 18:37:40` (approx) timestamp. > Till `2020-12-06 18:46:50` timestamp there was no `start commit` received > and CPU was at 2% till it throws `SolrCmdDistributor` error at `2020-12-06 > 18:47:47` timestamp. Then at around the same timestamp `2020-12-06 18:47:47` > received the `start commit` request. And by this time CPU utilization is > moved up to 14%. > Then at `2020-12-06 19:06:09` timestamp got `DistributedZkUpdateProcessor` > error message and then at `2020-12-06 19:06:09.673` timestamp got some > commit related message (which I'm assuming its indicates end of commit) and > CPU is again fall back to 2%. > > > Due to these timeout errors overall commit time is getting double. Can > someone guide me to resolve this issue. > > > > > > > > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html