Thanks for quick reply. I assume caches (are they too large?), perhaps uninverted indexes. Docvalues would help with latter ones. Do you use them? >> We do not use any cache. we disabled the cache from solrconfig.xml here is my solrconfig .xml and schema.xml https://drive.google.com/file/d/12SHl3YGP7jT4goikBkeyB2s1NX5_C2gz/view https://drive.google.com/file/d/1LwA1d4OiMhQQv806tR0HbZoEjA8IyfdR/view
We used Docvalues on that field which is used for sorting or faceting. You could also try upgrading to the latest version in 6.x series as a starter. >> I will surely try. So, the node in question isn't responding quickly enough to http requests and gets put into recovery. The log for the recovering node starts too late, so I can't say anything about what happened before 14:42:43.943 that lead to recovery. >> There is no error before 14:42:43.943 just search and insert requests are >> there. I got that node is responding but why it is not responding? Due to >> lack of memory or any other cause why we cannot get idea from log for reason of not responding. Is there any monitor for Solr from where we can find the root cause? Regards, Vishal Patel ________________________________ From: Ere Maijala <ere.maij...@helsinki.fi> Sent: Friday, July 10, 2020 4:27 PM To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> Subject: Re: Replica goes into recovery mode in Solr 6.1.0 vishal patel kirjoitti 10.7.2020 klo 12.45: > Thanks for your input. > > Walter already said that setting soft commit max time to 100 ms is a recipe > for disaster >>> I know that but our application is already developed and run on live >>> environment since last 5 years. Actually, we want to show a data very >>> quickly after the insert. > > you have huge JVM heaps without an explanation for the reason >>> We gave the 55GB ram because our usage is like that large query search and >>> very frequent searching and indexing. > Here is my memory snapshot which I have taken from GC. Yes, I can see that a lot of memory is in use, but the question is why. I assume caches (are they too large?), perhaps uninverted indexes. Docvalues would help with latter ones. Do you use them? > I have tried Solr upgrade from 6.1.0 to 8.5.1 but due to some issue we cannot > do. I have also asked in here > https://lucene.472066.n3.nabble.com/Sorting-in-other-collection-in-Solr-8-5-1-td4459506.html#a4459562 You could also try upgrading to the latest version in 6.x series as a starter. > Why we cannot find the reason of recovery from log? like memory or CPU issue, > frequent index or search, large query hit, > My log at the time of recovery > https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view > [https://lh5.googleusercontent.com/htOUfpihpAqncFsMlCLnSUZPu1_9DRKGNajaXV1jG44fpFzgx51ecNtUK58m5lk=w1200-h630-p]<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view> > recovery_shard.txt<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view> > drive.google.com Isn't it right there on the first lines? 2020-07-09 14:42:43.943 ERROR (updateExecutor-2-thread-21007-processing-http:////11.200.212.305:8983//solr//products x:products r:core_node1 n:11.200.212.306:8983_solr s:shard1 c:products) [c:products s:shard1 r:core_node1 x:products] o.a.s.u.StreamingSolrClients error org.apache.http.NoHttpResponseException: 11.200.212.305:8983 failed to respond followed by a couple more error messages about the same problem and then initiation of recovery: 2020-07-09 14:42:44.002 INFO (qtp1239731077-771611) [c:products s:shard1 r:core_node1 x:products] o.a.s.c.ZkController Put replica core=products coreNodeName=core_node3 on 11.200.212.305:8983_solr into leader-initiated recovery. So the node in question isn't responding quickly enough to http requests and gets put into recovery. The log for the recovering node starts too late, so I can't say anything about what happened before 14:42:43.943 that lead to recovery. --Ere > > ________________________________ > From: Ere Maijala <ere.maij...@helsinki.fi> > Sent: Friday, July 10, 2020 2:10 PM > To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> > Subject: Re: Replica goes into recovery mode in Solr 6.1.0 > > Walter already said that setting soft commit max time to 100 ms is a > recipe for disaster. That alone can be the issue, but if you're not > willing to try higher values, there's no way of being sure. And you have > huge JVM heaps without an explanation for the reason. If those do not > cause problems, you indicated that you also run some other software on > the same server. Is it possible that the other processes hog CPU, disk > or network and starve Solr? > > I must add that Solr 6.1.0 is over four years old. You could be hitting > a bug that has been fixed for years, but even if you encounter an issue > that's still present, you will need to uprgade to get it fixed. If you > look at the number of fixes done in subsequent 6.x versions alone in the > changelog (https://lucene.apache.org/solr/8_5_1/changes/Changes.html) > you'll see that there are a lot of them. You could be hitting something > like SOLR-10420, which has been fixed for over three years. > > Best, > Ere > > vishal patel kirjoitti 10.7.2020 klo 7.52: >> I’ve been running Solr for a dozen years and I’ve never needed a heap larger >> than 8 GB. >>>> What is your data size? same like us 1 TB? is your searching or indexing >>>> frequently? NRT model? >> >> My question is why replica is going into recovery? When replica went down, I >> checked GC log but GC pause was not more than 2 seconds. >> Also, I cannot find out any reason for recovery from Solr log file. i want >> to know the reason why replica goes into recovery. >> >> Regards, >> Vishal Patel >> ________________________________ >> From: Walter Underwood <wun...@wunderwood.org> >> Sent: Friday, July 10, 2020 3:03 AM >> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> >> Subject: Re: Replica goes into recovery mode in Solr 6.1.0 >> >> Those are extremely large JVMs. Unless you have proven that you MUST >> have 55 GB of heap, use a smaller heap. >> >> I’ve been running Solr for a dozen years and I’ve never needed a heap >> larger than 8 GB. >> >> Also, there is usually no need to use one JVM per replica. >> >> Your configuration is using 110 GB (two JVMs) just for Java >> where I would configure it with a single 8 GB JVM. That would >> free up 100 GB for file caches. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Jul 8, 2020, at 10:10 PM, vishal patel <vishalpatel200...@outlook.com> >>> wrote: >>> >>> Thanks for reply. >>> >>> what you mean by "Shard1 Allocated memory” >>>>> It means JVM memory of one solr node or instance. >>> >>> How many Solr JVMs are you running? >>>>> In one server 2 solr JVMs in which one is shard and other is replica. >>> >>> What is the heap size for your JVMs? >>>>> 55GB of one Solr JVM. >>> >>> Regards, >>> Vishal Patel >>> >>> Sent from Outlook<http://aka.ms/weboutlook> >>> ________________________________ >>> From: Walter Underwood <wun...@wunderwood.org> >>> Sent: Wednesday, July 8, 2020 8:45 PM >>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> >>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0 >>> >>> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know >>> of >>> any way to dedicate system RAM to an application object like a replica. >>> >>> How many Solr JVMs are you running? >>> >>> What is the heap size for your JVMs? >>> >>> Setting soft commit max time to 100 ms does not magically make Solr super >>> fast. >>> It makes Solr do too much work, makes the work queues fill up, and makes it >>> fail. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>>> On Jul 7, 2020, at 10:55 PM, vishal patel <vishalpatel200...@outlook.com> >>>> wrote: >>>> >>>> Thanks for your reply. >>>> >>>> One server has total 320GB ram. In this 2 solr node one is shard1 and >>>> second is shard2 replica. Each solr node have 55GB memory allocated. >>>> shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB >>>> data in this server. server has also other applications and for that 60GB >>>> memory allocated. So total 150GB memory is left. >>>> >>>> Proper formatting details: >>>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view >>>> >>>> Are you running multiple huge JVMs? >>>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory >>>>>> are still free. >>>> >>>> The servers will be doing a LOT of disk IO, so look at the read and write >>>> iops. I expect that the solr processes are blocked on disk reads almost >>>> all the time. >>>>>> is it chance to go in recovery mode if more IO read and write or blocked? >>>> >>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms). >>>>>> Our requirement is NRT so we keep the less time >>>> >>>> Regards, >>>> Vishal Patel >>>> ________________________________ >>>> From: Walter Underwood <wun...@wunderwood.org> >>>> Sent: Tuesday, July 7, 2020 8:15 PM >>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> >>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0 >>>> >>>> This isn’t a support list, so nobody looks at issues. We do try to help. >>>> >>>> It looks like you have 1 TB of index on a system with 320 GB of RAM. >>>> I don’t know what "Shard1 Allocated memory” is, but maybe half of >>>> that RAM is used by JVMs or some other process, I guess. Are you >>>> running multiple huge JVMs? >>>> >>>> The servers will be doing a LOT of disk IO, so look at the read and >>>> write iops. I expect that the solr processes are blocked on disk reads >>>> almost all the time. >>>> >>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms). >>>> That is probably causing your outages. >>>> >>>> wunder >>>> Walter Underwood >>>> wun...@wunderwood.org >>>> http://observer.wunderwood.org/ (my blog) >>>> >>>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vishalpatel200...@outlook.com> >>>>> wrote: >>>>> >>>>> Any one is looking my issue? Please guide me. >>>>> >>>>> Regards, >>>>> Vishal Patel >>>>> >>>>> >>>>> ________________________________ >>>>> From: vishal patel <vishalpatel200...@outlook.com> >>>>> Sent: Monday, July 6, 2020 7:11 PM >>>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> >>>>> Subject: Replica goes into recovery mode in Solr 6.1.0 >>>>> >>>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We >>>>> have 2 shards and each shard has 1 replica. We have 3 collection. >>>>> We do not use any cache and also disable in Solr config.xml. Search and >>>>> Update requests are coming frequently in our live platform. >>>>> >>>>> *Our commit configuration in solr.config are below >>>>> <autoCommit> >>>>> <maxTime>600000</maxTime> >>>>> <maxDocs>20000</maxDocs> >>>>> <openSearcher>false</openSearcher> >>>>> </autoCommit> >>>>> <autoSoftCommit> >>>>> <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> >>>>> </autoSoftCommit> >>>>> >>>>> *We used Near Real Time Searching So we did below configuration in >>>>> solr.in.cmd >>>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100 >>>>> >>>>> *Our collections details are below: >>>>> >>>>> Collection Shard1 Shard1 Replica Shard2 Shard2 Replica >>>>> Number of Documents Size(GB) Number of Documents Size(GB) >>>>> Number of Documents Size(GB) Number of Documents >>>>> Size(GB) >>>>> collection1 26913364 201 26913379 202 26913380 >>>>> 198 26913379 198 >>>>> collection2 13934360 310 13934367 310 13934368 >>>>> 219 13934367 219 >>>>> collection3 351539689 73.5 351540040 73.5 351540136 >>>>> 75.2 351539722 75.2 >>>>> >>>>> *My server configurations are below: >>>>> >>>>> Server1 Server2 >>>>> CPU Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), >>>>> 20 Logical Processor(s) Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, >>>>> 2301 Mhz, 10 Core(s), 20 Logical Processor(s) >>>>> HardDisk(GB) 3845 ( 3.84 TB) 3485 GB (3.48 TB) >>>>> Total memory(GB) 320 320 >>>>> Shard1 Allocated memory(GB) 55 >>>>> Shard2 Replica Allocated memory(GB) 55 >>>>> Shard2 Allocated memory(GB) 55 >>>>> Shard1 Replica Allocated memory(GB) 55 >>>>> Other Applications Allocated Memory(GB) 60 22 >>>>> Other Number Of Applications 11 7 >>>>> >>>>> >>>>> Sometimes, any one replica goes into recovery mode. Why replica goes into >>>>> recovery? Due to heavy search OR heavy update/insert OR long GC pause >>>>> time? If any one of them then what should we do in configuration? >>>>> Should we increase the shard for recovery issue? >>>>> >>>>> Regards, >>>>> Vishal Patel >>>>> >>>> >>> >> >> > > -- > Ere Maijala > Kansalliskirjasto / The National Library of Finland > -- Ere Maijala Kansalliskirjasto / The National Library of Finland