On Mon, Jun 15, 2020 at 1:56 PM Jörn Franke <jornfra...@gmail.com> wrote:
> What is the Service definition of Solr in Redhat? > I am not sure what that means. What is a service definition? I am using Solr in conjunction with Drupal's Search API Solr module: https://www.drupal.org/project/search_api_solr > > > Am 15.06.2020 um 19:46 schrieb Ryan W <rya...@gmail.com>: > > > > It happened again today. Again, no other apparent problems on the > server. > > Nothing else is stopping. Nothing in the logs that strikes me as useful. > > I'm using Red Hat Linux 7.8 and Solr 7.7.2. > > > > Solr is stopping a couple times per week and I don't know how to > determine > > why. > > > >> On Sun, Jun 14, 2020 at 9:41 AM Ryan W <rya...@gmail.com> wrote: > >> > >> Thank you. I pasted those settings at the end of my /etc/default/ > >> solr.in.sh just now and restarted solr. I will see if that fixes it. > >> Previously, I had no settings at all in solr.in.sh except for > SOLR_PORT. > >> > >> On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood <wun...@wunderwood.org > > > >> wrote: > >> > >>> 1. You have a tiny heap. 536 Megabytes is not enough. > >>> 2. I stopped using the CMS GC years ago. > >>> > >>> Here is the GC config we use on every one of our 150+ Solr hosts. We’re > >>> still on Java 8, but will be upgrading soon. > >>> > >>> SOLR_HEAP=8g > >>> # Use G1 GC -- wunder 2017-01-23 > >>> # Settings from https://wiki.apache.org/solr/ShawnHeisey > >>> GC_TUNE=" \ > >>> -XX:+UseG1GC \ > >>> -XX:+ParallelRefProcEnabled \ > >>> -XX:G1HeapRegionSize=8m \ > >>> -XX:MaxGCPauseMillis=200 \ > >>> -XX:+UseLargePages \ > >>> -XX:+AggressiveOpts \ > >>> " > >>> > >>> wunder > >>> Walter Underwood > >>> wun...@wunderwood.org > >>> http://observer.wunderwood.org/ (my blog) > >>> > >>>> On Jun 11, 2020, at 10:52 AM, Ryan W <rya...@gmail.com> wrote: > >>>> > >>>> On Wed, Jun 10, 2020 at 8:35 PM Hup Chen <chai...@hotmail.com> wrote: > >>>> > >>>>> I will check "dmesg" first, to find out any hardware error message. > >>>>> > >>>> > >>>> Here is what I see toward the end of the output from dmesg: > >>>> > >>>> [1521232.781785] [118857] 48 118857 108785 677 201 > >>>> 901 0 httpd > >>>> [1521232.781787] [118860] 48 118860 108785 710 201 > >>>> 881 0 httpd > >>>> [1521232.781788] [118862] 48 118862 113063 5256 210 > >>>> 725 0 httpd > >>>> [1521232.781790] [118864] 48 118864 114085 6634 212 > >>>> 703 0 httpd > >>>> [1521232.781791] [118871] 48 118871 139687 32323 262 > >>>> 620 0 httpd > >>>> [1521232.781793] [118873] 48 118873 108785 821 201 > >>>> 792 0 httpd > >>>> [1521232.781795] [118879] 48 118879 140263 32719 263 > >>>> 621 0 httpd > >>>> [1521232.781796] [118903] 48 118903 108785 812 201 > >>>> 771 0 httpd > >>>> [1521232.781798] [118905] 48 118905 113575 5606 211 > >>>> 660 0 httpd > >>>> [1521232.781800] [118906] 48 118906 113563 5694 211 > >>>> 626 0 httpd > >>>> [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or > >>>> sacrifice child > >>>> [1521232.782908] Killed process 117529 (httpd), UID 48, > >>> total-vm:675824kB, > >>>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > >>>> > >>>> Is this a relevant "Out of memory" message? Does this suggest an OOM > >>>> situation is the culprit? > >>>> > >>>> When I grep in the solr logs for oom, I see some entries like this... > >>>> > >>>> ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4 > >>>> -XX:CMSInitiatingOccupancyFraction=50 > >>> -XX:CMSMaxAbortablePrecleanTime=6000 > >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark > >>>> -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520 > >>>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > >>>> -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8 > >>>> -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728 > >>>> -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184 > >>>> -XX:-OmitStackTraceInFastThrow > >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 > >>> /opt/solr/server/logs > >>>> -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled > >>>> -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC > >>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps > >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC > >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 > >>>> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256 > >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers > >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC > -XX:+UseGCLogFileRotation > >>>> -XX:+UseParNewGC > >>>> > >>>> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". > >>> But I > >>>> think this is just a setting that indicates what to do in case of an > >>> OOM. > >>>> And if I look in that oom_solr.sh file, I see it would write an entry > >>> to a > >>>> solr_oom_kill log. And there is no such log in the logs directory. > >>>> > >>>> Many thanks. > >>>> > >>>> > >>>> > >>>> > >>>>> Then use some system admin tools to monitor that server, > >>>>> for instance, top, vmstat, lsof, iostat ... or simply install some > nice > >>>>> free monitoring tool into this system, like monit, monitorix, nagios. > >>>>> Good luck! > >>>>> > >>>>> ________________________________ > >>>>> From: Ryan W <rya...@gmail.com> > >>>>> Sent: Thursday, June 11, 2020 2:13 AM > >>>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> > >>>>> Subject: Re: How to determine why solr stops running? > >>>>> > >>>>> Hi all, > >>>>> > >>>>> People keep suggesting I check the logs for errors. What do those > >>> errors > >>>>> look like? Does anyone have examples of the text of a Solr oom > >>> error? Or > >>>>> the text of any other errors I should be looking for the next time > solr > >>>>> fails? Are there phrases I should grep for in the logs? Should I be > >>>>> looking in the Solr logs for an OOM error, or in the Apache logs? > >>>>> > >>>>> There is nothing failing on the server except for solr -- at least > not > >>> that > >>>>> I can see. There is no apparent problem with the hardware or > anything > >>> else > >>>>> on the server. The OS is Red Hat Enterprise Linux. The server has 16 > >>> GB of > >>>>> RAM and hosts one website that does not get a huge amount of traffic. > >>>>> > >>>>> When the start command is given to solr, does it first check to see > if > >>> solr > >>>>> is running, or does it always start solr whether it is already > running > >>> or > >>>>> not? > >>>>> > >>>>> Many thanks! > >>>>> Ryan > >>>>> > >>>>> > >>>>> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson < > erickerick...@gmail.com > >>>> > >>>>> wrote: > >>>>> > >>>>>> To add to what Dave said, if you have a particular machine that’s > >>> prone > >>>>> to > >>>>>> suddenly stopping, that’s usually a red flag that you should > seriously > >>>>>> think about hardware issues. > >>>>>> > >>>>>> If the problem strikes different machines, then I agree with Shawn > >>> that > >>>>>> the first thing I’d be suspicious of is OOM errors. > >>>>>> > >>>>>> FWIW, > >>>>>> Erick > >>>>>> > >>>>>>> On Jun 9, 2020, at 6:05 AM, Dave <hastings.recurs...@gmail.com> > >>> wrote: > >>>>>>> > >>>>>>> I’ll add that whenever I’ve had a solr instance shut down, for me > >>> it’s > >>>>>> been a hardware failure. Either the ram or the disk got a “glitch” > and > >>>>> both > >>>>>> of these are relatively fragile and wear and tear type parts of the > >>>>>> machine, and should be expected to fail and be replaced from time to > >>>>> time. > >>>>>> Solr is pretty aggressive with its logging so there are a lot of > >>> writes > >>>>>> always happening and of course reads, if the disk has any issues or > >>> the > >>>>>> memory it can lock it up and bring her down, more so if you have any > >>>>>> spellcheck dictionaries or suggesters being built on start up. > >>>>>>> > >>>>>>> Just my experience with this, could be wrong (most likely wrong) > but > >>> we > >>>>>> always have extra drives and memory around the server room for this > >>>>>> reason. At least once or twice a year we will have a disk failure > in > >>> the > >>>>>> raid and need to swap in a new one. > >>>>>>> > >>>>>>> Good luck though, also solr should be logging it’s failures so it > >>> would > >>>>>> be good to look there too > >>>>>>> > >>>>>>>> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <apa...@elyograg.org> > >>> wrote: > >>>>>>>> > >>>>>>>> On 5/14/2020 7:22 AM, Ryan W wrote: > >>>>>>>>> I manage a site where solr has stopped running a couple times in > >>> the > >>>>>> past > >>>>>>>>> week. The server hasn't been rebooted, so that's not the reason. > >>>>> What > >>>>>> else > >>>>>>>>> causes solr to stop running? How can I investigate why this is > >>>>>> happening? > >>>>>>>> > >>>>>>>> Any situation where Solr stops running and nobody requested the > stop > >>>>> is > >>>>>> a result of a serious problem that must be thoroughly > investigated. I > >>>>>> think it's a bad idea for Solr to automatically restart when it > stops > >>>>>> unexpectedly. Chances are that whatever caused the crash is going > to > >>>>>> simply make the crash happen again until the problem is solved. > >>>>>> Automatically restarting could hide problems from the system > >>>>> administrator. > >>>>>>>> > >>>>>>>> The only way a Solr auto-restart would be acceptable to me is if > it > >>>>>> sends a high priority alert to the sysadmin EVERY time it executes > an > >>>>>> auto-restart. It really is that bad of a problem. > >>>>>>>> > >>>>>>>> The causes of Solr crashes (that I can think of) include the > >>>>> following. > >>>>>> I believe I have listed these four options from most likely to least > >>>>> likely: > >>>>>>>> > >>>>>>>> * Java OutOfMemoryError exceptions. On non-windows systems, the > >>>>>> "bin/solr" script starts Solr with an option that results in Solr's > >>> death > >>>>>> anytime one of these exceptions occurs. We do this because program > >>>>>> operation is indeterminate and completely unpredictable when OOME > >>> occurs, > >>>>>> so it's far safer to stop running. That exception can be caused by > >>>>> several > >>>>>> things, some of which actually do not involve memory at all. If > >>> you're > >>>>>> running on Windows via the bin\solr.cmd command, then this will not > >>>>> happen > >>>>>> ... but OOME could still cause a crash, because as I already > >>> mentioned, > >>>>>> program operation is unpredictable when OOME occurs. > >>>>>>>> > >>>>>>>> * The OS kills Solr because system memory is completely exhausted > >>> and > >>>>>> Solr is the process using the most memory. Linux calls this the > >>>>>> "oom-killer" ... I am pretty sure something like it exists on most > >>>>>> operating systems. > >>>>>>>> > >>>>>>>> * Corruption somewhere in the system. Could be in Java, the OS, > >>> Solr, > >>>>>> or data used by any of those. > >>>>>>>> > >>>>>>>> * A very serious bug in Solr's code that we haven't discovered > yet. > >>>>>>>> > >>>>>>>> I included that last one simply for completeness. A bug that > >>> causes a > >>>>>> crash *COULD* exist, but as of right now, we have not seen any > >>> supporting > >>>>>> evidence. > >>>>>>>> > >>>>>>>> My guess is that Java OutOfMemoryError is the cause here, but I > >>> can't > >>>>>> be certain. If that is happening, then some resource (which might > >>> not be > >>>>>> memory) is fully depleted. We would need to see the full > >>>>> OutOfMemoryError > >>>>>> exception in order to determine why it is happening. Sometimes the > >>>>>> exception is logged in solr.log, sometimes it isn't. We cannot > >>> predict > >>>>>> what part of the code will be running when OOME occurs, so it would > be > >>>>>> nearly impossible for us to guarantee logging. OOME can happen > >>> ANYWHERE > >>>>> - > >>>>>> even in code that the compiler thinks is immune to exceptions. > >>>>>>>> > >>>>>>>> Side note to fellow committers: I wonder if we should implement > an > >>>>>> uncaught exception handler in Solr. I have found in my own programs > >>> that > >>>>>> it helps figure out thorny problems. And while I am on the subject > of > >>>>>> handlers that might not be general knowledge, I didn't find a > shutdown > >>>>> hook > >>>>>> or a security manager outside of tests. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Shawn > >>>>>> > >>>>>> > >>>>> > >>> > >>> >