On Mon, Jun 15, 2020 at 2:00 PM Ryan W <rya...@gmail.com> wrote: > What is the Service definition of Solr in Redhat? >
I think maybe you are talking about systemd. Maybe a service definition looks something like this? https://gist.github.com/hammady/3d7b5964c7b0f90997865ebef40bf5e1 I haven't used systemd before. I should probably look into that. It isn't something I am currently using, as far as I know. > >> Thank you. I pasted those settings at the end of my /etc/default/ >> >> solr.in.sh just now and restarted solr. I will see if that fixes it. >> >> Previously, I had no settings at all in solr.in.sh except for >> SOLR_PORT. >> >> >> >> On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood < >> wun...@wunderwood.org> >> >> wrote: >> >> >> >>> 1. You have a tiny heap. 536 Megabytes is not enough. >> >>> 2. I stopped using the CMS GC years ago. >> >>> >> >>> Here is the GC config we use on every one of our 150+ Solr hosts. >> We’re >> >>> still on Java 8, but will be upgrading soon. >> >>> >> >>> SOLR_HEAP=8g >> >>> # Use G1 GC -- wunder 2017-01-23 >> >>> # Settings from https://wiki.apache.org/solr/ShawnHeisey >> >>> GC_TUNE=" \ >> >>> -XX:+UseG1GC \ >> >>> -XX:+ParallelRefProcEnabled \ >> >>> -XX:G1HeapRegionSize=8m \ >> >>> -XX:MaxGCPauseMillis=200 \ >> >>> -XX:+UseLargePages \ >> >>> -XX:+AggressiveOpts \ >> >>> " >> >>> >> >>> wunder >> >>> Walter Underwood >> >>> wun...@wunderwood.org >> >>> http://observer.wunderwood.org/ (my blog) >> >>> >> >>>> On Jun 11, 2020, at 10:52 AM, Ryan W <rya...@gmail.com> wrote: >> >>>> >> >>>> On Wed, Jun 10, 2020 at 8:35 PM Hup Chen <chai...@hotmail.com> >> wrote: >> >>>> >> >>>>> I will check "dmesg" first, to find out any hardware error message. >> >>>>> >> >>>> >> >>>> Here is what I see toward the end of the output from dmesg: >> >>>> >> >>>> [1521232.781785] [118857] 48 118857 108785 677 201 >> >>>> 901 0 httpd >> >>>> [1521232.781787] [118860] 48 118860 108785 710 201 >> >>>> 881 0 httpd >> >>>> [1521232.781788] [118862] 48 118862 113063 5256 210 >> >>>> 725 0 httpd >> >>>> [1521232.781790] [118864] 48 118864 114085 6634 212 >> >>>> 703 0 httpd >> >>>> [1521232.781791] [118871] 48 118871 139687 32323 262 >> >>>> 620 0 httpd >> >>>> [1521232.781793] [118873] 48 118873 108785 821 201 >> >>>> 792 0 httpd >> >>>> [1521232.781795] [118879] 48 118879 140263 32719 263 >> >>>> 621 0 httpd >> >>>> [1521232.781796] [118903] 48 118903 108785 812 201 >> >>>> 771 0 httpd >> >>>> [1521232.781798] [118905] 48 118905 113575 5606 211 >> >>>> 660 0 httpd >> >>>> [1521232.781800] [118906] 48 118906 113563 5694 211 >> >>>> 626 0 httpd >> >>>> [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 >> or >> >>>> sacrifice child >> >>>> [1521232.782908] Killed process 117529 (httpd), UID 48, >> >>> total-vm:675824kB, >> >>>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB >> >>>> >> >>>> Is this a relevant "Out of memory" message? Does this suggest an OOM >> >>>> situation is the culprit? >> >>>> >> >>>> When I grep in the solr logs for oom, I see some entries like this... >> >>>> >> >>>> ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4 >> >>>> -XX:CMSInitiatingOccupancyFraction=50 >> >>> -XX:CMSMaxAbortablePrecleanTime=6000 >> >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark >> >>>> -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520 >> >>>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 >> >>>> -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8 >> >>>> -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728 >> >>>> -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184 >> >>>> -XX:-OmitStackTraceInFastThrow >> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 >> >>> /opt/solr/server/logs >> >>>> -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled >> >>>> -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC >> >>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps >> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC >> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 >> >>>> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256 >> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers >> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC >> -XX:+UseGCLogFileRotation >> >>>> -XX:+UseParNewGC >> >>>> >> >>>> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". >> >>> But I >> >>>> think this is just a setting that indicates what to do in case of an >> >>> OOM. >> >>>> And if I look in that oom_solr.sh file, I see it would write an entry >> >>> to a >> >>>> solr_oom_kill log. And there is no such log in the logs directory. >> >>>> >> >>>> Many thanks. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>>> Then use some system admin tools to monitor that server, >> >>>>> for instance, top, vmstat, lsof, iostat ... or simply install some >> nice >> >>>>> free monitoring tool into this system, like monit, monitorix, >> nagios. >> >>>>> Good luck! >> >>>>> >> >>>>> ________________________________ >> >>>>> From: Ryan W <rya...@gmail.com> >> >>>>> Sent: Thursday, June 11, 2020 2:13 AM >> >>>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> >> >>>>> Subject: Re: How to determine why solr stops running? >> >>>>> >> >>>>> Hi all, >> >>>>> >> >>>>> People keep suggesting I check the logs for errors. What do those >> >>> errors >> >>>>> look like? Does anyone have examples of the text of a Solr oom >> >>> error? Or >> >>>>> the text of any other errors I should be looking for the next time >> solr >> >>>>> fails? Are there phrases I should grep for in the logs? Should I >> be >> >>>>> looking in the Solr logs for an OOM error, or in the Apache logs? >> >>>>> >> >>>>> There is nothing failing on the server except for solr -- at least >> not >> >>> that >> >>>>> I can see. There is no apparent problem with the hardware or >> anything >> >>> else >> >>>>> on the server. The OS is Red Hat Enterprise Linux. The server has >> 16 >> >>> GB of >> >>>>> RAM and hosts one website that does not get a huge amount of >> traffic. >> >>>>> >> >>>>> When the start command is given to solr, does it first check to see >> if >> >>> solr >> >>>>> is running, or does it always start solr whether it is already >> running >> >>> or >> >>>>> not? >> >>>>> >> >>>>> Many thanks! >> >>>>> Ryan >> >>>>> >> >>>>> >> >>>>> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson < >> erickerick...@gmail.com >> >>>> >> >>>>> wrote: >> >>>>> >> >>>>>> To add to what Dave said, if you have a particular machine that’s >> >>> prone >> >>>>> to >> >>>>>> suddenly stopping, that’s usually a red flag that you should >> seriously >> >>>>>> think about hardware issues. >> >>>>>> >> >>>>>> If the problem strikes different machines, then I agree with Shawn >> >>> that >> >>>>>> the first thing I’d be suspicious of is OOM errors. >> >>>>>> >> >>>>>> FWIW, >> >>>>>> Erick >> >>>>>> >> >>>>>>> On Jun 9, 2020, at 6:05 AM, Dave <hastings.recurs...@gmail.com> >> >>> wrote: >> >>>>>>> >> >>>>>>> I’ll add that whenever I’ve had a solr instance shut down, for me >> >>> it’s >> >>>>>> been a hardware failure. Either the ram or the disk got a “glitch” >> and >> >>>>> both >> >>>>>> of these are relatively fragile and wear and tear type parts of the >> >>>>>> machine, and should be expected to fail and be replaced from time >> to >> >>>>> time. >> >>>>>> Solr is pretty aggressive with its logging so there are a lot of >> >>> writes >> >>>>>> always happening and of course reads, if the disk has any issues or >> >>> the >> >>>>>> memory it can lock it up and bring her down, more so if you have >> any >> >>>>>> spellcheck dictionaries or suggesters being built on start up. >> >>>>>>> >> >>>>>>> Just my experience with this, could be wrong (most likely wrong) >> but >> >>> we >> >>>>>> always have extra drives and memory around the server room for this >> >>>>>> reason. At least once or twice a year we will have a disk failure >> in >> >>> the >> >>>>>> raid and need to swap in a new one. >> >>>>>>> >> >>>>>>> Good luck though, also solr should be logging it’s failures so it >> >>> would >> >>>>>> be good to look there too >> >>>>>>> >> >>>>>>>> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <apa...@elyograg.org> >> >>> wrote: >> >>>>>>>> >> >>>>>>>> On 5/14/2020 7:22 AM, Ryan W wrote: >> >>>>>>>>> I manage a site where solr has stopped running a couple times in >> >>> the >> >>>>>> past >> >>>>>>>>> week. The server hasn't been rebooted, so that's not the reason. >> >>>>> What >> >>>>>> else >> >>>>>>>>> causes solr to stop running? How can I investigate why this is >> >>>>>> happening? >> >>>>>>>> >> >>>>>>>> Any situation where Solr stops running and nobody requested the >> stop >> >>>>> is >> >>>>>> a result of a serious problem that must be thoroughly >> investigated. I >> >>>>>> think it's a bad idea for Solr to automatically restart when it >> stops >> >>>>>> unexpectedly. Chances are that whatever caused the crash is going >> to >> >>>>>> simply make the crash happen again until the problem is solved. >> >>>>>> Automatically restarting could hide problems from the system >> >>>>> administrator. >> >>>>>>>> >> >>>>>>>> The only way a Solr auto-restart would be acceptable to me is if >> it >> >>>>>> sends a high priority alert to the sysadmin EVERY time it executes >> an >> >>>>>> auto-restart. It really is that bad of a problem. >> >>>>>>>> >> >>>>>>>> The causes of Solr crashes (that I can think of) include the >> >>>>> following. >> >>>>>> I believe I have listed these four options from most likely to >> least >> >>>>> likely: >> >>>>>>>> >> >>>>>>>> * Java OutOfMemoryError exceptions. On non-windows systems, the >> >>>>>> "bin/solr" script starts Solr with an option that results in Solr's >> >>> death >> >>>>>> anytime one of these exceptions occurs. We do this because program >> >>>>>> operation is indeterminate and completely unpredictable when OOME >> >>> occurs, >> >>>>>> so it's far safer to stop running. That exception can be caused by >> >>>>> several >> >>>>>> things, some of which actually do not involve memory at all. If >> >>> you're >> >>>>>> running on Windows via the bin\solr.cmd command, then this will not >> >>>>> happen >> >>>>>> ... but OOME could still cause a crash, because as I already >> >>> mentioned, >> >>>>>> program operation is unpredictable when OOME occurs. >> >>>>>>>> >> >>>>>>>> * The OS kills Solr because system memory is completely exhausted >> >>> and >> >>>>>> Solr is the process using the most memory. Linux calls this the >> >>>>>> "oom-killer" ... I am pretty sure something like it exists on most >> >>>>>> operating systems. >> >>>>>>>> >> >>>>>>>> * Corruption somewhere in the system. Could be in Java, the OS, >> >>> Solr, >> >>>>>> or data used by any of those. >> >>>>>>>> >> >>>>>>>> * A very serious bug in Solr's code that we haven't discovered >> yet. >> >>>>>>>> >> >>>>>>>> I included that last one simply for completeness. A bug that >> >>> causes a >> >>>>>> crash *COULD* exist, but as of right now, we have not seen any >> >>> supporting >> >>>>>> evidence. >> >>>>>>>> >> >>>>>>>> My guess is that Java OutOfMemoryError is the cause here, but I >> >>> can't >> >>>>>> be certain. If that is happening, then some resource (which might >> >>> not be >> >>>>>> memory) is fully depleted. We would need to see the full >> >>>>> OutOfMemoryError >> >>>>>> exception in order to determine why it is happening. Sometimes the >> >>>>>> exception is logged in solr.log, sometimes it isn't. We cannot >> >>> predict >> >>>>>> what part of the code will be running when OOME occurs, so it >> would be >> >>>>>> nearly impossible for us to guarantee logging. OOME can happen >> >>> ANYWHERE >> >>>>> - >> >>>>>> even in code that the compiler thinks is immune to exceptions. >> >>>>>>>> >> >>>>>>>> Side note to fellow committers: I wonder if we should implement >> an >> >>>>>> uncaught exception handler in Solr. I have found in my own >> programs >> >>> that >> >>>>>> it helps figure out thorny problems. And while I am on the >> subject of >> >>>>>> handlers that might not be general knowledge, I didn't find a >> shutdown >> >>>>> hook >> >>>>>> or a security manager outside of tests. >> >>>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> Shawn >> >>>>>> >> >>>>>> >> >>>>> >> >>> >> >>> >> >