On Wed, Jun 10, 2020 at 8:35 PM Hup Chen <chai...@hotmail.com> wrote:

> I will check "dmesg" first, to find out any hardware error message.
>

Here is what I see toward the end of the output from dmesg:

[1521232.781785] [118857]    48 118857   108785      677     201
901             0 httpd
[1521232.781787] [118860]    48 118860   108785      710     201
881             0 httpd
[1521232.781788] [118862]    48 118862   113063     5256     210
725             0 httpd
[1521232.781790] [118864]    48 118864   114085     6634     212
703             0 httpd
[1521232.781791] [118871]    48 118871   139687    32323     262
620             0 httpd
[1521232.781793] [118873]    48 118873   108785      821     201
792             0 httpd
[1521232.781795] [118879]    48 118879   140263    32719     263
621             0 httpd
[1521232.781796] [118903]    48 118903   108785      812     201
771             0 httpd
[1521232.781798] [118905]    48 118905   113575     5606     211
660             0 httpd
[1521232.781800] [118906]    48 118906   113563     5694     211
626             0 httpd
[1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
sacrifice child
[1521232.782908] Killed process 117529 (httpd), UID 48, total-vm:675824kB,
anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB

Is this a relevant "Out of memory" message?  Does this suggest an OOM
situation is the culprit?

When I grep in the solr logs for oom, I see some entries like this...

./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
-XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
-XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
-XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
-XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
-XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
-XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
-XX:-OmitStackTraceInFastThrow
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs
-XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
-XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
-XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
-XX:+UseParNewGC

Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". But I
think this is just a setting that indicates what to do in case of an OOM.
And if I look in that oom_solr.sh file, I see it would write an entry to a
solr_oom_kill log. And there is no such log in the logs directory.

Many thanks.




> Then use some system admin tools to monitor that server,
> for instance, top, vmstat, lsof, iostat ... or simply install some nice
> free monitoring tool into this system, like monit, monitorix, nagios.
> Good luck!
>
> ________________________________
> From: Ryan W <rya...@gmail.com>
> Sent: Thursday, June 11, 2020 2:13 AM
> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> Subject: Re: How to determine why solr stops running?
>
> Hi all,
>
> People keep suggesting I check the logs for errors.  What do those errors
> look like?  Does anyone have examples of the text of a Solr oom error?  Or
> the text of any other errors I should be looking for the next time solr
> fails?  Are there phrases I should grep for in the logs?  Should I be
> looking in the Solr logs for an OOM error, or in the Apache logs?
>
> There is nothing failing on the server except for solr -- at least not that
> I can see.  There is no apparent problem with the hardware or anything else
> on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
> RAM and hosts one website that does not get a huge amount of traffic.
>
> When the start command is given to solr, does it first check to see if solr
> is running, or does it always start solr whether it is already running or
> not?
>
> Many thanks!
> Ryan
>
>
> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > To add to what Dave said, if you have a particular machine that’s prone
> to
> > suddenly stopping, that’s usually a red flag that you should seriously
> > think about hardware issues.
> >
> > If the problem strikes different machines, then I agree with Shawn that
> > the first thing I’d be suspicious of is OOM errors.
> >
> > FWIW,
> > Erick
> >
> > > On Jun 9, 2020, at 6:05 AM, Dave <hastings.recurs...@gmail.com> wrote:
> > >
> > > I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> > been a hardware failure. Either the ram or the disk got a “glitch” and
> both
> > of these are relatively fragile and wear and tear type parts of the
> > machine, and should be expected to fail and be replaced from time to
> time.
> > Solr is pretty aggressive with its logging so there are a lot of writes
> > always happening and of course reads, if the disk has any issues or the
> > memory it can lock it up and bring her down, more so if you have any
> > spellcheck dictionaries or suggesters being built on start up.
> > >
> > > Just my experience with this, could be wrong (most likely wrong) but we
> > always have extra drives and memory around the server room for this
> > reason.  At least once or twice a year we will have a disk failure in the
> > raid and need to swap in a new one.
> > >
> > > Good luck though, also solr should be logging it’s failures so it would
> > be good to look there too
> > >
> > >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> > >>
> > >> On 5/14/2020 7:22 AM, Ryan W wrote:
> > >>> I manage a site where solr has stopped running a couple times in the
> > past
> > >>> week. The server hasn't been rebooted, so that's not the reason.
> What
> > else
> > >>> causes solr to stop running?  How can I investigate why this is
> > happening?
> > >>
> > >> Any situation where Solr stops running and nobody requested the stop
> is
> > a result of a serious problem that must be thoroughly investigated.  I
> > think it's a bad idea for Solr to automatically restart when it stops
> > unexpectedly.  Chances are that whatever caused the crash is going to
> > simply make the crash happen again until the problem is solved.
> > Automatically restarting could hide problems from the system
> administrator.
> > >>
> > >> The only way a Solr auto-restart would be acceptable to me is if it
> > sends a high priority alert to the sysadmin EVERY time it executes an
> > auto-restart.  It really is that bad of a problem.
> > >>
> > >> The causes of Solr crashes (that I can think of) include the
> following.
> > I believe I have listed these four options from most likely to least
> likely:
> > >>
> > >> * Java OutOfMemoryError exceptions.  On non-windows systems, the
> > "bin/solr" script starts Solr with an option that results in Solr's death
> > anytime one of these exceptions occurs.  We do this because program
> > operation is indeterminate and completely unpredictable when OOME occurs,
> > so it's far safer to stop running.  That exception can be caused by
> several
> > things, some of which actually do not involve memory at all.  If you're
> > running on Windows via the bin\solr.cmd command, then this will not
> happen
> > ... but OOME could still cause a crash, because as I already mentioned,
> > program operation is unpredictable when OOME occurs.
> > >>
> > >> * The OS kills Solr because system memory is completely exhausted and
> > Solr is the process using the most memory.  Linux calls this the
> > "oom-killer" ... I am pretty sure something like it exists on most
> > operating systems.
> > >>
> > >> * Corruption somewhere in the system.  Could be in Java, the OS, Solr,
> > or data used by any of those.
> > >>
> > >> * A very serious bug in Solr's code that we haven't discovered yet.
> > >>
> > >> I included that last one simply for completeness.  A bug that causes a
> > crash *COULD* exist, but as of right now, we have not seen any supporting
> > evidence.
> > >>
> > >> My guess is that Java OutOfMemoryError is the cause here, but I can't
> > be certain.  If that is happening, then some resource (which might not be
> > memory) is fully depleted.  We would need to see the full
> OutOfMemoryError
> > exception in order to determine why it is happening. Sometimes the
> > exception is logged in solr.log, sometimes it isn't.  We cannot predict
> > what part of the code will be running when OOME occurs, so it would be
> > nearly impossible for us to guarantee logging.  OOME can happen ANYWHERE
> -
> > even in code that the compiler thinks is immune to exceptions.
> > >>
> > >> Side note to fellow committers:  I wonder if we should implement an
> > uncaught exception handler in Solr.  I have found in my own programs that
> > it helps figure out thorny problems.  And while I am on the subject of
> > handlers that might not be general knowledge, I didn't find a shutdown
> hook
> > or a security manager outside of tests.
> > >>
> > >> Thanks,
> > >> Shawn
> >
> >
>

Reply via email to