On Mon, Jun 15, 2020 at 2:00 PM Ryan W <rya...@gmail.com> wrote:

> What is the Service definition of Solr in Redhat?
>

I think maybe you are talking about systemd.

Maybe a service definition looks something like this?
https://gist.github.com/hammady/3d7b5964c7b0f90997865ebef40bf5e1

I haven't used systemd before. I should probably look into that.  It isn't
something I am currently using, as far as I know.




> >> Thank you.  I pasted those settings at the end of my /etc/default/
>> >> solr.in.sh just now and restarted solr.  I will see if that fixes it.
>> >> Previously, I had no settings at all in solr.in.sh except for
>> SOLR_PORT.
>> >>
>> >> On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood <
>> wun...@wunderwood.org>
>> >> wrote:
>> >>
>> >>> 1. You have a tiny heap. 536 Megabytes is not enough.
>> >>> 2. I stopped using the CMS GC years ago.
>> >>>
>> >>> Here is the GC config we use on every one of our 150+ Solr hosts.
>> We’re
>> >>> still on Java 8, but will be upgrading soon.
>> >>>
>> >>> SOLR_HEAP=8g
>> >>> # Use G1 GC  -- wunder 2017-01-23
>> >>> # Settings from https://wiki.apache.org/solr/ShawnHeisey
>> >>> GC_TUNE=" \
>> >>> -XX:+UseG1GC \
>> >>> -XX:+ParallelRefProcEnabled \
>> >>> -XX:G1HeapRegionSize=8m \
>> >>> -XX:MaxGCPauseMillis=200 \
>> >>> -XX:+UseLargePages \
>> >>> -XX:+AggressiveOpts \
>> >>> "
>> >>>
>> >>> wunder
>> >>> Walter Underwood
>> >>> wun...@wunderwood.org
>> >>> http://observer.wunderwood.org/  (my blog)
>> >>>
>> >>>> On Jun 11, 2020, at 10:52 AM, Ryan W <rya...@gmail.com> wrote:
>> >>>>
>> >>>> On Wed, Jun 10, 2020 at 8:35 PM Hup Chen <chai...@hotmail.com>
>> wrote:
>> >>>>
>> >>>>> I will check "dmesg" first, to find out any hardware error message.
>> >>>>>
>> >>>>
>> >>>> Here is what I see toward the end of the output from dmesg:
>> >>>>
>> >>>> [1521232.781785] [118857]    48 118857   108785      677     201
>> >>>> 901             0 httpd
>> >>>> [1521232.781787] [118860]    48 118860   108785      710     201
>> >>>> 881             0 httpd
>> >>>> [1521232.781788] [118862]    48 118862   113063     5256     210
>> >>>> 725             0 httpd
>> >>>> [1521232.781790] [118864]    48 118864   114085     6634     212
>> >>>> 703             0 httpd
>> >>>> [1521232.781791] [118871]    48 118871   139687    32323     262
>> >>>> 620             0 httpd
>> >>>> [1521232.781793] [118873]    48 118873   108785      821     201
>> >>>> 792             0 httpd
>> >>>> [1521232.781795] [118879]    48 118879   140263    32719     263
>> >>>> 621             0 httpd
>> >>>> [1521232.781796] [118903]    48 118903   108785      812     201
>> >>>> 771             0 httpd
>> >>>> [1521232.781798] [118905]    48 118905   113575     5606     211
>> >>>> 660             0 httpd
>> >>>> [1521232.781800] [118906]    48 118906   113563     5694     211
>> >>>> 626             0 httpd
>> >>>> [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9
>> or
>> >>>> sacrifice child
>> >>>> [1521232.782908] Killed process 117529 (httpd), UID 48,
>> >>> total-vm:675824kB,
>> >>>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
>> >>>>
>> >>>> Is this a relevant "Out of memory" message?  Does this suggest an OOM
>> >>>> situation is the culprit?
>> >>>>
>> >>>> When I grep in the solr logs for oom, I see some entries like this...
>> >>>>
>> >>>> ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
>> >>>> -XX:CMSInitiatingOccupancyFraction=50
>> >>> -XX:CMSMaxAbortablePrecleanTime=6000
>> >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
>> >>>> -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
>> >>>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
>> >>>> -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
>> >>>> -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
>> >>>> -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
>> >>>> -XX:-OmitStackTraceInFastThrow
>> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
>> >>> /opt/solr/server/logs
>> >>>> -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
>> >>>> -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
>> >>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
>> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
>> >>>> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
>> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
>> -XX:+UseGCLogFileRotation
>> >>>> -XX:+UseParNewGC
>> >>>>
>> >>>> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh".
>> >>> But I
>> >>>> think this is just a setting that indicates what to do in case of an
>> >>> OOM.
>> >>>> And if I look in that oom_solr.sh file, I see it would write an entry
>> >>> to a
>> >>>> solr_oom_kill log. And there is no such log in the logs directory.
>> >>>>
>> >>>> Many thanks.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>> Then use some system admin tools to monitor that server,
>> >>>>> for instance, top, vmstat, lsof, iostat ... or simply install some
>> nice
>> >>>>> free monitoring tool into this system, like monit, monitorix,
>> nagios.
>> >>>>> Good luck!
>> >>>>>
>> >>>>> ________________________________
>> >>>>> From: Ryan W <rya...@gmail.com>
>> >>>>> Sent: Thursday, June 11, 2020 2:13 AM
>> >>>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
>> >>>>> Subject: Re: How to determine why solr stops running?
>> >>>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> People keep suggesting I check the logs for errors.  What do those
>> >>> errors
>> >>>>> look like?  Does anyone have examples of the text of a Solr oom
>> >>> error?  Or
>> >>>>> the text of any other errors I should be looking for the next time
>> solr
>> >>>>> fails?  Are there phrases I should grep for in the logs?  Should I
>> be
>> >>>>> looking in the Solr logs for an OOM error, or in the Apache logs?
>> >>>>>
>> >>>>> There is nothing failing on the server except for solr -- at least
>> not
>> >>> that
>> >>>>> I can see.  There is no apparent problem with the hardware or
>> anything
>> >>> else
>> >>>>> on the server.  The OS is Red Hat Enterprise Linux. The server has
>> 16
>> >>> GB of
>> >>>>> RAM and hosts one website that does not get a huge amount of
>> traffic.
>> >>>>>
>> >>>>> When the start command is given to solr, does it first check to see
>> if
>> >>> solr
>> >>>>> is running, or does it always start solr whether it is already
>> running
>> >>> or
>> >>>>> not?
>> >>>>>
>> >>>>> Many thanks!
>> >>>>> Ryan
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson <
>> erickerick...@gmail.com
>> >>>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> To add to what Dave said, if you have a particular machine that’s
>> >>> prone
>> >>>>> to
>> >>>>>> suddenly stopping, that’s usually a red flag that you should
>> seriously
>> >>>>>> think about hardware issues.
>> >>>>>>
>> >>>>>> If the problem strikes different machines, then I agree with Shawn
>> >>> that
>> >>>>>> the first thing I’d be suspicious of is OOM errors.
>> >>>>>>
>> >>>>>> FWIW,
>> >>>>>> Erick
>> >>>>>>
>> >>>>>>> On Jun 9, 2020, at 6:05 AM, Dave <hastings.recurs...@gmail.com>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>> I’ll add that whenever I’ve had a solr instance shut down, for me
>> >>> it’s
>> >>>>>> been a hardware failure. Either the ram or the disk got a “glitch”
>> and
>> >>>>> both
>> >>>>>> of these are relatively fragile and wear and tear type parts of the
>> >>>>>> machine, and should be expected to fail and be replaced from time
>> to
>> >>>>> time.
>> >>>>>> Solr is pretty aggressive with its logging so there are a lot of
>> >>> writes
>> >>>>>> always happening and of course reads, if the disk has any issues or
>> >>> the
>> >>>>>> memory it can lock it up and bring her down, more so if you have
>> any
>> >>>>>> spellcheck dictionaries or suggesters being built on start up.
>> >>>>>>>
>> >>>>>>> Just my experience with this, could be wrong (most likely wrong)
>> but
>> >>> we
>> >>>>>> always have extra drives and memory around the server room for this
>> >>>>>> reason.  At least once or twice a year we will have a disk failure
>> in
>> >>> the
>> >>>>>> raid and need to swap in a new one.
>> >>>>>>>
>> >>>>>>> Good luck though, also solr should be logging it’s failures so it
>> >>> would
>> >>>>>> be good to look there too
>> >>>>>>>
>> >>>>>>>> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <apa...@elyograg.org>
>> >>> wrote:
>> >>>>>>>>
>> >>>>>>>> On 5/14/2020 7:22 AM, Ryan W wrote:
>> >>>>>>>>> I manage a site where solr has stopped running a couple times in
>> >>> the
>> >>>>>> past
>> >>>>>>>>> week. The server hasn't been rebooted, so that's not the reason.
>> >>>>> What
>> >>>>>> else
>> >>>>>>>>> causes solr to stop running?  How can I investigate why this is
>> >>>>>> happening?
>> >>>>>>>>
>> >>>>>>>> Any situation where Solr stops running and nobody requested the
>> stop
>> >>>>> is
>> >>>>>> a result of a serious problem that must be thoroughly
>> investigated.  I
>> >>>>>> think it's a bad idea for Solr to automatically restart when it
>> stops
>> >>>>>> unexpectedly.  Chances are that whatever caused the crash is going
>> to
>> >>>>>> simply make the crash happen again until the problem is solved.
>> >>>>>> Automatically restarting could hide problems from the system
>> >>>>> administrator.
>> >>>>>>>>
>> >>>>>>>> The only way a Solr auto-restart would be acceptable to me is if
>> it
>> >>>>>> sends a high priority alert to the sysadmin EVERY time it executes
>> an
>> >>>>>> auto-restart.  It really is that bad of a problem.
>> >>>>>>>>
>> >>>>>>>> The causes of Solr crashes (that I can think of) include the
>> >>>>> following.
>> >>>>>> I believe I have listed these four options from most likely to
>> least
>> >>>>> likely:
>> >>>>>>>>
>> >>>>>>>> * Java OutOfMemoryError exceptions.  On non-windows systems, the
>> >>>>>> "bin/solr" script starts Solr with an option that results in Solr's
>> >>> death
>> >>>>>> anytime one of these exceptions occurs.  We do this because program
>> >>>>>> operation is indeterminate and completely unpredictable when OOME
>> >>> occurs,
>> >>>>>> so it's far safer to stop running.  That exception can be caused by
>> >>>>> several
>> >>>>>> things, some of which actually do not involve memory at all.  If
>> >>> you're
>> >>>>>> running on Windows via the bin\solr.cmd command, then this will not
>> >>>>> happen
>> >>>>>> ... but OOME could still cause a crash, because as I already
>> >>> mentioned,
>> >>>>>> program operation is unpredictable when OOME occurs.
>> >>>>>>>>
>> >>>>>>>> * The OS kills Solr because system memory is completely exhausted
>> >>> and
>> >>>>>> Solr is the process using the most memory.  Linux calls this the
>> >>>>>> "oom-killer" ... I am pretty sure something like it exists on most
>> >>>>>> operating systems.
>> >>>>>>>>
>> >>>>>>>> * Corruption somewhere in the system.  Could be in Java, the OS,
>> >>> Solr,
>> >>>>>> or data used by any of those.
>> >>>>>>>>
>> >>>>>>>> * A very serious bug in Solr's code that we haven't discovered
>> yet.
>> >>>>>>>>
>> >>>>>>>> I included that last one simply for completeness.  A bug that
>> >>> causes a
>> >>>>>> crash *COULD* exist, but as of right now, we have not seen any
>> >>> supporting
>> >>>>>> evidence.
>> >>>>>>>>
>> >>>>>>>> My guess is that Java OutOfMemoryError is the cause here, but I
>> >>> can't
>> >>>>>> be certain.  If that is happening, then some resource (which might
>> >>> not be
>> >>>>>> memory) is fully depleted.  We would need to see the full
>> >>>>> OutOfMemoryError
>> >>>>>> exception in order to determine why it is happening. Sometimes the
>> >>>>>> exception is logged in solr.log, sometimes it isn't.  We cannot
>> >>> predict
>> >>>>>> what part of the code will be running when OOME occurs, so it
>> would be
>> >>>>>> nearly impossible for us to guarantee logging.  OOME can happen
>> >>> ANYWHERE
>> >>>>> -
>> >>>>>> even in code that the compiler thinks is immune to exceptions.
>> >>>>>>>>
>> >>>>>>>> Side note to fellow committers:  I wonder if we should implement
>> an
>> >>>>>> uncaught exception handler in Solr.  I have found in my own
>> programs
>> >>> that
>> >>>>>> it helps figure out thorny problems.  And while I am on the
>> subject of
>> >>>>>> handlers that might not be general knowledge, I didn't find a
>> shutdown
>> >>>>> hook
>> >>>>>> or a security manager outside of tests.
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> Shawn
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>
>> >>>
>>
>

Reply via email to