Re: Solr Cloud in recovering state & down state for long

Ganesh Sethuraman Fri, 05 Oct 2018 08:16:30 -0700

Reading the ZK transaction log  could be issue, as ZK seems to be sensitive
to this (
https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#The+Log+Directory
)


> incorrect placement of transasction log
> The most performance critical part of ZooKeeper is the transaction log.
> ZooKeeper syncs transactions to media before it returns a response. A
> dedicated transaction log device is key to consistent good performance.
> Putting the log on a busy device will adversely effect performance. If you
> only have one storage device, put trace files on NFS and increase the
> snapshotCount; it doesn't eliminate the problem, but it should mitigate it.


I am not sure the logs and GC logs were evident from my previous mail.
Re-posting it here for your reference:

Here is the full Solr Log file (Note that it is in INFO mode):
https://raw.githubusercontent.com/ganeshmailbox/har/master/SolrLogFile
Here is the GC Log:
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMTAvMy8tLTAxX3NvbHJfZ2MubG9nLjUtLTIxLTE5LTU3

Thanks
Ganesh

On Fri, Oct 5, 2018 at 10:13 AM Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/5/2018 5:15 AM, Ganesh Sethuraman wrote:
> > 1. Does GC and Solr Logs help to why the Solr replicas server continues
> to
> > be in the recovering/ state? Our assumption is that Sept 17 16:00 hrs we
> > had done ZK transaction log reading, that might have caused the issue. Is
> > that correct?
> > 2. Does this state can cause slowness to Solr Queries for reads?
> > 3. Is there any way to get notified/email if the servers has any replica
> > gets into the recovery mode?
>
> Seeing the GC log and Solr log will allow us to look for problems.  It
> won't solve anything, it just lets us examine the situation, see if
> there is any evidence to point to the root issue and maybe a solution.
>
> If you're running with a heap that's too small, you can get into a
> situation where you never actually run out of memory, but the amount of
> available memory is so small that Java must continually run full garbage
> collections to keep enough of it free for the program to stay running.
> This can happen to ANY java program, including your ZK servers.
>
> If that happens, the program itself will only be running a small
> percentage of the time, and there will be extremely long pauses where
> very little happens other than garbage collection, and then when the
> program starts running again, it realizes that its timeouts have been
> exceeded, which in SolrCloud, will initiate recovery operations ... and
> that will probably keep the GC pause storm happening.
>
> With an 8 GB heap and likely billions of documents being handled by one
> Solr instance, that low-memory situation I just described seems very
> possible.  The solution is to make the heap bigger.  Your Solr install
> is very large ... it seems unlikely to me that 8GB would be enough.
> Solr is not typically a memory hog kind of application, if what it is
> asked to do is small.  When it is asked to do a bigger job, more memory
> will be required.
>
> Running without sufficient system memory to effectively cache the
> indexes that are actively used can also cause performance problems.
> This is memory *NOT* allocated to programs like Solr, that the OS is
> free to use for caching purposes.  With a busy enough server,
> performance problems caused by that can spiral and lead to SolrCloud
> recovery issues.
>
> Thanks,
> Shawn
>
>

Re: Solr Cloud in recovering state & down state for long

Reply via email to