Hi Tom,

SPM for SOLR should be helpful here. See http://sematext.com/spm

Otis

 

> On Nov 13, 2015, at 10:00, Tom Evans <tevans...@googlemail.com> wrote:
> 
> Hi all
> 
> We have some issues with our Solr servers spending too much time
> paused doing GC. From turning on gc debug, and extracting numbers from
> the GC log, we're getting an idea of just how much of a problem.
> 
> I'm currently doing this in a hacky, inefficient way:
> 
> grep -h 'Total time for which application threads were stopped:' solr_gc* \
>    | awk '($11 > 0.3) { print $1, $11 }' \
>    | sed 's#:.*:##' \
>    | sort -n \
>    | sum_by_date.py
> 
> (Yes, I really am using sed, grep and awk all in one line. Just wrong :)
> 
> The "sum_by_date.py" program simply adds up all the values with the
> same first column, and remembers the largest value seen. This is
> giving me the cumulative GC time for extended pauses (over 0.5s), and
> the maximum pause seen in a given time period (hourly), eg:
> 
> 2015-11-13T11 119.124037 2.203569
> 2015-11-13T12 184.683309 3.156565
> 2015-11-13T13 65.934526 1.978202
> 2015-11-13T14 63.970378 1.411700
> 
> 
> This is fine for seeing that we have a problem. However, really I need
> to get this in to our monitoring systems - we use munin. I'm
> struggling to work out the best way to extract this information for
> our monitoring systems, and I think this might be my naivety about
> Java, and working out what should be logged.
> 
> I've turned on JMX debugging, and looking at the different beans
> available using jconsole, but I'm drowning in information. What would
> be the best thing to monitor?
> 
> Ideally, like the stats above, I'd like to know the cumulative time
> spent paused in GC since the last poll, and the longest GC pause that
> we see. munin polls every 5 minutes, are there suitable counters
> exposed by JMX that it could extract?
> 
> Thanks in advance
> 
> Tom

Reply via email to