I like the idea of running a script vs. kill -9 ;-) Right now when a
node fails, we have monitors for whether a node is up and serving
queries. If not, that triggers some manual investigation and restart
process. Part of the process was to capture the logs and heap dump
file. What happened previously is that the log capture part wasn't
scripted into the restart process and so the logs got wiped out when
the restart happened :-(

One question about this - when you say "logs the issue" from your
script - what type of things do you log? I've been relying on the
timestamp of the heap dump (hprof) as a way to trace back into our log
files.

Thanks.
Tim

On Wed, Apr 24, 2013 at 10:03 AM, Mark Miller <markrmil...@gmail.com> wrote:
>
> On Apr 24, 2013, at 12:00 PM, Mark Miller <markrmil...@gmail.com> wrote:
>
>>> -XX:OnOutOfMemoryError="kill -9 %p" -XX:+HeapDumpOnOutOfMemoryError
>
> The way I like to handle this is to have the OOM trigger a little script or 
> set of cmds that logs the issue and kills the process.
>
> Then if you have the process supervised (via runit or something), it will 
> just start back up (what else do you do after an OOM?), but you will have 
> logged something, triggered a notification, whatever.
>
> - Mark

Reply via email to