Hi Rich,
   Some progress, but questions/comments also

On Sat, Oct 19, 2013 at 6:18 PM, Rich Freeman <ri...@gentoo.org> wrote:
> On Sat, Oct 19, 2013 at 8:25 PM, Mark Knecht <markkne...@gmail.com> wrote:
>> OK, it's a good idea just to have a Konsole terminal open. That might
>> catch something.
>
> I'm not sure if panics show up in konsole.  With a virtual console the
> kernel actually outputs the message.  Konsole under X11 is entirely
> user-mode and I'm not sure that ANY user-mode code can ever run after
> a panic.
>
> I think a virtual console is a better bet.
>

I suspect you're right about Konsole sitting on the KDE desktop. I
only meant that sometimes that catches a few messages and was hopeful
it might do that here but it's certainly not a real solution.

That said I'm not clear about the virtual console point. I thought the
virtual consoles were Alt-Ctl-F[1,2,3,..] When this even occurs my
keyboard isn't working so I don't know how to get there. You must mean
something else?


>> OK, so I remember years ago debugging something for Ingo Molnar using
>> the serial console, but in those days it was a real serial console on
>> a real serial port. None of my machine have those ports anymore. There
>> must be a more modern version of doing that. I'll go look for info.
>> Ethernet? USB? We've recently moved and the only other machine I've
>> got here at the apartment is a Gentoo laptop.
>
> That you'd have to look into.  I'm not sure if the kernel can handle a
> serial console on a PL2302/etc.  It might - it is all kernel-mode I
> think.  You'd have to attach it to another device running a terminal
> emulator, assuming you don't have a vt100/etc lying around.
>
>> There's a gentoo.wiki.org page here:
>>
>> http://wiki.gentoo.org/wiki/Kernel_Crash_Dumps
>>
>> The setup looks reasonably straight forward so I've reconfigured
>> 3.10.17 following those instructions.
>
> Yeah, I forgot - that was actually started based on my blog entry,
> actually.  It may very well have been improved on since.
>

To make progress with /etc/local.d/kdump.start it turns out I also
needed to enable

File Systems -> Pseudo File Systems -> /proc/kcore

The Gentoo wiki only talked about vmcore.

<SNIP>
>> When turned on it has options for Panic (Reboot) for both types. Seems
>> like I probably want that all turned on?
>
> You could try setting it to no and see if you actually can capture any
> meaningful logs that way - there is a chance you could recover your
> system without rebooting.  However, a panic would be the only real
> sure way to ensure a dump.
>
> Oh, and don't forget that there is a magic sysrq that triggers a
> panic.  Only issue with that is that you'll have to hunt around for
> whatever caused the actual hangup because it won't be in the panic
> backtrace (that will just lead you to the sysrq code).
>

At this point I'm a bit beyond my depth. If the hang created by
Virtualbox isn't a panic, but my keyboard is completely locked up,
then I don't know how I'm going to issue the magic sysrq to get the
dump process going.

As a test however, with all of this stuff set up, I logged out of KDE,
switched to a console, disabled X and tried

echo c>/proc/sysrq-trigger

I get a error screen and the system reboots. The first time I did it I
had a bunch of disk activity - presumably stuff being copied to either
kcore or vmcore - and then much later X & KDE came up running a single
processor. This seemed like a positive result.

I then thought maybe I shouldn't start xdm in my init scripts so I
disabled it, rebooted the box, logged in a root and tried again. This
time after the error screen and apparent reboot I gave the machine 45
minutes but never got back to a login screen.

QUESTION 1: This machine has 24GB DRAM. I've set crashkernel=256M and
hoped for the best but don't know if that's a good setting.

QUESTION 2: Am I correct that the captured dump output is going to be
a file that's roughly 24GB? Maybe this takes hours or something being
that it's that big and I'm presumably saving it to a RAID6 which is
doing a lot more parity calcs all in single processor mode. Is there a
way to estimate how long I'd have to wait to even get to a login
prompt?

So far I've been unable to save anything from /proc/vmcore.

>> As I expected about the logs. If the machine's dead then I don't want
>> stuff getting written to disk anyway. kdump sounds like the best
>> solution going right now. I'll try and see if I can get it working.
>
> Yeah - one of these days I'll see if I can get kdump working again.
> What it really needs is an initramfs that will automatically capture
> the dump and reboot.  That's how other distros handle it.  The dumps
> are pretty big though - the size of your RAM.

That would be helpful certainly.

Thanks for all the info and support!

Cheers,
Mark

Reply via email to