Hi Rich, Some progress, but questions/comments also On Sat, Oct 19, 2013 at 6:18 PM, Rich Freeman <ri...@gentoo.org> wrote: > On Sat, Oct 19, 2013 at 8:25 PM, Mark Knecht <markkne...@gmail.com> wrote: >> OK, it's a good idea just to have a Konsole terminal open. That might >> catch something. > > I'm not sure if panics show up in konsole. With a virtual console the > kernel actually outputs the message. Konsole under X11 is entirely > user-mode and I'm not sure that ANY user-mode code can ever run after > a panic. > > I think a virtual console is a better bet. >
I suspect you're right about Konsole sitting on the KDE desktop. I only meant that sometimes that catches a few messages and was hopeful it might do that here but it's certainly not a real solution. That said I'm not clear about the virtual console point. I thought the virtual consoles were Alt-Ctl-F[1,2,3,..] When this even occurs my keyboard isn't working so I don't know how to get there. You must mean something else? >> OK, so I remember years ago debugging something for Ingo Molnar using >> the serial console, but in those days it was a real serial console on >> a real serial port. None of my machine have those ports anymore. There >> must be a more modern version of doing that. I'll go look for info. >> Ethernet? USB? We've recently moved and the only other machine I've >> got here at the apartment is a Gentoo laptop. > > That you'd have to look into. I'm not sure if the kernel can handle a > serial console on a PL2302/etc. It might - it is all kernel-mode I > think. You'd have to attach it to another device running a terminal > emulator, assuming you don't have a vt100/etc lying around. > >> There's a gentoo.wiki.org page here: >> >> http://wiki.gentoo.org/wiki/Kernel_Crash_Dumps >> >> The setup looks reasonably straight forward so I've reconfigured >> 3.10.17 following those instructions. > > Yeah, I forgot - that was actually started based on my blog entry, > actually. It may very well have been improved on since. > To make progress with /etc/local.d/kdump.start it turns out I also needed to enable File Systems -> Pseudo File Systems -> /proc/kcore The Gentoo wiki only talked about vmcore. <SNIP> >> When turned on it has options for Panic (Reboot) for both types. Seems >> like I probably want that all turned on? > > You could try setting it to no and see if you actually can capture any > meaningful logs that way - there is a chance you could recover your > system without rebooting. However, a panic would be the only real > sure way to ensure a dump. > > Oh, and don't forget that there is a magic sysrq that triggers a > panic. Only issue with that is that you'll have to hunt around for > whatever caused the actual hangup because it won't be in the panic > backtrace (that will just lead you to the sysrq code). > At this point I'm a bit beyond my depth. If the hang created by Virtualbox isn't a panic, but my keyboard is completely locked up, then I don't know how I'm going to issue the magic sysrq to get the dump process going. As a test however, with all of this stuff set up, I logged out of KDE, switched to a console, disabled X and tried echo c>/proc/sysrq-trigger I get a error screen and the system reboots. The first time I did it I had a bunch of disk activity - presumably stuff being copied to either kcore or vmcore - and then much later X & KDE came up running a single processor. This seemed like a positive result. I then thought maybe I shouldn't start xdm in my init scripts so I disabled it, rebooted the box, logged in a root and tried again. This time after the error screen and apparent reboot I gave the machine 45 minutes but never got back to a login screen. QUESTION 1: This machine has 24GB DRAM. I've set crashkernel=256M and hoped for the best but don't know if that's a good setting. QUESTION 2: Am I correct that the captured dump output is going to be a file that's roughly 24GB? Maybe this takes hours or something being that it's that big and I'm presumably saving it to a RAID6 which is doing a lot more parity calcs all in single processor mode. Is there a way to estimate how long I'd have to wait to even get to a login prompt? So far I've been unable to save anything from /proc/vmcore. >> As I expected about the logs. If the machine's dead then I don't want >> stuff getting written to disk anyway. kdump sounds like the best >> solution going right now. I'll try and see if I can get it working. > > Yeah - one of these days I'll see if I can get kdump working again. > What it really needs is an initramfs that will automatically capture > the dump and reboot. That's how other distros handle it. The dumps > are pretty big though - the size of your RAM. That would be helpful certainly. Thanks for all the info and support! Cheers, Mark