Re: was: CURRENT [r308087] still crashing: Backtrace provided

O. Hartmann Sat, 05 Nov 2016 10:46:01 -0700

Am Sun, 30 Oct 2016 11:25:09 -0700
Mark Johnston <ma...@freebsd.org> schrieb:


> On Sun, Oct 30, 2016 at 06:55:00PM +0100, O. Hartmann wrote:
> > Am Sun, 30 Oct 2016 09:39:34 -0700
> > Mark Johnston <ma...@freebsd.org> schrieb:
> >   
> > > On Sun, Oct 30, 2016 at 08:25:25AM +0100, O. Hartmann wrote:  
> > > > Am Sat, 29 Oct 2016 18:33:45 -0700
> > > > Mark Johnston <ma...@freebsd.org> schrieb:
> > > >     
> > > > > On Sat, Oct 29, 2016 at 04:33:36PM +0200, O. Hartmann wrote:    
> > > > > > Am Sun, 23 Oct 2016 15:18:57 -0400 (EDT)
> > > > > > Benjamin Kaduk <ka...@mit.edu> schrieb:
> > > > > >       
> > > > > > > On Sun, 23 Oct 2016, O. Hartmann wrote:
> > > > > > >       
> > > > > > > > How can I track a memory leak?        
> > > > > > > 
> > > > > > > I think I did not read enough of the context, but vmstat and top 
> > > > > > > can track
> > > > > > > memory usage as a general thing.
> > > > > > >       
> > > > > > > > How can I write to disk the backtrace given by the debugger when
> > > > > > > > crashing? My box I can freely test is using the nVidia BLOB and 
> > > > > > > > vt(), so
> > > > > > > > I can not see the backtrace. I got a very bad screenshot on one 
> > > > > > > > of my
> > > > > > > > laptops, but its so ugly/unreadable, I think it is unsuable to 
> > > > > > > > be
> > > > > > > > presented within this list at a reasonable size (200 kB max ist 
> > > > > > > > too
> > > > > > > > small).        
> > > > > > > 
> > > > > > > The backtrace should be part of the crash dump that is written to 
> > > > > > > the
> > > > > > > (directly connected, non-encrypted, non-USB) swap device.  "call 
> > > > > > > doadump"
> > > > > > > at the debugger prompt (even typing blind) is supposed to make 
> > > > > > > sure
> > > > > > > there's a dump taken.
> > > > > > > 
> > > > > > > With respect to the screenshot, you should be able to post the 
> > > > > > > image on an
> > > > > > > external site and send a link to the list, at least.
> > > > > > > 
> > > > > > > -Ben
> > > > > > > _______________________________________________
> > > > > > > freebsd-current@freebsd.org mailing list
> > > > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > > > > > To unsubscribe, send any mail to
> > > > > > > "freebsd-current-unsubscr...@freebsd.org"      
> > > > > > 
> > > > > > Hello Benjamin,
> > > > > > 
> > > > > > thank you for your response. Attached, you'll find the backtrace 
> > > > > > developers
> > > > > > seem to have requested for. It was a bit hard, since FreeBSD, vt() 
> > > > > > and nVidia
> > > > > > is broken (black or distorted console, on UEFI it is black/locked 
> > > > > > as long as
> > > > > > the nvidia-modeset,ko module is loaded). I figured out that I could 
> > > > > > blindly
> > > > > > type "dump" when the box has crashed and resided at the debugger 
> > > > > > promt.
> > > > > > 
> > > > > > I hope this time I could provide the help to fix this really nasty 
> > > > > > problem. On
> > > > > > more recent hardware, Haswell and beyond, I was able to run CURRENT 
> > > > > > even with
> > > > > > ZFS and poudriere on a hard memory pressure without crash within 
> > > > > > three days.
> > > > > > On older machines, one older Fujitsu dual socket Core2Duo XEON (2x 
> > > > > > 4 core, 2x
> > > > > > 16 GB RAM banks) as well as two of my private boxes (1x IvyBridge 
> > > > > > XEON, one
> > > > > > i3-3220, both wit a non-UEFI-working ASROCK Z77 Pro4 board) crash, 
> > > > > > if FreeBSD
> > > > > > is > r307157. Staying on those systems with r307157 leaves the 
> > > > > > machine
> > > > > > "rock-solid" - the XEON box last now for a week uptime.       
> > > > > 
> > > > > In kgdb, could you execute:
> > > > > 
> > > > > (kgdb) frame 12
> > > > > (kgdb) p *ifp
> > > > > (kgdb) p *ro
> > > > > 
> > > > > and reply with the output?    
> > > > 
> > > > Besides, is there any way to investigate the crashed vmcore.X files?    
> > > 
> > > Besides examining the state contained in the vmcore? Not really.  
> > 
> > Juts not to misunderstand you (I'm not familiar with debugging!): I can 
> > investigate
> > the saved corefiles (vmcore.X) with kgdb? My first attempts failed by 
> > simply refering
> > via option -n 0 to the specific vmcore.0 and typing the commands as 
> > requested above -
> > the output looked like an error to me.  
> 
> Oh, sorry. Indeed, you should be able to execute
> 
> # kgdb $(sysctl kern.bootfile) vmcore.0
> 
> to open the core with kgdb.
> 
> > 
> > 
> >   
> > > 
> > > Based on the stack trace and affected range of revisions, it may be that
> > > reverting r307887 or r307234 helps, but I have no specific evidence for
> > > this without the requested output.  
> > 
> > I had the crashing also with > r307300 until now, so that leaves me with 
> > r307233 ... I
> > will go further with that revision and report so far.   
> 
> Hm, I don't see why this excludes r307887? In any case, r307234 looks to
> be the more likely culprit.

Here I'm again.

This time, it was r308329 or r308331. WITHOUT the debug stuff compiled into the 
kernel,
it took approximately 5 minutes to provoke the crash. WITH the debug options 
set, it
took more than 45 minutes to let the system dump the core. I really hope this 
time we
can fix the problem, this moment, I have put the system back to r307233 to see 
whether
3072034 is causing the crash as you suspect.

Attached, you'll find the backtrace report as last time. I had to type in 
"dump" blindly
on the system as a dark screen or a stuck X11 screen blocked the console (I use 
vt() and
nVidia BLOB with my nVidia GPUs - and this is still broken on FBSD).

Please let me know how I can assist further. I saved both the core AND this 
time the
culprit kernel.

Kind regards,

Oliver

core.txt.0
Description: Binary data

pgpA3UjSgMv0m.pgp
Description: OpenPGP digital signature

Re: was: CURRENT [r308087] still crashing: Backtrace provided

Reply via email to