On 2017/10/21 14:52, Tim Stewart wrote: > Stuart Henderson <s...@spacehopper.org> writes: > > > On 2017/10/21 12:04, Tim Stewart wrote: > >> *49727 296965 0 0 7 0x14200 crynlk > > > > aha, it was that one. Try this diff on top. > > > > Index: fpu.c > > =================================================================== > > RCS file: /cvs/src/sys/arch/amd64/amd64/fpu.c,v > > retrieving revision 1.38 > > diff -u -p -r1.38 fpu.c > > --- fpu.c 14 Oct 2017 04:44:43 -0000 1.38 > > +++ fpu.c 21 Oct 2017 16:16:14 -0000 > > @@ -347,7 +347,7 @@ void > > fpu_kernel_enter(void) > > { > > struct cpu_info *ci = curcpu(); > > - uint32_t cw; > > + struct savefpu *sfp; > > int s; > > > > /* > > @@ -376,10 +376,11 @@ fpu_kernel_enter(void) > > > > /* Initialize the FPU */ > > fninit(); > > - cw = __INITIAL_NPXCW__; > > - fldcw(&cw); > > - cw = __INITIAL_MXCSR__; > > - ldmxcsr(&cw); > > + sfp = &proc0.p_addr->u_pcb.pcb_savefpu; > > + memset(&sfp->fp_fxsave, 0, sizeof(sfp->fp_fxsave)); > > + sfp->fp_fxsave.fx_fcw = __INITIAL_NPXCW__; > > + sfp->fp_fxsave.fx_mxcsr = __INITIAL_MXCSR__; > > + fxrstor(&sfp->fp_fxsave); > > } > > > > void > > I've been running with this additional patch for a couple of hours and > the hang has not reappeared. I'll keep the system active and confirm > thta everything looks good tomorrow. > > I swear I've seen this patch before on a list but can't find the > original. Can someone give me or point me at some context, so I know > what I've just done? :)
Diff is from mikeb. It initializes the fpu more completely, we suspect something in the userland state wasn't getting cleared when entering the kernel. I saw some problems with aes-ni up after the "Correctly handle exceptions when restoring an invalid FPU context" commit. (aes-ni uses floating point registers).