Stuart Henderson <s...@spacehopper.org> writes:

> On 2017/10/21 14:52, Tim Stewart wrote:
>> Stuart Henderson <s...@spacehopper.org> writes:
>>
>> > On 2017/10/21 12:04, Tim Stewart wrote:
>> >> *49727  296965      0      0  7     0x14200                crynlk
>> >
>> > aha, it was that one. Try this diff on top.
>> >
>> > Index: fpu.c
>> > ===================================================================
>> > RCS file: /cvs/src/sys/arch/amd64/amd64/fpu.c,v
>> > retrieving revision 1.38
>> > diff -u -p -r1.38 fpu.c
>> > --- fpu.c  14 Oct 2017 04:44:43 -0000      1.38
>> > +++ fpu.c  21 Oct 2017 16:16:14 -0000
>> > @@ -347,7 +347,7 @@ void
>> >  fpu_kernel_enter(void)
>> >  {
>> >    struct cpu_info *ci = curcpu();
>> > -  uint32_t         cw;
>> > +  struct savefpu  *sfp;
>> >    int              s;
>> >
>> >    /*
>> > @@ -376,10 +376,11 @@ fpu_kernel_enter(void)
>> >
>> >    /* Initialize the FPU */
>> >    fninit();
>> > -  cw = __INITIAL_NPXCW__;
>> > -  fldcw(&cw);
>> > -  cw = __INITIAL_MXCSR__;
>> > -  ldmxcsr(&cw);
>> > +  sfp = &proc0.p_addr->u_pcb.pcb_savefpu;
>> > +  memset(&sfp->fp_fxsave, 0, sizeof(sfp->fp_fxsave));
>> > +  sfp->fp_fxsave.fx_fcw = __INITIAL_NPXCW__;
>> > +  sfp->fp_fxsave.fx_mxcsr = __INITIAL_MXCSR__;
>> > +  fxrstor(&sfp->fp_fxsave);
>> >  }
>> >
>> >  void
>>
>> I've been running with this additional patch for a couple of hours and
>> the hang has not reappeared.  I'll keep the system active and confirm
>> thta everything looks good tomorrow.
>>
>> I swear I've seen this patch before on a list but can't find the
>> original.  Can someone give me or point me at some context, so I know
>> what I've just done? :)
>
> Diff is from mikeb. It initializes the fpu more completely, we suspect
> something in the userland state wasn't getting cleared when entering the
> kernel. I saw some problems with aes-ni up after the "Correctly handle
> exceptions when restoring an invalid FPU context" commit. (aes-ni uses
> floating point registers).

I see.  So I'm guessing that an "unlocked" IPsec more likely to hit this
bug because it's using AES-NI outside of KERNEL_LOCK() now?  Am I close?

-TimS

--
Tim Stewart
-----------
Mail:   t...@stoo.org
Matrix: @tim:stoo.org

Reply via email to