On Mon, Sep 23, 2019 at 03:42:27PM +0000, James Dingwall wrote:
> On Thu, Sep 19, 2019 at 12:37:40PM -0400, Boris Ostrovsky wrote:
> > On 9/19/19 12:14 PM, James Dingwall wrote:
> > > On Thu, Sep 19, 2019 at 03:51:33PM +0000, Luck, Tony wrote:
> > >>> I have been investigating a regression in our environment where pstore 
> > >>> (efi-pstore specifically but I suspect this would affect all 
> > >>> implementations) no longer works after upgrading from a 4.4 to 5.0 
> > >>> kernel when running under xen.  (This is an Ubuntu kernel but I don't 
> > >>> think there are patches which affect this area.)
> > >> I don't have any answer for this ... but want to throw out the idea that
> > >> VMM systems could provide some hypercalls to guests to save/return
> > >> some blob of memory (perhaps the "save" triggers automagically if the
> > >> guest crashes?).
> > >>
> > >> That would provide a much better pstore back end than relying on 
> > >> emulation
> > >> of EFI persistent variables (which have severe contraints on size, and 
> > >> don't
> > >> support some pstore modes because you can't dynamically update EFI 
> > >> variables
> > >> hundreds of times per second).
> > >>
> > > For clarification this is a dom0 crash rather than an HVM guest with EFI. 
> > >  I
> > > should probably have also mentioned the xen verion has changed from 4.8.4 
> > > to
> > > 4.11.2 in case its behaviour on detection of crashed domain has changed.
> > >
> > > (For capturing guest crashes we have enabled xenconsole logging so the
> > > hvc0 log is available in dom0.)
> > 
> > 
> > Do you only see this difference between 4.4 and 5.0 when you crash via
> > sysrq?
> > 
> > Because that's where things changed. On 4.4 we seem to be forcing an
> > oops, which eventually calls kmsg_dump() and then panic. On 5.0 we call
> > panic() directly from sysrq handler. And because Xen's panic notifier
> > doesn't return we never get a chance to call kmsg_dump().
> > 
> 
> Ok, I see that change in 8341f2f222d729688014ce8306727fdb9798d37e.  I 
> hadn't tested it any other way before.  Using the null pointer 
> de-reference module code at [1] a pstore record is generated as expected 
> when the module is loaded (panic_on_oops=1).

This change looks correct -- it just gets us directly to the panic()
state instead of exercising the various exception handlers.

> I have also tested swapping the kmsg_dump() / 
> atomic_notifier_call_chain() around in panic.c and this also results in 
> a pstore record being created with sysrq-c.  I don't know if that would 
> be an acceptable solution though since it may break behaviour that other 
> things depend on.

I don't think reordering these is a good idea: as the comments say,
there might be work done in the notifier chain that kmsg_dump() will
want to capture (e.g. the KASLR base offset).

The situation seems to be that notifier callbacks must return -- I think
Xen needs fixing here.

-- 
Kees Cook

Reply via email to