Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0

Roger Pau Monné Wed, 29 Dec 2021 00:33:04 -0800

Adding xen-devel back.

On Wed, Dec 29, 2021 at 01:44:18AM +0800, G.R. wrote:
> On Tue, Dec 28, 2021 at 3:05 AM Roger Pau Monné <[email protected]> wrote:
> >
> > On Sun, Dec 26, 2021 at 02:06:55AM +0800, G.R. wrote:
> > > > > Thanks. I've raised this on freensd-net for advice [0]. IMO netfront
> > > > > shouldn't receive an mbuf that crosses a page boundary, but if that's
> > > > > indeed a legit mbuf I will figure out the best way to handle it.
> > > > >
> > > > > I have a clumsy patch (below) that might solve this, if you want to
> > > > > give it a try.
> > > >
> > > > Applied the patch and it worked like a charm!
> > > > Thank you so much for your quick help!
> > > > Wish you a wonderful holiday!
> > >
> > > I may have said too quickly...
> > > With the patch I can attach the iscsi disk and neither the dom0 nor
> > > the NAS domU complains this time.
> > > But when I attempt to mount the attached disk it reports I/O errors 
> > > randomly.
> > > By randomly I mean different disks behave differently...
> > > I don't see any error logs from kernels this time.
> > > (most of the iscsi disks are NTFS FS and mounted through the user mode
> > > fuse library)
> > > But since I have a local backup copy of the image, I can confirm that
> > > mounting that backup image does not result in any I/O error.
> > > Looks like something is still broken here...
> >
> > Indeed. That patch was likely too simple, and didn't properly handle
> > the split of mbuf data buffers.
> >
> > I have another version based on using sglist, which I think it's also
> > a worthwhile change for netfront. Can you please give it a try? I've
> > done a very simple test and seems fine, but you certainly have more
> > interesting cases.
> >
> > You will have to apply it on top of a clean tree, without any of the
> > other patches applied.
> 
> Unfortunately this new version is even worse.
> It not only does not fix the known issue on iSCSI, but also creating
> regression on NFS.
> The regression on NFS is kind of random that it takes a
> non-deterministic time to show up.
> Here is a stack trace for reference:
> db:0:kdb.enter.default>  bt
> Tracing pid 1696 tid 100622 td 0xfffff800883d5740
> kdb_enter() at kdb_enter+0x37/frame 0xfffffe009f80d900
> vpanic() at vpanic+0x197/frame 0xfffffe009f80d950
> panic() at panic+0x43/frame 0xfffffe009f80d9b0
> xn_txq_mq_start_locked() at xn_txq_mq_start_locked+0x5bc/frame
> 0xfffffe009f80da50


I think this is hitting a KASSERT, could you paste the text printed as
part of the panic (not just he backtrace)?

Sorry this is taking a bit of time to solve.

Thanks!

Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0

Reply via email to