On Sat, Jan 08, 2022 at 01:14:26AM +0800, G.R. wrote:
> On Wed, Jan 5, 2022 at 10:33 PM Roger Pau Monné <[email protected]> wrote:
> >
> > On Wed, Jan 05, 2022 at 12:05:39AM +0800, G.R. wrote:
> > > > > > > But seems like this patch is not stable enough yet and has its own
> > > > > > > issue -- memory is not properly released?
> > > > > >
> > > > > > I know. I've been working on improving it this morning and I'm
> > > > > > attaching an updated version below.
> > > > > >
> > > > > Good news.
> > > > > With this  new patch, the NAS domU can serve iSCSI disk without OOM
> > > > > panic, at least for a little while.
> > > > > I'm going to keep it up and running for a while to see if it's stable 
> > > > > over time.
> > > >
> > > > Thanks again for all the testing. Do you see any difference
> > > > performance wise?
> > > I'm still on a *debug* kernel build to capture any potential panic --
> > > none so far -- no performance testing yet.
> > > Since I'm a home user with a relatively lightweight workload, so far I
> > > didn't observe any difference in daily usage.
> > >
> > > I did some quick iperf3 testing just now.
> >
> > Thanks for doing this.
> >
> > > 1. between nas domU <=> Linux dom0 running on an old i7-3770 based box.
> > > The peak is roughly 12 Gbits/s when domU is the server.
> > > But I do see regression down to ~8.5 Gbits/s when I repeat the test in
> > > a short burst.
> > > The regression can recover when I leave the system idle for a while.
> > >
> > > When dom0 is the iperf3 server, the transfer rate is much lower, down
> > > all the way to 1.x Gbits/s.
> > > Sometimes, I can see the following kernel log repeats during the
> > > testing, likely contributing to the slowdown.
> > >              interrupt storm detected on "irq2328:"; throttling interrupt 
> > > source
> >
> > I assume the message is in the domU, not the dom0?
> Yes, in the TrueNAS domU.
> BTW, I rebooted back to the stock kernel and the message is no longer 
> observed.
> 
> With the stock kernel, the transfer rate from dom0 to nas domU can be
> as high as 30Gbps.
> The variation is still observed, sometimes down to ~19Gbps. There is
> no retransmission in this direction.
> 
> For the reverse direction, the observed low transfer rate still exists.
> It's still within the range of 1.x Gbps, but should still be better
> than the previous test.
> The huge number of re-transmission is still observed.
> The same behavior can be observed on a stock FreeBSD 12.2 image, so
> this is not specific to TrueNAS.

So that's domU sending the data, and dom0 receiving it.

> 
> According to the packet capture, the re-transmission appears to be
> caused by packet reorder.
> Here is one example incident:
> 1. dom0 sees a sequence jump in the incoming stream and begins to send out 
> SACKs
> 2. When SACK shows up at domU, it begins to re-transmit lost frames
>    (the re-transmit looks weird since it show up as a mixed stream of
> 1448 bytes and 12 bytes packets, instead of always 1448 bytes)
> 3. Suddenly the packets that are believed to have lost show up, dom0
> accept them as if they are re-transmission

Hm, so there seems to be some kind of issue with ordering I would say.

> 4. The actual re-transmission finally shows up in dom0...
> Should we expect packet reorder on a direct virtual link? Sounds fishy to me.
> Any chance we can get this re-transmission fixed?

Does this still happen with all the extra features disabled? (-rxcsum
-txcsum -lro -tso)

> So looks like at least the imbalance between two directions are not
> related to your patch.
> Likely the debug build is a bigger contributor to the perf difference
> in both directions.
> 
> I also tried your patch on a release build, and didn't observe any
> major difference in iperf3 numbers.
> Roughly match the 30Gbps and 1.xGbps number on the stock release kernel.

Thanks a lot, will try to get this upstream then.

Roger.

Reply via email to