Hi Pete,

> On Sep 11, 2018, at 00:40, Pete Heist <[email protected]> wrote:
> 
> Subject changed from “Cake on elements of a bridge”...
> 
> On Sep 10, 2018, at 9:55 PM, Dave Taht <[email protected]> wrote:
>> 
>> On Mon, Sep 10, 2018 at 12:29 PM Pete Heist <[email protected]> wrote:
>>> 
>>> For anyone who followed this, yes, the regular soft bridge (i.e. set 
>>> interfaces bridge br0) works fine on the ER-X, as I suspect it would on 
>>> most any Linux. A few notes about it:
>>> 
>>> - Your qdisc must be added to the physical interface (e.g. eth4), not the 
>>> bridge interface
>>> - Unlike the hardware bridge which has its own MAC, the soft bridge seems 
>>> to take the MAC of the lowest (or first listed?) interface port
>>> - On ER-X, bridge-nf-call-iptables=1 is the default so nothing needs to be 
>>> changed there for firewalling
>>> - When firewalling the bridged WAN interface, ‘in’ corresponds to bridged 
>>> traffic and ‘local’ to routed traffic, which is different from the 
>>> semantics for ordinary routed traffic
>>> - I can do stateful firewalling for bridged hosts with “accept established 
>>> and related”, but have to explicitly allow DHCP (UDP source/dest port 
>>> 67-68) in the WAN interface’s ‘in’ rules for DHCP traffic to pass through 
>>> the bridge
>>> 
>>> Performance:
>>> 
>>> Using Cake with this setup, the fun ends at around 110 Mbit with ksoftirqd 
>>> thrashing. Unsurprisingly, there’s probably some overhead here with the 
>>> soft bridge. For my purposes though (50 Mbit), it’s enough, barely…
>> 
>> Can I encourage you to give regular ole htb+fq_codel sqm a shot with a
>> bigger burst and cburst size for htb? Fiddling with the htb quantum
>> isn't helping much,
>> but try this, from: https://github.com/tohojo/sqm-scripts/issues/71
>> 
>> (I am thinking burst and cburst should be about 1.1ms of buffering in size)
> 
> So this has turned info an interesting exercise that produced a result 
> counter to what the common wisdom has been (that fq_codel is “faster” than 
> cake

        I believe the argument is more about htb+fq_codel versus cake instead 
of fq_codel versus cake, as it seems to be the shaper functionality that incurs 
the highest cost.

> ). Because of that, I’m open to criticism of my methodology and different 
> criteria for a successful bitrate for the shaper.
> 
> First, note that these tests still through a bridge as above, but are for a 
> more typical setup with separate qdisc instances on egress and ingress, as 
> opposed to my “110 Mbit” result from above, which was for egress and ingress 
> through a common IFB.
> 
> It occurs to me that what I really want to know is the maximum set bitrate 
> for the shaper where it still appears to be behaving properly and 
> consistently, meaning, the actual measured TCP throughput is held steady, and 
> at the expected percentage less than the set bitrate. I first find this out 
> by setting a “comfortable” rate of 100Mbit and checking TCP throughput with 
> iperf3, which is typically around 5% less than the set bitrate.

        So the expected values somewhat depend on the exact configuration, but 
over all the expected TCP/IPv4 goodput is calculated as follows (I assume you 
are well aware of that, but I believe this worth repeating to calibrate the 
expectancy):
Expected overhead percentage: 100 - 100 * ((MTU - IP-Overhead - TCP-Overhead) / 
(MTU + Framing-Overhead))
assuming MTU 1500, IPv4, no TCP options, and ethernet framing (of which the 
kernel only accounts for 14 bytes) we get
100 - 100 * ((1500 - 20 - 20) / (1500 + 14)) = 3.57 % so the observed 
difference between set gross-rate and measured net-tcp payload rate matches the 
theory reasonably well.
with tcp timestamps which you might/should have enabled you get:
100 - 100 * ((1500 - 20 - 20 - 12) / (1500 + 14)) = 4.39 %
and with proper accounting for an ethernet carrier:
100 - 100 * ((1500 - 20 - 20 - 12) / (1500 + 38)) = 5.85175552666 %
all of these are close enough to 5% to make the 5% rule a reasonable threshold 
to compare against, at least to me.

> Then I increase the shaper’s bitrate 5Mbit at a time and re-run the test 
> until I find the last bitrate at which iperf3 reports a stable (within a few 
> percent) and correct rate over 10 seconds for several runs in a row. See the 
> attached iperf3 results for sample runs around the threshold rates.

        Except for the 10 seconds this sounds reasonable, I would aim for at 
least 30, even tough this will be more important once you also monitor the 
latency under load concurrently to the bandwidth-probing flows...

> 
> qdisc: egress Mbit / ingress Mbit
> 
> cake nat dual-srchost / cake nat dual-dsthost ingress: 135 / 145

        On your box is there actual NAT masquerading happening?

> htb+fq_codel: 125 / 125
> htb+fq_codel with burst/cburst=96000: 155 / 155

        The last time we discussed the bust issue, I could not manage to see 
any difference with or without a specified burst, but I strongly believe I 
simply did not properly test. Btw, this is unidirectional shaping or with 
bidirectional saturation? 

> 
> So with this testing criteria, I’m actually seeing cake “win” (with the 
> exception of setting htb's burst/cburst to 96000, which shows a clear 
> improvement, probably at the expense of something).

        I could be wrong but the cost of burst/cburst is basically potentially 
more delay and jitter with the delay bound by the time required to empty the 
burst bucket, so at 96000 or 96kb and a set rate of 155Mbps I expect an 
additional delay of 1000 * 96/(155*1000) = 0.62 milliseconds, which fits as I 
believe that the 96k were targeted for 1ms @ 100Mbps. Also it will make the 
output of the shaper be less smooth and more choppy as sending is not paced 
ideally any more.
        My (too simplistic) mental model is that burst allows the shaper to 
work better in sirq-constrained conditions, as the issue basically seems to be 
that the shaper does not run often enough but without any overcommit permission 
will continue putting less data to the NIC than can be sent (at the desired 
shaper rate) in the (longer than expected/desired) interval between shaper 
executions. Te bust basically allow the shaper to dump in a batch of packets, 
hopefully getting allowing the interface to keep sending on the average desired 
sending rate.


> I also see that the ingress rate for cake can be held steady to a bit higher 
> of a bitrate than egress. I am using the ‘ingress’ keyword on ingress. I have 
> to be careful here because from run to run there can be slight variations in 
> behavior, but having repeated it several times at each bitrate around the 
> threshold, I’m fairly certain about the results.
> 
> In the ER-X manual (https://dl.ubnt.com/guides/edgemax/EdgeOS_UG.pdf), they 
> give a guideline of 100-250Mbps on the “expected Smart Queue shaping 
> performance” (which means fq_codel) for the ER-X. In reality, 100Mbps is 
> comfortable, and 250Mbps seems impossible. You might be able to get that rate 
> by setting fq_codel to 300+Mbit (and you can’t, through a bridge anyway), but 
> is the queue really controlled? I think I’m applying at least a little more 
> consistent criteria for “success" here at a given bitrate than we have before.
> 
> I suppose I should repeat this test with different hardware to be surer of 
> the claim, but I’m not sure when I’ll have the time. I will say that Cake’s 
> shaper overall produces more satisfyingly consistent rates, and given its NAT 
> support and host fairness, that’s why I’m likely to continue to use it when I 
> can.
> 
> 

        I am quite curious about these files, but I seem incapable of 
downloading/opening them...

Best Regards
        Sebastian

> 
> _______________________________________________
> Cake mailing list
> [email protected]
> https://lists.bufferbloat.net/listinfo/cake

_______________________________________________
Cake mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cake

Reply via email to