On 6 May 2016 at 22:43, Dave Taht <dave.t...@gmail.com> wrote: > On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.li...@gmail.com> wrote: >> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.li...@gmail.com> wrote: >>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <bro...@redhat.com> wrote: >>>> >>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2] >>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing). >>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project >>>> is in some kind of conflict. >>>> >>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349 >>>> >>>> [2] >>>> http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335 >>> >>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a >>> bit with fq_codel limits I was able to get 420Mbps UDP like this: >>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256 >> >> Forgot to mention, I've reduced drop_batch_size down to 32 > > 0) Not clear to me if that's the right line, there are 4 wifi queues, > and the third one > is the BE queue.
That was an example, sorry, should have stated that. I've applied same settings to all 4 queues. > That is too low a limit, also, for normal use. And: > for the purpose of this particular UDP test, flows 16 is ok, but not > ideal. I played with different combinations, it doesn't make any (significant) difference: 20-30Mbps, not more. What numbers would you propose? > 1) What's the tcp number (with a simultaneous ping) with this latest patchset? > (I care about tcp performance a lot more than udp floods - surviving a > udp flood yes, performance, no) During the test (both TCP and UDP) it's roughly 5ms in average, not running tests ~2ms. Actually I'm now wondering if target is working at all, because I had same result with target 80ms.. So, yes, latency is good, but performance is poor. > before/after? > > tc -s qdisc show dev wlan0 during/after results? during the test: qdisc mq 0: root Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17) backlog 1545794b 1021p requeues 17 qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17) backlog 1541252b 1018p requeues 17 maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0 new_flows_len 0 old_flows_len 1 qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 after the test (60sec): qdisc mq 0: root Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28) backlog 0b 0p requeues 28 qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28) backlog 0b 0p requeues 28 maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0 new_flows_len 0 old_flows_len 1 qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 > IF you are doing builds for the archer c7v2, I can join in on this... (?) I'm not but I have c7 somewhere, so I can do a build for it and also test, so we are on the same page. > I did do a test of the ath10k "before", fq_codel *never engaged*, and > tcp induced latencies under load, e at 100mbit, cracked 600ms, while > staying flat (20ms) at 100mbit. (not the same patches you are testing) > on x86. I have got tcp 300Mbit out of an osx box, similar latency, > have yet to get anything more on anything I currently have > before/after patchsets. > > I'll go add flooding to the tests, I just finished a series comparing > two different speed stations and life was good on that. > > "before" - fq_codel never engages, we see seconds of latency under load. > > root@apu2:~# tc -s qdisc show dev wlp4s0 > qdisc mq 0: root > Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 > target 5.0ms interval 100.0ms ecn > Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 > new_flows_len 0 old_flows_len 0 > qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 > target 5.0ms interval 100.0ms ecn > Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0 > new_flows_len 0 old_flows_len 1 > qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 > target 5.0ms interval 100.0ms ecn > Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0 > new_flows_len 1 old_flows_len 3 > qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 > target 5.0ms interval 100.0ms ecn > Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0 > new_flows_len 1 old_flows_len 0 > ``` > > >>> This is certainly better than 30Mbps but still more than two times >>> less than before (900). > > The number that I still am not sure we got is that you were sending > 900mbit udp and recieving 900mbit on the prior tests? 900 was sending, AP POV (wifi client is downloading) >>> TCP also improved a little (550 to ~590). > > The limit is probably a bit low, also. You might want to try target > 20ms as well. I've tried limit up to 1024 and target up to 80ms >>> >>> Felix, others, do you want to see the ported patch, maybe I did something >>> wrong? >>> Doesn't look like it will save ath10k from performance regression. > > what was tcp "before"? (I'm sorry, such a long thread) 750Mbps >>> >>>> >>>> On Fri, 6 May 2016 11:42:43 +0200 >>>> Jesper Dangaard Brouer <bro...@redhat.com> wrote: >>>> >>>>> Hi Felix, >>>>> >>>>> This is an important fix for OpenWRT, please read! >>>>> >>>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024, >>>>> without also adjusting q->flows_cnt. Eric explains below that you must >>>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just >>>>> adjust it to 128) >>>>> >>>>> Problematic OpenWRT commit in question: >>>>> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e >>>>> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it >>>>> from causing too much cpu load with higher speed (#21326)") >>>>> >>>>> >>>>> I also highly recommend you cherry-pick this very recent commit: >>>>> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") >>>>> https://git.kernel.org/davem/net-next/c/9d18562a227 >>>>> >>>>> This should fix very high CPU usage in-case fq_codel goes into drop mode. >>>>> The problem is that drop mode was considered rare, and implementation >>>>> wise it was chosen to be more expensive (to save cycles on normal mode). >>>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is >>>>> especially expensive for smaller devices, as it scans a 4K big array, >>>>> thus 64 cache misses for small devices! >>>>> >>>>> The fix is to allow drop-mode to bulk-drop more packets when entering >>>>> drop-mode (default 64 bulk drop). That way we don't suddenly >>>>> experience a significantly higher processing cost per packet, but >>>>> instead can amortize this. >>>>> >>>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk >>>>> drop, given we also recommend bucket size to be 128 ? (thus the amount >>>>> of memory to scan is less, but their CPU is also much smaller). >>>>> >>>>> --Jesper >>>>> >>>>> >>>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.duma...@gmail.com> >>>>> wrote: >>>>> >>>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote: >>>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.duma...@gmail.com> wrote: >>>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote: >>>>> > > > >>>>> > > >> >>>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024 >>>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn >>>>> > > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0) >>>>> > > >> backlog 0b 0p requeues 0 >>>>> > > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >>>>> > > >> new_flows_len 0 old_flows_len 0 >>>>> > > > >>>>> > > > >>>>> > > > Limit of 1024 packets and 1024 flows is not wise I think. >>>>> > > > >>>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 >>>>> > > > packet, >>>>> > > > which is almost the same than having no queue at all) >>>>> > > > >>>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a >>>>> > > > chance to trigger. >>>>> > > > >>>>> > > > So you could either reduce number of buckets to 128 (if memory is >>>>> > > > tight), or increase limit to 8192. >>>>> > > >>>>> > > Will try, but what I've posted is default, I didn't change/configure >>>>> > > that. >>>>> > >>>>> > fq_codel has a default of 10240 packets and 1024 buckets. >>>>> > >>>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413 >>>>> > >>>>> > If someone changed that in the linux variant you use, he probably should >>>>> > explain the rationale. >>>> >>>> -- >>>> Best regards, >>>> Jesper Dangaard Brouer >>>> MSc.CS, Principal Kernel Engineer at Red Hat >>>> Author of http://www.iptv-analyzer.org >>>> LinkedIn: http://www.linkedin.com/in/brouer > > > > -- > Dave Täht > Let's go make home routers and wifi faster! With better software! > http://blog.cerowrt.org