On Thu, May 7, 2020 at 8:52 PM Cong Wang <xiyou.wangc...@gmail.com> wrote: > > On Tue, May 5, 2020 at 1:46 AM Václav Zindulka > <vaclav.zindu...@tlapnet.cz> wrote: > > > > On Mon, May 4, 2020 at 7:46 PM Cong Wang <xiyou.wangc...@gmail.com> wrote: > > > > > > Sorry for the delay. I lost connection to my dev machine, I am trying > > > to setup this on my own laptop. > > > > Sorry to hear that. I will gladly give you access to my testing > > machine where all this nasty stuff happens every time so you can test > > it in place. You can try everything there and have online results. I > > can give you access even to the IPMI console so you can switch the > > kernel during boot easily. I didn't notice this problem until the time > > of deployment. My prior testing machines were with metallic ethernet > > ports only so I didn't know about those problems earlier. > > Thanks for the offer! No worries, I setup a testing VM on my laptop.
OK > > > > > > I tried to emulate your test case in my VM, here is the script I use: > > > > > > ==== > > > ip li set dev dummy0 up > > > tc qd add dev dummy0 root handle 1: htb default 1 > > > for i in `seq 1 1000` > > > do > > > tc class add dev dummy0 parent 1:0 classid 1:$i htb rate 1mbit ceil > > > 1.5mbit > > > tc qd add dev dummy0 parent 1:$i fq_codel > > > done > > > > > > time tc qd del dev dummy0 root > > > ==== > > > > > > And this is the result: > > > > > > Before my patch: > > > real 0m0.488s > > > user 0m0.000s > > > sys 0m0.325s > > > > > > After my patch: > > > real 0m0.180s > > > user 0m0.000s > > > sys 0m0.132s > > > > My results with your test script. > > > > before patch: > > /usr/bin/time -p tc qdisc del dev enp1s0f0 root > > real 1.63 > > user 0.00 > > sys 1.63 > > > > after patch: > > /usr/bin/time -p tc qdisc del dev enp1s0f0 root > > real 1.55 > > user 0.00 > > sys 1.54 > > > > > This is an obvious improvement, so I have no idea why you didn't > > > catch any difference. > > > > We use hfsc instead of htb. I don't know whether it may cause any > > difference. I can provide you with my test scripts if necessary. > > Yeah, you can try to replace the htb with hfsc in my script, > I didn't spend time to figure out hfsc parameters. class add dev dummy0 parent 1:0 classid 1:$i hfsc ls m1 0 d 0 m2 13107200 ul m1 0 d 0 m2 13107200 but it behaves the same as htb... > My point here is, if I can see the difference with merely 1000 > tc classes, you should see a bigger difference with hundreds > of thousands classes in your setup. So, I don't know why you > saw a relatively smaller difference. I saw a relatively big difference. It was about 1.5s faster on my huge setup which is a lot. Yet maybe the problem is caused by something else? I thought about tx/rx queues. RJ45 ports have up to 4 tx and rx queues. SFP+ interfaces have much higher limits. 8 or even 64 possible queues. I've tried to increase the number of queues using ethtool from 4 to 8 and decreased to 2. But there was no difference. It was about 1.62 - 1.63 with an unpatched kernel and about 1.55 - 1.58 with your patches applied. I've tried it for ifb and RJ45 interfaces where it took about 0.02 - 0.03 with an unpatched kernel and 0.05 with your patches applied, which is strange, but it may be caused by the fact it was very fast even before. I've commits c71c00df335f6aff00d3dc7f28e06dc8abc088a7, 13a5aec17cc65f6aa5c3bc470f504650bd465a69, 720cc6b0d12fb7c8a494e441ebd360c62023dad2, 51287a4bc6f2addd4a8c1919829aab3bb7c706c9 from https://github.com/congwang/linux/commits/qdisc_reset applied on 5.4.6 kernel. I can apply them on the newest one if it can have any impact. I hope I've applied the right patches and haven't missed any older commits. I've even tried to compile the kernel from your repository - branch qdisc_reset. Times are a little bit lower than with patched 5.4.6. 1.52 - 1.53. Yet I still can't get to great improvement like you saw. Thank you.