Hi Marcelo,

On Thu, Nov 22, 2018 at 07:08:32PM -0200, Marcelo Ricardo Leitner wrote:
> On Thu, Nov 22, 2018 at 02:22:20PM -0200, Marcelo Ricardo Leitner wrote:
> > On Wed, Nov 21, 2018 at 03:51:20AM +0100, Pablo Neira Ayuso wrote:
> > > Hi,
> > > 
> > > This patchset is the third iteration [1] [2] [3] to introduce a kernel
> > > intermediate (IR) to express ACL hardware offloads.
> > 
> > On v2 cover letter you had:
> > 
> > """
> > However, cost of this layer is very small, adding 1 million rules via
> > tc -batch, perf shows:
> > 
> >      0.06%  tc               [kernel.vmlinux]    [k] tc_setup_flow_action
> > """
> > 
> > The above doesn't include time spent on children calls and I'm worried
> > about the new allocation done by flow_rule_alloc(), as it can impact
> > rule insertion rate. I'll run some tests here and report back.
> 
> I'm seeing +60ms on 1.75s (~3.4%) to add 40k flower rules on ingress
> with skip_hw and tc in batch mode, with flows like:
> 
> filter add dev p6p2 parent ffff: protocol ip prio 1 flower skip_hw
> src_mac ec:13:db:00:00:00 dst_mac ec:14:c2:00:00:00 src_ip
> 56.0.0.0 dst_ip 55.0.0.0 action drop
> 
> Only 20ms out of those 60ms were consumed within fl_change() calls
> (considering children calls), though.
> 
> Do you see something similar?  I used current net-next (d59da3fbfe3f)
> and with this patchset applied.

I see lots of send() and recv() in tc -batch via strace, using this
example rule, repeating it N times:

        filter add dev eth0 parent ffff: protocol ip pref 1 flower dst_mac 
f4:52:14:10:df:92 action mirred egress redirect dev eth1

This is taking ~8 seconds for 40k rules from my old laptop [*], this
is already not too fast (without my patchset).

I remember we discussed about adding support for real batching for tc
- probably we can probably do this transparently by assuming that if the
skbuff length mismatches nlmsghdr->len field, then we enter the batch
mode from the kernel. This would require to update iproute2 to use
libmnl batching routines, or code that follows similar approach
otherwise.

[*] 0.5 seconds in nft (similar ruleset), this is using netlink batching.

Reply via email to