Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations

Jakub Kicinski Wed, 07 Aug 2019 11:50:05 -0700

On Tue, 6 Aug 2019 20:33:47 -0600, David Ahern wrote:
> Some time back supported was added for devlink 'resources'. The idea is
> that hardware (mlxsw) has limited resources (e.g., memory) that can be
> allocated in certain ways (e.g., kvd for mlxsw) thus implementing
> restrictions on the number of programmable entries (e.g., routes,
> neighbors) by userspace.
> 
> I contend:
> 
> 1. The kernel is an analogy to the hardware: it is programmed by
> userspace, has limited resources (e.g., memory), and that users want to
> control (e.g., limit) the number of networking entities that can be
> programmed - routes, rules, nexthop objects etc and by address family
> (ipv4, ipv6).


Memory hierarchy for ASIC is more complex and changes more often than
we want to change the model and kernel ABIs. The API in devlink is
intended for TCAM partitioning.

> 2. A consistent operational model across use cases - s/w forwarding, XDP
> forwarding and hardware forwarding - is good for users deploying systems
> based on the Linux networking stack. This aligns with my basic point at
> LPC last November about better integration of XDP and kernel tables.
> 
> The existing devlink API is the right one for all use cases. Most
> notably that the kernel can mimic the hardware from a resource
> management. Trying to say 'use cgroups for s/w forwarding and devlink
> for h/w forwarding' is complicating the lives of users. It is just a
> model and models can apply to more than some rigid definition.

This argument holds no water. Only a tiny fraction of Linux networking
users will have an high performance forwarding ASIC attached to their
CPUs. So we'll make 99.9% of users who never seen devlink learn the
tool for device control to control kernel resource?

Perhaps I'm misinterpreting your point there.

> As for the namespace piece of this, the kernel's tables for networking
> are *per namespace*, and so the resource controller must be per
> namespace. This aligns with another consistent theme I have promoted
> over the years - the ability to divide up a single ASIC into multiple,
> virtual switches which are managed per namespace. This is a very popular
> feature from a certain legacy vendor and one that would be good for open
> networking to achieve. This is the basis of my response last week about
> the devlink instance per namespace, and I thought Jiri was moving in
> that direction until our chat today. Jiri's intention is something
> different; we can discuss that on the next version of his patches.

Resource limits per namespace make perfect sense. Just not configured
via devlink..

Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations

Reply via email to