On Tue, 6 Aug 2019 20:33:47 -0600, David Ahern wrote: > Some time back supported was added for devlink 'resources'. The idea is > that hardware (mlxsw) has limited resources (e.g., memory) that can be > allocated in certain ways (e.g., kvd for mlxsw) thus implementing > restrictions on the number of programmable entries (e.g., routes, > neighbors) by userspace. > > I contend: > > 1. The kernel is an analogy to the hardware: it is programmed by > userspace, has limited resources (e.g., memory), and that users want to > control (e.g., limit) the number of networking entities that can be > programmed - routes, rules, nexthop objects etc and by address family > (ipv4, ipv6).
Memory hierarchy for ASIC is more complex and changes more often than we want to change the model and kernel ABIs. The API in devlink is intended for TCAM partitioning. > 2. A consistent operational model across use cases - s/w forwarding, XDP > forwarding and hardware forwarding - is good for users deploying systems > based on the Linux networking stack. This aligns with my basic point at > LPC last November about better integration of XDP and kernel tables. > > The existing devlink API is the right one for all use cases. Most > notably that the kernel can mimic the hardware from a resource > management. Trying to say 'use cgroups for s/w forwarding and devlink > for h/w forwarding' is complicating the lives of users. It is just a > model and models can apply to more than some rigid definition. This argument holds no water. Only a tiny fraction of Linux networking users will have an high performance forwarding ASIC attached to their CPUs. So we'll make 99.9% of users who never seen devlink learn the tool for device control to control kernel resource? Perhaps I'm misinterpreting your point there. > As for the namespace piece of this, the kernel's tables for networking > are *per namespace*, and so the resource controller must be per > namespace. This aligns with another consistent theme I have promoted > over the years - the ability to divide up a single ASIC into multiple, > virtual switches which are managed per namespace. This is a very popular > feature from a certain legacy vendor and one that would be good for open > networking to achieve. This is the basis of my response last week about > the devlink instance per namespace, and I thought Jiri was moving in > that direction until our chat today. Jiri's intention is something > different; we can discuss that on the next version of his patches. Resource limits per namespace make perfect sense. Just not configured via devlink..