On 3/24/19 9:26 PM, Alexei Starovoitov wrote: > On Sun, Mar 24, 2019 at 06:56:42AM -0600, David Ahern wrote: >> >> This change also enables many other key features: >> 1. IPv4 multipath routes are not evicted just because 1 hop goes down. >> 2. IPv6 multipath routes with device only nexthops (e.g., tunnels). >> 3. IPv6 nexthop with IPv4 route (aka, RFC 5549) which enables a more >> natural BGP unnumbered. >> 4. Lower memory consumption for IPv6 FIB entries which has no sharing at >> all like IPv4 does. >> 5. Allows atomic update of nexthop definitions with a single replace >> command as opposed to replacing the N-routes using it. > > Does kernel work as data plane or control plane in any of the above > features ? > Sadly the patches allow it to do both, but cumulus doesn't use it > for data path. The kernel on control plane cpu is merely a database. > And today it doesn't scale when used as a database. > The kernel has to be fast as a dataplane but these extra features > will slow down the routing by making kernel-as-database scale a bit better. > Hence my suggestion in the previous email: use proper database > to store routes, nexthops and whatever else necessary to program the asic. > The kernel doesn't need to hold this information. >
The first 40 patches align fib_nh and fib6_nh providing more consistency and alignment between IPv4 and IPv6 and allowing more code re-use between the protocols. The end result is the ability to have IPv6 gateways with IPv4 routes, a much needed control plane feature other companies have been harassing me about as well as the internal need for Cumulus. In the refactoring I have been very careful about changes to data structure layout and cacheline hits as well as adverse changes to memory use. I believe at the end of this change set there is no impact to existing performance - control plane or data plane. That is followed by refactoring IPv6 again in a direction that makes IPv4 and IPv6 more consistent and enables changes (outside of the nexthop sets) that will improve IPv6 for a number of cases by removing the need to always generate a dst_entry. After that are a few patches exporting functions for use by nexthop code and then diving into the refactoring enabling separate nexthop objects. Again, impacts to performance have been top of mind, and I have done what I can to minimize any overhead in the datapath - to the point of a few ‘if (nh)’ checks wrapped in an unlikely. And with the nexthop code in place it gives users an alternative to a broken IPv6 multipath API as one example. As far as scalability goes, I can already inject a million routes into the kernel FIB. This allows me to it more efficiently and to manage the FIBs more efficiently in the face of changes such as a link going down as we move to higher end systems - such as spectrum2. As for routes in the kernel, they need to be there for any control plane processes to properly function. One example is ping and traceroute to troubleshoot data path problems, and another is for bgp (or any other service) to connect to a peer through the data plane (do not assume a peer is on a directly connected route). Further, the routes need to go through the kernel to get to the switchdev driver. The routes need to be there for XDP forwarding and routing on the host. Pawel has already expressed interest in using XDP for fast path forwarding with FRR managing the route table. You keep trying to make this about Cumulus. This is about bringing next level features to Linux and in the process bringing more consistency and code sharing between IPv4 and IPv6. This is about 1-API for the data center be it servers, hosts, switches or routers regardless of datapath (hardware offload, XDP, or kernel forwarding), and maintaining consistency in configuring, monitoring and troubleshooting across those systems. That is the common theme of both the netdev talk last summer and the talk at LPC in November. Again, I have tried to be very careful with the intrusion of checks into the datapath with the goal of no measurable impact to performance. I am invested to seeing that through and will continue looking for ways to improve it for all use cases.