Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes

Jiri Pirko Mon, 19 Sep 2016 23:03:06 -0700

Tue, Sep 20, 2016 at 07:49:47AM CEST, ro...@cumulusnetworks.com wrote:
>On 9/19/16, 8:15 AM, Jiri Pirko wrote:
>> Mon, Sep 19, 2016 at 04:59:22PM CEST, ro...@cumulusnetworks.com wrote:
>>> On 9/18/16, 11:14 PM, Jiri Pirko wrote:
>>>> Mon, Sep 19, 2016 at 01:16:17AM CEST, ro...@cumulusnetworks.com wrote:
>>>>> On 9/18/16, 1:00 PM, Florian Fainelli wrote:
>>>>>> Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>>>>>>> From: Jiri Pirko <j...@mellanox.com>
>>>>>>>
>>>>>>> This is RFC, unfinished. I came across some issues in the process so I 
>>>>>>> would
>>>>>>> like to share those and restart the fib offload discussion in order to 
>>>>>>> make it
>>>>>>> really usable.
>>>>>>>
>>>>>>> So the goal of this patchset is to allow driver to propagate all 
>>>>>>> prefixes
>>>>>>> configured in kernel down HW. This is necessary for routing to work
>>>>>>> as expected. If we don't do that HW might forward prefixes known to 
>>>>>>> kernel
>>>>>>> incorrectly. Take an example when default route is set in switch HW and 
>>>>>>> there
>>>>>>> is an IP address set on a management (non-switch) port.
>>>>>>>
>>>>>>> Currently, only fibs related to the switch port netdev are offloaded 
>>>>>>> using
>>>>>>> switchdev ops. This model is not extendable so the first patch 
>>>>>>> introduces
>>>>>>> a replacement: notifier to propagate fib additions and removals to 
>>>>>>> whoever
>>>>>>> interested. The second patch makes mlxsw to adopt this new way, 
>>>>>>> registering
>>>>>>> one notifier block for each mlxsw (asic) instance.
>>>>>> Instead of introducing another specialization of a notifier_block
>>>>>> implementation, could we somehow have a kernel-based netlink listener
>>>>>> which receives the same kind of event information from rtmsg_fib()?
>>>>>>
>>>>>> The reason is that having such a facility would hook directly onto
>>>>>> existing rtmsg_* calls that exist throughout the stack, and that seems
>>>>>> to scale better.
>>>>> I was thinking along the same lines. Instead of proliferating notifier 
>>>>> blocks
>>>>> through-out the stack for switchdev offload, putting existing events to 
>>>>> use would be nice.
>>>>>
>>>>> But the problem though is drivers having to parse the netlink msg again. 
>>>>> also, the intent
>>>>> here is to do the offload first ..before the route is added to the kernel 
>>>>> (though i don't see that in
>>>>> the current series). existing netlink rmsg_fib events are generated after 
>>>>> the route is added to the kernel.
>>>>>
>>>>>
>>>>> Jiri, instead of the notifier, do you see a problem with always calling 
>>>>> the existing switchdev
>>>>> offload api for every route  for every asic instance ?. the first device 
>>>>> where the route fits wins.
>>>> There is not list of asic instances. Therefore the notifier fits much 
>>>> better here.
>>>>
>>>>
>>>>
>>>>> it seems similar to driver registering for notifier and looking at every 
>>>>> route ...
>>>>> am i missing something ?
>>>>> and the policies you mention could help around selecting the asic 
>>>>> instance (FCFS or mirror).
>>>>> you will need to abstract out the asic instance for switchdev api to call 
>>>>> on, but I thought you
>>>>> already have that in some form in your devlink infrastructure.
>>>> switchdev asic instances and devlink instances are orthogonal.
>>> maybe it is not today...but the requirement for devlink was to provide a 
>>> way to communicate
>>> to the switch driver
>>> - global switch attributes or
>>> - things that cannot go via switch ports (exactly the problem you are 
>>> trying to solve for routes here)
>> Devlink is a general beast, not switch specific one. I see no need to
>> use fib->devlink->driver route inside kernel. Devlink is for userspace
>> facing.
>
>yes, sure. it has a dev abstraction and an api. devlink discussion started a 
>few years ago in the context
>of switch asics for the very same reason that it will help direct the offload 
>call to the
>switch device driver when you cant apply the settings on a per port basis.
>You have kept the abstraction and api generic ..which is a great thing.
>But that can't be the reason for it to not support its original intent...if 
>there is a way.
>
>>
>>
>>> so,  maybe an instance of switch asic modeled via devlink will help here 
>>> and possibly all/other switchdev
>>> offload hooks ?
>> Maybe, but in case of fibs, the notifier just fits great. I see no need
>> for anything else.
>
>I think its better to stick with 'offload api or notifier' whichever we pick ..
>to be consistent with other switchdev offload areas. That was the original 
>intent of
>introducing the switchdev api layer. If we are now replacing the switchdev api 
>with notifiers,


I strongly disagree. Make it uniform is not desirable. For some things,
direct ndo/sdo make sense and is better. For some other things, notifier
fits better. For example when I was implementing LAG offload,
I also chose a notifier.


>assuming 'notifiers are the best way' to offload routes, lets keep it 
>consistent with
>other switchdev offload areas too.
>
>I know you already have them for links...and that is good..because links 
>already have notifiers.
>we will need the same thing for acls. Having notifiers for acls too seems like 
>an overkill.

Acls will reuse the tc ndo infra. No notifiers required there. 


>we will then have to extend this to multicast and mpls routes too. will all 
>these be notifiers too ?

I believe so.


>
>Do you see any scale problems with using notifiers ?. as you know these ascis 
>can scale to
>32k-128k routes.

I don't see any problem there. What do you think might be wrong?


>
>lets discuss more at netdev1.2..if your patches are not in by then.
>
>thanks,
>Roopa
>
>

Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes

Reply via email to