On Fri, 23 Sep 2016 10:22:59 -0700, Samudrala, Sridhar wrote: > On 9/23/2016 8:29 AM, Jakub Kicinski wrote: > > On Fri, 23 Sep 2016 07:23:26 -0700, John Fastabend wrote: > >> Yep, I like the idea in general. I had a slightly different approach in > >> mind though. If you look at __dev_queue_xmit() there is a void > >> accel_priv pointer (gather you found this based on your commit note). > >> My take was we could extend this a bit so it can be used by the VFR > >> devices and they could do a dev_queue_xmit_accel(). In this way there is > >> no need to touch /net/core/{filter, dst, ip_tunnel}.c etc. Maybe the > >> accel logic needs to be extended to push the priv pointer all the way > >> through the xmit routine of the target netdev though. This should look > >> a lot like the macvlan accelerated xmit device path without the > >> switching logic. > >> > >> Of course maybe the name would be extended to dev_queue_xmit_extended() > >> or something. > >> > >> So the flow on ingress would be, > >> > >> 1. pkt_received_by_PF_netdev > >> 2. PF_netdev reads some tag off packet/descriptor and sets correct > >> skb->dev field. This is needed so stack "sees" packets from > >> correct VF ports. > >> 3. packet passed up to stack. > >> > >> I guess it is a bit "zombie" like on the receive path because the packet > >> is never actually handled by VF netdev code per se and on egress can > >> traverse both the VFR and PF netdevs qdiscs. But on the other hand the > >> VFR netdevs and PF netdevs are all in the same driver. Plus using a > >> queue per VFR is a bit of a waste as its not needed and also hardware > >> may not have any mechanism to push VF traffic onto a rx queue. > >> > >> On egress, > >> > >> 1. VFR xmit is called > >> 2. VFR xmit calls dev_queue_xmit_accel() with some meta-data if needed > >> for the lower netdev > >> 3. lower netdev sends out the packet. > >> > >> Again we don't need to waste any queues for each VFR and the VFR can be > >> a LLTX device. In this scheme I think you avoid much of the changes in > >> your patch and keep it all contained in the driver. Any thoughts? > > The 'accel' parameter in dev_queue_xmit_accel() is currently only passed > to ndo_select_queue() via netdev_pick_tx() and is used to select the tx > queue. > Also, it is not passed all the way to the driver specific xmit routine. > Doesn't it require > changing all the driver xmit routines if we want to pass this parameter? > > > Goes without saying that you have a much better understanding of packet > > scheduling so please bear with me :) My target model is that I have > > n_cpus x "n_tc/prio" queues on the PF and I want to transmit the > > fallback traffic over those same queues. So no new HW queues are used > > for VFRs at all. This is a reverse of macvlan offload which AFAICT has > > "bastard hw queues" which actually TX for a separate software device. > > > > My understanding was that I can rework this model to have software > > queues for VFRs (#sw queues == #PF queues + #VFRs) but no extra HW > > queues (#hw queues == #PF queues) but then when the driver sees a > > packet on sw-only VFR queue it has to pick one of the PF queues (which > > one?), lock PF software queue to own it, and only then can it > > transmit. With the dst_metadata there is no need for extra locking or > > queue selection. > > Yes. The VFPR netdevs don't have any HW queues associated with them and > we would like > to use the PF queues for the xmit. > I was also looking into some way of passing the port id via skb > parameter to the > dev_queue_xmit() call so that the PF xmit routine can do a directed > transmit to a specifc VF. > Is skb->cb an option to pass this info? > dst_metadata approach would work too if it is acceptable.
I don't think we can trust skb->cb to be set to anything meaningful when the skb is received by the lower device.