Tue, Nov 01, 2016 at 04:13:32PM CET, john.fastab...@gmail.com wrote: >[...] > >>>> P4 is ment to program programable hw, not fixed pipeline. >>>> >>> >>> I'm guessing there are no upstream drivers at the moment that support >>> this though right? The rocker universe bits though could leverage this. >> >> mlxsw. But this is naturaly not implemented yet, as there is no >> infrastructure. > >Really? What is re-programmable? > >Can the parse graph support arbitrary parse graph? >Can the table topology be reconfigured? >Can new tables be created? >What about "new" actions being defined at configuration time? > >Or is this just the normal TCAM configuration of defining key widths and >fields.
At this point TCAM configuration. > >> >> >>> >>>> >>>>> >>>>>> >>>>>>> since I cannot see how one can put the whole p4 language compiler >>>>>>> into the driver, so this last step of p4ast->hw, I presume, will be >>>>>>> done by firmware, which will be running full compiler in an embedded cpu >>>>>> >>>>>> In case of mlxsw, that compiler would be in driver. >>>>>> >>>>>> >>>>>>> on the switch. To me that's precisely the kernel bypass, since we won't >>>>>>> have a clue what HW capabilities actually are and won't be able to fine >>>>>>> grain control them. >>>>>>> Please correct me if I'm wrong. >>>>>> >>>>>> You are wrong. By your definition, everything has to be figured out in >>>>>> driver and FW does nothing. Otherwise it could do "something else" and >>>>>> that would be a bypass? Does not make any sense to me whatsoever. >>>>>> >>>>>> >>>>>>> >>>>>>>> Plus the thing I cannot imagine in the model you propose is table >>>>>>>> fillup. >>>>>>>> For ebpf, you use maps. For p4 you would have to have a separate >>>>>>>> HW-only >>>>>>>> API. This is very similar to the original John's Flow-API. And >>>>>>>> therefore >>>>>>>> a kernel bypass. >>>>>>> >>>>>>> I think John's flow api is a better way to expose mellanox switch >>>>>>> capabilities. >>>>>> >>>>>> We are under impression that p4 suits us nicely. But it is not about >>>>>> us, it is about finding the common way to do this. >>>>>> >>>>> >>>>> I'll just poke at my FlowAPI question again. For fixed ASICS what is >>>>> the Flow-API missing. We have a few proof points that show it is both >>>>> sufficient and usable for the handful of use cases we care about. >>>> >>>> Yeah, it is most probably fine. Even for flex ASICs to some point. The >>>> question is how it stands comparing to other alternatives, like p4 >>>> >>> >>> Just to be clear the Flow-API _was_ generated from the initial P4 spec. >>> The header files and tools used with it were autogenerated ("compiled" >>> in a loose sense) from the P4 program. The piece I never exposed >>> was the set_* operations to reconfigure running systems. I'm not sure >>> how valuable this is in practice though. >>> >>> Also there is a P4-16 spec that will be released shortly that is more >>> flexible and also more complex. >> >> Would it be able to easily extend the Flow-API to include the changes? >> > >P4-16 will allow externs, "functions" to execute in the control flow and >possibly inside the parse graph. None of this was considered in the >Flow-API. So none of this is supported. > >I still have the question are you trying to push the "programming" of >the device via 'tc' or just the runtime configuration of tables? If it >is just runtime Flow-API is sufficient IMO. If its programming the >device using the complete P4-16 spec than no its not sufficient. But Sure we need both. >I don't believe vendors will expose the complete programmability of the >device in the driver, this is going to look more like a fw update than >a runtime change at least on the devices I'm aware of. Depends on driver. I think it is fine if driver processed it into come hw configuration sequence or it simply pushed the program down to fw. Both usecases are legit. > >> >>> >>>> >>>>> >>>>>> >>>>>>> I also think it's not fair to call it 'bypass'. I see nothing in it >>>>>>> that justify such 'swear word' ;) >>>>>> >>>>>> John's Flow-API was a kernel bypass. Why? It was a API specifically >>>>>> designed to directly work with HW tables, without kernel being involved. >>>>> >>>>> I don't think that is a fair definition of HW bypass. The SKIP_SW flag >>>>> does exactly that for 'tc' based offloads and it was not rejected. >>>> >>>> No, no, no. You still have possibility to do the same thing in kernel, >>>> same functionality, with the same API. That is a big difference. >>>> >>>> >>>>> >>>>> The _real_ reason that seems to have fallen out of this and other >>>>> discussion is the Flow-API didn't provide an in-kernel translation into >>>>> an emulated patch. Note we always had a usermode translation to eBPF. >>>>> A secondary reason appears to be overhead of adding yet another netlink >>>>> family. >>>> >>>> Yeah. Maybe you remember, back then when Flow-API was being discussed, >>>> I suggested to wrap it under TC as cls_xflows and cls_xflowsaction of >>>> some sort and do in-kernel datapath implementation. I believe that after >>>> that, it would be acceptable. >>>> >>> >>> As I understand the thread here that is exactly the proposal here right? >>> With a discussion around if the structures/etc are sufficient or any >>> alternative representations exist. >> >> Might be the way, yes. But I fear that with other p4 extensions this >> might not be easy to align with. Therefore I though about something more >> generic, like the p4ast. >> > >Same question as above are we _really_ talking about pushing the entire >programmability of the device via 'tc'. If so we need to have a vendor >say they will support and implement this? We need some API, and I believe that TC is perfectly suitable for that. Why do you think it's a problem? > >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>>> The goal of flow api was to expose HW features to user space, so that >>>>>>> user space can program it. For something simple as mellanox switch >>>>>>> asic it fits perfectly well. >>>>>> >>>>>> Again, this is not mlx-asic-specific. And again, that is a kernel bypass. >>>>>> >>>>>> >>>>>>> Unless I misunderstand the bigger goal of this discussion and it's >>>>>>> about programming ezchip devices. >>>>>> >>>>>> No. For network processors, I believe that BPF is nicely offloadable, no >>>>>> need to do the excercise for that. >>>>>> >>>>>> >>>>>>> >>>>>>> If the goal is to model hw tcam in the linux kernel then just introduce >>>>>>> tcam bpf map type. It will be dog slow in user space, but it will >>>>>>> match exactly what is happnening in the HW and user space can make >>>>>>> sensible trade-offs. >>>>>> >>>>>> No, you got me completely wrong. This is not about the TCAM. This is >>>>>> about differences in the 2 words (p4/bpf). >>>>>> Again, for "p4-ish" devices, you have to translate BPF. And as you >>>>>> noted, it's an instruction set. Very hard if not impossible to parse in >>>>>> order to get back the original semantics. >>>>>> >>>>> >>>>> I think in this discussion "p4-ish" devices means devices with multiple >>>>> tables in a pipeline? Not devices that have programmable/configurable >>>>> pipelines right? And if we get to talking about reconfigurable devices >>>>> I believe this should be done out of band as it typically means >>>>> reloading some ucode, etc. >>>> >>>> I'm talking about both. But I think we should focus on reconfigurable >>>> ones, as we probably won't see that much fixed ones in the future. >>>> >>> >>> hmm maybe but the 10/40/100Gbps devices are going to be around for some >>> time. So we need to ensure these work well. >> >> Yes, but I would like to emphasize, if we are defining new api >> the primary focus should be on new devices. >> >> > >What device though. Back to mlxsw question about actually supporting >this stuff. >