Hi Zhang, I think we both want the same thing and share the same basic concepts.
PSB, some answers, Best, Ori > -----Original Message----- > From: Zhang, Qi Z <qi.z.zh...@intel.com> > Sent: Thursday, May 18, 2023 1:33 PM > > > > > -----Original Message----- > > From: Ori Kam <or...@nvidia.com> > > Sent: Wednesday, May 17, 2023 11:19 PM > > To: Zhang, Qi Z <qi.z.zh...@intel.com>; dev@dpdk.org > > Cc: techbo...@dpdk.org; Richardson, Bruce > <bruce.richard...@intel.com>; > > Burakov, Anatoly <anatoly.bura...@intel.com>; Wiles, Keith > > <keith.wi...@intel.com>; Liang, Cunming <cunming.li...@intel.com>; Wu, > > Jingjing <jingjing...@intel.com>; Zhang, Helin <helin.zh...@intel.com>; > > Mcnamara, John <john.mcnam...@intel.com>; Xu, Rosen > > <rosen...@intel.com> > > Subject: RE: seeking community input on adapting DPDK to P4Runtime > > backend > > > > Hi Zhang, > > > > rte_flow is an excellent candidate for implementing P4. > > We and some internal tests that shows great promise in this regard. > > > > I would be very happy to supply any needed information and have > > discussion on how to continue with this project. > > Thank you Ori! Please check my following comments > > Regards > Qi > > > > > Please see inline detailed answers. > > > > Best, > > Ori Kam > > > > > > > > > > > -----Original Message----- > > > From: Zhang, Qi Z <qi.z.zh...@intel.com> > > > Sent: Monday, May 8, 2023 9:40 AM > > > Subject: seeking community input on adapting DPDK to P4Runtime > > backend > > > > > > Hi: > > > > > > Our team is currently working on developing a DPDK PMD for a P4- > > > programmed network controller, based on customer feedback to > integrate > > > DPDK into the P4Runtime backend .[https://p4.org/p4- > > > spec/p4runtime/main/P4Runtime-Spec.html] > > > > > > (*) However, we are facing challenges in adapting DPDK's rte_flow API > > > to the P4Runtime API, primarily due to the transition from a > > > table-based API with fields of arbitrary bits width at arbitrary > > > offset to a protocol-based API (more detail be described in post-script). > > > > > > We are seeking suggestions and best practices from the open-source > > > community to help us with this integration. Specifically, we are > > > interested in > > > learning: > > > > > > (*) If anyone has previously attempted to map rte_flow to P4-based > > devices. > > > > We did try successfully. > > > > > (*) Thoughts on how to map from table-based matching to protocol- > based > > > matching like in rte_flow. > > > > Rte_flow is table based (groups), now with the introduction of template > API > > rte_flow is even more table based (we added the concept of tables) which > > are just what > > p4 requires. > > Yes, the rte_flow template can be used to map a sequence of patterns to a > P4 table and a sequence of actions to a P4 action. However, Using a fixed > rte_flow template can be problematic when handling different P4 programs > in the same driver. To provide more flexibility, the mapping of patterns and > actions can be externalized into a configuration file or part of the firmware > can be learned from driver, allowing for customization based on the specific > requirements of each P4 pipeline. actually we have enabled this approach in > order to accommodate different P4 programs. > > However, an alternative approach to consider is whether it would be feasible > to directly expose the P4 table and action names or IDs to the application, > rather than relying on rte_flow templates. This approach offers several > potential benefits: > > Integration with P4runtime Backend: By exposing the P4 table and action > names or IDs directly, DPDK could be easily integrated as a P4runtime > backend. This eliminates the need for translation from the P4runtime API to > rte_flow templates in the application, simplifying the integration process. > > Elimination of Manual Mapping: Exposing the P4 table and action names or > IDs to the application would remove the requirement for the engineering > team to manually map each pipeline to specific rte_flow templates. This is > particularly beneficial in cases where hardware vendors provide customers > with a toolchain to create their own P4 pipelines but do not necessarily own > the P4 programs. By eliminating the dependency on rte_flow templates, this > approach reduces complexity in using DPDK as the driver. > > To be more specific, the proposed API for exposing P4 table and action > names or IDs directly to the application could be as follows: > > /* Get the table info */ > struct rte_p4_table_info tbl_info; > rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table", > &tbl_info); > > /* Create the key */ > struct rte_p4_table_key *key; > rte_p4_table_key_create(port_id, tbl_info->id, &key); > > /* Set the key fields */ > rte_p4_table_key_field_set_by_name(port_id, key, "wire_port", > &wire_port, 2); > rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_src", > &tun_ip_src, 4); > rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_dst", > &tun_ip_dst, 4); > rte_p4_table_key_field_set_by_name(port_id, key, "vni", &vni, 3); > rte_p4_table_key_field_set_by_name(port_id, key, "ipv4_src", &ipv4_src, 4); > rte_p4_table_key_field_set_by_name(port_id, key, "ipv4_dst", &ipv4_dst, > 4); > rte_p4_table_key_field_set_by_name(port_id, key, "src_port", &src_port, > 2); > rte_p4_table_key_field_set_by_name(port_id, key, "dst_port", &dst_port, > 2); > > /* Get the action spec info */ > struct rte_p4_action_spec_info as_info; > rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd", > &as_info); > > > /* Create the action */ > struct rte_p4_action *action; > rte_p4_action_create(port_id, as_info->id, &action); > > > /* Set the action fields */ > rte_p4_table_action_field_set_by_name(port_id, action, "mod_id", > &mod_id, 3); > rte_p4_table_action_field_set_by_name(port_id, action, "port_id", > &target_port_id, 2); > > /* Add the entry */ > rte_p4_table_entry_add(port_id, tbl_info->id, key, action); > > ... > I think that introduce some API that knows P4 is the way to go, but I think that this should be a very simple API which calls rte_flow. > > > > > > > > > (*) Any ideas on how to extend or expand the rte_flow APIs to better > > > accommodate P4-based or other table-matching based devices. > > > > > > > Lets discuss any issue you have. > > > > > Your insights and feedback would be greatly appreciated! > > > > > > ======================= Post-Script > ============================ > > > > > > More details on the problem below, for anyone interested > > > > > > In P4, flow offloading can be implemented using the P4Runtime API, > > > which provides a standard interface for controlling and configuring > > > the data plane behavior of network devices. P4Runtime allows network > > > operators to dynamically add, modify, and remove flow rules in the > > > hardware forwarding tables of P4-enabled devices. > > > > > > The P4Runtime API is a table-based API, it assume the packet process > > > pipeline was consists of one or more key/action units (tables). In > > > P4Runtime, each table defines the fields to be matched and the actions > > > to be taken on incoming packets. During compilation, the P4 compiler > > > assigns a unique > > > uint32 ID to each table, action, and field, which is associated with > > > its corresponding string name. These IDs have no inherent relationship > > > to any network protocol but instead serve as a means to identify > > > different components of a P4 program within the P4Runtime API. > > > > > This is the concept of tables and groups in rte_flow. > > > > > If we choose to use rte_flow as the low-level API for P4Runtime, a > > > translation layer is needed in the application to map the P4 tables > > > and actions to the corresponding rte_flow rules. However, this > > > translation layer can be problematic as it is not easily scalable. > > > When the P4 pipeline is refined or updated, the translation rules may > > > also need to be updated, which can result in errors and reduced > efficiency. > > > > > I don't understand why. > > > > > On the other hand, a hardware vendor that provides a P4-enabled device > > > is required to implement an rte_flow interface in their DPDK PMD. > > > Typically, the > > > P4 compiler generates hints for the driver on how to map P4 tables to > > > hardware resources, and how to convert table entry add/modify/delete > > > actions into low-level hardware configurations. However, because > > > rte_flow is protocol-based, it poses an additional challenge for > > > driver developers, who must create another translation layer to > > > convert rte_flow tokens into P4 object identifiers. This translation > > > layer must be carefully designed and implemented to ensure optimal > > > performance and scalability, and to ensure that the driver can efficiently > > handle the dynamic nature of P4 programs. > > > > > Right, but some of the translation can be done in shared code by all PMDs > > and the translation is static for the compilation so inserting rules can be > > supper fast with no need for extra work. > > > > > To better understand the problem, let's consider the following example > > > that demonstrates how to use the P4Runtime API to program a rule for > > > processing a VXLAN packet. The rule matches a VXLAN packet, > > > decapsulates the tunnel header, and forwards it to a specific port. > > > > > > The P4 source code below describes the VXLAN decap table > > > decap_vxlan_tcp_table, which matches the outer IP address, VNI, inner > > > IP address, and inner TCP port. For each rule, four action > > > specifications can be selected. We will focus on one action > > > specification decap_vxlan_fwd that performs decapsulation and forwards > > the packet to a specific port. > > > > > > table decap_vxlan_tcp_table { > > > key = { > > > hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src"); > > > hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst"); > > > hdrs.vxlan[meta.depth-1].vni : exact @name("vni"); > > > hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src"); > > > hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst"); > > > hdrs.tcp.sport : exact @name("src_port"); > > > hdrs.tcp.dport : exact @name("dst_port"); > > > } > > > actions = { > > > @tableonly decap_vxlan_fwd; > > > @tableonly decap_vxlan_dnat_fwd; > > > @tableonly decap_vxlan_snat_fwd; > > > @defaultonly set_exception; > > > } > > > } > > Translate to rte_flow: > > template pattern relaxed_mode = 1 pattern = Ipv4_src / ipv4_dst / vni / > > ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = { > > tun_ip_src = &pattern[ipv4_src] > > .... > > } > > > ... > > > > > > action decap_vxlan_fwd(PortId_t port_id) { > > > meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4; > > > send_to_port(port_id); > > > } > > > > > Same as above just with action template > > > > > Below is an example of the hint that the compiler will generate for > > > the > > > decap_vxlan_tcp_table: > > > > > > Table ID: 8454144 > > > Name: decap_vxlan_tcp_table > > > Field ID Name Match Type Bit Width > > > Byte Width Byte Order > > > 1 tun_ip_src exact 32 > > > 4 network > > > 2 tun_ip_dst exact 32 > > > 4 network > > > 3 vni exact 24 > > > 3 network > > > 4 ipv4_src exact 32 > > > 4 network > > > 5 ipv4_dst exact 32 > > > 4 network > > > 6 src_port exact 16 > > > 2 network > > > 7 dst_port exact 16 > > > 2 network Spec ID Name > > > 8519716 decap_vxlan_fwd > > > 8519718 decap_vxlan_dnat_fwd > > > 8519720 decap_vxlan_snat_fwd > > > 8519695 set_exception > > > > > > And the hint of action spec "decap_vxlan_fwd" as below: > > > > > > Spec ID: 8519716 > > > Name: decap_vxlan_fwd > > > Field ID Name Bit Width Byte Width > > > Byte Order > > > 1 port_id 32 4 > > > host > > > > > > Please note that different compilers may assign different IDs. > > > > > > Below is an example of how to program a rule using the P4 runtime API > > > in JSON format. This rule matches fields and directs packets to port 5. > > > > > > { > > > "type": 1, //INSERT > > > "entity": { > > > "table_entry": { > > > "table_id": 8454144, > > > "match": [ > > > { "field_id": 1, "exact": { "value": [10, 0, 0, 1] } > > > }, // outer src IP = > > > 10.0.0.1 > > > { "field_id": 2, "exact": { "value": [10, 0, 0, 2] } > > > }, // outer dst IP = > > > 10.0.0.2 > > > { "field_id": 3, "exact": { "value": [0, 0, 10] } }, > > > // vni = 10, > > > { "field_id": 4, "exact": { "value": [192, 0, 0, 1] } > > > }, // inner src IP = > > > 192.0.0.1 > > > {"field_id": 5, "exact": { "value": [192, 0, 0, 2] } > > > }, // inner dst IP = > > > 192.0.0.2 > > > {"field_id": 6, "exact": { "value": [0, 200] } }, // > > > tcp src port = 200 > > > {"field_id": 7, "exact": { "value": [0, 201] } }, // > > > tcp dst port = 201 > > > ], > > > "action": { > > > "action": { > > > "action_id": 8519716, > > > "params": [ > > > { "param_id": 1, "value": [5, 0, 0, 0] } > > > ] > > > } > > > }, > > > ... > > > } > > > } ... > > > } > > > > > > Please note that this is only a part of the full command. For more > > > information, please refer to the p4runtime.proto[2] > > > > > > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html > > > 2. > > > > > > https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.p > > r > > > oto > > > > > > Thank you for your attention to this matter. > > > > > > > I think that we should schedule some meeting to see how much gaps we > > really have between the rte_flow and > > P4 and how we can improve the rte_flow to allow the best experience. > > Sound a good idea! > > > > > Regards > > > Qi