date:20241011



> On Sat, 12 Oct 2024 10:40:59 +0800
> Chaoyong He  wrote:
> 
> > The original logic keeps using the same name parameter when malloc
> > memory in secondary process, which may cause error when using multiple
> > PF cards.
> >
> > Fixes: 3b00109d2b65 ("net/nfp: add PF ID used to format symbols")
> > Cc: peng.zh...@corigine.com
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Chaoyong He 
> > Reviewed-by: Long Wu 
> > Reviewed-by: Peng Zhang 
> 
> How, the name is ignore mostly by malloc. Only used for tracing.

Okay, If you insist, we can abandon this one.

Re: [PATCH 1/4] ethdev: verify queue ID when Tx done cleanup

On 9/5/2024 8:33 AM, Andrew Rybchenko wrote:
> On 9/5/24 09:46, Jie Hai wrote:
>> From: Chengwen Feng 
>>
>> Verify queue_id for rte_eth_tx_done_cleanup API.
> 
> If I'm not mistaken the function is considered data path API (fast).
> If so, it should not validate it's parameters as in rte_eth_tx_burst().
> It may be done under RTE_ETHDEV_DEBUG_TX only.
> 
> May be documentation should be fixed to highlight it.
> 
> And yes, current implementation looks inconsistent from this point of
> view since port_id and presence of callback are checked.
> 
> Anyway motivation in the patch description is insufficient.
> 

Hi Chengwen,

I agree with Andrew, to not add checks to the datapath function.

Can you please explain more why this check is needed at first place?
Is it for a specific usecase?

>>
>> Fixes: 44a718c457b5 ("ethdev: add API to free consumed buffers in Tx
>> ring")
>> Cc: sta...@dpdk.org
>>
>> Signed-off-by: Chengwen Feng 
>> ---
>>   lib/ethdev/rte_ethdev.c | 4 
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>> index f1c658f49e80..998deb5ab101 100644
>> --- a/lib/ethdev/rte_ethdev.c
>> +++ b/lib/ethdev/rte_ethdev.c
>> @@ -2823,6 +2823,10 @@ rte_eth_tx_done_cleanup(uint16_t port_id,
>> uint16_t queue_id, uint32_t free_cnt)
>>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>   dev = &rte_eth_devices[port_id];
>>   +    ret = eth_dev_validate_tx_queue(dev, queue_id);
>> +    if (ret != 0)
>> +    return ret;
>> +
>>   if (*dev->dev_ops->tx_done_cleanup == NULL)
>>   return -ENOTSUP;
>>   
>

Re: [PATCH v2 2/2] examples/l3fwd: fix read beyond array boundaries in ACL mode

On Tue, 30 Jul 2024 13:22:35 +0100
Konstantin Ananyev  wrote:

> From: Konstantin Ananyev 
> 
> With commit: ACL mode now can use send_packets_multi().
> What I missed with that changes: send_packets_multi() can't deal
> properly with input dst_port[i] == BAD_PORT (though it can set
> it itself), as it uses dst_port[i] values to read L2 addresses for the port
> and assumes dst_port[] to contain valid only values.
> To fix that just add a check that all dst_port[] entries are valid before
> calling : send_packets_multi(). Otherwhise  use  send_packets_single().
> An alternative, and probably more logical approach would be to
> re-arrange send_packets_multi() so that it updates L2 packet headers
> at the very last state - when dst_port[] are finialized.
> But that would affect all other modes, but that would affect all other
> modes and will require much more code changes and testing.
> 
> Bugzilla ID: 1502
> Fixes: aa7c6077c19b ("examples/l3fwd: avoid packets reorder in ACL mode")
> 
> Reported-by: Song Jiale 
> Signed-off-by: Konstantin Ananyev 

Please fix spelling errors in this version.


WARNING:TYPO_SPELLING: 'Otherwhise' may be misspelled - perhaps 'Otherwise'?
#71: 
calling : send_packets_multi(). Otherwhise  use  send_packets_single().

WARNING:TYPO_SPELLING: 'deined' may be misspelled - perhaps 'denied'?
#121: FILE: examples/l3fwd/l3fwd_acl.c:1042:
+   /* bad or deined by ACL rule packets */

[PATCH v2 09/10] net/nfp: modify the comment of some control messages

The comment of some control messages are not right, which conflict
with the data structure and may confuse the other developers.

Signed-off-by: Chaoyong He 
Reviewed-by: Long Wu 
Acked-by: Stephen Hemminger 
---
 drivers/net/nfp/flower/nfp_flower_cmsg.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/nfp/flower/nfp_flower_cmsg.h 
b/drivers/net/nfp/flower/nfp_flower_cmsg.h
index 5fc4210d8b..eda047a404 100644
--- a/drivers/net/nfp/flower/nfp_flower_cmsg.h
+++ b/drivers/net/nfp/flower/nfp_flower_cmsg.h
@@ -738,7 +738,7 @@ struct nfp_fl_act_output {
  *3   2   1
  *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |   -   |opcode |   -   |jump_id|   -   |
+ * | res |  opcode |  res  | len_lw|   -   |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * | eth_dst_47_16_mask|
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -777,7 +777,7 @@ struct nfp_fl_act_push_vlan {
  *3   2   1
  *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |   -   |opcode |   |jump_id|   -   |
+ * | res |  opcode |  res  | len_lw|   -   |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * | ipv4_saddr_mask   |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -802,7 +802,7 @@ struct nfp_fl_act_set_ip4_addrs {
  *3   2   1
  *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |   -   |opcode |   |jump_id|ttl_mask   |   tos_mask|
+ * | res |  opcode |  res  | len_lw|ttl_mask   |   tos_mask|
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * |   ttl |  tos  |   0   |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -821,7 +821,7 @@ struct nfp_fl_act_set_ip4_ttl_tos {
  *3   2   1
  *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |   -   |opcode |   |jump_id|   -   |
+ * | res |  opcode |  res  | len_lw|   -   |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * | ipv6_addr_127_96_mask |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -854,7 +854,7 @@ struct nfp_fl_act_set_ipv6_addr {
  *3   2   1
  *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |   -   |opcode |   |jump_id|  tclass_mask  |  hlimit_mask  |
+ * | res |  opcode |  res  | len_lw|  tclass_mask  |  hlimit_mask  |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * |   0   |  tclass   |  hlimit   |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -879,7 +879,7 @@ struct nfp_fl_act_set_ipv6_tc_hl_fl {
  *3   2   1
  *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |   -   |opcode |   |jump_id|   -   |
+ * | res |  opcode |  res  | len_lw|   -   |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * |  src_mask | dst_mask  |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -900,7 +900,7 @@ struct nfp_fl_act_set_tport {
  *3   2   1
  *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |  -  |  opcode |   |jump_id|  -  |M|   - |V|
+ * | res |  opcode |  res  | len_lw|  -  |M|   - |V|
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * | ipv6_daddr_127_96 / ipv4_daddr|
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-- 
2.39.1

[PATCH v2 10/10] net/nfp: fix memory leak in VF initialization logic

Fix one memory leak problem in the logic of VF initialization.

Fixes: d81e2b514dc9 ("net/nfp: move device info into process private data")
Cc: sta...@dpdk.org

Signed-off-by: Chaoyong He 
Reviewed-by: Long Wu 
Reviewed-by: Peng Zhang 
Acked-by: Stephen Hemminger 
---
 drivers/net/nfp/nfp_ethdev_vf.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index ab413a2c5a..0aadca9010 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -322,12 +322,13 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
 
-   net_hw->eth_xstats_base = rte_malloc("rte_eth_xstat",
-   sizeof(struct rte_eth_xstat) * 
nfp_net_xstats_size(eth_dev), 0);
+   net_hw->eth_xstats_base = rte_calloc("rte_eth_xstat",
+   nfp_net_xstats_size(eth_dev), sizeof(struct 
rte_eth_xstat), 0);
if (net_hw->eth_xstats_base == NULL) {
PMD_INIT_LOG(ERR, "No memory for xstats base values on device 
%s!",
pci_dev->device.name);
-   return -ENOMEM;
+   err = -ENOMEM;
+   goto hw_priv_free;
}
 
/* Work out where in the BAR the queues start. */
-- 
2.39.1

[PATCH v2 08/10] net/nfp: fix problem caused by FEC set

The return value of 'nfp_eth_set_fec()' is three ways, the original
logic considered it as two ways wrongly.

Fixes: 37bd1b843a20 ("net/nfp: support setting FEC mode")
Cc: zerun...@corigine.com
Cc: sta...@dpdk.org

Signed-off-by: Chaoyong He 
Reviewed-by: Long Wu 
Reviewed-by: Peng Zhang 
Acked-by: Stephen Hemminger 
---
 drivers/net/nfp/nfp_net_common.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/nfp/nfp_net_common.c b/drivers/net/nfp/nfp_net_common.c
index 5c3a9a7ae7..b986ed4622 100644
--- a/drivers/net/nfp/nfp_net_common.c
+++ b/drivers/net/nfp/nfp_net_common.c
@@ -2692,6 +2692,7 @@ int
 nfp_net_fec_set(struct rte_eth_dev *dev,
uint32_t fec_capa)
 {
+   int ret;
uint8_t idx;
enum nfp_eth_fec fec;
uint32_t supported_fec;
@@ -2724,7 +2725,13 @@ nfp_net_fec_set(struct rte_eth_dev *dev,
return -EIO;
}
 
-   return nfp_eth_set_fec(hw_priv->pf_dev->cpp, eth_port->index, fec);
+   ret = nfp_eth_set_fec(hw_priv->pf_dev->cpp, eth_port->index, fec);
+   if (ret < 0) {
+   PMD_DRV_LOG(ERR, "NFP set FEC mode failed.");
+   return ret;
+   }
+
+   return 0;
 }
 
 uint32_t
-- 
2.39.1

Re: [PATCH] devtools: exclude common EAL header from attribute check

On Thu, Oct 10, 2024 at 7:41 PM Stephen Hemminger
 wrote:
> > With the addition of SKIP_FILES in
> > commit 7e421ae345f4 ("devtools: support skipping forbid rule check"),
> > it is possible to avoid false positive on adding
> > attribute wrapper in rte_common.h.
> >
> > Signed-off-by: David Marchand 
> Acked-by: Stephen Hemminger 

Applied, thanks.


-- 
David Marchand

Re: [PATCH v3 1/2] fib: implement RCU rule reclamation

On Thu, Oct 10, 2024 at 1:27 PM Vladimir Medvedkin
 wrote:
>
> Currently, for DIR24-8 algorithm, the tbl8 group is freed even though the
> readers might be using the tbl8 group entries. The freed tbl8 group can
> be reallocated quickly. As a result, lookup may be performed incorrectly.
>
> To address that, RCU QSBR is integrated for safe tbl8 group reclamation.
>
> Signed-off-by: Vladimir Medvedkin 
> Acked-by: Stephen Hemminger 

We got one false positive in bitops unit test, and one strange failure on ARM.

Recheck-request: iol-unit-amd64-testing, iol-unit-arm64-testing


-- 
David Marchand

Re: [PATCH v9 1/7] eal: add static per-lcore memory allocation facility

2024-10-11 Thread Thomas Monjalon

11/10/2024 10:04, Mattias Rönnblom:
> On 2024-10-10 23:24, Thomas Monjalon wrote:
> > Hello,
> > 
> > This new feature looks to bring something interesting to DPDK.
> > There was a good amount of discussion and review,
> > and there is a real effort of documentation.
> > 
> > However, some choices done in this implementation
> > were not explained or advertised enough in the documentation,
> > in my opinion.
> > 
> 
> Are those of relevance to the API user?

I think it helps to understand when we should use this API.
Such design explanation may come in the prog guide RST file.


> > I think the first thing to add is an explanation of the memory layout.
> > Maybe that a SVG drawing would help to show how it is stored.
> 
> That would be helpful to someone wanting to understand the internals. 
> But where should that go? If it's put in the API, it will also obscure 
> the *actual* API documentation.

Of course not in API doc.
I'm talking about moving a lot of explanations in the prog guide,
and add a bit more about the layout.

> I have some drawings already, and I agree they are very helpful - both 
> in explaining how things work, and making obvious why the memory layout 
> resulting from the use of lcore variables are superior to that of the 
> lcore id-index static array approach.

Cool, please add some in the prog guide.

> > We also need to explain why it is not using rte_malloc.
> > 
> > Also please could you re-read the doc and comments in detail?
> > I think some words are missing and there are typos.
> > While at it, please allow for easy update of the text
> > by starting each sentence on a new line.
> > Breaking lines logically is better for future patches.
> > One more advice: avoid very long sentences.
> 
> I've gone through the documentation and will post a new patch set.

OK thanks.

> There's been a lot of comments and discussion on this patch set. Did you 
> have anything in particular in mind?

Nothing more than what I raised in this review.


> > Do you have benchmarks results of the modules using such variables
> > (power, random, service)?
> > It would be interesting to compare time efficiency and memory usage
> > before/after, with different number of threads.
> > 
> 
> I have the dummy modules of test_lcore_var_perf.c, which show the 
> performance benefits as the number of modules using lcore variables 
> increases.
> 
> That said, the gains are hard to quantify with micro benchmarks, and for 
> real-world performance, one really has to start using the facility at 
> scale before anything interesting may happen.
> 
> Keep in mind however, that while this is new to DPDK, similar facilities 
> already exists your favorite UN*X kernel. The implementation is 
> different, but I think it's accurate to say the goal and the effects 
> should be the same.
> 
> One can also run the perf autotest for RTE random, but such tests only 
> show lcore variables doesn't make things significantly worse when the L1 
> cache is essentially unused. (In fact, the lcore variable-enabled 
> rte_random.c somewhat counter-intuitively generates a 64-bit number 1 
> TSC cycle faster than the old version on my system.)
> 
> Just to be clear: it's the footprint in the core-private caches we are 
> attempting to reduce.

OK


> > 10/10/2024 16:21, Mattias Rönnblom:
> >> Introduce DPDK per-lcore id variables, or lcore variables for short.
> >>
> >> An lcore variable has one value for every current and future lcore
> >> id-equipped thread.
> > 
> > I find it difficult to read "lcore id-equipped thread".
> > Can we just say "DPDK thread"?
> 
> Sure, if you point me to a definition of what a DPDK thread is.
> 
> I can think of at least four potential definitions
> * An EAL thread
> * An EAL thread or a registered non-EAL thread
> * Any thread calling into DPDK APIs
> * Any thread living in a DPDK process

OK I understand your point.
If we move the design explanations in the prog guide,
we can explain this point in the introduction of the chapter.


> > [...]
> >> +An application, a DPDK library or PMD may keep opt to keep per-thread
> >> +state.
> > 
> > I don't understand this sentence.
> 
> Which part is unclear?

"keep opt to keep per-thread"
What do you mean?


[...]
> >> +Variables with thread-local storage are allocated at the time of
> >> +thread creation, and exists until the thread terminates, for every
> >> +thread in the process. Only very small object should be allocated in
> >> +TLS, since large TLS objects significantly slows down thread creation
> >> +and may needlessly increase memory footprint for application that make
> >> +extensive use of unregistered threads.
> > 
> > I don't understand the relation with non-DPDK threads.
> 
> __thread isn't just for "DPDK threads". It will allocate memory on all 
> threads in the process.

OK
May be good to add as a note.


> > [...]
> >> +#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR * RTE_MAX_LCORE)
> > 
> > With #define RTE_MAX_LCORE_VAR 1048576,
> > LCORE_BUFF

RE: [RFC 0/4] ethdev: rework config restore



> -Original Message-
> From: Ferruh Yigit 
> Sent: Friday, October 11, 2024 02:03
> To: Konstantin Ananyev ; Dariusz Sosnowski
> ; NBU-Contact-Thomas Monjalon (EXTERNAL)
> ; Andrew Rybchenko 
> Cc: dev@dpdk.org; Bruce Richardson 
> Subject: Re: [RFC 0/4] ethdev: rework config restore
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/10/2024 11:58 PM, Konstantin Ananyev wrote:
> >
> >
> > config restore
> 
>  External email: Use caution opening links or attachments
> 
> 
>  On 10/10/2024 1:08 PM, Dariusz Sosnowski wrote:
> >> -Original Message-
> >> From: Ferruh Yigit 
> >> Sent: Thursday, October 10, 2024 01:17
> >> To: Dariusz Sosnowski ; Konstantin Ananyev
> >> ; NBU-Contact-Thomas Monjalon
> >> (EXTERNAL) ; Andrew Rybchenko
> >> 
> >> Cc: dev@dpdk.org; Bruce Richardson 
> >> Subject: Re: [RFC 0/4] ethdev: rework config restore
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> On 10/9/2024 5:18 PM, Dariusz Sosnowski wrote:
>  -Original Message-
>  From: Ferruh Yigit 
>  Sent: Wednesday, October 9, 2024 03:08
>  To: Konstantin Ananyev ; Dariusz
>  Sosnowski ; NBU-Contact-Thomas Monjalon
>  (EXTERNAL) ; Andrew Rybchenko
>  
>  Cc: dev@dpdk.org; Bruce Richardson 
>  Subject: Re: [RFC 0/4] ethdev: rework config restore
> 
>  External email: Use caution opening links or attachments
> 
> 
>  On 10/8/2024 6:21 PM, Konstantin Ananyev wrote:
> >
> >
> >> We have been working on optimizing the latency of calls
> >> to rte_eth_dev_start(), on ports spawned by mlx5 PMD.
> >> Most of the work requires changes in the implementation
> >> of
> >> .dev_start() PMD callback, but I also wanted to start a
> >> discussion regarding configuration restore.
> >>
> >> rte_eth_dev_start() does a few things on top of calling
> >> .dev_start()
>  callback:
> >>
> >> - Before calling it:
> >> - eth_dev_mac_restore() - if device supports
> >> RTE_ETH_DEV_NOLIVE_MAC_ADDR;
> >> - After calling it:
> >> - eth_dev_mac_restore() - if device does not support
> > RTE_ETH_DEV_NOLIVE_MAC_ADDR;
> >> - restore promiscuous config
> >> - restore all multicast config
> >>
> >> eth_dev_mac_restore() iterates over all known MAC
> >> addresses - stored in rte_eth_dev_data.mac_addrs array -
> >> and calls
> >> .mac_addr_set() and .mac_addr_add() callbacks to apply
> >> these MAC
>  addresses.
> >>
> >> Promiscuous config restore checks if promiscuous mode is
> >> enabled or not, and calls .promiscuous_enable() or
> >> .promiscuous_disable()
>  callback.
> >>
> >> All multicast config restore checks if all multicast mode
> >> is enabled or not, and calls .allmulticast_enable() or
> >> .allmulticast_disable()
>  callback.
> >>
> >> Callbacks are called directly in all of these cases, to
> >> bypass the checks for applying the same configuration,
> >> which exist in relevant
>  APIs.
> >> Checks are bypassed to force drivers to reapply the
> configuration.
> >>
> >> Let's consider what happens in the following sequence of API
> calls.
> >>
> >> 1. rte_eth_dev_configure() 2. rte_eth_tx_queue_setup() 3.
> >> rte_eth_rx_queue_setup() 4. rte_eth_promiscuous_enable()
> >> - Call dev->dev_ops->promiscuous_enable()
> >> - Stores promiscuous state in dev->data->promiscuous 5.
> >> rte_eth_allmulticast_enable()
> >> - Call dev->dev_ops->allmulticast_enable()
> >> - Stores allmulticast state in dev->data->allmulticast 6.
> >> rte_eth_dev_start()
> >> - Call dev->dev_ops->dev_start()
> >> - Call dev->dev_ops->mac_addr_set() - apply default MAC
> address
> >> - Call dev->dev_ops->promiscuous_enable()
> >> - Call dev->dev_ops->allmulticast_enable()
> >>
> >> Even though all configuration is available in dev->data
> >> after step 5, library forces reapplying this configuration in 
> >> step 6.
> >>
> >> In mlx5 PMD case all relevant callbacks require
> >> communication with the kernel driver, to configure the
> >> device (mlx5 PMD must create/destroy new kernel flow
> >> rules and/

Re: [PATCH v12 6/7] eal: add unit tests for atomic bit access functions

On Thu, Oct 10, 2024 at 1:56 PM Mattias Rönnblom  wrote:
>
> On 2024-10-10 12:45, David Marchand wrote:
> > On Fri, Sep 20, 2024 at 12:57 PM Mattias Rönnblom
> >  wrote:
> >> +   static int  \
> >> +   run_parallel_test_and_modify ## size(void *arg) \
> >> +   {   \
> >> +   struct parallel_test_and_set_lcore ## size *lcore = arg; \
> >> +   uint64_t deadline = rte_get_timer_cycles() +\
> >> +   PARALLEL_TEST_RUNTIME * rte_get_timer_hz(); \
> >> +   do {\
> >> +   bool old_value; \
> >> +   bool new_value = rte_rand() & 1;\
> >> +   bool use_assign = rte_rand() & 1;   \
> >> +   \
> >> +   if (use_assign) \
> >> +   old_value = 
> >> rte_bit_atomic_test_and_assign( \
> >> +   lcore->word, lcore->bit, 
> >> new_value, \
> >> +   rte_memory_order_relaxed);  \
> >> +   else\
> >> +   old_value = new_value ? \
> >> +   rte_bit_atomic_test_and_set(\
> >> +   lcore->word, lcore->bit, \
> >> +   rte_memory_order_relaxed) 
> >> : \
> >> +   rte_bit_atomic_test_and_clear(  \
> >> +   lcore->word, lcore->bit, \
> >> +   rte_memory_order_relaxed); 
> >> \
> >> +   if (old_value != new_value) \
> >> +   lcore->flips++; \
> >> +   } while (rte_get_timer_cycles() < deadline);\
> >> +   \
> >> +   return 0;   \
> >> +   }   \
> >> +   \
> >> +   static int  \
> >> +   test_bit_atomic_parallel_test_and_modify ## size(void)  \
> >> +   {   \
> >> +   unsigned int worker_lcore_id;   \
> >> +   uint ## size ## _t word = 0;\
> >> +   unsigned int bit = rte_rand_max(size);  \
> >> +   struct parallel_test_and_set_lcore ## size lmain = {\
> >> +   .word = &word,  \
> >> +   .bit = bit  \
> >> +   };  \
> >> +   struct parallel_test_and_set_lcore ## size lworker = {  \
> >> +   .word = &word,  \
> >> +   .bit = bit  \
> >> +   };  \
> >> +   \
> >> +   if (rte_lcore_count() < 2) {\
> >> +   printf("Need multiple cores to run parallel 
> >> test.\n"); \
> >> +   return TEST_SKIPPED;\
> >> +   }   \
> >> +   \
> >> +   worker_lcore_id = rte_get_next_lcore(-1, 1, 0); \
> >> +   \
> >> +   int rc = 
> >> rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
> >> +  &lworker, worker_lcore_id); 
> >> \
> >> +   TEST_ASSERT(rc == 0, "Worker thread launch failed");\
> >> +   \
> >> +   run_parallel_test_and_modify ## size(&lmain);   \
> >> +   \
> >> +   rte_eal_mp_wait_lcore();\
> >> +   \
> >> +

RE: [PATCH v9 1/7] eal: add static per-lcore memory allocation facility

2024-10-11 Thread Morten Brørup

> >> +/**
> >> + * Get pointer to lcore variable instance of the current thread.
> >> + *
> >> + * May only be used by EAL threads and registered non-EAL threads.
> >> + */
> >> +#define RTE_LCORE_VAR_VALUE(handle) \
> >
> > RTE_LCORE_VAR_LOCAL?
> >
> 
> Why is that better?
> 
> Maybe Morten can remind me here, but I think we had a discussion about
> RTE_LCORE_VAR() versus RTE_LCORE_VAR_VALUE() at some point, and
> RTE_LCORE_VAR_VALUE() was deemed more clear.

Yes, we had the discussion, and reached this conclusion.

However, having been away from it for awhile, and now coming back to it, I lean 
towards the shorter names, although they are not 100 % correct.

I am usually a proponent of (very) long - self-explanatory - variable names.
But in this case, brevity will be better for reviewing code using the library.

[PATCH v10 1/7] eal: add static per-lcore memory allocation facility

Introduce DPDK per-lcore id variables, or lcore variables for short.

An lcore variable has one value for every current and future lcore
id-equipped thread.

The primary  use case is for statically allocating
small, frequently-accessed data structures, for which one instance
should exist for each lcore.

Lcore variables are similar to thread-local storage (TLS, e.g., C11
_Thread_local), but decoupling the values' life time with that of the
threads.

Lcore variables are also similar in terms of functionality provided by
FreeBSD kernel's DPCPU_*() family of macros and the associated
build-time machinery. DPCPU uses linker scripts, which effectively
prevents the reuse of its, otherwise seemingly viable, approach.

The currently-prevailing way to solve the same problem as lcore
variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized
array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of
lcore variables over this approach is that data related to the same
lcore now is close (spatially, in memory), rather than data used by
the same module, which in turn avoid excessive use of padding,
polluting caches with unused data.

Signed-off-by: Mattias Rönnblom 
Acked-by: Morten Brørup 
Acked-by: Konstantin Ananyev 
Acked-by: Chengwen Feng 
Acked-by: Stephen Hemminger 

--

PATCH v10:
 * Improve documentation grammar and spelling. (Stephen Hemminger,
   Thomas Monjalon)
 * Add version.map DPDK version comment. (Thomas Monjalon)

PATCH v9:
 * Fixed merge conflicts in release notes.

PATCH v8:
 * Work around missing max_align_t definition in MSVC. (Morten Brørup)

PATCH v7:
 * Add () to the FOREACH lcore id macro parameter, to allow arbitrary
   expression, not just a simple variable name, being passed.
   (Konstantin Ananyev)

PATCH v6:
 * Have API user provide the loop variable in the FOREACH macro, to
   avoid subtle bugs where the loop variable name clashes with some
   other user-defined variable. (Konstantin Ananyev)

PATCH v5:
 * Update EAL programming guide.

PATCH v2:
 * Add Windows support. (Morten Brørup)
 * Fix lcore variables API index reference. (Morten Brørup)
 * Various improvements of the API documentation. (Morten Brørup)
 * Elimination of unused symbol in version.map. (Morten Brørup)

PATCH:
 * Update MAINTAINERS and release notes.
 * Stop covering included files in extern "C" {}.

RFC v6:
 * Include  to get aligned_alloc().
 * Tweak documentation (grammar).
 * Provide API-level guarantees that lcore variable values take on an
   initial value of zero.
 * Fix misplaced __rte_cache_aligned in the API doc example.

RFC v5:
 * In Doxygen, consistenly use @ (and not \).
 * The RTE_LCORE_VAR_GET() and SET() convience access macros
   covered an uncommon use case, where the lcore value is of a
   primitive type, rather than a struct, and is thus eliminated
   from the API. (Morten Brørup)
 * In the wake up GET()/SET() removeal, rename RTE_LCORE_VAR_PTR()
   RTE_LCORE_VAR_VALUE().
 * The underscores are removed from __rte_lcore_var_lcore_ptr() to
   signal that this function is a part of the public API.
 * Macro arguments are documented.

RFV v4:
 * Replace large static array with libc heap-allocated memory. One
   implication of this change is there no longer exists a fixed upper
   bound for the total amount of memory used by lcore variables.
   RTE_MAX_LCORE_VAR has changed meaning, and now represent the
   maximum size of any individual lcore variable value.
 * Fix issues in example. (Morten Brørup)
 * Improve access macro type checking. (Morten Brørup)
 * Refer to the lcore variable handle as "handle" and not "name" in
   various macros.
 * Document lack of thread safety in rte_lcore_var_alloc().
 * Provide API-level assurance the lcore variable handle is
   always non-NULL, to all applications to use NULL to mean
   "not yet allocated".
 * Note zero-sized allocations are not allowed.
 * Give API-level guarantee the lcore variable values are zeroed.

RFC v3:
 * Replace use of GCC-specific alignof() with alignof().
 * Update example to reflect FOREACH macro name change (in RFC v2).

RFC v2:
 * Use alignof to derive alignment requirements. (Morten Brørup)
 * Change name of FOREACH to make it distinct from 's
   *per-EAL-thread* RTE_LCORE_FOREACH(). (Morten Brørup)
 * Allow user-specified alignment, but limit max to cache line size.
---
 MAINTAINERS   |   6 +
 config/rte_config.h   |   1 +
 doc/api/doxy-api-index.md |   1 +
 .../prog_guide/env_abstraction_layer.rst  |  43 +-
 doc/guides/rel_notes/release_24_11.rst|  14 +
 lib/eal/common/eal_common_lcore_var.c |  85 
 lib/eal/common/meson.build|   1 +
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_lcore_var.h   | 389 ++
 lib/eal/version.map   |   3 +
 10 files changed, 538 insertions(+), 6 deletions(-)
 create mode 100644 lib/eal/common/eal

[PATCH v10 2/7] eal: add lcore variable functional tests

Add functional test suite to exercise the  API.

Signed-off-by: Mattias Rönnblom 
Acked-by: Morten Brørup 
Acked-by: Chengwen Feng 
Acked-by: Stephen Hemminger 

--

PATCH v6:
 * Update FOREACH invocations to match new API.

RFC v5:
 * Adapt tests to reflect the removal of the GET() and SET() macros.

RFC v4:
 * Check all lcore id's values for all variables in the many variables
   test case.
 * Introduce test case for max-sized lcore variables.

RFC v2:
 * Improve alignment-related test coverage.
---
 app/test/meson.build  |   1 +
 app/test/test_lcore_var.c | 436 ++
 2 files changed, 437 insertions(+)
 create mode 100644 app/test/test_lcore_var.c

diff --git a/app/test/meson.build b/app/test/meson.build
index e29258e6ec..48279522f0 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -103,6 +103,7 @@ source_file_deps = {
 'test_ipsec_sad.c': ['ipsec'],
 'test_kvargs.c': ['kvargs'],
 'test_latencystats.c': ['ethdev', 'latencystats', 'metrics'] + 
sample_packet_forward_deps,
+'test_lcore_var.c': [],
 'test_lcores.c': [],
 'test_link_bonding.c': ['ethdev', 'net_bond',
 'net'] + packet_burst_generator_deps + virtual_pmd_deps,
diff --git a/app/test/test_lcore_var.c b/app/test/test_lcore_var.c
new file mode 100644
index 00..2a1f258548
--- /dev/null
+++ b/app/test/test_lcore_var.c
@@ -0,0 +1,436 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Ericsson AB
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define MIN_LCORES 2
+
+RTE_LCORE_VAR_HANDLE(int, test_int);
+RTE_LCORE_VAR_HANDLE(char, test_char);
+RTE_LCORE_VAR_HANDLE(long, test_long_sized);
+RTE_LCORE_VAR_HANDLE(short, test_short);
+RTE_LCORE_VAR_HANDLE(long, test_long_sized_aligned);
+
+struct int_checker_state {
+   int old_value;
+   int new_value;
+   bool success;
+};
+
+static void
+rand_blk(void *blk, size_t size)
+{
+   size_t i;
+
+   for (i = 0; i < size; i++)
+   ((unsigned char *)blk)[i] = (unsigned char)rte_rand();
+}
+
+static bool
+is_ptr_aligned(const void *ptr, size_t align)
+{
+   return ptr != NULL ? (uintptr_t)ptr % align == 0 : false;
+}
+
+static int
+check_int(void *arg)
+{
+   struct int_checker_state *state = arg;
+
+   int *ptr = RTE_LCORE_VAR_VALUE(test_int);
+
+   bool naturally_aligned = is_ptr_aligned(ptr, sizeof(int));
+
+   bool equal = *(RTE_LCORE_VAR_VALUE(test_int)) == state->old_value;
+
+   state->success = equal && naturally_aligned;
+
+   *ptr = state->new_value;
+
+   return 0;
+}
+
+RTE_LCORE_VAR_INIT(test_int);
+RTE_LCORE_VAR_INIT(test_char);
+RTE_LCORE_VAR_INIT_SIZE(test_long_sized, 32);
+RTE_LCORE_VAR_INIT(test_short);
+RTE_LCORE_VAR_INIT_SIZE_ALIGN(test_long_sized_aligned, sizeof(long),
+ RTE_CACHE_LINE_SIZE);
+
+static int
+test_int_lvar(void)
+{
+   unsigned int lcore_id;
+
+   struct int_checker_state states[RTE_MAX_LCORE] = {};
+
+   RTE_LCORE_FOREACH_WORKER(lcore_id) {
+   struct int_checker_state *state = &states[lcore_id];
+
+   state->old_value = (int)rte_rand();
+   state->new_value = (int)rte_rand();
+
+   *RTE_LCORE_VAR_LCORE_VALUE(lcore_id, test_int) =
+   state->old_value;
+   }
+
+   RTE_LCORE_FOREACH_WORKER(lcore_id)
+   rte_eal_remote_launch(check_int, &states[lcore_id], lcore_id);
+
+   rte_eal_mp_wait_lcore();
+
+   RTE_LCORE_FOREACH_WORKER(lcore_id) {
+   struct int_checker_state *state = &states[lcore_id];
+   int value;
+
+   TEST_ASSERT(state->success, "Unexpected value "
+   "encountered on lcore %d", lcore_id);
+
+   value = *RTE_LCORE_VAR_LCORE_VALUE(lcore_id, test_int);
+   TEST_ASSERT_EQUAL(state->new_value, value,
+ "Lcore %d failed to update int", lcore_id);
+   }
+
+   /* take the opportunity to test the foreach macro */
+   int *v;
+   unsigned int i = 0;
+   RTE_LCORE_VAR_FOREACH_VALUE(lcore_id, v, test_int) {
+   TEST_ASSERT_EQUAL(i, lcore_id, "Encountered lcore id %d "
+ "while expecting %d during iteration",
+ lcore_id, i);
+   TEST_ASSERT_EQUAL(states[lcore_id].new_value, *v,
+ "Unexpected value on lcore %d during "
+ "iteration", lcore_id);
+   i++;
+   }
+
+   return TEST_SUCCESS;
+}
+
+static int
+test_sized_alignment(void)
+{
+   unsigned int lcore_id;
+   long *v;
+
+   RTE_LCORE_VAR_FOREACH_VALUE(lcore_id, v, test_long_sized) {
+   TEST_ASSERT(is_ptr_aligned(v, alignof(long)),
+   "Type-derived alignment failed");
+   }
+
+   RTE_LCORE_VAR_FOREACH_VALUE(lcore_i

[PATCH v10 4/7] random: keep PRNG state in lcore variable

Replace keeping PRNG state in a RTE_MAX_LCORE-sized static array of
cache-aligned and RTE_CACHE_GUARDed struct instances with keeping the
same state in a more cache-friendly lcore variable.

Signed-off-by: Mattias Rönnblom 
Acked-by: Morten Brørup 
Acked-by: Konstantin Ananyev 
Acked-by: Chengwen Feng 
Acked-by: Stephen Hemminger 

--

RFC v3:
 * Remove cache alignment on unregistered threads' rte_rand_state.
   (Morten Brørup)
---
 lib/eal/common/rte_random.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c
index 90e91b3c4f..a8d00308dd 100644
--- a/lib/eal/common/rte_random.c
+++ b/lib/eal/common/rte_random.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct __rte_cache_aligned rte_rand_state {
@@ -19,14 +20,12 @@ struct __rte_cache_aligned rte_rand_state {
uint64_t z3;
uint64_t z4;
uint64_t z5;
-   RTE_CACHE_GUARD;
 };
 
-/* One instance each for every lcore id-equipped thread, and one
- * additional instance to be shared by all others threads (i.e., all
- * unregistered non-EAL threads).
- */
-static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1];
+RTE_LCORE_VAR_HANDLE(struct rte_rand_state, rand_state);
+
+/* instance to be shared by all unregistered non-EAL threads */
+static struct rte_rand_state unregistered_rand_state;
 
 static uint32_t
 __rte_rand_lcg32(uint32_t *seed)
@@ -85,8 +84,14 @@ rte_srand(uint64_t seed)
unsigned int lcore_id;
 
/* add lcore_id to seed to avoid having the same sequence */
-   for (lcore_id = 0; lcore_id < RTE_DIM(rand_states); lcore_id++)
-   __rte_srand_lfsr258(seed + lcore_id, &rand_states[lcore_id]);
+   for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+   struct rte_rand_state *lcore_state =
+   RTE_LCORE_VAR_LCORE_VALUE(lcore_id, rand_state);
+
+   __rte_srand_lfsr258(seed + lcore_id, lcore_state);
+   }
+
+   __rte_srand_lfsr258(seed + lcore_id, &unregistered_rand_state);
 }
 
 static __rte_always_inline uint64_t
@@ -124,11 +129,10 @@ struct rte_rand_state *__rte_rand_get_state(void)
 
idx = rte_lcore_id();
 
-   /* last instance reserved for unregistered non-EAL threads */
if (unlikely(idx == LCORE_ID_ANY))
-   idx = RTE_MAX_LCORE;
+   return &unregistered_rand_state;
 
-   return &rand_states[idx];
+   return RTE_LCORE_VAR_VALUE(rand_state);
 }
 
 uint64_t
@@ -228,6 +232,8 @@ RTE_INIT(rte_rand_init)
 {
uint64_t seed;
 
+   RTE_LCORE_VAR_ALLOC(rand_state);
+
seed = __rte_random_initial_seed();
 
rte_srand(seed);
-- 
2.43.0

[PATCH v10 3/7] eal: add lcore variable performance test

Add basic micro benchmark for lcore variables, in an attempt to assure
that the overhead isn't significantly greater than alternative
approaches, in scenarios where the benefits aren't expected to show up
(i.e., when plenty of cache is available compared to the working set
size of the per-lcore data).

Signed-off-by: Mattias Rönnblom 
Acked-by: Chengwen Feng 
Acked-by: Stephen Hemminger 
Acked-by: Morten Brørup 

--

PATCH v8:
 * Fix spelling. (Morten Brørup)

PATCH v6:
 * Use floating point math when calculating per-update latency.
   (Morten Brørup)

PATCH v5:
 * Add variant of thread-local storage with initialization performed
   at the time of thread creation to the benchmark scenarios. (Morten
   Brørup)

PATCH v4:
 * Rework the tests to be a little less unrealistic. Instead of a
   single dummy module using a single variable, use a number of
   variables/modules. In this way, differences in cache effects may
   show up.
 * Add RTE_CACHE_GUARD to better mimic that static array pattern.
   (Morten Brørup)
 * Show latencies as TSC cycles. (Morten Brørup)
---
 app/test/meson.build   |   1 +
 app/test/test_lcore_var_perf.c | 257 +
 2 files changed, 258 insertions(+)
 create mode 100644 app/test/test_lcore_var_perf.c

diff --git a/app/test/meson.build b/app/test/meson.build
index 48279522f0..d4e0c59900 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -104,6 +104,7 @@ source_file_deps = {
 'test_kvargs.c': ['kvargs'],
 'test_latencystats.c': ['ethdev', 'latencystats', 'metrics'] + 
sample_packet_forward_deps,
 'test_lcore_var.c': [],
+'test_lcore_var_perf.c': [],
 'test_lcores.c': [],
 'test_link_bonding.c': ['ethdev', 'net_bond',
 'net'] + packet_burst_generator_deps + virtual_pmd_deps,
diff --git a/app/test/test_lcore_var_perf.c b/app/test/test_lcore_var_perf.c
new file mode 100644
index 00..2efb8342d1
--- /dev/null
+++ b/app/test/test_lcore_var_perf.c
@@ -0,0 +1,257 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Ericsson AB
+ */
+
+#define MAX_MODS 1024
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+struct mod_lcore_state {
+   uint64_t a;
+   uint64_t b;
+   uint64_t sum;
+};
+
+static void
+mod_init(struct mod_lcore_state *state)
+{
+   state->a = rte_rand();
+   state->b = rte_rand();
+   state->sum = 0;
+}
+
+static __rte_always_inline void
+mod_update(volatile struct mod_lcore_state *state)
+{
+   state->sum += state->a * state->b;
+}
+
+struct __rte_cache_aligned mod_lcore_state_aligned {
+   struct mod_lcore_state mod_state;
+
+   RTE_CACHE_GUARD;
+};
+
+static struct mod_lcore_state_aligned
+sarray_lcore_state[MAX_MODS][RTE_MAX_LCORE];
+
+static void
+sarray_init(void)
+{
+   unsigned int lcore_id = rte_lcore_id();
+   int mod;
+
+   for (mod = 0; mod < MAX_MODS; mod++) {
+   struct mod_lcore_state *mod_state =
+   &sarray_lcore_state[mod][lcore_id].mod_state;
+
+   mod_init(mod_state);
+   }
+}
+
+static __rte_noinline void
+sarray_update(unsigned int mod)
+{
+   unsigned int lcore_id = rte_lcore_id();
+   struct mod_lcore_state *mod_state =
+   &sarray_lcore_state[mod][lcore_id].mod_state;
+
+   mod_update(mod_state);
+}
+
+struct mod_lcore_state_lazy {
+   struct mod_lcore_state mod_state;
+   bool initialized;
+};
+
+/*
+ * Note: it's usually a bad idea have this much thread-local storage
+ * allocated in a real application, since it will incur a cost on
+ * thread creation and non-lcore thread memory usage.
+ */
+static RTE_DEFINE_PER_LCORE(struct mod_lcore_state_lazy,
+   tls_lcore_state)[MAX_MODS];
+
+static inline void
+tls_init(struct mod_lcore_state_lazy *state)
+{
+   mod_init(&state->mod_state);
+
+   state->initialized = true;
+}
+
+static __rte_noinline void
+tls_lazy_update(unsigned int mod)
+{
+   struct mod_lcore_state_lazy *state =
+   &RTE_PER_LCORE(tls_lcore_state[mod]);
+
+   /* With thread-local storage, initialization must usually be lazy */
+   if (!state->initialized)
+   tls_init(state);
+
+   mod_update(&state->mod_state);
+}
+
+static __rte_noinline void
+tls_update(unsigned int mod)
+{
+   struct mod_lcore_state_lazy *state =
+   &RTE_PER_LCORE(tls_lcore_state[mod]);
+
+   mod_update(&state->mod_state);
+}
+
+RTE_LCORE_VAR_HANDLE(struct mod_lcore_state, lvar_lcore_state)[MAX_MODS];
+
+static void
+lvar_init(void)
+{
+   unsigned int mod;
+
+   for (mod = 0; mod < MAX_MODS; mod++) {
+   RTE_LCORE_VAR_ALLOC(lvar_lcore_state[mod]);
+
+   struct mod_lcore_state *state =
+   RTE_LCORE_VAR_VALUE(lvar_lcore_state[mod]);
+
+   mod_init(state);
+   }
+}
+
+static __rte_noinline void
+lvar_update(unsigned int mod)
+{
+   struct mod_

[PATCH v10 5/7] power: keep per-lcore state in lcore variable

Replace static array of cache-aligned structs with an lcore variable,
to slightly benefit code simplicity and performance.

Signed-off-by: Mattias Rönnblom 
Acked-by: Morten Brørup 
Acked-by: Konstantin Ananyev 
Acked-by: Chengwen Feng 
Acked-by: Stephen Hemminger 

--

PATCH v6:
 * Update FOREACH invocation to match new API.

RFC v3:
 * Replace for loop with FOREACH macro.
---
 lib/power/rte_power_pmd_mgmt.c | 35 +-
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index b1c18a5f56..a981db4b39 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -5,6 +5,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -69,7 +70,7 @@ struct __rte_cache_aligned pmd_core_cfg {
uint64_t sleep_target;
/**< Prevent a queue from triggering sleep multiple times */
 };
-static struct pmd_core_cfg lcore_cfgs[RTE_MAX_LCORE];
+static RTE_LCORE_VAR_HANDLE(struct pmd_core_cfg, lcore_cfgs);
 
 static inline bool
 queue_equal(const union queue *l, const union queue *r)
@@ -252,12 +253,11 @@ clb_multiwait(uint16_t port_id __rte_unused, uint16_t 
qidx __rte_unused,
struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
uint16_t max_pkts __rte_unused, void *arg)
 {
-   const unsigned int lcore = rte_lcore_id();
struct queue_list_entry *queue_conf = arg;
struct pmd_core_cfg *lcore_conf;
const bool empty = nb_rx == 0;
 
-   lcore_conf = &lcore_cfgs[lcore];
+   lcore_conf = RTE_LCORE_VAR_VALUE(lcore_cfgs);
 
/* early exit */
if (likely(!empty))
@@ -317,13 +317,12 @@ clb_pause(uint16_t port_id __rte_unused, uint16_t qidx 
__rte_unused,
struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
uint16_t max_pkts __rte_unused, void *arg)
 {
-   const unsigned int lcore = rte_lcore_id();
struct queue_list_entry *queue_conf = arg;
struct pmd_core_cfg *lcore_conf;
const bool empty = nb_rx == 0;
uint32_t pause_duration = rte_power_pmd_mgmt_get_pause_duration();
 
-   lcore_conf = &lcore_cfgs[lcore];
+   lcore_conf = RTE_LCORE_VAR_VALUE(lcore_cfgs);
 
if (likely(!empty))
/* early exit */
@@ -358,9 +357,8 @@ clb_scale_freq(uint16_t port_id __rte_unused, uint16_t qidx 
__rte_unused,
struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
uint16_t max_pkts __rte_unused, void *arg)
 {
-   const unsigned int lcore = rte_lcore_id();
const bool empty = nb_rx == 0;
-   struct pmd_core_cfg *lcore_conf = &lcore_cfgs[lcore];
+   struct pmd_core_cfg *lcore_conf = RTE_LCORE_VAR_VALUE(lcore_cfgs);
struct queue_list_entry *queue_conf = arg;
 
if (likely(!empty)) {
@@ -518,7 +516,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, 
uint16_t port_id,
goto end;
}
 
-   lcore_cfg = &lcore_cfgs[lcore_id];
+   lcore_cfg = RTE_LCORE_VAR_LCORE_VALUE(lcore_id, lcore_cfgs);
 
/* check if other queues are stopped as well */
ret = cfg_queues_stopped(lcore_cfg);
@@ -619,7 +617,7 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
}
 
/* no need to check queue id as wrong queue id would not be enabled */
-   lcore_cfg = &lcore_cfgs[lcore_id];
+   lcore_cfg = RTE_LCORE_VAR_LCORE_VALUE(lcore_id, lcore_cfgs);
 
/* check if other queues are stopped as well */
ret = cfg_queues_stopped(lcore_cfg);
@@ -769,21 +767,22 @@ rte_power_pmd_mgmt_get_scaling_freq_max(unsigned int 
lcore)
 }
 
 RTE_INIT(rte_power_ethdev_pmgmt_init) {
-   size_t i;
-   int j;
+   unsigned int lcore_id;
+   struct pmd_core_cfg *lcore_cfg;
+   int i;
+
+   RTE_LCORE_VAR_ALLOC(lcore_cfgs);
 
/* initialize all tailqs */
-   for (i = 0; i < RTE_DIM(lcore_cfgs); i++) {
-   struct pmd_core_cfg *cfg = &lcore_cfgs[i];
-   TAILQ_INIT(&cfg->head);
-   }
+   RTE_LCORE_VAR_FOREACH_VALUE(lcore_id, lcore_cfg, lcore_cfgs)
+   TAILQ_INIT(&lcore_cfg->head);
 
/* initialize config defaults */
emptypoll_max = 512;
pause_duration = 1;
/* scaling defaults out of range to ensure not used unless set by user 
or app */
-   for (j = 0; j < RTE_MAX_LCORE; j++) {
-   scale_freq_min[j] = 0;
-   scale_freq_max[j] = UINT32_MAX;
+   for (i = 0; i < RTE_MAX_LCORE; i++) {
+   scale_freq_min[i] = 0;
+   scale_freq_max[i] = UINT32_MAX;
}
 }
-- 
2.43.0

[PATCH v10 7/7] eal: keep per-lcore power intrinsics state in lcore variable

Keep per-lcore power intrinsics state in a lcore variable to reduce
cache working set size and avoid any CPU next-line-prefetching causing
false sharing.

Signed-off-by: Mattias Rönnblom 
Acked-by: Morten Brørup 
Acked-by: Konstantin Ananyev 
Acked-by: Chengwen Feng 
Acked-by: Stephen Hemminger 
---
 lib/eal/x86/rte_power_intrinsics.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/lib/eal/x86/rte_power_intrinsics.c 
b/lib/eal/x86/rte_power_intrinsics.c
index 6d9b64240c..f4ba2c8ecb 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -6,6 +6,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -14,10 +15,14 @@
 /*
  * Per-lcore structure holding current status of C0.2 sleeps.
  */
-static alignas(RTE_CACHE_LINE_SIZE) struct power_wait_status {
+struct power_wait_status {
rte_spinlock_t lock;
volatile void *monitor_addr; /**< NULL if not currently sleeping */
-} wait_status[RTE_MAX_LCORE];
+};
+
+RTE_LCORE_VAR_HANDLE(struct power_wait_status, wait_status);
+
+RTE_LCORE_VAR_INIT(wait_status);
 
 /*
  * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state.
@@ -172,7 +177,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
if (pmc->fn == NULL)
return -EINVAL;
 
-   s = &wait_status[lcore_id];
+   s = RTE_LCORE_VAR_LCORE_VALUE(lcore_id, wait_status);
 
/* update sleep address */
rte_spinlock_lock(&s->lock);
@@ -264,7 +269,7 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
if (lcore_id >= RTE_MAX_LCORE)
return -EINVAL;
 
-   s = &wait_status[lcore_id];
+   s = RTE_LCORE_VAR_LCORE_VALUE(lcore_id, wait_status);
 
/*
 * There is a race condition between sleep, wakeup and locking, but we
@@ -303,8 +308,8 @@ int
 rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
const uint32_t num, const uint64_t tsc_timestamp)
 {
-   const unsigned int lcore_id = rte_lcore_id();
-   struct power_wait_status *s = &wait_status[lcore_id];
+   struct power_wait_status *s = RTE_LCORE_VAR_VALUE(wait_status);
+
uint32_t i, rc;
 
/* check if supported */
-- 
2.43.0

[PATCH v10 0/7] Lcore variables

This patch set introduces a new API  for static
per-lcore id data allocation.

Please refer to the  API documentation for both a
rationale for this new API, and a comparison to the alternatives
available.

The adoption of this API would affect many different DPDK modules, but
the author updated only a few, mostly to serve as examples in this
RFC, and to iron out some, but surely not all, wrinkles in the API.

The question on how to best allocate static per-lcore memory has been
up several times on the dev mailing list, for example in the thread on
"random: use per lcore state" RFC by Stephen Hemminger.

Lcore variables are surely not the answer to all your per-lcore-data
needs, since it only allows for more-or-less static allocation. In the
author's opinion, it does however provide a reasonably simple and
clean and seemingly very much performant solution to a real problem.

Mattias Rönnblom (7):
  eal: add static per-lcore memory allocation facility
  eal: add lcore variable functional tests
  eal: add lcore variable performance test
  random: keep PRNG state in lcore variable
  power: keep per-lcore state in lcore variable
  service: keep per-lcore state in lcore variable
  eal: keep per-lcore power intrinsics state in lcore variable

 MAINTAINERS   |   6 +
 app/test/meson.build  |   2 +
 app/test/test_lcore_var.c | 436 ++
 app/test/test_lcore_var_perf.c| 257 +++
 config/rte_config.h   |   1 +
 doc/api/doxy-api-index.md |   1 +
 .../prog_guide/env_abstraction_layer.rst  |  43 +-
 doc/guides/rel_notes/release_24_11.rst|  14 +
 lib/eal/common/eal_common_lcore_var.c |  85 
 lib/eal/common/meson.build|   1 +
 lib/eal/common/rte_random.c   |  28 +-
 lib/eal/common/rte_service.c  | 117 ++---
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_lcore_var.h   | 389 
 lib/eal/version.map   |   3 +
 lib/eal/x86/rte_power_intrinsics.c|  17 +-
 lib/power/rte_power_pmd_mgmt.c|  35 +-
 17 files changed, 1343 insertions(+), 93 deletions(-)
 create mode 100644 app/test/test_lcore_var.c
 create mode 100644 app/test/test_lcore_var_perf.c
 create mode 100644 lib/eal/common/eal_common_lcore_var.c
 create mode 100644 lib/eal/include/rte_lcore_var.h

-- 
2.43.0

RE: [RFC 0/4] ethdev: rework config restore

> 
>  External email: Use caution opening links or attachments
> 
> 
>  On 10/10/2024 1:08 PM, Dariusz Sosnowski wrote:
> >> -Original Message-
> >> From: Ferruh Yigit 
> >> Sent: Thursday, October 10, 2024 01:17
> >> To: Dariusz Sosnowski ; Konstantin Ananyev
> >> ; NBU-Contact-Thomas Monjalon
> >> (EXTERNAL) ; Andrew Rybchenko
> >> 
> >> Cc: dev@dpdk.org; Bruce Richardson 
> >> Subject: Re: [RFC 0/4] ethdev: rework config restore
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> On 10/9/2024 5:18 PM, Dariusz Sosnowski wrote:
>  -Original Message-
>  From: Ferruh Yigit 
>  Sent: Wednesday, October 9, 2024 03:08
>  To: Konstantin Ananyev ; Dariusz
>  Sosnowski ; NBU-Contact-Thomas Monjalon
>  (EXTERNAL) ; Andrew Rybchenko
>  
>  Cc: dev@dpdk.org; Bruce Richardson 
>  Subject: Re: [RFC 0/4] ethdev: rework config restore
> 
>  External email: Use caution opening links or attachments
> 
> 
>  On 10/8/2024 6:21 PM, Konstantin Ananyev wrote:
> >
> >
> >> We have been working on optimizing the latency of calls to
> >> rte_eth_dev_start(), on ports spawned by mlx5 PMD. Most of
> >> the work requires changes in the implementation of
> >> .dev_start() PMD callback, but I also wanted to start a
> >> discussion regarding configuration restore.
> >>
> >> rte_eth_dev_start() does a few things on top of calling
> >> .dev_start()
>  callback:
> >>
> >> - Before calling it:
> >> - eth_dev_mac_restore() - if device supports
> >> RTE_ETH_DEV_NOLIVE_MAC_ADDR;
> >> - After calling it:
> >> - eth_dev_mac_restore() - if device does not support
> > RTE_ETH_DEV_NOLIVE_MAC_ADDR;
> >> - restore promiscuous config
> >> - restore all multicast config
> >>
> >> eth_dev_mac_restore() iterates over all known MAC addresses -
> >> stored in rte_eth_dev_data.mac_addrs array - and calls
> >> .mac_addr_set() and .mac_addr_add() callbacks to apply these
> >> MAC
>  addresses.
> >>
> >> Promiscuous config restore checks if promiscuous mode is
> >> enabled or not, and calls .promiscuous_enable() or
> >> .promiscuous_disable()
>  callback.
> >>
> >> All multicast config restore checks if all multicast mode is
> >> enabled or not, and calls .allmulticast_enable() or
> >> .allmulticast_disable()
>  callback.
> >>
> >> Callbacks are called directly in all of these cases, to
> >> bypass the checks for applying the same configuration, which
> >> exist in relevant
>  APIs.
> >> Checks are bypassed to force drivers to reapply the 
> >> configuration.
> >>
> >> Let's consider what happens in the following sequence of API 
> >> calls.
> >>
> >> 1. rte_eth_dev_configure()
> >> 2. rte_eth_tx_queue_setup()
> >> 3. rte_eth_rx_queue_setup()
> >> 4. rte_eth_promiscuous_enable()
> >> - Call dev->dev_ops->promiscuous_enable()
> >> - Stores promiscuous state in dev->data->promiscuous 5.
> >> rte_eth_allmulticast_enable()
> >> - Call dev->dev_ops->allmulticast_enable()
> >> - Stores allmulticast state in dev->data->allmulticast 6.
> >> rte_eth_dev_start()
> >> - Call dev->dev_ops->dev_start()
> >> - Call dev->dev_ops->mac_addr_set() - apply default MAC 
> >> address
> >> - Call dev->dev_ops->promiscuous_enable()
> >> - Call dev->dev_ops->allmulticast_enable()
> >>
> >> Even though all configuration is available in dev->data after
> >> step 5, library forces reapplying this configuration in step 6.
> >>
> >> In mlx5 PMD case all relevant callbacks require communication
> >> with the kernel driver, to configure the device (mlx5 PMD
> >> must create/destroy new kernel flow rules and/or change netdev
>  config).
> >>
> >> mlx5 PMD handles applying all configuration in .dev_start(),
> >> so the following forced callbacks force additional
> >> communication with the kernel. The
> > same configuration is applied multiple times.
> >>
> >> As an optimization, mlx5 PMD could check if a given
> >> configuration wa

[PATCH v10 6/7] service: keep per-lcore state in lcore variable

Replace static array of cache-aligned structs with an lcore variable,
to slightly benefit code simplicity and performance.

Signed-off-by: Mattias Rönnblom 
Acked-by: Morten Brørup 
Acked-by: Konstantin Ananyev 
Acked-by: Chengwen Feng 
Acked-by: Stephen Hemminger 

--

PATCH v7:
 * Update to match new FOREACH API.

RFC v6:
 * Remove a now-redundant lcore variable value memset().

RFC v5:
 * Fix lcore value pointer bug introduced by RFC v4.

RFC v4:
 * Remove strange-looking lcore value lookup potentially containing
   invalid lcore id. (Morten Brørup)
 * Replace misplaced tab with space. (Morten Brørup)
---
 lib/eal/common/rte_service.c | 117 +++
 1 file changed, 65 insertions(+), 52 deletions(-)

diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index a38c594ce4..3d2c12c39b 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -11,6 +11,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -77,7 +78,7 @@ struct __rte_cache_aligned core_state {
 
 static uint32_t rte_service_count;
 static struct rte_service_spec_impl *rte_services;
-static struct core_state *lcore_states;
+static RTE_LCORE_VAR_HANDLE(struct core_state, lcore_states);
 static uint32_t rte_service_library_initialized;
 
 int32_t
@@ -103,12 +104,8 @@ rte_service_init(void)
goto fail_mem;
}
 
-   lcore_states = rte_calloc("rte_service_core_states", RTE_MAX_LCORE,
-   sizeof(struct core_state), RTE_CACHE_LINE_SIZE);
-   if (!lcore_states) {
-   EAL_LOG(ERR, "error allocating core states array");
-   goto fail_mem;
-   }
+   if (lcore_states == NULL)
+   RTE_LCORE_VAR_ALLOC(lcore_states);
 
int i;
struct rte_config *cfg = rte_eal_get_configuration();
@@ -124,7 +121,6 @@ rte_service_init(void)
return 0;
 fail_mem:
rte_free(rte_services);
-   rte_free(lcore_states);
return -ENOMEM;
 }
 
@@ -138,7 +134,6 @@ rte_service_finalize(void)
rte_eal_mp_wait_lcore();
 
rte_free(rte_services);
-   rte_free(lcore_states);
 
rte_service_library_initialized = 0;
 }
@@ -288,7 +283,6 @@ rte_service_component_register(const struct 
rte_service_spec *spec,
 int32_t
 rte_service_component_unregister(uint32_t id)
 {
-   uint32_t i;
struct rte_service_spec_impl *s;
SERVICE_VALID_GET_OR_ERR_RET(id, s, -EINVAL);
 
@@ -296,9 +290,11 @@ rte_service_component_unregister(uint32_t id)
 
s->internal_flags &= ~(SERVICE_F_REGISTERED);
 
+   unsigned int lcore_id;
+   struct core_state *cs;
/* clear the run-bit in all cores */
-   for (i = 0; i < RTE_MAX_LCORE; i++)
-   lcore_states[i].service_mask &= ~(UINT64_C(1) << id);
+   RTE_LCORE_VAR_FOREACH_VALUE(lcore_id, cs, lcore_states)
+   cs->service_mask &= ~(UINT64_C(1) << id);
 
memset(&rte_services[id], 0, sizeof(struct rte_service_spec_impl));
 
@@ -467,7 +463,10 @@ rte_service_may_be_active(uint32_t id)
return -EINVAL;
 
for (i = 0; i < lcore_count; i++) {
-   if (lcore_states[ids[i]].service_active_on_lcore[id])
+   struct core_state *cs =
+   RTE_LCORE_VAR_LCORE_VALUE(ids[i], lcore_states);
+
+   if (cs->service_active_on_lcore[id])
return 1;
}
 
@@ -477,7 +476,7 @@ rte_service_may_be_active(uint32_t id)
 int32_t
 rte_service_run_iter_on_app_lcore(uint32_t id, uint32_t serialize_mt_unsafe)
 {
-   struct core_state *cs = &lcore_states[rte_lcore_id()];
+   struct core_state *cs = RTE_LCORE_VAR_VALUE(lcore_states);
struct rte_service_spec_impl *s;
 
SERVICE_VALID_GET_OR_ERR_RET(id, s, -EINVAL);
@@ -499,8 +498,7 @@ service_runner_func(void *arg)
 {
RTE_SET_USED(arg);
uint8_t i;
-   const int lcore = rte_lcore_id();
-   struct core_state *cs = &lcore_states[lcore];
+   struct core_state *cs = RTE_LCORE_VAR_VALUE(lcore_states);
 
rte_atomic_store_explicit(&cs->thread_active, 1, 
rte_memory_order_seq_cst);
 
@@ -546,13 +544,15 @@ service_runner_func(void *arg)
 int32_t
 rte_service_lcore_may_be_active(uint32_t lcore)
 {
-   if (lcore >= RTE_MAX_LCORE || !lcore_states[lcore].is_service_core)
+   struct core_state *cs = RTE_LCORE_VAR_LCORE_VALUE(lcore, lcore_states);
+
+   if (lcore >= RTE_MAX_LCORE || !cs->is_service_core)
return -EINVAL;
 
/* Load thread_active using ACQUIRE to avoid instructions dependent on
 * the result being re-ordered before this load completes.
 */
-   return rte_atomic_load_explicit(&lcore_states[lcore].thread_active,
+   return rte_atomic_load_explicit(&cs->thread_active,
   rte_memory_order_acquire);
 }
 
@@ -560,9 +560,12 @@ int32_t
 rte_service_lcore_count(void)
 {
int32_t count = 0;
-

[PATCH dpdk v2] checkpatches: verify in-reply-to header when possible

When using checkpatches.sh locally, verify that there is an In-Reply-To
header when the patch is a respin (i.e. v2, v3, etc.). This is currently
only enforced by the upstream CI but cannot be verified locally.

This cannot be verified when checking commit ids since --in-reply-to is
a git-format-patch option which is not specified by checkpatches.sh when
generating temporary files.

Here is an example:

 $ git format-patch -v6 -1 --stdout | devtools/checkpatches.sh
 warning: [PATCH v6] graph: expose node context as pointers
 warning: respins must be --in-reply-to=.
 0/1 valid patch

 $ git format-patch -v12345 -1 --stdout | devtools/checkpatches.sh
 warning: [PATCH v12345] graph: expose node context as pointers
 warning: respins must be --in-reply-to=.
 0/1 valid patch

 $ git format-patch -v6 -1 --stdout --in-reply-to=foo | \
 devtools/checkpatches.sh
 1/1 valid patch

Link: https://git.dpdk.org/tools/dpdk-ci/commit/?id=070b31649e48460b3
Signed-off-by: Robin Jarry 
Acked-by: Stephen Hemminger 
---

Notes:
v2: fix check for v1[0-9]+

 devtools/checkpatches.sh | 13 +
 1 file changed, 13 insertions(+)

diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 17a05e4986fd..b8ca2f67bbd4 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -414,11 +414,13 @@ status=0
 check () { #  
local ret=0
local subject=''
+   local check_in_reply_to=false
headline_printed=false
 
total=$(($total + 1))
if [ -n "$1" ] ; then
tmpinput=$1
+   check_in_reply_to=true
else
tmpinput=$(mktemp -t dpdk.checkpatches.XX)
trap "rm -f '$tmpinput'" INT
@@ -428,6 +430,7 @@ check () { #  
--no-stat --stdout -1 $commit > "$tmpinput"
else
cat > "$tmpinput"
+   check_in_reply_to=true
fi
fi
 
@@ -435,6 +438,16 @@ check () { #  
subject=$(sed '/^Subject: */!d;s///;N;s,\n[[:space:]]\+, ,;s,\n.*,,;q' 
"$tmpinput")
! $verbose || print_headline "$subject"
 
+   # check In-Reply-To for version > 1
+   if [ "$check_in_reply_to" = true ] \
+   && echo "$subject" | grep -Eq 
'\[[^]]+\[^]]*\]' \
+   && ! grep -qi '^In-Reply-To: ' "$tmpinput"
+   then
+   echo "warning: $subject"
+   echo "warning: respins must be 
--in-reply-to=."
+   ret=1
+   fi
+
! $verbose || printf 'Running checkpatch.pl:\n'
report=$($DPDK_CHECKPATCH_PATH $options "$tmpinput" 2>/dev/null)
if [ $? -ne 0 ] ; then
-- 
2.46.2

Re: [PATCH v9 1/7] eal: add static per-lcore memory allocation facility

2024-10-11 Thread Thomas Monjalon

11/10/2024 10:09, Morten Brørup:
> > > +static void *
> > > +lcore_var_alloc(size_t size, size_t align)
> > > +{
> > > + void *handle;
> > > + unsigned int lcore_id;
> > > + void *value;
> > > +
> > > + offset = RTE_ALIGN_CEIL(offset, align);
> > > +
> > > + if (offset + size > RTE_MAX_LCORE_VAR) {
> > > +#ifdef RTE_EXEC_ENV_WINDOWS
> > > + lcore_buffer = _aligned_malloc(LCORE_BUFFER_SIZE,
> > > +RTE_CACHE_LINE_SIZE);
> > > +#else
> > > + lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
> > > +  LCORE_BUFFER_SIZE);
> > > +#endif
> > > + RTE_VERIFY(lcore_buffer != NULL);
> > 
> > Please no panic in a lib.
> > You can return NULL.
> 
> I agree with Mattias design here.
> Lcore variables are like RTE_PER_LCORE variables and simple "static" 
> variables.
> If the system does not have enough memory for those, the application will not 
> be able to deal with it.
> Panic early (in this lib) is the correct way to deal with it.

There were discussions in the past where we agreed to remove
as much panic as possible in our libs and drivers.
We want to allow the application to have a chance to cleanup.

I don't think returning NULL in an allocation is something disruptive.

I understand you don't want to manage an error return
in variable declarations, so can we have RTE_VERIFY in declaration macros?


> > > + * Lcore variables cannot and need not be freed.
> > 
> > I'm curious about that.
> > If EAL is closed, and the application continues its life,
> > then we want all this memory to be cleaned as well.
> > Do you know rte_eal_cleanup()?
> 
> Good catch, Thomas! I missed that in my review.
> Mattias, it seems you need a chained list of lcore_buffer allocations for 
> this.

Yes


> > > + * The size of an lcore variable's value must be less than the DPDK
> > 
> > size of variable, not size of value
> 
> Initially, I thought the same as Thomas...
> It is confusing considering the handle the variable, and its instances having 
> values.
> 
> However, during the review, Mattias convinced me of its correctness.
> 
> And by the way, RTE_PER_LCORE also does it:
> https://elixir.bootlin.com/dpdk/v24.07/source/lib/eal/include/rte_per_lcore.h#L48

I understand your point of view and I accept it.

Re: [PATCH v2] service: fix deadlock on worker lcore exit

On Thu, Oct 3, 2024 at 5:50 PM Van Haaren, Harry
 wrote:
> > From: David Marchand 
> > Sent: Thursday, October 3, 2024 10:13 AM
> > To: Mattias Rönnblom ; Van Haaren, Harry 
> > 
> > Cc: dev@dpdk.org ; step...@networkplumber.org 
> > ; suanmi...@nvidia.com ; 
> > tho...@monjalon.net ; sta...@dpdk.org 
> > ; Tyler Retzlaff ; Aaron 
> > Conole 
> > Subject: Re: [PATCH v2] service: fix deadlock on worker lcore exit
> >
> > On Thu, Oct 3, 2024 at 8:57 AM David Marchand  
> > wrote:
> > >
> > > From: Mattias Rönnblom 
> > >
> > > Calling rte_exit() from a worker lcore thread causes a deadlock in
> > > rte_service_finalize().
> > >
> > > This patch makes rte_service_finalize() deadlock-free by avoiding the
> > > need to synchronize with service lcore threads, which in turn is
> > > achieved by moving service and per-lcore state from the heap to being
> > > statically allocated.
> > >
> > > The BSS segment increases with ~156 kB (on x86_64 with default
> > > RTE_MAX_LCORE and RTE_SERVICE_NUM_MAX).
> > >
> > > According to the service perf autotest, this change also results in a
> > > slight reduction of service framework overhead.
> > >
> > > Fixes: 33666b448f15 ("service: fix crash on exit")
> > > Cc: sta...@dpdk.org
> > >
> > > Signed-off-by: Mattias Rönnblom 
> > > Acked-by: Tyler Retzlaff 
> > > ---
> > > Changes since v1:
> > > - rebased,
> >
> > I can't merge this patch in its current state.
> >
> > At the moment, two CI report a problem with the
> > eal_flags_file_prefix_autotest unit test.
> >
> > -stdout-
> > RTE>>eal_flags_file_prefix_autotest
> > Running binary with argv[]:'/home/zhoumin/gh_dpdk/build/app/dpdk-test'
> > '--proc-type=secondary' '-m' '18' '--file-prefix=memtest'
> > Running binary with argv[]:'/home/zhoumin/gh_dpdk/build/app/dpdk-test'
> > '-m' '18' '--file-prefix=memtest1'
> > Error - hugepage files for memtest1 were not deleted!
> > Test Failed
> > RTE>>
> >
> > Can you have a look?
>
> Not sure how the code change in question is relating to the eal-flags 
> failure, but I can reproduce the failure here.
> Reproducing issue on *all* of the below tags; this indicates its likely a 
> board-config issue, and not a true issue (unless its been there since 
> 23.11??).
>
> Tested commits were all bad:
> b3485f4293 (HEAD, tag: v24.07) version: 24.07.0
> a9778aad62 (HEAD, tag: v24.03) version: 24.03.0
> eeb0605f11 (HEAD, tag: v23.11) version: 23.11.0
>
> So I'm pretty sure this is a board/runner config issue, with the error output 
> as follows here:
> RTE>>eal_flags_file_prefix_autotest
> Running binary with argv[]:'./app/test/dpdk-test' '--proc-type=secondary' 
> '-m' '18' '--file-prefix=memtest'
> EAL: Detected CPU lcores: 64
> EAL: Detected NUMA nodes: 2
> EAL: Detected static linkage of DPDK
> EAL: Cannot open '/var/run/dpdk/memtest/config' for rte_mem_config
> EAL: FATAL: Cannot init config
> EAL: Cannot init config
>
> FAIL:
> DPDK_TEST=eal_flags_file_prefix_autotest ./app/test/dpdk-test  --no-pci
>
> PASS:
> DPDK_TEST=eal_flags_file_prefix_autotest ./app/test/dpdk-test
>
> So seems like the eal-flags test is NOT able to handle args like "--no-pci"? 
> I tend to run tests in no PCI mode to speed up things :)

Well, speeding up, or hiding the issue, I guess.

> In short, this service-cores patch is not the root cause. Perhaps some of the 
> CI folks can confirm if there's extra args passed to the runner?

To be clear, I can't merge this patch because of this (systematic)
failure in many CI env (GHA, LoongArch, UNH).

Adding CI ml in the loop.


-- 
David Marchand

[PATCH v5 1/1] dmadev: support strict priority configuration

2024-10-11 Thread Vamsi Krishna

From: Vamsi Attunuru 

Some DMA controllers offer the ability to configure priority
level for the DMA channels, allowing for the prioritization
of DMA command execution based on channel importance.

This patch supports such strict priority configuration. If the
dmadev supports, it should advertise the capability flag
RTE_DMA_CAPA_PRI_POLICY_SP, then application could enable strict
priority configuration.

Signed-off-by: Vamsi Attunuru 
Signed-off-by: Amit Prakash Shukla 
---
V5 changes:
* Updated comments and commit message
* Addressed V4 review comments

V4 changes:
* Rebased onto the latest

V3 changes:
* Corrected patch title

V2 changes:
* Reverted removed text from release_24_11.rst

V1 changes:
* Added trace support
* Added new capability flag

Deprecation notice:
https://patches.dpdk.org/project/dpdk/patch/20240730144612.2132848-1-amitpraka...@marvell.com/

* Assuming we do not anticipate any advanced scheduling schemes for dmadev 
queues,
  this patch is intended to support a strict priority scheme.

 doc/guides/rel_notes/release_24_11.rst |  8 
 lib/dmadev/rte_dmadev.c| 14 ++
 lib/dmadev/rte_dmadev.h| 19 +++
 lib/dmadev/rte_dmadev_trace.h  |  2 ++
 4 files changed, 43 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_11.rst 
b/doc/guides/rel_notes/release_24_11.rst
index e0a9aa55a1..8953e02836 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -67,6 +67,11 @@ New Features
 
   The new statistics are useful for debugging and profiling.
 
+* **Added strict priority capability for dmadev.**
+
+  Added new capability flag ``RTE_DMA_CAPA_PRI_POLICY_SP`` to check if the
+  DMA device supports assigning fixed priority, allowing for better control
+  over resource allocation and scheduling.
 
 Removed Items
 -
@@ -112,6 +117,9 @@ ABI Changes
Also, make sure to start the actual text at the margin.
===
 
+* dmadev: Added ``nb_priorities`` field to ``rte_dma_info`` structure and
+  ``priority`` field to ``rte_dma_conf`` structure to get device supported
+  priority levels and configure required priority from the application.
 
 Known Issues
 
diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
index 845727210f..0d92d15d65 100644
--- a/lib/dmadev/rte_dmadev.c
+++ b/lib/dmadev/rte_dmadev.c
@@ -450,6 +450,11 @@ rte_dma_info_get(int16_t dev_id, struct rte_dma_info 
*dev_info)
if (ret != 0)
return ret;
 
+   if ((dev_info->dev_capa & RTE_DMA_CAPA_PRI_POLICY_SP) && 
(dev_info->nb_priorities <= 1)) {
+   RTE_DMA_LOG(ERR, "Num of priorities must be > 1 for Device %d", 
dev_id);
+   return -EINVAL;
+   }
+
dev_info->dev_name = dev->data->dev_name;
dev_info->numa_node = dev->device->numa_node;
dev_info->nb_vchans = dev->data->dev_conf.nb_vchans;
@@ -497,6 +502,12 @@ rte_dma_configure(int16_t dev_id, const struct 
rte_dma_conf *dev_conf)
return -EINVAL;
}
 
+   if ((dev_info.dev_capa & RTE_DMA_CAPA_PRI_POLICY_SP) &&
+   (dev_conf->priority >= dev_info.nb_priorities)) {
+   RTE_DMA_LOG(ERR, "Device %d configure invalid priority", 
dev_id);
+   return -EINVAL;
+   }
+
if (*dev->dev_ops->dev_configure == NULL)
return -ENOTSUP;
ret = (*dev->dev_ops->dev_configure)(dev, dev_conf,
@@ -769,6 +780,7 @@ dma_capability_name(uint64_t capability)
{ RTE_DMA_CAPA_SILENT,  "silent"  },
{ RTE_DMA_CAPA_HANDLES_ERRORS, "handles_errors" },
{ RTE_DMA_CAPA_M2D_AUTO_FREE,  "m2d_auto_free"  },
+   { RTE_DMA_CAPA_PRI_POLICY_SP,  "pri_policy_sp" },
{ RTE_DMA_CAPA_OPS_COPY,"copy"},
{ RTE_DMA_CAPA_OPS_COPY_SG, "copy_sg" },
{ RTE_DMA_CAPA_OPS_FILL,"fill"},
@@ -955,6 +967,7 @@ dmadev_handle_dev_info(const char *cmd __rte_unused,
rte_tel_data_start_dict(d);
rte_tel_data_add_dict_string(d, "name", dma_info.dev_name);
rte_tel_data_add_dict_int(d, "nb_vchans", dma_info.nb_vchans);
+   rte_tel_data_add_dict_int(d, "nb_priorities", dma_info.nb_priorities);
rte_tel_data_add_dict_int(d, "numa_node", dma_info.numa_node);
rte_tel_data_add_dict_int(d, "max_vchans", dma_info.max_vchans);
rte_tel_data_add_dict_int(d, "max_desc", dma_info.max_desc);
@@ -974,6 +987,7 @@ dmadev_handle_dev_info(const char *cmd __rte_unused,
ADD_CAPA(dma_caps, dev_capa, RTE_DMA_CAPA_SILENT);
ADD_CAPA(dma_caps, dev_capa, RTE_DMA_CAPA_HANDLES_ERRORS);
ADD_CAPA(dma_caps, dev_capa, RTE_DMA_CAPA_M2D_AUTO_FREE);
+   ADD_CAPA(dma_caps, dev_capa, RTE_DMA_CAPA_PRI_POLICY_SP);
ADD_CAPA(dma_caps, dev_capa, RTE_DMA_CAPA_OPS_COPY);
ADD_CAPA(dma_caps, dev_capa, RTE_DMA_CAPA_OPS

[PATCH 2/4] ethdev: add get restore flags driver callback

Before this patch, ethdev layer assumed that all drivers require that
it has to forcefully restore:

- MAC addresses
- promiscuous mode setting
- all multicast mode setting

upon rte_eth_dev_start().

This patch introduces a new callback to eth_dev_ops -
get_restore_flags().
Drivers implementing this callback can explicitly enable/disable
certain parts of config restore procedure.

In order to minimize the changes to all the drivers and
preserve the current behavior, it is assumed that
if this callback is not defined, all configuration should be restored.

Signed-off-by: Dariusz Sosnowski 
---
 lib/ethdev/ethdev_driver.c | 11 +++
 lib/ethdev/ethdev_driver.h | 64 ++
 lib/ethdev/version.map |  1 +
 3 files changed, 76 insertions(+)

diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
index c335a25a82..f78f9fb5c1 100644
--- a/lib/ethdev/ethdev_driver.c
+++ b/lib/ethdev/ethdev_driver.c
@@ -958,3 +958,14 @@ rte_eth_switch_domain_free(uint16_t domain_id)
 
return 0;
 }
+
+void
+rte_eth_get_restore_flags(struct rte_eth_dev *dev,
+ enum rte_eth_dev_operation op,
+ uint32_t *flags)
+{
+   if (dev->dev_ops->get_restore_flags != NULL)
+   dev->dev_ops->get_restore_flags(dev, op, flags);
+   else
+   *flags = RTE_ETH_RESTORE_ALL;
+}
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index ae00ead865..3617c6951d 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1235,6 +1235,47 @@ typedef int (*eth_count_aggr_ports_t)(struct rte_eth_dev 
*dev);
 typedef int (*eth_map_aggr_tx_affinity_t)(struct rte_eth_dev *dev, uint16_t 
tx_queue_id,
  uint8_t affinity);
 
+/**
+ * @internal
+ * Defines types of operations which can be executed by the application.
+ */
+enum rte_eth_dev_operation {
+   RTE_ETH_START,
+};
+
+/**@{@name Restore flags
+ * Flags returned by get_restore_flags() callback.
+ * They indicate to ethdev layer which configuration is required to be 
restored.
+ */
+/** If set, ethdev layer will forcefully reapply default and any other added 
MAC addresses. */
+#define RTE_ETH_RESTORE_MAC_ADDR RTE_BIT32(0)
+/** If set, ethdev layer will forcefully reapply current promiscuous mode 
setting. */
+#define RTE_ETH_RESTORE_PROMISC  RTE_BIT32(1)
+/** If set, ethdev layer will forcefully reapply current all multicast mode 
setting. */
+#define RTE_ETH_RESTORE_ALLMULTI RTE_BIT32(2)
+/**@}*/
+
+/** All configuration which can be restored by ethdev layer. */
+#define RTE_ETH_RESTORE_ALL (RTE_ETH_RESTORE_MAC_ADDR | \
+RTE_ETH_RESTORE_PROMISC | \
+RTE_ETH_RESTORE_ALLMULTI)
+
+/**
+ * @internal
+ * Fetch from the driver what kind of configuration must be restored by ethdev 
layer,
+ * after certain operations are performed by the application (such as 
rte_eth_dev_start()).
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param op
+ *   Type of operation executed by the application.
+ * @param flags
+ *   Flags indicating what configuration must be restored by ethdev layer.
+ */
+typedef void (*eth_get_restore_flags_t)(struct rte_eth_dev *dev,
+   enum rte_eth_dev_operation op,
+   uint32_t *flags);
+
 /**
  * @internal A structure containing the functions exported by an Ethernet 
driver.
  */
@@ -1474,6 +1515,9 @@ struct eth_dev_ops {
eth_count_aggr_ports_t count_aggr_ports;
/** Map a Tx queue with an aggregated port of the DPDK port */
eth_map_aggr_tx_affinity_t map_aggr_tx_affinity;
+
+   /** Get configuration which ethdev should restore */
+   eth_get_restore_flags_t get_restore_flags;
 };
 
 /**
@@ -2131,6 +2175,26 @@ struct rte_eth_fdir_conf {
struct rte_eth_fdir_flex_conf flex_conf;
 };
 
+/**
+ * @internal
+ * Fetch from the driver what kind of configuration must be restarted by 
ethdev layer,
+ * using get_restore_flags() callback.
+ *
+ * If callback is not defined, it is assumed that all supported configuration 
must be restored.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param op
+ *   Type of operation executed by the application.
+ * @param affinity
+ *   Flags indicating what configuration must be restored by ethdev layer.
+ */
+__rte_internal
+void
+rte_eth_get_restore_flags(struct rte_eth_dev *dev,
+ enum rte_eth_dev_operation op,
+ uint32_t *flags);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 1669055ca5..fa0469e602 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -358,4 +358,5 @@ INTERNAL {
rte_eth_switch_domain_alloc;
rte_eth_switch_domain_free;
rte_flow_fp_default_ops;
+   rte_eth_get_restore_flags;
 };
-- 
2.39.5

[PATCH 3/4] ethdev: restore config only when requested

Use get_restore_flags() internal API introduced in previous commits
in rte_eth_dev_start(), to restore only the configuration
requested by the driver.

Signed-off-by: Dariusz Sosnowski 
---
 lib/ethdev/rte_ethdev.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 362a1883f0..9e82556374 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1719,20 +1719,27 @@ eth_dev_allmulticast_restore(struct rte_eth_dev *dev, 
uint16_t port_id)
 
 static int
 eth_dev_config_restore(struct rte_eth_dev *dev,
-   struct rte_eth_dev_info *dev_info, uint16_t port_id)
+   struct rte_eth_dev_info *dev_info,
+   uint32_t restore_flags,
+   uint16_t port_id)
 {
int ret;
 
-   if (!(*dev_info->dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR))
+   if (!(*dev_info->dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR) &&
+   (restore_flags & RTE_ETH_RESTORE_MAC_ADDR))
eth_dev_mac_restore(dev, dev_info);
 
-   ret = eth_dev_promiscuous_restore(dev, port_id);
-   if (ret != 0)
-   return ret;
+   if (restore_flags & RTE_ETH_RESTORE_PROMISC) {
+   ret = eth_dev_promiscuous_restore(dev, port_id);
+   if (ret != 0)
+   return ret;
+   }
 
-   ret = eth_dev_allmulticast_restore(dev, port_id);
-   if (ret != 0)
-   return ret;
+   if (restore_flags & RTE_ETH_RESTORE_ALLMULTI) {
+   ret = eth_dev_allmulticast_restore(dev, port_id);
+   if (ret != 0)
+   return ret;
+   }
 
return 0;
 }
@@ -1742,6 +1749,7 @@ rte_eth_dev_start(uint16_t port_id)
 {
struct rte_eth_dev *dev;
struct rte_eth_dev_info dev_info;
+   uint32_t restore_flags;
int diag;
int ret, ret_stop;
 
@@ -1769,8 +1777,11 @@ rte_eth_dev_start(uint16_t port_id)
if (ret != 0)
return ret;
 
+   rte_eth_get_restore_flags(dev, RTE_ETH_START, &restore_flags);
+
/* Lets restore MAC now if device does not support live change */
-   if (*dev_info.dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR)
+   if ((*dev_info.dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR) &&
+   (restore_flags & RTE_ETH_RESTORE_MAC_ADDR))
eth_dev_mac_restore(dev, &dev_info);
 
diag = (*dev->dev_ops->dev_start)(dev);
@@ -1779,7 +1790,7 @@ rte_eth_dev_start(uint16_t port_id)
else
return eth_err(port_id, diag);
 
-   ret = eth_dev_config_restore(dev, &dev_info, port_id);
+   ret = eth_dev_config_restore(dev, &dev_info, restore_flags, port_id);
if (ret != 0) {
RTE_ETHDEV_LOG_LINE(ERR,
"Error during restoring configuration for device (port 
%u): %s",
-- 
2.39.5

[PATCH 4/4] net/mlx5: disable config restore

mlx5 PMD does not require configuration restore
on rte_eth_dev_start().
Add implementation of get_restore_flags() indicating that.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/mlx5.c|  2 ++
 drivers/net/mlx5/mlx5.h|  3 +++
 drivers/net/mlx5/mlx5_ethdev.c | 19 +++
 3 files changed, 24 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8d266b0e64..9b6acaf7f1 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2571,6 +2571,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
.count_aggr_ports = mlx5_count_aggr_ports,
.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
+   .get_restore_flags = mlx5_get_restore_flags,
 };

 /* Available operations from secondary process. */
@@ -2663,6 +2664,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
.get_monitor_addr = mlx5_get_monitor_addr,
.count_aggr_ports = mlx5_count_aggr_ports,
.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
+   .get_restore_flags = mlx5_get_restore_flags,
 };

 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 869aac032b..a5829fb71a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -2228,6 +2228,9 @@ eth_rx_burst_t mlx5_select_rx_function(struct rte_eth_dev 
*dev);
 struct mlx5_priv *mlx5_port_to_eswitch_info(uint16_t port, bool valid);
 struct mlx5_priv *mlx5_dev_to_eswitch_info(struct rte_eth_dev *dev);
 int mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev);
+void mlx5_get_restore_flags(struct rte_eth_dev *dev,
+   enum rte_eth_dev_operation op,
+   uint32_t *flags);

 /* mlx5_ethdev_os.c */

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 6a678d6dcc..8b78efc3fd 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -796,3 +796,22 @@ mlx5_hairpin_cap_get(struct rte_eth_dev *dev, struct 
rte_eth_hairpin_cap *cap)
cap->tx_cap.rte_memory = hca_attr->hairpin_sq_wq_in_host_mem;
return 0;
 }
+
+/**
+ * Indicate to ethdev layer, what configuration must be restored.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device structure.
+ * @param[in] op
+ *   Type of operation which might require config restore.
+ * @param[out] flags
+ *   Restore flags will be stored here.
+ */
+void
+mlx5_get_restore_flags(__rte_unused struct rte_eth_dev *dev,
+  __rte_unused enum rte_eth_dev_operation op,
+  uint32_t *flags)
+{
+   /* mlx5 PMD does not require any configuration restore. */
+   *flags = 0;
+}
--
2.39.5

[PATCH 1/4] ethdev: rework config restore

Extract promiscuous and all multicast configuration restore
to separate functions.
This change will allow easier integration of disabling
these procedures for supporting PMDs in follow up commits.

Signed-off-by: Dariusz Sosnowski 
---
 lib/ethdev/rte_ethdev.c | 34 +-
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e..362a1883f0 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1648,14 +1648,10 @@ eth_dev_mac_restore(struct rte_eth_dev *dev,
 }
 
 static int
-eth_dev_config_restore(struct rte_eth_dev *dev,
-   struct rte_eth_dev_info *dev_info, uint16_t port_id)
+eth_dev_promiscuous_restore(struct rte_eth_dev *dev, uint16_t port_id)
 {
int ret;
 
-   if (!(*dev_info->dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR))
-   eth_dev_mac_restore(dev, dev_info);
-
/* replay promiscuous configuration */
/*
 * use callbacks directly since we don't need port_id check and
@@ -1683,6 +1679,14 @@ eth_dev_config_restore(struct rte_eth_dev *dev,
}
}
 
+   return 0;
+}
+
+static int
+eth_dev_allmulticast_restore(struct rte_eth_dev *dev, uint16_t port_id)
+{
+   int ret;
+
/* replay all multicast configuration */
/*
 * use callbacks directly since we don't need port_id check and
@@ -1713,6 +1717,26 @@ eth_dev_config_restore(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+eth_dev_config_restore(struct rte_eth_dev *dev,
+   struct rte_eth_dev_info *dev_info, uint16_t port_id)
+{
+   int ret;
+
+   if (!(*dev_info->dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR))
+   eth_dev_mac_restore(dev, dev_info);
+
+   ret = eth_dev_promiscuous_restore(dev, port_id);
+   if (ret != 0)
+   return ret;
+
+   ret = eth_dev_allmulticast_restore(dev, port_id);
+   if (ret != 0)
+   return ret;
+
+   return 0;
+}
+
 int
 rte_eth_dev_start(uint16_t port_id)
 {
-- 
2.39.5

[PATCH 0/4] ethdev: rework config restore

This patch series reworks the config restore procedure,
so that drivers are able to enable/disable certain parts of it.
Drivers can provide get_restore_flags() callback,
which will indicate to ethdev library what configuration to restore.

If callback is not defined, then ethdev assumes that
all configuration must be restored, preserving the current behavior for all 
drivers.

This patch series also includes implementation of get_restore_flags()
for mlx5 PMD, which does not require config restore.

RFC: https://inbox.dpdk.org/dev/20240918092201.33772-1-dsosnow...@nvidia.com/

Dariusz Sosnowski (4):
  ethdev: rework config restore
  ethdev: add get restore flags driver callback
  ethdev: restore config only when requested
  net/mlx5: disable config restore

 drivers/net/mlx5/mlx5.c|  2 ++
 drivers/net/mlx5/mlx5.h|  3 ++
 drivers/net/mlx5/mlx5_ethdev.c | 19 ++
 lib/ethdev/ethdev_driver.c | 11 ++
 lib/ethdev/ethdev_driver.h | 64 ++
 lib/ethdev/rte_ethdev.c| 49 ++
 lib/ethdev/version.map |  1 +
 7 files changed, 142 insertions(+), 7 deletions(-)

--
2.39.5

Re: [PATCH] rcu: refactor rcu register/unregister functions

On Sun, Sep 8, 2024 at 10:43 AM Doug Foster  wrote:
>
> This simplifies the implementation of rte_rcu_qsbr_thread_register()
> and rte_rcu_thread_unregister() functions. The simplified implementation
> is easier to read.
>
> Signed-off-by: Doug Foster 
> Reviewed-by: Honnappa Nagarahalli 
> Reviewed-by: Wathsala Vithanage 

Applied, thanks Doug.


-- 
David Marchand

Re: [PATCH v6 2/3] net/ice: add frequency adjustment support for PTP

On 10/11/2024 9:02 AM, Bruce Richardson wrote:
> On Fri, Oct 11, 2024 at 06:34:06AM +, Mingjin Ye wrote:
>> Add ice support for new ethdev API to adjust frequency for IEEE1588
>> PTP. Also, this patch reworks code for converting software update
>> to hardware update.
>>
>> Signed-off-by: Simei Su 
>> Signed-off-by: Mingjin Ye 
>> ---
>>  doc/guides/nics/ice.rst  |  16 
>>  drivers/net/ice/ice_ethdev.c | 176 ++-
>>  drivers/net/ice/ice_ethdev.h |   5 +-
>>  drivers/net/ice/ice_rxtx.c   |   4 +-
>>  4 files changed, 153 insertions(+), 48 deletions(-)
>>
>> diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
>> index ae975d19ad..061c8c7a20 100644
>> --- a/doc/guides/nics/ice.rst
>> +++ b/doc/guides/nics/ice.rst
>> @@ -328,6 +328,22 @@ Forward Error Correction (FEC)
>>  
>>  Supports get/set FEC mode and get FEC capability.
>>  
>> +Time Synchronisation
>> +
>> +
>> +The system operator can run a PTP (Precision Time Protocol) client 
>> application
>> +to synchronise the time on the network card (and optionally the time on the
>> +system) to the PTP master.
>> +
>> +ICE PMD supports PTP client applications that use the DPDK IEEE1588 API to
>> +communicate with the PTP master clock. Note that PTP client application 
>> needs
>> +to run on PF and add the ``--force-max-simd-bitwidth=64`` startup parameter 
>> to
>> +disable vector mode.
>> +
>> +.. code-block:: console
>> +
>> +examples/dpdk-ptpclient -c f -n 3 -a :ec:00.1 
>> --force-max-simd-bitwidth=64 -- -T 1 -p 0x1 -c 1
>> +
> 
> It's a pity that the vector disabling doesn't happen automatically
> somewhere and that we have to ask the user to pass in the flag. Maybe see
> if that can be improved for RC2 or next release?
> 
> Anyway, for this patch:
> 
> Acked-by: Bruce Richardson 
> 

Applied to dpdk-next-net/main, thanks.

Re: [PATCH v6 0/3] add frequency adjustment support for PTP

On 10/11/2024 7:34 AM, Mingjin Ye wrote:
> [1/3] ethdev: add frequency adjustment API
> [2/3] net/ice: add frequency adjustment support for PTP
> [3/3] examples/ptpclient: add frequency adjustment
> ---
> v2: rte_eth_timesync_adjust_freq marked as experimental.
> ---
> v3: Add more description for API.
> ---
> v4: Documentation for adding a ptpclient.
> ---
> v5: Fixes in ice.
> ---
> v6: Fix git commit conflict.
> 
> Mingjin Ye (3):
>   ethdev: add frequency adjustment API
>   net/ice: add frequency adjustment support for PTP
>   examples/ptpclient: add frequency adjustment
> 

Merged this patch series partially, 1/3 & 2/3 merged to be able to get
them to -rc1.

Discussions on sample application, patch 3/3, can continue.

Re: [PATCH v6 1/3] ethdev: add frequency adjustment API

On 10/11/2024 7:34 AM, Mingjin Ye wrote:
> This patch adds freq adjustment API for PTP high accuracy.
> 
> Signed-off-by: Simei Su 
> Signed-off-by: Mingjin Ye 
>

Reviewed-by: Ferruh Yigit 

Applied to dpdk-next-net/main, thanks.

Re: [PATCH 0/2] enhance the flower service framework

On 10/10/2024 8:45 AM, Chaoyong He wrote:
> Enhance the NFP service framework when using the flower firmware, we can
> using the alarm now when there is no service core. 
> 
> Long Wu (2):
>   net/nfp: rename flower service flag
>   net/nfp: enhance the flower service framework
>

Series applied to dpdk-next-net/main, thanks.

Re: [PATCH 0/3] net/tap: queue limit patches

On 10/11/2024 6:29 PM, Stephen Hemminger wrote:
> Some patches related to recent increase in possible queues.
> 
> Stephen Hemminger (3):
>   net/tap: handle increase in mp_max_fds
>   net/tap: add static assert to make sure max queues less than fd limit
>   net/tap: increase the maximum allowable queues
> 

For series,
Acked-by: Ferruh Yigit 

Series applied to dpdk-next-net/main, thanks.

Re: [PATCH v3 1/3] ethdev: add traffic manager query function

On 10/9/2024 11:32 AM, Bruce Richardson wrote:
> Add function to allow querying a node in the scheduler tree.  Returns
> the parameters as were given to the add function. Adding this function
> allows apps to just query the hierarchy rather than having to maintain
> their own copies of it internally.
> 
> Signed-off-by: Bruce Richardson 
>

Acked-by: Ferruh Yigit

Re: [PATCH v3 0/3] add support for querying ethdev TM nodes

On 10/9/2024 11:32 AM, Bruce Richardson wrote:
> Add support for an ethdev TM query function to allow apps to get details
> of the TM nodes they previously configured. This patchset includes:
> 
> * ethdev changes to add the API
> * implementation of the function in "ice" pmd
> * testpmd command to allow testing the function
> 
> v3: updated comments and removed null check on patch 1.
> Added doc to patch 3.
> V2: rebase to the head of next-net tree
> 
> 
> Bruce Richardson (3):
>   ethdev: add traffic manager query function
>   net/ice: add traffic management node query function
>   app/testpmd: add support for querying TM nodes
>

Series applied to dpdk-next-net/main, thanks.

Re: [PATCH v6 0/4] fix segment fault when parse args

2024-10-11 Thread fengchengwen

On 2024/10/11 22:13, David Marchand wrote:
> On Wed, Oct 9, 2024 at 6:50 AM Chengwen Feng  wrote:
>>
>> The rte_kvargs_process() was used to parse key-value (e.g. socket_id=0),
>> it also supports to parse only-key (e.g. socket_id). But many drivers's
>> callback can only handle key-value, it will segment fault if handles
>> only-key. so the patchset [1] was introduced.
>>
>> Because the patchset [1] modified too much drivers, therefore:
>> 1) A new API rte_kvargs_process_opt() was introduced, it inherits the
>> function of rte_kvargs_process() which could parse both key-value and
>> only-key.
>> 2) Constraint the rte_kvargs_process() can only parse key-value.
>>
>> [1] 
>> https://patches.dpdk.org/project/dpdk/patch/20230320092110.37295-1-fengcheng...@huawei.com/
>>
>> Chengwen Feng (4):
>>   kvargs: add one new process API
>>   net/sfc: use new API to parse kvargs
>>   net/tap: use new API to parse kvargs
>>   common/nfp: use new API to parse kvargs
> 
> I see Stephen wanted to have a look on the series, but rc1 is close,
> and I decided to merge it with following comments.
> 
> I marked the new API as stable from the go.
> The reason being that it provides the "legacy" behavior of previous
> rte_kvargs_process function, and as such, some users may have been
> relying on it.
> If you think this is wrong, please submit a followup patch.

I just reviewed the merged commit, it's OK.

> 
> I was surprised of the change in behavior wrt to a NULL kvlist input
> parameter mixed in the change.
> But I left it as is. Idem, a followup patch may be sent.
> 
> 
> Series applied, thanks for fixing this old issue Chengwen.

Thanks David

>

Re: [PATCH v6 2/3] net/ice: add frequency adjustment support for PTP

On Fri, Oct 11, 2024 at 06:34:06AM +, Mingjin Ye wrote:
> Add ice support for new ethdev API to adjust frequency for IEEE1588
> PTP. Also, this patch reworks code for converting software update
> to hardware update.
> 
> Signed-off-by: Simei Su 
> Signed-off-by: Mingjin Ye 
> ---
>  doc/guides/nics/ice.rst  |  16 
>  drivers/net/ice/ice_ethdev.c | 176 ++-
>  drivers/net/ice/ice_ethdev.h |   5 +-
>  drivers/net/ice/ice_rxtx.c   |   4 +-
>  4 files changed, 153 insertions(+), 48 deletions(-)
> 
> diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
> index ae975d19ad..061c8c7a20 100644
> --- a/doc/guides/nics/ice.rst
> +++ b/doc/guides/nics/ice.rst
> @@ -328,6 +328,22 @@ Forward Error Correction (FEC)
>  
>  Supports get/set FEC mode and get FEC capability.
>  
> +Time Synchronisation
> +
> +
> +The system operator can run a PTP (Precision Time Protocol) client 
> application
> +to synchronise the time on the network card (and optionally the time on the
> +system) to the PTP master.
> +
> +ICE PMD supports PTP client applications that use the DPDK IEEE1588 API to
> +communicate with the PTP master clock. Note that PTP client application needs
> +to run on PF and add the ``--force-max-simd-bitwidth=64`` startup parameter 
> to
> +disable vector mode.
> +
> +.. code-block:: console
> +
> +examples/dpdk-ptpclient -c f -n 3 -a :ec:00.1 
> --force-max-simd-bitwidth=64 -- -T 1 -p 0x1 -c 1
> +

It's a pity that the vector disabling doesn't happen automatically
somewhere and that we have to ask the user to pass in the flag. Maybe see
if that can be improved for RC2 or next release?

Anyway, for this patch:

Acked-by: Bruce Richardson

Re: [PATCH v9 1/7] eal: add static per-lcore memory allocation facility


On 2024-10-10 23:24, Thomas Monjalon wrote:

Hello,

This new feature looks to bring something interesting to DPDK.
There was a good amount of discussion and review,
and there is a real effort of documentation.

However, some choices done in this implementation
were not explained or advertised enough in the documentation,
in my opinion.



Are those of relevance to the API user?


I think the first thing to add is an explanation of the memory layout.
Maybe that a SVG drawing would help to show how it is stored.



That would be helpful to someone wanting to understand the internals. 
But where should that go? If it's put in the API, it will also obscure 
the *actual* API documentation.


I have some drawings already, and I agree they are very helpful - both 
in explaining how things work, and making obvious why the memory layout 
resulting from the use of lcore variables are superior to that of the 
lcore id-index static array approach.



We also need to explain why it is not using rte_malloc.

Also please could you re-read the doc and comments in detail?
I think some words are missing and there are typos.
While at it, please allow for easy update of the text
by starting each sentence on a new line.
Breaking lines logically is better for future patches.
One more advice: avoid very long sentences.



I've gone through the documentation and will post a new patch set.

There's been a lot of comments and discussion on this patch set. Did you 
have anything in particular in mind?



Do you have benchmarks results of the modules using such variables
(power, random, service)?
It would be interesting to compare time efficiency and memory usage
before/after, with different number of threads.



I have the dummy modules of test_lcore_var_perf.c, which show the 
performance benefits as the number of modules using lcore variables 
increases.


That said, the gains are hard to quantify with micro benchmarks, and for 
real-world performance, one really has to start using the facility at 
scale before anything interesting may happen.


Keep in mind however, that while this is new to DPDK, similar facilities 
already exists your favorite UN*X kernel. The implementation is 
different, but I think it's accurate to say the goal and the effects 
should be the same.


One can also run the perf autotest for RTE random, but such tests only 
show lcore variables doesn't make things significantly worse when the L1 
cache is essentially unused. (In fact, the lcore variable-enabled 
rte_random.c somewhat counter-intuitively generates a 64-bit number 1 
TSC cycle faster than the old version on my system.)


Just to be clear: it's the footprint in the core-private caches we are 
attempting to reduce.



Adding more detailed comments below.


10/10/2024 16:21, Mattias Rönnblom:

Introduce DPDK per-lcore id variables, or lcore variables for short.

An lcore variable has one value for every current and future lcore
id-equipped thread.


I find it difficult to read "lcore id-equipped thread".
Can we just say "DPDK thread"?



Sure, if you point me to a definition of what a DPDK thread is.

I can think of at least four potential definitions
* An EAL thread
* An EAL thread or a registered non-EAL thread
* Any thread calling into DPDK APIs
* Any thread living in a DPDK process


The primary  use case is for statically allocating
small, frequently-accessed data structures, for which one instance
should exist for each lcore.

Lcore variables are similar to thread-local storage (TLS, e.g., C11
_Thread_local), but decoupling the values' life time with that of the
threads.


In which situation we need values of a dead thread?



To clean up heap-allocated memory referenced by such variables, for 
example, or other resources.



[...]

+An application, a DPDK library or PMD may keep opt to keep per-thread
+state.


I don't understand this sentence.



Which part is unclear?


+
+Per-thread data may be maintained using either *lcore variables*
+(``rte_lcore_var.h``), *thread-local storage (TLS)*
+(``rte_per_lcore.h``), or a static array of ``RTE_MAX_LCORE``
+elements, index by ``rte_lcore_id()``. These methods allows for


index*ed*



Fixed.


+per-lcore data to be a largely module-internal affair, and not
+directly visible in its API. Another possibility is to have deal


*to* deal ?


+explicitly with per-thread aspects in the API (e.g., the ports of the
+Eventdev API).
+
+Lcore varibles are suitable for small object statically allocated at


vari*a*bles



Fixed.


+the time of module or application initialization. An lcore variable
+take on one value for each lcore id-equipped thread (i.e., for EAL
+threads and registered non-EAL threads, in total ``RTE_MAX_LCORE``
+instances). The lifetime of lcore variables are detached from that of
+the owning threads, and may thus be initialized prior to the owner
+having been created.
+
+Variables with thread-local storage are allocated at the time of
+thread creation, and exists until the thread termin

RE: [PATCH v9 1/7] eal: add static per-lcore memory allocation facility

2024-10-11 Thread Morten Brørup

Mattias,
Please note that most of Thomas' questions are in the interest of the general 
public, considered requests for further documentation.


> Do you have benchmarks results of the modules using such variables
> (power, random, service)?
> It would be interesting to compare time efficiency and memory usage
> before/after, with different number of threads.

IMO, the main benefit is the reduction of waste of CPU data cache (no need for 
RTE_CACHE_GUARD fillers in per-lcore data structures).

Mattias,
The PMU counters library is too new, so I suppose you cannot yet measure the 
primary benefit, but only derived benefits.
If you have any kind of perf data, please provide them.


> > Lcore variables are similar to thread-local storage (TLS, e.g., C11
> > _Thread_local), but decoupling the values' life time with that of the
> > threads.
> 
> In which situation we need values of a dead thread?

Values of dead threads is not the issue here.
This is:
1. TLS variables are allocated and initialized when ANY thread is created, so 
using TLS for variables increases the cost of creating new threads. This is 
relevant for applications that frequently create (and destroy) short-lived 
threads.
2. TLS variables uses memory for ALL threads, regardless if those threads use 
the TLS variables or not. This increases the memory footprint for applications 
that create many (long lived) threads.

Mattias:
Thomas' question might still be relevant...
Is there any situation where the values of the lcore variables are relevant 
after the thread is dead?

> > +Variables with thread-local storage are allocated at the time of
> > +thread creation, and exists until the thread terminates, for every
> > +thread in the process. Only very small object should be allocated in
> > +TLS, since large TLS objects significantly slows down thread
> creation
> > +and may needlessly increase memory footprint for application that
> make
> > +extensive use of unregistered threads.
> 
> I don't understand the relation with non-DPDK threads.

The relationship here is the same as I described above: Using TLS for DPDK 
threads also have a cost for non-DPDK threads.


> [...]
> > +#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR * RTE_MAX_LCORE)
> 
> With #define RTE_MAX_LCORE_VAR 1048576,
> LCORE_BUFFER_SIZE can be 100MB, right?

Mattias:
You should document what you have explained...
This huge amount of memory is not really consumed, but relies on paging; it is 
only a large virtual address space.
This also makes it somewhat advantageous that the lcore variables don't use 
hugepages.

> > +static size_t offset = RTE_MAX_LCORE_VAR;
> 
> A comment may be useful for this value: it triggers the first alloc?

Mattias:
If you recall, I also had a hard time understanding this design (instead of 
simply comparing lcore_buffer to NULL).
Please add a comment that this not only triggers the first allocation, but also 
additional allocations, if using a lot of memory for lcore variables.

> 
> > +
> > +static void *
> > +lcore_var_alloc(size_t size, size_t align)
> > +{
> > +   void *handle;
> > +   unsigned int lcore_id;
> > +   void *value;
> > +
> > +   offset = RTE_ALIGN_CEIL(offset, align);
> > +
> > +   if (offset + size > RTE_MAX_LCORE_VAR) {
> > +#ifdef RTE_EXEC_ENV_WINDOWS
> > +   lcore_buffer = _aligned_malloc(LCORE_BUFFER_SIZE,
> > +  RTE_CACHE_LINE_SIZE);
> > +#else
> > +   lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
> > +LCORE_BUFFER_SIZE);
> > +#endif
> > +   RTE_VERIFY(lcore_buffer != NULL);
> 
> Please no panic in a lib.
> You can return NULL.

I agree with Mattias design here.
Lcore variables are like RTE_PER_LCORE variables and simple "static" variables.
If the system does not have enough memory for those, the application will not 
be able to deal with it.
Panic early (in this lib) is the correct way to deal with it.


> > +/**
> > + * @file
> > + *
> > + * RTE Lcore variables
> 
> Please don't say "RTE", it is just a prefix.
> You can replace it with "DPDK" if you really want to be specific.

The commit message says:
"Introduce DPDK per-lcore id variables, or lcore variables for short."

Use one of the two here.
I personally prefer the short variant, "Lcore variables", because that is the 
term we are going to use in conversations on the mailing list, documentation 
etc.
The long variant is mainly intended for explaining the library itself.


> > + * Lcore variables cannot and need not be freed.
> 
> I'm curious about that.
> If EAL is closed, and the application continues its life,
> then we want all this memory to be cleaned as well.
> Do you know rte_eal_cleanup()?

Good catch, Thomas! I missed that in my review.
Mattias, it seems you need a chained list of lcore_buffer allocations for this.


> > + *
> > + * The size of an lcore variable's value must be less than the DPDK
> 
> size of variable, not size of value

Initially, I thought the

Re: [PATCH v3 00/50] Provide: flow filter init API, Enable virtual queues, fix ntnic issues for release 24.07

On 10/10/2024 3:13 PM, Serhii Iliushyk wrote:
> The list of updates provided by the patchset:
>   * Update the supported version of the FPGA to 9563.55.49
>   * Fix Coverity issues
>   * Fix issues related to release 24.07
>   * Extended and fixed the implementation of the logging
>   * Added NT flow filter init API
>   * Added NT flow backend initialization API
>   * Added initialization of FPGA modules related to flow HW offload
>   * Added basic handling of the virtual queues
>   * Update documentation
>   * Fix logging according to the requirement to not have '\n'
> 
> Danylo Vodopianov (15):
>   net/ntnic: fix coverity issues:
>   net/ntnic: extend and fix logging implementation
>   net/ntnic: add basic queue operations
>   net/ntnic: enhance Ethernet device configuration
>   net/ntnic: add scatter-gather HW deallocation
>   net/ntnic: add queue setup operations
>   net/ntnic: add packet handler for virtio queues
>   net/ntnic: add init for virt queues in the DBS
>   net/ntnic: add split-queue support
>   net/ntnic: add functions for availability monitor management
>   net/ntnic: used writer data handling functions
>   net/ntnic: add descriptor reader data handling functions
>   net/ntnic: virtqueue setup managed packed-ring was added
>   net/ntnic: add functions for releasing virt queues
>   net/ntnic: add functions for retrieving and managing packets
> 
> Oleksandr Kolomeiets (33):
>   net/ntnic: update NT NiC PMD driver with FPGA version
>   net/ntnic: update documentation
>   net/ntnic: remove extra calling of the API for release port
>   net/ntnic: add flow filter init API
>   net/ntnic: add flow filter deinitialization API
>   net/ntnic: add flow backend initialization API
>   net/ntnic: add flow backend deinitialization API
>   net/ntnic: add INFO flow module
>   net/ntnic: add categorizer (CAT) flow module
>   net/ntnic: add key match (KM) flow module
>   net/ntnic: add flow matcher (FLM) flow module
>   net/ntnic: add IP fragmenter (IFR) flow module
>   net/ntnic: add hasher (HSH) flow module
>   net/ntnic: add queue select (QSL) flow module
>   net/ntnic: add slicer (SLC LR) flow module
>   net/ntnic: add packet descriptor builder (PDB) flow module
>   net/ntnic: add header field update (HFU) flow module
>   net/ntnic: add RPP local retransmit (RPP LR) flow module
>   net/ntnic: add copier (Tx CPY) flow module
>   net/ntnic: add checksum update (CSU) flow module
>   net/ntnic: add insert (Tx INS) flow module
>   net/ntnic: add replacer (Tx RPL) flow module
>   net/ntnic: add base init and deinit of the NT flow API
>   net/ntnic: add base init and deinit the NT flow backend
>   net/ntnic: add categorizer (CAT) FPGA module
>   net/ntnic: add key match (KM) FPGA module
>   net/ntnic: add flow matcher (FLM) FPGA module
>   net/ntnic: add hasher (HSH) FPGA module
>   net/ntnic: add queue select (QSL) FPGA module
>   net/ntnic: add slicer (SLC LR) FPGA module
>   net/ntnic: add packet descriptor builder (PDB) FPGA module
>   net/ntnic: add Tx Packet Editor (TPE) FPGA module
>   net/ntnic: add receive MAC converter (RMC) core module
> 
> Serhii Iliushyk (2):
>   net/ntnic: add Tx Packet Editor (TPE) flow module
>   net/ntnic: update FPGA registeris related to DBS
>

Added implied ack for series,
Acked-by: Serhii Iliushyk 

Series applied to dpdk-next-net/main, thanks.

Re: [PATCH v5 0/2] add two commands for testpmd application

On 10/10/2024 7:42 AM, Chaoyong He wrote:
> This patch series aims to add two commands for the testpmd application:
> - testpmd> set port  eeprom  magic  \
>   value  offset 
> - testpmd> set port  led 
> 
> ---
> v5:
> * Use 'accept_risk' rather than 'confirm' to make it clear enough.
> * Use dynamic memory alloc rather than VLA.
> * Place the logic in the right place following the reviewer's request.
> v4:
> * Make the log more readable by using 'rte_strerror()'.
> * Make the modification of document following the rules.
> v3:
> * Add acknowledgement for the set eeprom command.
> * Modify logic following the request from reviewer.
> v2:
> * Solve the conflict in document file.
> ---
> 
> James Hershaw (2):
>   app/testpmd: add support for setting device EEPROM
>   app/testpmd: add set dev led on/off command
>

For series,
Acked-by: Ferruh Yigit 

Series applied to dpdk-next-net/main, thanks.

Re: [PATCH v10 0/2] power: introduce PM QoS interface

2024-10-11 Thread lihuisong (C)

Hi Thomas and Ferruh,

Kindly ping for merge.

/Huisong
在 2024/9/12 11:07, fengchengwen 写道:

Series-reviewed-by: Chengwen Feng

Thanks chengwen.

On 2024/9/12 10:38, Huisong Li wrote:

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

[1]
https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us

---
v10:
- replace LINE_MAX with a custom macro and fix two typos.
v9:
- move new feature description from release_24_07.rst to release_24_11.rst.
v8:
- update the latest code to resolve CI warning
v7:
- remove a dead code rte_lcore_is_enabled in patch[2/2]
v6:
- update release_24_07.rst based on dpdk repo to resolve CI warning.
v5:
- use LINE_MAX to replace BUFSIZ, and use snprintf to replace sprintf.
v4:
- fix some comments basd on Stephen
- add stdint.h include
- add Acked-by Morten Brørup
v3:
- add RTE_POWER_xxx prefix for some macro in header
- add the check for lcore_id with rte_lcore_is_enabled
v2:
- use PM QoS on CPU wide to replace the one on system wide

Huisong Li (2):
power: introduce PM QoS API on CPU wide
examples/l3fwd-power: add PM QoS configuration

doc/guides/prog_guide/power_man.rst| 24 ++
doc/guides/rel_notes/release_24_11.rst | 5 ++
examples/l3fwd-power/main.c| 24 ++
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 111 +
lib/power/rte_power_qos.h | 73
lib/power/version.map | 4 +
7 files changed, 243 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h

[PATCH v2 1/3] net/tap: handle increase in mp_max_fds

Now that max MP fds has increased to 253 it is possible that
the number of queues the TAP device can handle is less than that.
Therefore the code to handle MP message should only allow the
number of queues it can handle.

Coverity issue: 445386
Signed-off-by: Stephen Hemminger 
---
 drivers/net/tap/rte_eth_tap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 5ad3bbadd1..c486c6f073 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -2391,9 +2391,10 @@ tap_mp_sync_queues(const struct rte_mp_msg *request, 
const void *peer)
reply_param->q_count = 0;
 
RTE_ASSERT(dev->data->nb_rx_queues == dev->data->nb_tx_queues);
-   if (dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) {
+
+   if (dev->data->nb_rx_queues > RTE_PMD_TAP_MAX_QUEUES) {
TAP_LOG(ERR, "Number of rx/tx queues %u exceeds max number of 
fds %u",
-   dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
+   dev->data->nb_rx_queues, RTE_PMD_TAP_MAX_QUEUES);
return -1;
}
 
-- 
2.45.2

[PATCH v2 3/3] net/tap: remove unnecessary checks in configure

The ethdev layer already validates that the number of requested
queues is less than the reported max queues.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/tap/rte_eth_tap.c | 16 
 1 file changed, 16 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index c486c6f073..46afc9e2cb 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -948,22 +948,6 @@ tap_dev_configure(struct rte_eth_dev *dev)
 {
struct pmd_internals *pmd = dev->data->dev_private;
 
-   if (dev->data->nb_rx_queues > RTE_PMD_TAP_MAX_QUEUES) {
-   TAP_LOG(ERR,
-   "%s: number of rx queues %d exceeds max num of queues 
%d",
-   dev->device->name,
-   dev->data->nb_rx_queues,
-   RTE_PMD_TAP_MAX_QUEUES);
-   return -1;
-   }
-   if (dev->data->nb_tx_queues > RTE_PMD_TAP_MAX_QUEUES) {
-   TAP_LOG(ERR,
-   "%s: number of tx queues %d exceeds max num of queues 
%d",
-   dev->device->name,
-   dev->data->nb_tx_queues,
-   RTE_PMD_TAP_MAX_QUEUES);
-   return -1;
-   }
if (dev->data->nb_rx_queues != dev->data->nb_tx_queues) {
TAP_LOG(ERR,
"%s: number of rx queues %d must be equal to number of 
tx queues %d",
-- 
2.45.2

[PATCH v2 2/3] net/tap: increase the maximum allowable queues

The default of 16 is too low for larger systems.
Upper bound is the maximum number of fd's that can be passed to
secondary process (253) and the maximum number of queue's allowed
by Linux kernel (256).  Both these manifest constants, are not
exposed in visible kernel API.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/tap/meson.build | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tap/meson.build b/drivers/net/tap/meson.build
index 5e5a3ad3c6..4853efb236 100644
--- a/drivers/net/tap/meson.build
+++ b/drivers/net/tap/meson.build
@@ -14,7 +14,8 @@ sources = files(
 
 deps = ['bus_vdev', 'gso', 'hash']
 
-max_queues = '-DTAP_MAX_QUEUES=16'
+# TAP device is limited by RTE_MP_MAX_FD_NUM
+max_queues = '-DTAP_MAX_QUEUES=253'
 cflags += max_queues
 
 require_iova_in_mbuf = false
-- 
2.45.2

[PATCH v2 0/3] net/tap: queue limit patches

Some patches related to recent queue limit changes.

v2 - up the limit to maximum Linux can support
 dont use static_assert here
 get rid of unreachable checks in configure

Stephen Hemminger (3):
  net/tap: handle increase in mp_max_fds
  net/tap: increase the maximum allowable queues
  net/tap: remove unnecessary checks in configure

 drivers/net/tap/meson.build   |  3 ++-
 drivers/net/tap/rte_eth_tap.c | 21 +++--
 2 files changed, 5 insertions(+), 19 deletions(-)

-- 
2.45.2

Re: [PATCH 0/3] net/tap: queue limit patches

On Sat, 12 Oct 2024 02:01:12 +0100
Ferruh Yigit  wrote:

> On 10/11/2024 6:29 PM, Stephen Hemminger wrote:
> > Some patches related to recent increase in possible queues.
> > 
> > Stephen Hemminger (3):
> >   net/tap: handle increase in mp_max_fds
> >   net/tap: add static assert to make sure max queues less than fd limit
> >   net/tap: increase the maximum allowable queues
> >   
> 
> For series,
> Acked-by: Ferruh Yigit 
> 
> Series applied to dpdk-next-net/main, thanks.
> 


I sent a revised one, mostly because of the failure in loongarch build.
But also after further investigation of what is upper limit in Linux
and doing tests at max queues.

Re: [PATCH v2 1/4] eal: add bitset type


On 2024-10-10 10:30, David Marchand wrote:

From: Mattias Rönnblom 

Introduce a set of functions and macros that operate on sets of bits,
kept in arrays of 64-bit words.

RTE bitset is designed for bitsets which are larger than what fits in
a single machine word (i.e., 64 bits).


Tiny detail: there is an extra newline here that shouldn't be there.


For very large bitsets, the  API may be a more appropriate
choice.

Signed-off-by: Mattias Rönnblom 
Acked-by: Morten Brørup 
Acked-by: Tyler Retzlaff 
---
  MAINTAINERS|   6 +
  app/test/meson.build   |   1 +
  app/test/test_bitset.c | 861 +
  doc/api/doxy-api-index.md  |   1 +
  doc/guides/rel_notes/release_24_11.rst |   9 +
  lib/eal/common/meson.build |   1 +
  lib/eal/common/rte_bitset.c|  30 +
  lib/eal/include/meson.build|   1 +
  lib/eal/include/rte_bitset.h   | 998 +
  lib/eal/version.map|   3 +
  10 files changed, 1911 insertions(+)
  create mode 100644 app/test/test_bitset.c
  create mode 100644 lib/eal/common/rte_bitset.c
  create mode 100644 lib/eal/include/rte_bitset.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 812463fe9f..00da3dac29 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -255,6 +255,12 @@ M: Jack Bond-Preston 
  F: lib/eal/include/rte_bitops.h
  F: app/test/test_bitops.c
  
+Bitset

+M: Mattias Rönnblom 
+F: lib/eal/include/rte_bitset.h
+F: lib/eal/common/rte_bitset.c
+F: app/test/test_bitset.c
+
  Bitmap
  M: Cristian Dumitrescu 
  F: lib/eal/include/rte_bitmap.h
diff --git a/app/test/meson.build b/app/test/meson.build
index e29258e6ec..fe248b786c 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -33,6 +33,7 @@ source_file_deps = {
  'test_bitcount.c': [],
  'test_bitmap.c': [],
  'test_bitops.c': [],
+'test_bitset.c': [],
  'test_bitratestats.c': ['metrics', 'bitratestats', 'ethdev'] + 
sample_packet_forward_deps,
  'test_bpf.c': ['bpf', 'net'],
  'test_byteorder.c': [],
diff --git a/app/test/test_bitset.c b/app/test/test_bitset.c
new file mode 100644
index 00..fd3e50f396
--- /dev/null
+++ b/app/test/test_bitset.c
@@ -0,0 +1,861 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2023 Ericsson AB
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+
+#include "test.h"
+
+#define MAGIC UINT64_C(0xdeadbeefdeadbeef)
+
+static void
+rand_buf(void *buf, size_t n)
+{
+   size_t i;
+
+   for (i = 0; i < n; i++)
+   ((unsigned char *)buf)[i] = rte_rand();
+}
+
+static uint64_t *
+alloc_bitset(size_t size)
+{
+   uint64_t *p;
+
+   p = malloc(RTE_BITSET_SIZE(size) + 2 * sizeof(uint64_t));
+   if (p == NULL)
+   rte_panic("Unable to allocate memory\n");
+
+   rand_buf(&p[0], RTE_BITSET_SIZE(size));
+
+   p[0] = MAGIC;
+   p[RTE_BITSET_NUM_WORDS(size) + 1] = MAGIC;
+
+   return p + 1;
+}
+
+
+static int
+free_bitset(uint64_t *bitset, size_t size)
+{
+   uint64_t *p;
+
+   p = bitset - 1;
+
+   if (p[0] != MAGIC)
+   return TEST_FAILED;
+
+   if (p[RTE_BITSET_NUM_WORDS(size) + 1] != MAGIC)
+   return TEST_FAILED;
+
+   free(p);
+
+   return TEST_SUCCESS;
+}
+
+static bool
+rand_bool(void)
+{
+   return rte_rand_max(2);
+}
+
+static void
+rand_bool_ary(bool *ary, size_t len)
+{
+   size_t i;
+
+   for (i = 0; i < len; i++)
+   ary[i] = rand_bool();
+}
+
+static void
+rand_unused_bits(uint64_t *bitset, size_t size)
+{
+   uint64_t bits = rte_rand() & ~__RTE_BITSET_USED_MASK(size);
+
+   bitset[RTE_BITSET_NUM_WORDS(size) - 1] |= bits;
+}
+
+static void
+rand_bitset(uint64_t *bitset, size_t size)
+{
+   size_t i;
+
+   rte_bitset_init(bitset, size);
+
+   for (i = 0; i < size; i++)
+   rte_bitset_assign(bitset, i, rand_bool());
+
+   rand_unused_bits(bitset, size);
+}
+
+typedef bool test_fun(const uint64_t *bitset, size_t bit_num);
+typedef void set_fun(uint64_t *bitset, size_t bit_num);
+typedef void clear_fun(uint64_t *bitset, size_t bit_num);
+typedef void assign_fun(uint64_t *bitset, size_t bit_num, bool value);
+typedef void flip_fun(uint64_t *bitset, size_t bit_num);
+
+static int
+test_set_clear_size(test_fun test_fun, set_fun set_fun, clear_fun clear_fun, 
size_t size)
+{
+   size_t i;
+   bool reference[size];
+   uint64_t *bitset;
+
+   rand_bool_ary(reference, size);
+
+   bitset = alloc_bitset(size);
+
+   TEST_ASSERT(bitset != NULL, "Failed to allocate memory");
+
+   rte_bitset_init(bitset, size);
+
+   for (i = 0; i < size; i++) {
+   if (reference[i])
+   set_fun(bitset, i);
+   else
+   clear_fun(bitset, i);
+   }
+
+   for (i = 0; i < size; i++)
+   if (reference[i] != test_fun(bitset, i))
+

Re: [PATCH v4 2/5] graph: add node fastpath error counters


, Aug 16, 2024 at 17:09:

From: Pavan Nikhilesh 

Add node fastpath error counters advertised during
node registration.

Signed-off-by: Pavan Nikhilesh 


Reviewed-by: Robin Jarry

Re: [PATCH v4 4/5] node: add error stats for ip4 lookup node


Hi Pavan,

, Aug 16, 2024 at 17:09:

From: Pavan Nikhilesh 

Add error counters for ip4 LPM lookup failures in
ip4_lookup node.

Signed-off-by: Pavan Nikhilesh 


[snip]


diff --git a/lib/node/node_private.h b/lib/node/node_private.h
index 1de7306792..36b2a733db 100644
--- a/lib/node/node_private.h
+++ b/lib/node/node_private.h
@@ -12,6 +12,8 @@
 #include 
 #include 
 
+#include 

+
 extern int rte_node_logtype;
 #define RTE_LOGTYPE_NODE rte_node_logtype
 
@@ -88,4 +90,10 @@ node_mbuf_priv2(struct rte_mbuf *m)

return (struct node_mbuf_priv2 *)rte_mbuf_to_priv(m);
 }
 
+#define NODE_INCREMENT_ERROR_ID(node, id, cond, cnt)   \

+   {   
   \
+   if (unlikely(rte_graph_has_stats_feature() && (cond)))  
   \
+   ((uint64_t *)RTE_PTR_ADD(node, node->err_off))[id] += 
(cnt);   \
+   }


This is private API which is not usable with out-of-tree nodes. Could 
you expose a way to increment a given error stat that does not involve 
a hacky macro?


How about something like this in rte_graph_worker_common.h:

static inline void
rte_node_increment_error(struct rte_node *node, uint16_t err_id, uint64_t value)
{
#ifdef RTE_LIBRTE_GRAPH_STATS
   uint64_t *errors = RTE_PTR_ADD(node, node->err_off);
   errors[err_id] += value;
#else
   RTE_SET_USED(node);
   RTE_SET_USED(err_id);
   RTE_SET_USED(value);
#endif
}

NB: do *not* include a condition in that function. The decision whether 
to increment an error stat or not should remain in the nodes.

Re: [PATCH v3] fib: network byte order IPv4 lookup

On Thu, Oct 10, 2024 at 1:26 PM Vladimir Medvedkin
 wrote:
> diff --git a/lib/fib/meson.build b/lib/fib/meson.build
> index 6795f41a0a..8c03496cdc 100644
> --- a/lib/fib/meson.build
> +++ b/lib/fib/meson.build
> @@ -25,40 +25,28 @@ if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok
>  # linked into main lib.
>
>  # check if all required flags already enabled (variant a).
> -acl_avx512_flags = ['__AVX512F__','__AVX512DQ__']
> -acl_avx512_on = true
> -foreach f:acl_avx512_flags
> +fib_avx512_flags = ['__AVX512F__','__AVX512DQ__', '__AVX512BW__']
> +fib_avx512_on = true
> +foreach f:fib_avx512_flags
>  if cc.get_define(f, args: machine_args) == ''
> -acl_avx512_on = false
> +fib_avx512_on = false
>  endif
>  endforeach

Repeating comment on v2 that was lost because of duplicate submission (?):

Please reuse the common checks recently merged, see for example:
https://git.dpdk.org/dpdk/diff/drivers/event/dlb2/meson.build?id=ef7a4025cd714189dc333bb19ea60c2abdeffb7d

>
> -if acl_avx512_on == true
> -cflags += ['-DCC_DIR24_8_AVX512_SUPPORT']
> -sources += files('dir24_8_avx512.c')
> -# TRIE AVX512 implementation uses avx512bw intrinsics along with
> -# avx512f and avx512dq
> -if cc.get_define('__AVX512BW__', args: machine_args) != ''
> -cflags += ['-DCC_TRIE_AVX512_SUPPORT']
> -sources += files('trie_avx512.c')
> -endif
> -elif cc.has_multi_arguments('-mavx512f', '-mavx512dq')
> +if fib_avx512_on == true
> +cflags += ['-DCC_DIR24_8_AVX512_SUPPORT', '-DCC_TRIE_AVX512_SUPPORT']

Nit: now that both dir24_8 and trie share the same requirement, can we
go with a simple CC_AVX512_SUPPORT?
This is really a nit, I am ok if you prefer to separate both.


> +sources += files('dir24_8_avx512.c', 'trie_avx512.c')
> +elif cc.has_multi_arguments('-mavx512f', '-mavx512dq', '-mavx512bw')
>  dir24_8_avx512_tmp = static_library('dir24_8_avx512_tmp',
>  'dir24_8_avx512.c',
>  dependencies: static_rte_eal,
> -c_args: cflags + ['-mavx512f', '-mavx512dq'])
> +c_args: cflags + ['-mavx512f', '-mavx512dq', '-mavx512bw'])
>  objs += dir24_8_avx512_tmp.extract_objects('dir24_8_avx512.c')
> -cflags += ['-DCC_DIR24_8_AVX512_SUPPORT']
> -# TRIE AVX512 implementation uses avx512bw intrinsics along with
> -# avx512f and avx512dq
> -if cc.has_argument('-mavx512bw')
> -trie_avx512_tmp = static_library('trie_avx512_tmp',
> +trie_avx512_tmp = static_library('trie_avx512_tmp',
>  'trie_avx512.c',
>  dependencies: static_rte_eal,
> -c_args: cflags + ['-mavx512f', \
> -'-mavx512dq', '-mavx512bw'])
> -objs += trie_avx512_tmp.extract_objects('trie_avx512.c')
> -cflags += ['-DCC_TRIE_AVX512_SUPPORT']
> -endif
> +c_args: cflags + ['-mavx512f', '-mavx512dq', '-mavx512bw'])
> +objs += trie_avx512_tmp.extract_objects('trie_avx512.c')
> +cflags += ['-DCC_DIR24_8_AVX512_SUPPORT', '-DCC_TRIE_AVX512_SUPPORT']
>  endif
>  endif
> diff --git a/lib/fib/rte_fib.c b/lib/fib/rte_fib.c
> index 4f9fba5a4f..991e48b5ea 100644
> --- a/lib/fib/rte_fib.c
> +++ b/lib/fib/rte_fib.c
> @@ -42,6 +42,7 @@ EAL_REGISTER_TAILQ(rte_fib_tailq)
>  struct rte_fib {
> charname[RTE_FIB_NAMESIZE];
> enum rte_fib_type   type;   /**< Type of FIB struct */
> +   int flags;  /**< Flags */
> struct rte_rib  *rib;   /**< RIB helper datastructure */
> void*dp;/**< pointer to the dataplane struct*/
> rte_fib_lookup_fn_t lookup; /**< FIB lookup function */
> @@ -110,7 +111,7 @@ init_dataplane(struct rte_fib *fib, __rte_unused int 
> socket_id,
> if (fib->dp == NULL)
> return -rte_errno;
> fib->lookup = dir24_8_get_lookup_fn(fib->dp,
> -   RTE_FIB_LOOKUP_DEFAULT);
> +   RTE_FIB_LOOKUP_DEFAULT, !!(fib->flags & 
> RTE_FIB_FLAG_LOOKUP_BE));
> fib->modify = dir24_8_modify;
> return 0;
> default:
> @@ -214,6 +215,7 @@ rte_fib_create(const char *name, int socket_id, struct 
> rte_fib_conf *conf)
> rte_strlcpy(fib->name, name, sizeof(fib->name));
> fib->rib = rib;
> fib->type = conf->type;
> +   fib->flags = conf->flags;

In addition to Robin comments, I also have a concern on the
extensibility aspect.

conf->flags must be validated against known flags.
Otherwise existing applications may pass wrong stuff and "work fine",
until the day we had one more flag.


> fib->def_nh = conf->default_nh;
> ret = init_dataplane(fib, socket_id, conf);
> if (ret

RE: [PATCH] eal/x86: cache queried CPU flags




> Rather than re-querying the HW each time a CPU flag is requested, we can
> just save the return value in the flags array. This should speed up
> repeated querying of CPU flags, and provides a workaround for a reported
> issue where errors are seen with constant querying of the AVX-512 CPU
> flag from a non-AVX VM.
> 
> Bugzilla Id: 1501
> 
> Signed-off-by: Bruce Richardson 
> ---
>  lib/eal/x86/rte_cpuflags.c | 20 +++-
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/eal/x86/rte_cpuflags.c b/lib/eal/x86/rte_cpuflags.c
> index 26163ab746..62e782fb4b 100644
> --- a/lib/eal/x86/rte_cpuflags.c
> +++ b/lib/eal/x86/rte_cpuflags.c
> @@ -8,6 +8,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "rte_cpuid.h"
> 
> @@ -21,12 +22,14 @@ struct feature_entry {
>   uint32_t bit;   /**< cpuid register bit */
>  #define CPU_FLAG_NAME_MAX_LEN 64
>   char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
> + bool has_value;
> + bool value;
>  };
> 
>  #define FEAT_DEF(name, leaf, subleaf, reg, bit) \
>   [RTE_CPUFLAG_##name] = {leaf, subleaf, reg, bit, #name },
> 
> -const struct feature_entry rte_cpu_feature_table[] = {
> +struct feature_entry rte_cpu_feature_table[] = {
>   FEAT_DEF(SSE3, 0x0001, 0, RTE_REG_ECX,  0)
>   FEAT_DEF(PCLMULQDQ, 0x0001, 0, RTE_REG_ECX,  1)
>   FEAT_DEF(DTES64, 0x0001, 0, RTE_REG_ECX,  2)
> @@ -147,7 +150,7 @@ const struct feature_entry rte_cpu_feature_table[] = {
>  int
>  rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
>  {
> - const struct feature_entry *feat;
> + struct feature_entry *feat;
>   cpuid_registers_t regs;
>   unsigned int maxleaf;
> 
> @@ -156,6 +159,8 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
>   return -ENOENT;
> 
>   feat = &rte_cpu_feature_table[feature];
> + if (feat->has_value)
> + return feat->value;
> 
>   if (!feat->leaf)
>   /* This entry in the table wasn't filled out! */
> @@ -163,8 +168,10 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
> 
>   maxleaf = __get_cpuid_max(feat->leaf & 0x8000, NULL);
> 
> - if (maxleaf < feat->leaf)
> - return 0;
> + if (maxleaf < feat->leaf) {
> + feat->value = 0;
> + goto out;
> + }
> 
>  #ifdef RTE_TOOLCHAIN_MSVC
>   __cpuidex(regs, feat->leaf, feat->subleaf);
> @@ -175,7 +182,10 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
>  #endif
> 
>   /* check if the feature is enabled */
> - return (regs[feat->reg] >> feat->bit) & 1;
> + feat->value = (regs[feat->reg] >> feat->bit) & 1;
> +out:
> + feat->has_value = true;
> + return feat->value;

If that function can be called by 2 (or more) threads simultaneously,
then In theory  'feat->has_value = true;' can be reordered with 
' feat->value = (regs[feat->reg] >> feat->bit) & 1;'  (by cpu or complier)
and some thread(s) can get wrong feat->value.
The probability of such collision is really low, but still seems not 
impossible. 

>  }
> 
>  const char *
> --
> 2.43.0

Re: [PATCH v2 2/2] test/pcapng: test chained mbufs

On Sat, Oct 5, 2024 at 6:14 PM Stephen Hemminger
 wrote:
> On Fri, 13 Sep 2024 16:06:00 +0300
> Oleksandr Nahnybida  wrote:
>
> > Adjust test to check if pcapng works with chained mbufs
> >
> > Signed-off-by: Oleksandr Nahnybida 
> Acked-by: Stephen Hemminger 

Series applied, thanks Oleksandr.


-- 
David Marchand

Re: [PATCH] app/dumpcap: fix handling of jumbo frames

On Fri, Oct 4, 2024 at 12:11 AM Stephen Hemminger
 wrote:
>
> If dumpcap (in legacy pcap mode) tried to handle a large segmented
> frame it would core dump because rte_pktmbuf_read() would return NULL.
> Fix by using same logic as in pcap PMD.
>
> Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application")
Cc: sta...@dpdk.org

> Reported-by: Tianli Lai 
> Signed-off-by: Stephen Hemminger 

Applied, thanks.


-- 
David Marchand

Re: [PATCH] eal/x86: cache queried CPU flags

On Fri, Oct 11, 2024 at 12:42:01PM +, Konstantin Ananyev wrote:
> 
> 
> > Rather than re-querying the HW each time a CPU flag is requested, we can
> > just save the return value in the flags array. This should speed up
> > repeated querying of CPU flags, and provides a workaround for a reported
> > issue where errors are seen with constant querying of the AVX-512 CPU
> > flag from a non-AVX VM.
> > 
> > Bugzilla Id: 1501
> > 
> > Signed-off-by: Bruce Richardson 
> > ---
> >  lib/eal/x86/rte_cpuflags.c | 20 +++-
> >  1 file changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/lib/eal/x86/rte_cpuflags.c b/lib/eal/x86/rte_cpuflags.c
> > index 26163ab746..62e782fb4b 100644
> > --- a/lib/eal/x86/rte_cpuflags.c
> > +++ b/lib/eal/x86/rte_cpuflags.c
> > @@ -8,6 +8,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > 
> >  #include "rte_cpuid.h"
> > 
> > @@ -21,12 +22,14 @@ struct feature_entry {
> > uint32_t bit;   /**< cpuid register bit */
> >  #define CPU_FLAG_NAME_MAX_LEN 64
> > char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
> > +   bool has_value;
> > +   bool value;
> >  };
> > 
> >  #define FEAT_DEF(name, leaf, subleaf, reg, bit) \
> > [RTE_CPUFLAG_##name] = {leaf, subleaf, reg, bit, #name },
> > 
> > -const struct feature_entry rte_cpu_feature_table[] = {
> > +struct feature_entry rte_cpu_feature_table[] = {
> > FEAT_DEF(SSE3, 0x0001, 0, RTE_REG_ECX,  0)
> > FEAT_DEF(PCLMULQDQ, 0x0001, 0, RTE_REG_ECX,  1)
> > FEAT_DEF(DTES64, 0x0001, 0, RTE_REG_ECX,  2)
> > @@ -147,7 +150,7 @@ const struct feature_entry rte_cpu_feature_table[] = {
> >  int
> >  rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
> >  {
> > -   const struct feature_entry *feat;
> > +   struct feature_entry *feat;
> > cpuid_registers_t regs;
> > unsigned int maxleaf;
> > 
> > @@ -156,6 +159,8 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
> > return -ENOENT;
> > 
> > feat = &rte_cpu_feature_table[feature];
> > +   if (feat->has_value)
> > +   return feat->value;
> > 
> > if (!feat->leaf)
> > /* This entry in the table wasn't filled out! */
> > @@ -163,8 +168,10 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
> > 
> > maxleaf = __get_cpuid_max(feat->leaf & 0x8000, NULL);
> > 
> > -   if (maxleaf < feat->leaf)
> > -   return 0;
> > +   if (maxleaf < feat->leaf) {
> > +   feat->value = 0;
> > +   goto out;
> > +   }
> > 
> >  #ifdef RTE_TOOLCHAIN_MSVC
> > __cpuidex(regs, feat->leaf, feat->subleaf);
> > @@ -175,7 +182,10 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
> >  #endif
> > 
> > /* check if the feature is enabled */
> > -   return (regs[feat->reg] >> feat->bit) & 1;
> > +   feat->value = (regs[feat->reg] >> feat->bit) & 1;
> > +out:
> > +   feat->has_value = true;
> > +   return feat->value;
> 
> If that function can be called by 2 (or more) threads simultaneously,
> then In theory  'feat->has_value = true;' can be reordered with 
> ' feat->value = (regs[feat->reg] >> feat->bit) & 1;'  (by cpu or complier)
> and some thread(s) can get wrong feat->value.
> The probability of such collision is really low, but still seems not 
> impossible. 
> 

Well since this code is x86-specific the externally visible store ordering
will match the instruction store ordering. Therefore, I think a compiler
barrier is all that is necessary before feat->has_value assignment,
correct?

/Bruce

Re: [PATCH v9] mempool: test performance with larger bursts

On Tue, Sep 17, 2024 at 10:10 AM Morten Brørup  
wrote:
>
> Bursts of up to 64, 128 and 256 packets are not uncommon, so increase the
> maximum tested get and put burst sizes from 32 to 256.
> For convenience, also test get and put burst sizes of
> RTE_MEMPOOL_CACHE_MAX_SIZE.
>
> Some applications keep more than 512 objects, so increase the maximum
> number of kept objects from 512 to 32768, still in jumps of factor four.
> This exceeds the typical mempool cache size of 512 objects, so the test
> also exercises the mempool driver.
>
> Reduced the duration of each iteration from 5 seconds to 1 second.
>
> Increased the precision of rate_persec calculation by timing the actual
> duration of the test, instead of assuming it took exactly 1 second.
>
> Added cache guard to per-lcore stats structure.
>
> Signed-off-by: Morten Brørup 
> Acked-by: Chengwen Feng 
> Acked-by: Bruce Richardson 

Applied, thanks Morten.


-- 
David Marchand

RE: [PATCH v2 2/4] ethdev: add get restore flags driver callback




> Before this patch, ethdev layer assumed that all drivers require that
> it has to forcefully restore:
> 
> - MAC addresses
> - promiscuous mode setting
> - all multicast mode setting
> 
> upon rte_eth_dev_start().
> 
> This patch introduces a new callback to eth_dev_ops -
> get_restore_flags().
> Drivers implementing this callback can explicitly enable/disable
> certain parts of config restore procedure.
> 
> In order to minimize the changes to all the drivers and
> preserve the current behavior, it is assumed that
> if this callback is not defined, all configuration should be restored.
> 
> Signed-off-by: Dariusz Sosnowski 

LGTM, just few nits/suggestions, pls see below.
Series-Acked-by: Konstantin Ananyev 

> ---
>  lib/ethdev/ethdev_driver.c | 11 +++
>  lib/ethdev/ethdev_driver.h | 64 ++
>  lib/ethdev/version.map |  1 +
>  3 files changed, 76 insertions(+)
> 
> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
> index c335a25a82..f78f9fb5c1 100644
> --- a/lib/ethdev/ethdev_driver.c
> +++ b/lib/ethdev/ethdev_driver.c
> @@ -958,3 +958,14 @@ rte_eth_switch_domain_free(uint16_t domain_id)
> 
>   return 0;
>  }
> +
> +void
> +rte_eth_get_restore_flags(struct rte_eth_dev *dev,
> +   enum rte_eth_dev_operation op,
> +   uint32_t *flags)

I would go with uint64_t for flags (to avoid limitations in future).
Also, If it is void anyway, might be just return flags?
i.e:
uint64_t  +rte_eth_get_restore_flags(struct rte_eth_dev *dev, enum 
rte_eth_dev_operation op);

Another thing - do we need to do update docs?
PMD or PG sections, release notes?

> +{
> + if (dev->dev_ops->get_restore_flags != NULL)
> + dev->dev_ops->get_restore_flags(dev, op, flags);
> + else
> + *flags = RTE_ETH_RESTORE_ALL;
> +}
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index ae00ead865..8ac5328521 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -1235,6 +1235,47 @@ typedef int (*eth_count_aggr_ports_t)(struct 
> rte_eth_dev *dev);
>  typedef int (*eth_map_aggr_tx_affinity_t)(struct rte_eth_dev *dev, uint16_t 
> tx_queue_id,
> uint8_t affinity);
> 
> +/**
> + * @internal
> + * Defines types of operations which can be executed by the application.
> + */
> +enum rte_eth_dev_operation {
> + RTE_ETH_START,
> +};
> +
> +/**@{@name Restore flags
> + * Flags returned by get_restore_flags() callback.
> + * They indicate to ethdev layer which configuration is required to be 
> restored.
> + */
> +/** If set, ethdev layer will forcefully restore default and any other added 
> MAC addresses. */
> +#define RTE_ETH_RESTORE_MAC_ADDR RTE_BIT32(0)
> +/** If set, ethdev layer will forcefully restore current promiscuous mode 
> setting. */
> +#define RTE_ETH_RESTORE_PROMISC  RTE_BIT32(1)
> +/** If set, ethdev layer will forcefully restore current all multicast mode 
> setting. */
> +#define RTE_ETH_RESTORE_ALLMULTI RTE_BIT32(2)
> +/**@}*/
> +
> +/** All configuration which can be restored by ethdev layer. */
> +#define RTE_ETH_RESTORE_ALL (RTE_ETH_RESTORE_MAC_ADDR | \
> +  RTE_ETH_RESTORE_PROMISC | \
> +  RTE_ETH_RESTORE_ALLMULTI)
> +
> +/**
> + * @internal
> + * Fetch from the driver what kind of configuration must be restored by 
> ethdev layer,
> + * after certain operations are performed by the application (such as 
> rte_eth_dev_start()).
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param op
> + *   Type of operation executed by the application.
> + * @param flags
> + *   Flags indicating what configuration must be restored by ethdev layer.
> + */
> +typedef void (*eth_get_restore_flags_t)(struct rte_eth_dev *dev,
> + enum rte_eth_dev_operation op,
> + uint32_t *flags);
> +
>  /**
>   * @internal A structure containing the functions exported by an Ethernet 
> driver.
>   */
> @@ -1474,6 +1515,9 @@ struct eth_dev_ops {
>   eth_count_aggr_ports_t count_aggr_ports;
>   /** Map a Tx queue with an aggregated port of the DPDK port */
>   eth_map_aggr_tx_affinity_t map_aggr_tx_affinity;
> +
> + /** Get configuration which ethdev should restore */
> + eth_get_restore_flags_t get_restore_flags;
>  };
> 
>  /**
> @@ -2131,6 +2175,26 @@ struct rte_eth_fdir_conf {
>   struct rte_eth_fdir_flex_conf flex_conf;
>  };
> 
> +/**
> + * @internal
> + * Fetch from the driver what kind of configuration must be restored by 
> ethdev layer,
> + * using get_restore_flags() callback.
> + *
> + * If callback is not defined, it is assumed that all supported 
> configuration must be restored.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param op
> + *   Type of operation executed by the application.
> + * @param flags
> + *   Flags indicating what configuration must be

Re: [PATCH v3] fib: network byte order IPv4 lookup

On Fri, Oct 11, 2024 at 1:29 PM David Marchand
 wrote:
> > @@ -214,6 +215,7 @@ rte_fib_create(const char *name, int socket_id, struct 
> > rte_fib_conf *conf)
> > rte_strlcpy(fib->name, name, sizeof(fib->name));
> > fib->rib = rib;
> > fib->type = conf->type;
> > +   fib->flags = conf->flags;
>
> In addition to Robin comments, I also have a concern on the
> extensibility aspect.
>
> conf->flags must be validated against known flags.
> Otherwise existing applications may pass wrong stuff and "work fine",
> until the day we had one more flag.
>

And about this flag field, please update release notes and remove the
associated deprecation notice.
Thank you.


-- 
David Marchand

Re: [PATCH v2] doc: announce single-event enqueue/dequeue ABI change

On Wed, Jul 5, 2023 at 1:18 PM Mattias Rönnblom
 wrote:
>
> Announce the removal of the single-event enqueue and dequeue
> operations from the eventdev ABI.
>
> Signed-off-by: Mattias Rönnblom 
>
> ---
> PATCH v2: Fix commit subject prefix.
> ---
>  doc/guides/rel_notes/deprecation.rst | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index 66431789b0..ca192d838d 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -153,3 +153,11 @@ Deprecation Notices
>The new port library API (functions rte_swx_port_*)
>will gradually transition from experimental to stable status
>starting with DPDK 23.07 release.
> +
> +* eventdev: The single-event (non-burst) enqueue and dequeue
> +  operations, used by static inline burst enqueue and dequeue
> +  functions in , will be removed in DPDK 23.11. This
> +  simplification includes changing the layout and potentially also the
> +  size of the public rte_event_fp_ops struct, breaking the ABI. Since
> +  these functions are not called directly by the application, the API
> +  remains unaffected.

Looks like it was missed in 23.11, can/should we finish this cleanup in 24.11?


-- 
David Marchand

Re: [PATCH dpdk] buildtools/cmdline: fix meson error when used as a subproject

2024-10-11 Thread Thomas Monjalon

07/08/2024 15:43, Bruce Richardson:
> On Thu, Aug 01, 2024 at 12:01:54PM +0200, Robin Jarry wrote:
> > Fix the following error when using dpdk as a subproject:
> > 
> >  subprojects/dpdk/buildtools/subproject/meson.build:28:56:
> >  ERROR: Unknown function "file".
> > 
> > This was obviously never tested in its submitted form.
> > 
> > Cc: sta...@dpdk.org
> > Fixes: 7d8c608faa7f ("buildtools/cmdline: allow using script in subproject")
> > 
> > Signed-off-by: Robin Jarry 
> > ---
> 
> Good catch, and easy fix, thanks.
> 
> Interestingly, even without testing the build, vscode meson extension
> flags this issue in the "problems" tab of vscode: "Unknown function `file`"
> 
> Acked-by: Bruce Richardson 

Applied, thanks.

Re: [PATCH] doc: postpone deprecation of pipeline legacy API

On Wed, Jul 19, 2023 at 5:13 PM Cristian Dumitrescu
 wrote:
>
> Postpone the deprecation of the legacy pipeline, table and port
> library API and gradual stabilization of the new API.
>
> Signed-off-by: Cristian Dumitrescu 
> ---
>  doc/guides/rel_notes/deprecation.rst | 21 +
>  1 file changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index fb771a0305..56ef906cdb 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -145,19 +145,16 @@ Deprecation Notices
>In the absence of such interest, this library will be removed in DPDK 
> 23.11.
>
>  * pipeline: The pipeline library legacy API (functions rte_pipeline_*)
> -  will be deprecated in DPDK 23.07 release and removed in DPDK 23.11 release.
> -  The new pipeline library API (functions rte_swx_pipeline_*)
> -  will gradually transition from experimental to stable status
> -  starting with DPDK 23.07 release.
> +  will be deprecated and subsequently removed in DPDK 24.11 release.
> +  Before this, the new pipeline library API (functions rte_swx_pipeline_*)
> +  will gradually transition from experimental to stable status.
>
>  * table: The table library legacy API (functions rte_table_*)
> -  will be deprecated in DPDK 23.07 release and removed in DPDK 23.11 release.
> -  The new table library API (functions rte_swx_table_*)
> -  will gradually transition from experimental to stable status
> -  starting with DPDK 23.07 release.
> +  will be deprecated and subsequently removed in DPDK 24.11 release.
> +  Before this, the new table library API (functions rte_swx_table_*)
> +  will gradually transition from experimental to stable status.
>
>  * port: The port library legacy API (functions rte_port_*)
> -  will be deprecated in DPDK 23.07 release and removed in DPDK 23.11 release.
> -  The new port library API (functions rte_swx_port_*)
> -  will gradually transition from experimental to stable status
> -  starting with DPDK 23.07 release.
> +  will be deprecated and subsequently removed in DPDK 24.11 release.
> +  Before this, the new port library API (functions rte_swx_port_*)
> +  will gradually transition from experimental to stable status.

v24.11 dev is in progress.
Time to drop deprecated API in those libraries?


-- 
David Marchand

Re: [PATCH dpdk] graph: make graphviz export more readable

On Wed, Aug 28, 2024 at 3:42 PM Robin Jarry  wrote:
>
> Change the color of arrows leading to sink nodes to dark orange. Remove
> the node oval shape around the sink nodes and make their text dark
> orange. This results in a much more readable output for large graphs.
> See the link below for an example.
>
> Link: https://f.jarry.cc/rte-graph-dot/ipv6.svg

Easier to read, thanks.

> Signed-off-by: Robin Jarry 

Applied, thanks.


-- 
David Marchand

Re: [PATCH dpdk v7] graph: expose node context as pointers

On Mon, Sep 30, 2024 at 11:28 AM Robin Jarry  wrote:
>
> In some cases, the node context data is used to store two pointers
> because the data is larger than the reserved 16 bytes. Having to define
> intermediate structures just to be able to cast is tedious. And without
> intermediate structures, casting to opaque pointers is hard without
> violating strict aliasing rules.
>
> Add an unnamed union to allow storing opaque pointers in the node
> context. Unfortunately, aligning an unnamed union that contains an array
> produces inconsistent results between C and C++. To preserve ABI/API
> compatibility in both C and C++, move all fast-path area fields into an
> unnamed struct which is itself cache aligned. Use __rte_cache_aligned to
> preserve existing alignment on architectures where cache lines are 128
> bytes.
>
> Add a static assert to ensure that the fast path area does not grow
> beyond a 64 bytes cache line.
>
> Signed-off-by: Robin Jarry 
> Acked-by: Kiran Kumar Kokkilagadda 

Applied, thanks.


-- 
David Marchand

Re: [PATCH] eal: increase max file descriptor for secondary process device

On Thu, Oct 10, 2024 at 11:51 AM David Marchand
 wrote:
>
> On Thu, Sep 5, 2024 at 6:22 PM Stephen Hemminger
>  wrote:
> >
> > The TAP and XDP driver both are limited to only 8 queues when
> > because of the small limit imposed by EAL. Increase the limit
> > now since this release allows changing ABI.
> >
> > Signed-off-by: Stephen Hemminger 
>
> Applied, thanks.

I suspect this triggered some coverity hiccup:

*** CID 445386:  Memory - illegal accesses  (OVERRUN)
/drivers/net/tap/rte_eth_tap.c
: 2401 in tap_mp_sync_queues()
2395TAP_LOG(ERR, "Number of rx/tx queues %u
exceeds max number of fds %u",
2396dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
2397return -1;
2398}
2399
2400for (queue = 0; queue < dev->data->nb_rx_queues; queue++) {
>>> CID 445386:  Memory - illegal accesses  (OVERRUN)
>>> Overrunning array "process_private->fds" of 16 4-byte elements at 
>>> element index 252 (byte offset 1011) using index "queue" (which evaluates 
>>> to 252).
2401reply.fds[reply.num_fds++] =
process_private->fds[queue];
2402reply_param->q_count++;
2403}
2404
2405/* Send reply */
2406strlcpy(reply.name, request->name, sizeof(reply.name));

Can you have a look?

Thanks.


-- 
David Marchand

Re: [PATCH dpdk v2 03/16] net: add structure for ipv6 addresses

On Fri, 11 Oct 2024 14:37:35 +0200
Morten Brørup  wrote:

> > From: Robin Jarry [mailto:rja...@redhat.com]
> > Sent: Thursday, 10 October 2024 22.08
> > 
> > Morten Brørup, Oct 06, 2024 at 10:18:  
> > > This has been discussed before, but I want to double check...
> > >
> > > If - sometime in the future - we want to add a union to offer a 2-  
> > byte  
> > > access variant and make the structure to 2-byte aligned (which is the
> > > common case in Ethernet packets), it will break both API and ABI.  
> > This  
> > > seems unlikely to get accepted at a later time, so I think we are
> > > - right now - in a situation where it's now or never:
> > >
> > > struct rte_ipv6_addr {
> > >   __extension__
> > >   union {
> > >   unsigned char a[RTE_IPV6_ADDR_SIZE];
> > >   uint16_t  w[RTE_IPV6_ADDR_SIZE / 2];
> > >   };
> > > };
> > >
> > > Unless some of the CPU folks want the 2-byte aligned variant, stick
> > > with what you offered.  
> > 
> > I was also thinking the same. I could have added an unnamed union which
> > only contains one field.
> > 
> > However, it does not make much sense if we never want to change the
> > default alignment.
> > 
> > Important note: DPDK is compiled with the following C flags:
> > 
> >   -Wno-address-of-packed-member
> > 
> > Added in 2017 https://git.dpdk.org/dpdk/commit/?id=95cd37070af44
> > 
> > If we had struct rte_ipv6_addr aligned on 2 bytes (or more),
> > applications that do not silence that warning would have a hard time.
> > Example in grout:
> > 
> > ../modules/ip6/datapath/ip6_input.c:99:54: error: taking address of
> > packed member of ‘struct rte_ipv6_hdr’ may result in an unaligned
> > pointer value [-Werror=address-of-packed-member]
> >99 | nh = ip6_route_lookup(iface->vrf_id, &ip-  
> > >dst_addr);  
> >   |
> > ^  
> 
> I don't understand the choice of packing for these types...
> 
> Why are the IPv4 [ipv4h] and IPv6 header [ipv6h] types packed? Is it so they 
> can be stored on unaligned addresses?
> 
> E.g. the IPv4 header's natural alignment is 4 byte (due to the rte_be32_t 
> src_addr, dst_addr fields). The IPv4 header is most often 2 byte aligned, 
> being located after an Ethernet header.
> 
> Maybe they should be 2-byte aligned, like the Ethernet address type [etha] 
> and the Ethernet header type [ethh].
> 
> The VLAN tag type [vlanh] is packed. Why?
> 
> I wonder if the compiler would spit out the same warnings if the above types 
> were __rte_aligned(2) instead of __rte_packed.

I am in the never use packed camp for network headers. The Linux, BSD and glibc 
headers avoid use of packed. Windows seems to love using it.

[DPDK/ethdev Bug 1563] Many I219 variants not supported by E1000 druver

2024-10-11 Thread bugzilla

https://bugs.dpdk.org/show_bug.cgi?id=1563

Bug ID: 1563
   Summary: Many I219 variants not supported by E1000 druver
   Product: DPDK
   Version: 24.11
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: step...@networkplumber.org
  Target Milestone: ---

There are many variants of e1000 which are supported by Linux and FreeBSD
kernel drivers which are not handled by the DPDK e1000 PMD. Looks like the base
code for e1000 has not been updated in years.

Specifically, the Tiger Lake and related chips are not supported.
This interface would be useful for testing on my machine.

$ lspci -n -v -s 00:1f.6
00:1f.6 0200: 8086:15fc (rev 20)
DeviceName: Onboard - Ethernet
Subsystem: 1458:1000
Flags: bus master, fast devsel, latency 0, IRQ 226, IOMMU group 16
Memory at 8870 (32-bit, non-prefetchable) [size=128K]
Capabilities: 
Kernel driver in use: e1000e
Kernel modules: e1000e

$ sudo ethtool -i eno2
driver: e1000e
version: 6.10.11-amd64
firmware-version: 0.6-4
expansion-rom-version: 
bus-info: :00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

-- 
You are receiving this mail because:
You are the assignee for the bug.

RE: [EXTERNAL] Re: [PATCH v4 3/5] graph: add stats for node specific errors

2024-10-11 Thread Pavan Nikhilesh Bhagavatula




> -Original Message-
> From: Robin Jarry 
> Sent: Friday, October 11, 2024 3:24 PM
> To: Pavan Nikhilesh Bhagavatula ; Jerin Jacob
> ; Nithin Kumar Dabilpuram
> ; Kiran Kumar Kokkilagadda
> ; zhirun@intel.com; Zhirun Yan
> 
> Cc: dev@dpdk.org
> Subject: [EXTERNAL] Re: [PATCH v4 3/5] graph: add stats for node specific
> errors
> 
> Hi Pavan, , Aug 16, 2024 at 17: 09: > From: Pavan Nikhilesh
>  > > Add support for retrieving/printing stats
> for node specific > errors using rte_graph_cluster_stats_get(). > > 
> Signed-off-
> by: Pavan
> 
> Hi Pavan,
> 
> , Aug 16, 2024 at 17:09:
> > From: Pavan Nikhilesh 
> >
> > Add support for retrieving/printing stats for node specific
> > errors using rte_graph_cluster_stats_get().
> >
> > Signed-off-by: Pavan Nikhilesh 
> > ---
> 
> [snip]
> 
> > diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
> > index b28143d737..12b6461cf5 100644
> > --- a/lib/graph/rte_graph.h
> > +++ b/lib/graph/rte_graph.h
> > @@ -223,6 +223,10 @@ struct __rte_cache_aligned
> rte_graph_cluster_node_stats {
> >
> > uint64_t realloc_count; /**< Realloc count. */
> >
> > +   uint8_t node_error_cntrs;  /**< Number of
> Node error counters. */
> > +   char (*node_error_desc)[RTE_NODE_ERROR_DESC_SIZE]; /**< Names
> of the Node error counters. */
> 
> Why do you need the parentheses here?
> 
> > +   uint64_t *node_error_count;/**< Total error
> count per each error. */
> 
> The node_ prefix is redundant here. Can you use something shorter?
> 
>uint8_t errors_num; /**< Number of Node error counters. */
>char (*errors_desc)[RTE_NODE_ERROR_DESC_SIZE];
>uint64_t *errors_value;
> 

I will shrink them in the next version.

> Thanks!

[PATCH 0/9] net/ice: base code update for RC2

A number of small fixes and other changes to enable Tx scheduler
enhancements to our DPDK driver have been added to the base code.
Upstream these changes for 24.11 RC2. Most have previously been
submitted as part of the scheduler changes [1]

[1] https://patches.dpdk.org/project/dpdk/list/?series=32758&state=*

Bruce Richardson (6):
  net/ice/base: remove 255 limit on sched child nodes
  net/ice/base: set VSI index on newly created nodes
  net/ice/base: optimize subtree searches
  net/ice/base: remove flag checks before topology upload
  net/ice/base: allow init without TC class sched nodes
  net/ice/base: read VSI layer info from VSI

Dave Ertman (1):
  net/ice/base: fix VLAN replay after reset

Fabio Pricoco (1):
  net/ice/base: add bounds check

Jacob Keller (1):
  net/ice/base: re-enable bypass mode for E822

 drivers/net/ice/base/ice_controlq.c |  23 +-
 drivers/net/ice/base/ice_dcb.c  |   3 +-
 drivers/net/ice/base/ice_ddp.c  |  33 
 drivers/net/ice/base/ice_ptp_hw.c   | 117 ++--
 drivers/net/ice/base/ice_ptp_hw.h   |   2 +-
 drivers/net/ice/base/ice_sched.c|  59 +++---
 drivers/net/ice/base/ice_switch.c   |   2 -
 drivers/net/ice/base/ice_type.h |   3 +-
 drivers/net/ice/ice_ethdev.c|   2 +-
 9 files changed, 168 insertions(+), 76 deletions(-)

--
2.43.0

[PATCH 4/9] net/ice/base: remove 255 limit on sched child nodes

The Tx scheduler in the ice driver can be configured to have large
numbers of child nodes at a given layer, but the driver code implicitly
limited the number of nodes to 255 by using a u8 datatype for the number
of children. Increase this to a 16-bit value throughout the code.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ice/base/ice_dcb.c   |  3 ++-
 drivers/net/ice/base/ice_sched.c | 23 +--
 drivers/net/ice/base/ice_type.h  |  2 +-
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ice/base/ice_dcb.c b/drivers/net/ice/base/ice_dcb.c
index 4ef54613b1..e97f35b4cf 100644
--- a/drivers/net/ice/base/ice_dcb.c
+++ b/drivers/net/ice/base/ice_dcb.c
@@ -1585,7 +1585,8 @@ ice_update_port_tc_tree_cfg(struct ice_port_info *pi,
struct ice_aqc_txsched_elem_data elem;
u32 teid1, teid2;
int status = 0;
-   u8 i, j;
+   u16 i;
+   u8 j;
 
if (!pi)
return ICE_ERR_PARAM;
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 373c32a518..1d6dd2af82 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -288,7 +288,7 @@ ice_sched_get_first_node(struct ice_port_info *pi,
  */
 struct ice_sched_node *ice_sched_get_tc_node(struct ice_port_info *pi, u8 tc)
 {
-   u8 i;
+   u16 i;
 
if (!pi || !pi->root)
return NULL;
@@ -311,7 +311,7 @@ void ice_free_sched_node(struct ice_port_info *pi, struct 
ice_sched_node *node)
 {
struct ice_sched_node *parent;
struct ice_hw *hw = pi->hw;
-   u8 i, j;
+   u16 i, j;
 
/* Free the children before freeing up the parent node
 * The parent array is updated below and that shifts the nodes
@@ -1503,7 +1503,7 @@ ice_sched_get_free_qgrp(struct ice_port_info *pi,
struct ice_sched_node *qgrp_node, u8 owner)
 {
struct ice_sched_node *min_qgrp;
-   u8 min_children;
+   u16 min_children;
 
if (!qgrp_node)
return qgrp_node;
@@ -2063,7 +2063,7 @@ static void ice_sched_rm_agg_vsi_info(struct 
ice_port_info *pi, u16 vsi_handle)
  */
 static bool ice_sched_is_leaf_node_present(struct ice_sched_node *node)
 {
-   u8 i;
+   u16 i;
 
for (i = 0; i < node->num_children; i++)
if (ice_sched_is_leaf_node_present(node->children[i]))
@@ -2098,7 +2098,7 @@ ice_sched_rm_vsi_cfg(struct ice_port_info *pi, u16 
vsi_handle, u8 owner)
 
ice_for_each_traffic_class(i) {
struct ice_sched_node *vsi_node, *tc_node;
-   u8 j = 0;
+   u16 j = 0;
 
tc_node = ice_sched_get_tc_node(pi, i);
if (!tc_node)
@@ -2166,7 +2166,7 @@ int ice_rm_vsi_lan_cfg(struct ice_port_info *pi, u16 
vsi_handle)
  */
 bool ice_sched_is_tree_balanced(struct ice_hw *hw, struct ice_sched_node *node)
 {
-   u8 i;
+   u16 i;
 
/* start from the leaf node */
for (i = 0; i < node->num_children; i++)
@@ -2240,7 +2240,8 @@ ice_sched_get_free_vsi_parent(struct ice_hw *hw, struct 
ice_sched_node *node,
  u16 *num_nodes)
 {
u8 l = node->tx_sched_layer;
-   u8 vsil, i;
+   u8 vsil;
+   u16 i;
 
vsil = ice_sched_get_vsi_layer(hw);
 
@@ -2282,7 +2283,7 @@ ice_sched_update_parent(struct ice_sched_node *new_parent,
struct ice_sched_node *node)
 {
struct ice_sched_node *old_parent;
-   u8 i, j;
+   u16 i, j;
 
old_parent = node->parent;
 
@@ -2382,8 +2383,9 @@ ice_sched_move_vsi_to_agg(struct ice_port_info *pi, u16 
vsi_handle, u32 agg_id,
u16 num_nodes[ICE_AQC_TOPO_MAX_LEVEL_NUM] = { 0 };
u32 first_node_teid, vsi_teid;
u16 num_nodes_added;
-   u8 aggl, vsil, i;
+   u8 aggl, vsil;
int status;
+   u16 i;
 
tc_node = ice_sched_get_tc_node(pi, tc);
if (!tc_node)
@@ -2498,7 +2500,8 @@ ice_move_all_vsi_to_dflt_agg(struct ice_port_info *pi,
 static bool
 ice_sched_is_agg_inuse(struct ice_port_info *pi, struct ice_sched_node *node)
 {
-   u8 vsil, i;
+   u8 vsil;
+   u16 i;
 
vsil = ice_sched_get_vsi_layer(pi->hw);
if (node->tx_sched_layer < vsil - 1) {
diff --git a/drivers/net/ice/base/ice_type.h b/drivers/net/ice/base/ice_type.h
index 598a80155b..6177bf4e2a 100644
--- a/drivers/net/ice/base/ice_type.h
+++ b/drivers/net/ice/base/ice_type.h
@@ -1030,9 +1030,9 @@ struct ice_sched_node {
struct ice_aqc_txsched_elem_data info;
u32 agg_id; /* aggregator group ID */
u16 vsi_handle;
+   u16 num_children;
u8 in_use;  /* suspended or in use */
u8 tx_sched_layer;  /* Logical Layer (1-9) */
-   u8 num_children;
u8 tc_num;
u8 owner;
 #define ICE_SCHED_NODE_OWNER_LAN   0
-- 
2.43.0

[PATCH 1/9] net/ice/base: re-enable bypass mode for E822

From: Jacob Keller 

When removing bypass mode, the code for E822 bypass was completely
removed in error. This code should be maintained in DPDK so re-add the
necessary functions.

Fixes: ce9ad8c5bc6d ("net/ice/base: remove PHY port timer bypass mode")
Cc: sta...@dpdk.org

Signed-off-by: Jacob Keller 
Signed-off-by: Bruce Richardson 
---
 drivers/net/ice/base/ice_ptp_hw.c | 117 --
 drivers/net/ice/base/ice_ptp_hw.h |   2 +-
 drivers/net/ice/ice_ethdev.c  |   2 +-
 3 files changed, 113 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ice/base/ice_ptp_hw.c 
b/drivers/net/ice/base/ice_ptp_hw.c
index 2a112fea12..1e92e5ff21 100644
--- a/drivers/net/ice/base/ice_ptp_hw.c
+++ b/drivers/net/ice/base/ice_ptp_hw.c
@@ -4468,18 +4468,103 @@ ice_stop_phy_timer_e822(struct ice_hw *hw, u8 port, 
bool soft_reset)
return 0;
 }
 
+/**
+ * ice_phy_cfg_fixed_tx_offset_e822 - Configure Tx offset for bypass mode
+ * @hw: pointer to the HW struct
+ * @port: the PHY port to configure
+ *
+ * Calculate and program the fixed Tx offset, and indicate that the offset is
+ * ready. This can be used when operating in bypass mode.
+ */
+static int ice_phy_cfg_fixed_tx_offset_e822(struct ice_hw *hw, u8 port)
+{
+   enum ice_ptp_link_spd link_spd;
+   enum ice_ptp_fec_mode fec_mode;
+   u64 total_offset;
+   int err;
+
+   err = ice_phy_get_speed_and_fec_e822(hw, port, &link_spd, &fec_mode);
+   if (err)
+   return err;
+
+   total_offset = ice_calc_fixed_tx_offset_e822(hw, link_spd);
+
+   /* Program the fixed Tx offset into the P_REG_TOTAL_TX_OFFSET_L
+* register, then indicate that the Tx offset is ready. After this,
+* timestamps will be enabled.
+*
+* Note that this skips including the more precise offsets generated
+* by the Vernier calibration.
+*/
+
+   err = ice_write_64b_phy_reg_e822(hw, port, P_REG_TOTAL_TX_OFFSET_L,
+total_offset);
+   if (err)
+   return err;
+
+   err = ice_write_phy_reg_e822(hw, port, P_REG_TX_OR, 1);
+   if (err)
+   return err;
+
+   return ICE_SUCCESS;
+}
+
+/**
+ * ice_phy_cfg_rx_offset_e822 - Configure fixed Rx offset for bypass mode
+ * @hw: pointer to the HW struct
+ * @port: the PHY port to configure
+ *
+ * Calculate and program the fixed Rx offset, and indicate that the offset is
+ * ready. This can be used when operating in bypass mode.
+ */
+static int ice_phy_cfg_fixed_rx_offset_e822(struct ice_hw *hw, u8 port)
+{
+   enum ice_ptp_link_spd link_spd;
+   enum ice_ptp_fec_mode fec_mode;
+   u64 total_offset;
+   int err;
+
+   err = ice_phy_get_speed_and_fec_e822(hw, port, &link_spd, &fec_mode);
+   if (err)
+   return err;
+
+   total_offset = ice_calc_fixed_rx_offset_e822(hw, link_spd);
+
+   /* Program the fixed Rx offset into the P_REG_TOTAL_RX_OFFSET_L
+* register, then indicate that the Rx offset is ready. After this,
+* timestamps will be enabled.
+*
+* Note that this skips including the more precise offsets generated
+* by Vernier calibration.
+*/
+   err = ice_write_64b_phy_reg_e822(hw, port, P_REG_TOTAL_RX_OFFSET_L,
+total_offset);
+   if (err)
+   return err;
+
+   err = ice_write_phy_reg_e822(hw, port, P_REG_RX_OR, 1);
+   if (err)
+   return err;
+
+   return ICE_SUCCESS;
+}
+
 /**
  * ice_start_phy_timer_e822 - Start the PHY clock timer
  * @hw: pointer to the HW struct
  * @port: the PHY port to start
+ * @bypass: if true, start the PHY in bypass mode
  *
  * Start the clock of a PHY port. This must be done as part of the flow to
  * re-calibrate Tx and Rx timestamping offsets whenever the clock time is
  * initialized or when link speed changes.
  *
- * Hardware will take Vernier measurements on Tx or Rx of packets.
+ * Bypass mode enables timestamps immediately without waiting for Vernier
+ * calibration to complete. Hardware will still continue taking Vernier
+ * measurements on Tx or Rx of packets, but they will not be applied to
+ * timestamps.
  */
-int ice_start_phy_timer_e822(struct ice_hw *hw, u8 port)
+int ice_start_phy_timer_e822(struct ice_hw *hw, u8 port, bool bypass)
 {
u32 lo, hi, val;
u64 incval;
@@ -4544,15 +4629,35 @@ int ice_start_phy_timer_e822(struct ice_hw *hw, u8 port)
 
ice_ptp_exec_tmr_cmd(hw);
 
+   if (bypass) {
+   /* Enter BYPASS mode, enabling timestamps immediately. */
+   val |= P_REG_PS_BYPASS_MODE_M;
+   err = ice_write_phy_reg_e822(hw, port, P_REG_PS, val);
+   if (err)
+   return err;
+   }
+
val |= P_REG_PS_ENA_CLK_M;
err = ice_write_phy_reg_e822(hw, port, P_REG_PS, val);
if (err)
return err;
 
-   val |= P_REG_PS

[PATCH 2/9] net/ice/base: add bounds check

From: Fabio Pricoco 

Refactor while loop to add a check that the values read are in the
correct range.

Fixes: 6c1f26be50a2 ("net/ice/base: add control queue information")
Cc: sta...@dpdk.org

Signed-off-by: Fabio Pricoco 
Signed-off-by: Bruce Richardson 
---
 drivers/net/ice/base/ice_controlq.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ice/base/ice_controlq.c 
b/drivers/net/ice/base/ice_controlq.c
index af27dc8542..b210495827 100644
--- a/drivers/net/ice/base/ice_controlq.c
+++ b/drivers/net/ice/base/ice_controlq.c
@@ -839,16 +839,35 @@ static u16 ice_clean_sq(struct ice_hw *hw, struct 
ice_ctl_q_info *cq)
struct ice_ctl_q_ring *sq = &cq->sq;
u16 ntc = sq->next_to_clean;
struct ice_aq_desc *desc;
+   u32 head;
 
desc = ICE_CTL_Q_DESC(*sq, ntc);
 
-   while (rd32(hw, cq->sq.head) != ntc) {
-   ice_debug(hw, ICE_DBG_AQ_MSG, "ntc %d head %d.\n", ntc, 
rd32(hw, cq->sq.head));
+   head = rd32(hw, sq->head);
+   if (head >= sq->count) {
+   ice_debug(hw, ICE_DBG_AQ_MSG,
+ "Read head value (%d) exceeds allowed range.\n",
+ head);
+   return 0;
+   }
+
+   while (head != ntc) {
+   ice_debug(hw, ICE_DBG_AQ_MSG,
+ "ntc %d head %d.\n",
+ ntc, head);
ice_memset(desc, 0, sizeof(*desc), ICE_DMA_MEM);
ntc++;
if (ntc == sq->count)
ntc = 0;
desc = ICE_CTL_Q_DESC(*sq, ntc);
+
+   head = rd32(hw, sq->head);
+   if (head >= sq->count) {
+   ice_debug(hw, ICE_DBG_AQ_MSG,
+ "Read head value (%d) exceeds allowed 
range.\n",
+ head);
+   return 0;
+   }
}
 
sq->next_to_clean = ntc;
-- 
2.43.0

[PATCH 3/9] net/ice/base: fix VLAN replay after reset

From: Dave Ertman 

If there is more than one VLAN defined when any reset that affects the
PF is initiated, after the reset rebuild, no traffic will pass on any
VLAN but the last one created.

This is caused by the iteration though the VLANs during replay each
clearing the vsi_map bitmap of the VSI that is being replayed.  The
problem is that during the replay, the pointer to the vsi_map bitmap is
used by each successive vlan to determine if it should be replayed on
this VSI.

The logic was that the replay of the VLAN would replace the bit in the
map before the next VLAN would iterate through.  But, since the replay
copies the old bitmap pointer to filt_replay_rules and creates a new one
for the recreated VLANS, it does not do this, and leaves the old bitmap
broken to be used to replay the remaining VLANs.

Since the old bitmap will be cleaned up in post replay cleanup, there is
no need to alter it and break following VLAN replay, so don't clear the
bit.

Fixes: c7dd15931183 ("net/ice/base: add virtual switch code")
Cc: sta...@dpdk.org

Signed-off-by: Dave Ertman 
Signed-off-by: Jacob Keller 
Signed-off-by: Bruce Richardson 
---
 drivers/net/ice/base/ice_switch.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ice/base/ice_switch.c 
b/drivers/net/ice/base/ice_switch.c
index 96ef26d535..a3786961e6 100644
--- a/drivers/net/ice/base/ice_switch.c
+++ b/drivers/net/ice/base/ice_switch.c
@@ -10110,8 +10110,6 @@ ice_replay_vsi_fltr(struct ice_hw *hw, struct 
ice_port_info *pi,
if (!itr->vsi_list_info ||
!ice_is_bit_set(itr->vsi_list_info->vsi_map, vsi_handle))
continue;
-   /* Clearing it so that the logic can add it back */
-   ice_clear_bit(vsi_handle, itr->vsi_list_info->vsi_map);
f_entry.fltr_info.vsi_handle = vsi_handle;
f_entry.fltr_info.fltr_act = ICE_FWD_TO_VSI;
/* update the src in case it is VSI num */
--
2.43.0

[PATCH 5/9] net/ice/base: set VSI index on newly created nodes

The ice_sched_node type has got a field for the vsi to which the node
belongs. This field was not getting set in "ice_sched_add_node", so add
a line configuring this field for each node from its parent node.
Similarly, when searching for a qgroup node, we can check for each node
that the VSI information is correct.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ice/base/ice_sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 1d6dd2af82..45934f9152 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -200,6 +200,7 @@ ice_sched_add_node(struct ice_port_info *pi, u8 layer,
node->in_use = true;
node->parent = parent;
node->tx_sched_layer = layer;
+   node->vsi_handle = parent->vsi_handle;
parent->children[parent->num_children++] = node;
node->info = elem;
return 0;
@@ -1575,7 +1576,7 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 
vsi_handle, u8 tc,
/* make sure the qgroup node is part of the VSI subtree */
if (ice_sched_find_node_in_subtree(pi->hw, vsi_node, qgrp_node))
if (qgrp_node->num_children < max_children &&
-   qgrp_node->owner == owner)
+   qgrp_node->owner == owner && qgrp_node->vsi_handle 
== vsi_handle)
break;
qgrp_node = qgrp_node->sibling;
}
-- 
2.43.0

[PATCH 6/9] net/ice/base: optimize subtree searches

In a number of places throughout the driver code, we want to confirm
that a scheduler node is indeed a child of another node. Currently, this
is confirmed by searching down the tree from the base until the desired
node is hit, a search which may hit many irrelevant tree nodes when
recursing down wrong branches. By switching the direction of search, to
check upwards from the node to the parent, we can avoid any incorrect
paths, and so speed up processing.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ice/base/ice_sched.c | 23 +++
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 45934f9152..4c5c19daf3 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -1464,25 +1464,16 @@ void ice_sched_get_psm_clk_freq(struct ice_hw *hw)
  * subtree or not
  */
 bool
-ice_sched_find_node_in_subtree(struct ice_hw *hw, struct ice_sched_node *base,
+ice_sched_find_node_in_subtree(struct ice_hw __ALWAYS_UNUSED *hw,
+  struct ice_sched_node *base,
   struct ice_sched_node *node)
 {
-   u8 i;
-
-   for (i = 0; i < base->num_children; i++) {
-   struct ice_sched_node *child = base->children[i];
-
-   if (node == child)
-   return true;
-
-   if (child->tx_sched_layer > node->tx_sched_layer)
-   return false;
-
-   /* this recursion is intentional, and wouldn't
-* go more than 8 calls
-*/
-   if (ice_sched_find_node_in_subtree(hw, child, node))
+   if (base == node)
+   return true;
+   while (node->tx_sched_layer != 0 && node->parent != NULL) {
+   if (node->parent == base)
return true;
+   node = node->parent;
}
return false;
 }
-- 
2.43.0

[PATCH 7/9] net/ice/base: remove flag checks before topology upload

DPDK should support more than just 9-level or 5-level topologies, so
remove the checks for those particular settings.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ice/base/ice_ddp.c | 33 -
 1 file changed, 33 deletions(-)

diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
index 90aaa6b331..c17a58eab8 100644
--- a/drivers/net/ice/base/ice_ddp.c
+++ b/drivers/net/ice/base/ice_ddp.c
@@ -2384,38 +2384,6 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
return status;
}
 
-   /* Is default topology already applied ? */
-   if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-   hw->num_tx_sched_layers == 9) {
-   ice_debug(hw, ICE_DBG_INIT, "Loaded default topology\n");
-   /* Already default topology is loaded */
-   return ICE_ERR_ALREADY_EXISTS;
-   }
-
-   /* Is new topology already applied ? */
-   if ((flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-   hw->num_tx_sched_layers == 5) {
-   ice_debug(hw, ICE_DBG_INIT, "Loaded new topology\n");
-   /* Already new topology is loaded */
-   return ICE_ERR_ALREADY_EXISTS;
-   }
-
-   /* Is set topology issued already ? */
-   if (flags & ICE_AQC_TX_TOPO_FLAGS_ISSUED) {
-   ice_debug(hw, ICE_DBG_INIT, "Update tx topology was done by 
another PF\n");
-   /* add a small delay before exiting */
-   for (i = 0; i < 20; i++)
-   ice_msec_delay(100, true);
-   return ICE_ERR_ALREADY_EXISTS;
-   }
-
-   /* Change the topology from new to default (5 to 9) */
-   if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-   hw->num_tx_sched_layers == 5) {
-   ice_debug(hw, ICE_DBG_INIT, "Change topology from 5 to 9 
layers\n");
-   goto update_topo;
-   }
-
pkg_hdr = (struct ice_pkg_hdr *)buf;
state = ice_verify_pkg(pkg_hdr, len);
if (state) {
@@ -2462,7 +2430,6 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
/* Get the new topology buffer */
new_topo = ((u8 *)section) + offset;
 
-update_topo:
/* acquire global lock to make sure that set topology issued
 * by one PF
 */
-- 
2.43.0

[PATCH 8/9] net/ice/base: allow init without TC class sched nodes

If DCB support is disabled via DDP image, there will not be any traffic
class (TC) nodes in the scheduler tree immediately above the root level.
To allow the driver to work with this scenario, we allow use of the root
node as a dummy TC0 node in case where there are no TC nodes in the
tree. For use of any other TC other than 0 (used by default in the
driver), existing behaviour of returning NULL pointer is maintained.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ice/base/ice_sched.c | 8 +++-
 drivers/net/ice/base/ice_type.h  | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 4c5c19daf3..7e255c0337 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -293,6 +293,10 @@ struct ice_sched_node *ice_sched_get_tc_node(struct 
ice_port_info *pi, u8 tc)
 
if (!pi || !pi->root)
return NULL;
+   /* if no TC nodes, use root as TC node 0 */
+   if (!pi->has_tc)
+   return tc == 0 ? pi->root : NULL;
+
for (i = 0; i < pi->root->num_children; i++)
if (pi->root->children[i]->tc_num == tc)
return pi->root->children[i];
@@ -1306,7 +1310,9 @@ int ice_sched_init_port(struct ice_port_info *pi)
if (buf[0].generic[j].data.elem_type ==
ICE_AQC_ELEM_TYPE_ENTRY_POINT)
hw->sw_entry_point_layer = j;
-
+   else if (buf[0].generic[j].data.elem_type ==
+   ICE_AQC_ELEM_TYPE_TC)
+   pi->has_tc = 1;
status = ice_sched_add_node(pi, j, &buf[i].generic[j], 
NULL);
if (status)
goto err_init_port;
diff --git a/drivers/net/ice/base/ice_type.h b/drivers/net/ice/base/ice_type.h
index 6177bf4e2a..35f832eb9f 100644
--- a/drivers/net/ice/base/ice_type.h
+++ b/drivers/net/ice/base/ice_type.h
@@ -1260,6 +1260,7 @@ struct ice_port_info {
struct ice_qos_cfg qos_cfg;
u8 is_vf:1;
u8 is_custom_tx_enabled:1;
+   u8 has_tc:1;
 };
 
 struct ice_switch_info {
-- 
2.43.0

[PATCH 9/9] net/ice/base: read VSI layer info from VSI

Rather than computing from the number of HW layers the layer of the VSI,
we can instead just read that info from the VSI node itself. This allows
the layer to be changed at runtime.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ice/base/ice_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 7e255c0337..9608ac7c24 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -1550,7 +1550,6 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 
vsi_handle, u8 tc,
u16 max_children;
 
qgrp_layer = ice_sched_get_qgrp_layer(pi->hw);
-   vsi_layer = ice_sched_get_vsi_layer(pi->hw);
max_children = pi->hw->max_children[qgrp_layer];
 
vsi_ctx = ice_get_vsi_ctx(pi->hw, vsi_handle);
@@ -1560,6 +1559,7 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 
vsi_handle, u8 tc,
/* validate invalid VSI ID */
if (!vsi_node)
return NULL;
+   vsi_layer = vsi_node->tx_sched_layer;
 
/* If the queue group and vsi layer are same then queues
 * are all attached directly to VSI
-- 
2.43.0

RE: [EXTERNAL] Re: [PATCH v4 1/5] graph: add support for node specific errors

2024-10-11 Thread Pavan Nikhilesh Bhagavatula

> On Fri, Aug 16, 2024 at 5:10 PM  wrote:
> >
> > From: Pavan Nikhilesh 
> >
> > Add ability for Nodes to advertise error counters
> > during registration.
> >
> > Signed-off-by: Pavan Nikhilesh 
> 
> Such a series deserve a cover letter.
> 
> It also deserves a RN update, and I see no removal of the associated
> deprecation notice.
> 

I will update the series with cover letter, release notes and cleanup 
deprecation notice in the next version.

Thanks, 
Pavan.

> 
> --
> David Marchand

[PATCH v3 2/4] ethdev: add get restore flags driver callback

Before this patch, ethdev layer assumed that all drivers require that
it has to forcefully restore:

- MAC addresses
- promiscuous mode setting
- all multicast mode setting

upon rte_eth_dev_start().

This patch introduces a new callback to eth_dev_ops -
get_restore_flags().
Drivers implementing this callback can explicitly enable/disable
certain parts of config restore procedure.

In order to minimize the changes to all the drivers and
preserve the current behavior, it is assumed that
if this callback is not defined, all configuration should be restored.

Signed-off-by: Dariusz Sosnowski 
---
 lib/ethdev/ethdev_driver.c |  9 ++
 lib/ethdev/ethdev_driver.h | 66 ++
 lib/ethdev/version.map |  1 +
 3 files changed, 76 insertions(+)

diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
index c335a25a82..9afef06431 100644
--- a/lib/ethdev/ethdev_driver.c
+++ b/lib/ethdev/ethdev_driver.c
@@ -958,3 +958,12 @@ rte_eth_switch_domain_free(uint16_t domain_id)
 
return 0;
 }
+
+uint64_t
+rte_eth_get_restore_flags(struct rte_eth_dev *dev, enum rte_eth_dev_operation 
op)
+{
+   if (dev->dev_ops->get_restore_flags != NULL)
+   return dev->dev_ops->get_restore_flags(dev, op);
+   else
+   return RTE_ETH_RESTORE_ALL;
+}
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index ae00ead865..3974512a5c 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1235,6 +1235,48 @@ typedef int (*eth_count_aggr_ports_t)(struct rte_eth_dev 
*dev);
 typedef int (*eth_map_aggr_tx_affinity_t)(struct rte_eth_dev *dev, uint16_t 
tx_queue_id,
  uint8_t affinity);
 
+/**
+ * @internal
+ * Defines types of operations which can be executed by the application.
+ */
+enum rte_eth_dev_operation {
+   RTE_ETH_START,
+};
+
+/**@{@name Restore flags
+ * Flags returned by get_restore_flags() callback.
+ * They indicate to ethdev layer which configuration is required to be 
restored.
+ */
+/** If set, ethdev layer will forcefully restore default and any other added 
MAC addresses. */
+#define RTE_ETH_RESTORE_MAC_ADDR RTE_BIT64(0)
+/** If set, ethdev layer will forcefully restore current promiscuous mode 
setting. */
+#define RTE_ETH_RESTORE_PROMISC  RTE_BIT64(1)
+/** If set, ethdev layer will forcefully restore current all multicast mode 
setting. */
+#define RTE_ETH_RESTORE_ALLMULTI RTE_BIT64(2)
+/**@}*/
+
+/** All configuration which can be restored by ethdev layer. */
+#define RTE_ETH_RESTORE_ALL (RTE_ETH_RESTORE_MAC_ADDR | \
+RTE_ETH_RESTORE_PROMISC | \
+RTE_ETH_RESTORE_ALLMULTI)
+
+/**
+ * @internal
+ * Fetch from the driver what kind of configuration must be restored by ethdev 
layer,
+ * after certain operations are performed by the application (such as 
rte_eth_dev_start()).
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param op
+ *   Type of operation executed by the application.
+ *
+ * @return
+ *   ORed restore flags indicating which configuration should be restored by 
ethdev.
+ *   0 if no restore is required by the driver.
+ */
+typedef uint64_t (*eth_get_restore_flags_t)(struct rte_eth_dev *dev,
+   enum rte_eth_dev_operation op);
+
 /**
  * @internal A structure containing the functions exported by an Ethernet 
driver.
  */
@@ -1474,6 +1516,9 @@ struct eth_dev_ops {
eth_count_aggr_ports_t count_aggr_ports;
/** Map a Tx queue with an aggregated port of the DPDK port */
eth_map_aggr_tx_affinity_t map_aggr_tx_affinity;
+
+   /** Get configuration which ethdev should restore */
+   eth_get_restore_flags_t get_restore_flags;
 };
 
 /**
@@ -2131,6 +2176,27 @@ struct rte_eth_fdir_conf {
struct rte_eth_fdir_flex_conf flex_conf;
 };
 
+/**
+ * @internal
+ * Fetch from the driver what kind of configuration must be restored by ethdev 
layer,
+ * using get_restore_flags() callback.
+ *
+ * If callback is not defined, it is assumed that all supported configuration 
must be restored.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param op
+ *   Type of operation executed by the application.
+ *
+ * @return
+ *   ORed restore flags indicating which configuration should be restored by 
ethdev.
+ *   0 if no restore is required by the driver.
+ */
+__rte_internal
+uint64_t
+rte_eth_get_restore_flags(struct rte_eth_dev *dev,
+ enum rte_eth_dev_operation op);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 1669055ca5..fa0469e602 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -358,4 +358,5 @@ INTERNAL {
rte_eth_switch_domain_alloc;
rte_eth_switch_domain_free;
rte_flow_fp_default_ops;
+   rte_eth_get_restore_flags;
 };
-- 
2.39.5

[PATCH v3 3/4] ethdev: restore config only when requested

Use get_restore_flags() internal API introduced in previous commits
in rte_eth_dev_start(), to restore only the configuration
requested by the driver.

Signed-off-by: Dariusz Sosnowski 
---
 lib/ethdev/rte_ethdev.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 362a1883f0..c46107c4de 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1719,20 +1719,27 @@ eth_dev_allmulticast_restore(struct rte_eth_dev *dev, 
uint16_t port_id)
 
 static int
 eth_dev_config_restore(struct rte_eth_dev *dev,
-   struct rte_eth_dev_info *dev_info, uint16_t port_id)
+   struct rte_eth_dev_info *dev_info,
+   uint64_t restore_flags,
+   uint16_t port_id)
 {
int ret;
 
-   if (!(*dev_info->dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR))
+   if (!(*dev_info->dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR) &&
+   (restore_flags & RTE_ETH_RESTORE_MAC_ADDR))
eth_dev_mac_restore(dev, dev_info);
 
-   ret = eth_dev_promiscuous_restore(dev, port_id);
-   if (ret != 0)
-   return ret;
+   if (restore_flags & RTE_ETH_RESTORE_PROMISC) {
+   ret = eth_dev_promiscuous_restore(dev, port_id);
+   if (ret != 0)
+   return ret;
+   }
 
-   ret = eth_dev_allmulticast_restore(dev, port_id);
-   if (ret != 0)
-   return ret;
+   if (restore_flags & RTE_ETH_RESTORE_ALLMULTI) {
+   ret = eth_dev_allmulticast_restore(dev, port_id);
+   if (ret != 0)
+   return ret;
+   }
 
return 0;
 }
@@ -1742,6 +1749,7 @@ rte_eth_dev_start(uint16_t port_id)
 {
struct rte_eth_dev *dev;
struct rte_eth_dev_info dev_info;
+   uint64_t restore_flags;
int diag;
int ret, ret_stop;
 
@@ -1769,8 +1777,11 @@ rte_eth_dev_start(uint16_t port_id)
if (ret != 0)
return ret;
 
+   restore_flags = rte_eth_get_restore_flags(dev, RTE_ETH_START);
+
/* Lets restore MAC now if device does not support live change */
-   if (*dev_info.dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR)
+   if ((*dev_info.dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR) &&
+   (restore_flags & RTE_ETH_RESTORE_MAC_ADDR))
eth_dev_mac_restore(dev, &dev_info);
 
diag = (*dev->dev_ops->dev_start)(dev);
@@ -1779,7 +1790,7 @@ rte_eth_dev_start(uint16_t port_id)
else
return eth_err(port_id, diag);
 
-   ret = eth_dev_config_restore(dev, &dev_info, port_id);
+   ret = eth_dev_config_restore(dev, &dev_info, restore_flags, port_id);
if (ret != 0) {
RTE_ETHDEV_LOG_LINE(ERR,
"Error during restoring configuration for device (port 
%u): %s",
-- 
2.39.5

[PATCH v3 0/4] ethdev: rework config restore

This patch series reworks the config restore procedure,
so that drivers are able to enable/disable certain parts of it.
Drivers can provide get_restore_flags() callback,
which will indicate to ethdev library what configuration to restore.

If callback is not defined, then ethdev assumes that
all configuration must be restored, preserving the current behavior for all 
drivers.

This patch series also includes implementation of get_restore_flags()
for mlx5 PMD, which does not require config restore.

RFC: https://inbox.dpdk.org/dev/20240918092201.33772-1-dsosnow...@nvidia.com/

v3:
- Change type for restore flags to uint64_t.
- Return restore flags from rte_eth_get_restore_flags() directly,
  instead of returning through pointer.
v2:
- Fix typos in API docs.

Dariusz Sosnowski (4):
  ethdev: rework config restore
  ethdev: add get restore flags driver callback
  ethdev: restore config only when requested
  net/mlx5: disable config restore

 drivers/net/mlx5/mlx5.c|  2 ++
 drivers/net/mlx5/mlx5.h|  2 ++
 drivers/net/mlx5/mlx5_ethdev.c | 18 ++
 lib/ethdev/ethdev_driver.c |  9 +
 lib/ethdev/ethdev_driver.h | 66 ++
 lib/ethdev/rte_ethdev.c| 49 +
 lib/ethdev/version.map |  1 +
 7 files changed, 140 insertions(+), 7 deletions(-)

--
2.39.5

[PATCH v3 4/4] net/mlx5: disable config restore

mlx5 PMD does not require configuration restore
on rte_eth_dev_start().
Add implementation of get_restore_flags() indicating that.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/mlx5.c|  2 ++
 drivers/net/mlx5/mlx5.h|  2 ++
 drivers/net/mlx5/mlx5_ethdev.c | 18 ++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8d266b0e64..9b6acaf7f1 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2571,6 +2571,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
.count_aggr_ports = mlx5_count_aggr_ports,
.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
+   .get_restore_flags = mlx5_get_restore_flags,
 };
 
 /* Available operations from secondary process. */
@@ -2663,6 +2664,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
.get_monitor_addr = mlx5_get_monitor_addr,
.count_aggr_ports = mlx5_count_aggr_ports,
.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
+   .get_restore_flags = mlx5_get_restore_flags,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 869aac032b..33568bc3d3 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -2228,6 +2228,8 @@ eth_rx_burst_t mlx5_select_rx_function(struct rte_eth_dev 
*dev);
 struct mlx5_priv *mlx5_port_to_eswitch_info(uint16_t port, bool valid);
 struct mlx5_priv *mlx5_dev_to_eswitch_info(struct rte_eth_dev *dev);
 int mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev);
+uint64_t mlx5_get_restore_flags(struct rte_eth_dev *dev,
+   enum rte_eth_dev_operation op);
 
 /* mlx5_ethdev_os.c */
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 6a678d6dcc..6f24d649e0 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -796,3 +796,21 @@ mlx5_hairpin_cap_get(struct rte_eth_dev *dev, struct 
rte_eth_hairpin_cap *cap)
cap->tx_cap.rte_memory = hca_attr->hairpin_sq_wq_in_host_mem;
return 0;
 }
+
+/**
+ * Indicate to ethdev layer, what configuration must be restored.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device structure.
+ * @param[in] op
+ *   Type of operation which might require.
+ * @param[out] flags
+ *   Restore flags will be stored here.
+ */
+uint64_t
+mlx5_get_restore_flags(__rte_unused struct rte_eth_dev *dev,
+  __rte_unused enum rte_eth_dev_operation op)
+{
+   /* mlx5 PMD does not require any configuration restore. */
+   return 0;
+}
-- 
2.39.5

[PATCH v3 1/4] ethdev: rework config restore

Extract promiscuous and all multicast configuration restore
to separate functions.
This change will allow easier integration of disabling
these procedures for supporting PMDs in follow up commits.

Signed-off-by: Dariusz Sosnowski 
---
 lib/ethdev/rte_ethdev.c | 34 +-
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e..362a1883f0 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1648,14 +1648,10 @@ eth_dev_mac_restore(struct rte_eth_dev *dev,
 }
 
 static int
-eth_dev_config_restore(struct rte_eth_dev *dev,
-   struct rte_eth_dev_info *dev_info, uint16_t port_id)
+eth_dev_promiscuous_restore(struct rte_eth_dev *dev, uint16_t port_id)
 {
int ret;
 
-   if (!(*dev_info->dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR))
-   eth_dev_mac_restore(dev, dev_info);
-
/* replay promiscuous configuration */
/*
 * use callbacks directly since we don't need port_id check and
@@ -1683,6 +1679,14 @@ eth_dev_config_restore(struct rte_eth_dev *dev,
}
}
 
+   return 0;
+}
+
+static int
+eth_dev_allmulticast_restore(struct rte_eth_dev *dev, uint16_t port_id)
+{
+   int ret;
+
/* replay all multicast configuration */
/*
 * use callbacks directly since we don't need port_id check and
@@ -1713,6 +1717,26 @@ eth_dev_config_restore(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+eth_dev_config_restore(struct rte_eth_dev *dev,
+   struct rte_eth_dev_info *dev_info, uint16_t port_id)
+{
+   int ret;
+
+   if (!(*dev_info->dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR))
+   eth_dev_mac_restore(dev, dev_info);
+
+   ret = eth_dev_promiscuous_restore(dev, port_id);
+   if (ret != 0)
+   return ret;
+
+   ret = eth_dev_allmulticast_restore(dev, port_id);
+   if (ret != 0)
+   return ret;
+
+   return 0;
+}
+
 int
 rte_eth_dev_start(uint16_t port_id)
 {
-- 
2.39.5

Re: [PATCH v6 3/3] examples/ptpclient: add frequency adjustment

On 10/11/2024 7:34 AM, Mingjin Ye wrote:
> This patch adds PI servo controller to support frequency
> adjustment API for IEEE1588 PTP.
> 
> For example, the command for starting ptpclient with PI controller is:
> dpdk-ptpclient -a :81:00.0 -c 1 -n 3 -- -T 0 -p 0x1 -c 1
> 
> Signed-off-by: Simei Su 
> Signed-off-by: Wenjun Wu 
> Signed-off-by: Mingjin Ye 
> ---
>  doc/guides/sample_app_ug/ptpclient.rst |  12 +-
>  examples/ptpclient/ptpclient.c | 284 +++--
>  2 files changed, 274 insertions(+), 22 deletions(-)
> 
> diff --git a/doc/guides/sample_app_ug/ptpclient.rst 
> b/doc/guides/sample_app_ug/ptpclient.rst
> index d47e942738..89fe575b5f 100644
> --- a/doc/guides/sample_app_ug/ptpclient.rst
> +++ b/doc/guides/sample_app_ug/ptpclient.rst
> @@ -50,6 +50,10 @@ The adjustment for slave can be represented as:
>  If the command line parameter ``-T 1`` is used the application also
>  synchronizes the PTP PHC clock with the Linux kernel clock.
>  
> +If the command line parameter ``-c 1`` is used, the application will also
> +use the servo of the local clock. Only one type of servo is currently
> +implemented, the PI controller. Default 0 (not used).
> +
>  Compiling the Application
>  -
>  
> @@ -65,7 +69,7 @@ To run the example in a ``linux`` environment:
>  
>  .. code-block:: console
>  
> -.//examples/dpdk-ptpclient -l 1 -n 4 -- -p 0x1 -T 0
> +.//examples/dpdk-ptpclient -l 1 -n 4 -- -p 0x1 -T 0 -c 1
>  
>  Refer to *DPDK Getting Started Guide* for general information on running
>  applications and the Environment Abstraction Layer (EAL) options.
> @@ -73,7 +77,13 @@ applications and the Environment Abstraction Layer (EAL) 
> options.
>  * ``-p portmask``: Hexadecimal portmask.
>  * ``-T 0``: Update only the PTP slave clock.
>  * ``-T 1``: Update the PTP slave clock and synchronize the Linux Kernel to 
> the PTP clock.
> +* ``-c 0``: Not used clock servo controller.
> +* ``-c 1``: The clock servo PI controller is used and the log will print 
> information
> +about "master offset".
>  
> +Also, by adding ``-T 1`` and ``-c 1`` , the ``master offset`` value printed 
> in the
> +log will slowly converge and eventually stabilise at the nanosecond level. 
> The
> +synchronisation accuracy is much higher compared to not using a servo 
> controller.
>  

What is this new feature / argument dependency to the new
'rte_eth_timesync_adjust_freq()' API?

Right now only one PMD supports it, should application check the support
of the API when -c parameter provided, and fail it if is not supported?


>  Code Explanation
>  
> diff --git a/examples/ptpclient/ptpclient.c b/examples/ptpclient/ptpclient.c
> index afb61bba51..dea8d9d54a 100644
> --- a/examples/ptpclient/ptpclient.c
> +++ b/examples/ptpclient/ptpclient.c
> @@ -46,6 +46,35 @@ static volatile bool force_quit;
>  #define KERNEL_TIME_ADJUST_LIMIT  2
>  #define PTP_PROTOCOL 0x88F7
>  
> +#define KP 0.7
> +#define KI 0.3
> +#define FREQ_EST_MARGIN 0.001
> +
> +enum servo_state {
> + SERVO_UNLOCKED,
> + SERVO_JUMP,
> + SERVO_LOCKED,
> +};
> +
> +struct pi_servo {
> + double offset[2];
> + double local[2];
> + double drift;
> + double last_freq;
> + int count;
> +
> + double max_frequency;
> + double step_threshold;
> + double first_step_threshold;
> + int first_update;
> +};
> +
> +enum controller_mode {
> + MODE_NONE,
> + MODE_PI,
> + MAX_ALL
> +} mode = MODE_NONE;
> +
>  struct rte_mempool *mbuf_pool;
>  uint32_t ptp_enabled_port_mask;
>  uint8_t ptp_enabled_port_nb;
> @@ -135,6 +164,9 @@ struct ptpv2_data_slave_ordinary {
>   uint8_t ptpset;
>   uint8_t kernel_time_set;
>   uint16_t current_ptp_port;
> + int64_t master_offset;
> + int64_t path_delay;
> + struct pi_servo *servo;
>  };
>  
>  static struct ptpv2_data_slave_ordinary ptp_data;
> @@ -293,36 +325,44 @@ print_clock_info(struct ptpv2_data_slave_ordinary 
> *ptp_data)
>   ptp_data->tstamp3.tv_sec,
>   (ptp_data->tstamp3.tv_nsec));
>  
> - printf("\nT4 - Master Clock.  %lds %ldns ",
> + printf("\nT4 - Master Clock.  %lds %ldns\n",
>   ptp_data->tstamp4.tv_sec,
>   (ptp_data->tstamp4.tv_nsec));
>  
> - printf("\nDelta between master and slave clocks:%"PRId64"ns\n",
> + if (mode == MODE_NONE) {
> + printf("\nDelta between master and slave clocks:%"PRId64"ns\n",
>   ptp_data->delta);
>  
> - clock_gettime(CLOCK_REALTIME, &sys_time);
> - rte_eth_timesync_read_time(ptp_data->current_ptp_port, &net_time);
> + clock_gettime(CLOCK_REALTIME, &sys_time);
> + rte_eth_timesync_read_time(ptp_data->current_ptp_port,
> +&net_time);
>  
> - time_t ts = net_time.tv_sec;
> + time_t ts = net_time.tv_sec;
>  
> - printf("\n\n

Re: [PATCH] doc: correct definition of Stats per queue feature

On 10/11/2024 2:38 AM, Stephen Hemminger wrote:
> Change the documentation to match current usage of this feature
> in the NIC table. Moved this sub heading to be after basic
> stats because the queue stats reported now are in the same structure.
> 
> Although the "Stats per Queue" feature was originally intended
> to be related to stats mapping, the overwhelming majority of drivers
> report this feature with a different meaning.
> 
> Hopefully in later release the per-queue stats limitations
> can be fixed, but this requires and API, ABI, and lots of driver
> changes.
> 
> Fixes: dad1ec72a377 ("doc: document NIC features")
> Cc: ferruh.yi...@intel.com
> Signed-off-by: Stephen Hemminger 
> 

Acked-by: Ferruh Yigit

Re: [PATCH v3 0/4] ethdev: rework config restore

On 10/11/2024 7:51 PM, Dariusz Sosnowski wrote:
> This patch series reworks the config restore procedure,
> so that drivers are able to enable/disable certain parts of it.
> Drivers can provide get_restore_flags() callback,
> which will indicate to ethdev library what configuration to restore.
> 
> If callback is not defined, then ethdev assumes that
> all configuration must be restored, preserving the current behavior for all 
> drivers.
> 
> This patch series also includes implementation of get_restore_flags()
> for mlx5 PMD, which does not require config restore.
> 
> RFC: https://inbox.dpdk.org/dev/20240918092201.33772-1-dsosnow...@nvidia.com/
> 
> v3:
> - Change type for restore flags to uint64_t.
> - Return restore flags from rte_eth_get_restore_flags() directly,
>   instead of returning through pointer.
> v2:
> - Fix typos in API docs.
> 
> Dariusz Sosnowski (4):
>   ethdev: rework config restore
>   ethdev: add get restore flags driver callback
>   ethdev: restore config only when requested
>   net/mlx5: disable config restore
> 

For series,
Acked-by: Ferruh Yigit 

Series applied to dpdk-next-net/main, thanks.

Re: [PATCH] net/nfb: fix use after free

2024-10-11 Thread Martin Spinler

On Thu, 2024-10-10 at 19:17 +0200, David Marchand wrote:
> On Thu, Oct 10, 2024 at 7:16 PM Thomas Monjalon  wrote:
> > 
> > With the annotations added to the allocation functions,
> > more issues are detected at compilation time:
> > 
> > nfb_rx.c:133:28: error: pointer 'rxq' used after 'rte_free'
> > 
> > It is fixed by moving the assignment before freeing the parent pointer.
> > 
> > Fixes: 80da7efbb4c4 ("eal: annotate allocation functions")
> > 
> > Signed-off-by: Thomas Monjalon 
> Reviewed-by: David Marchand 
> 
> 
Acked-by: Martin Spinler

RE: [PATCH v14 1/4] lib: add generic support for reading PMU events




> Add support for programming PMU counters and reading their values
> in runtime bypassing kernel completely.
> 
> This is especially useful in cases where CPU cores are isolated
> i.e run dedicated tasks. In such cases one cannot use standard
> perf utility without sacrificing latency and performance.
> 
> Signed-off-by: Tomasz Duszynski 
> ---
>  MAINTAINERS|   5 +
>  app/test/meson.build   |   1 +
>  app/test/test_pmu.c|  53 +++
>  doc/api/doxy-api-index.md  |   3 +-
>  doc/api/doxy-api.conf.in   |   1 +
>  doc/guides/prog_guide/profile_app.rst  |  29 ++
>  doc/guides/rel_notes/release_24_11.rst |   7 +
>  lib/meson.build|   1 +
>  lib/pmu/meson.build|  12 +
>  lib/pmu/pmu_private.h  |  32 ++
>  lib/pmu/rte_pmu.c  | 463 +
>  lib/pmu/rte_pmu.h  | 227 
>  lib/pmu/version.map|  14 +
>  13 files changed, 847 insertions(+), 1 deletion(-)
>  create mode 100644 app/test/test_pmu.c
>  create mode 100644 lib/pmu/meson.build
>  create mode 100644 lib/pmu/pmu_private.h
>  create mode 100644 lib/pmu/rte_pmu.c
>  create mode 100644 lib/pmu/rte_pmu.h
>  create mode 100644 lib/pmu/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7d171e3d45..fbd46905b6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1811,6 +1811,11 @@ M: Nithin Dabilpuram 
>  M: Pavan Nikhilesh 
>  F: lib/node/
> 
> +PMU - EXPERIMENTAL
> +M: Tomasz Duszynski 
> +F: lib/pmu/
> +F: app/test/test_pmu*
> +
> 
>  Test Applications
>  -
> diff --git a/app/test/meson.build b/app/test/meson.build
> index e29258e6ec..45f56d8aae 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -139,6 +139,7 @@ source_file_deps = {
>  'test_pmd_perf.c': ['ethdev', 'net'] + packet_burst_generator_deps,
>  'test_pmd_ring.c': ['net_ring', 'ethdev', 'bus_vdev'],
>  'test_pmd_ring_perf.c': ['ethdev', 'net_ring', 'bus_vdev'],
> +'test_pmu.c': ['pmu'],
>  'test_power.c': ['power'],
>  'test_power_cpufreq.c': ['power'],
>  'test_power_intel_uncore.c': ['power'],
> diff --git a/app/test/test_pmu.c b/app/test/test_pmu.c
> new file mode 100644
> index 00..79376ea2e8
> --- /dev/null
> +++ b/app/test/test_pmu.c
> @@ -0,0 +1,53 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2024 Marvell International Ltd.
> + */
> +
> +#include 
> +
> +#include "test.h"
> +
> +static int
> +test_pmu_read(void)
> +{
> + const char *name = NULL;
> + int tries = 10, event;
> + uint64_t val = 0;
> +
> + if (name == NULL) {
> + printf("PMU not supported on this arch\n");
> + return TEST_SKIPPED;
> + }
> +
> + if (rte_pmu_init() == -ENOTSUP) {
> + printf("pmu_autotest only supported on Linux, skipping test\n");
> + return TEST_SKIPPED;
> + }
> + if (rte_pmu_init() < 0)
> + return TEST_SKIPPED;
> +
> + event = rte_pmu_add_event(name);
> + while (tries--)
> + val += rte_pmu_read(event);
> +
> + rte_pmu_fini();
> +
> + return val ? TEST_SUCCESS : TEST_FAILED;
> +}
> +
> +static struct unit_test_suite pmu_tests = {
> + .suite_name = "pmu autotest",
> + .setup = NULL,
> + .teardown = NULL,
> + .unit_test_cases = {
> + TEST_CASE(test_pmu_read),
> + TEST_CASES_END()
> + }
> +};
> +
> +static int
> +test_pmu(void)
> +{
> + return unit_test_suite_runner(&pmu_tests);
> +}
> +
> +REGISTER_FAST_TEST(pmu_autotest, true, true, test_pmu);
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index f9f0300126..805efc6520 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -237,7 +237,8 @@ The public API headers are grouped by topics:
>[log](@ref rte_log.h),
>[errno](@ref rte_errno.h),
>[trace](@ref rte_trace.h),
> -  [trace_point](@ref rte_trace_point.h)
> +  [trace_point](@ref rte_trace_point.h),
> +  [pmu](@ref rte_pmu.h)
> 
>  - **misc**:
>[EAL config](@ref rte_eal.h),
> diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
> index a8823c046f..658490b6a2 100644
> --- a/doc/api/doxy-api.conf.in
> +++ b/doc/api/doxy-api.conf.in
> @@ -69,6 +69,7 @@ INPUT   = 
> @TOPDIR@/doc/api/doxy-api-index.md \
>@TOPDIR@/lib/pdcp \
>@TOPDIR@/lib/pdump \
>@TOPDIR@/lib/pipeline \
> +  @TOPDIR@/lib/pmu \
>@TOPDIR@/lib/port \
>@TOPDIR@/lib/power \
>@TOPDIR@/lib/ptr_compress \
> diff --git a/doc/guides/prog_guide/profile_app.rst 
> b/doc/guides/prog_guide/profile_app.rst
> index a6b5fb4d5e..854c515a61 100644
> --- a/doc/guides/prog_guide/profile_app.rst
> +++

RE: [PATCH dpdk v2 16/16] ipv6: add function to check ipv6 version

2024-10-11 Thread Morten Brørup

> From: Robin Jarry [mailto:rja...@redhat.com]
> Sent: Thursday, 10 October 2024 22.01
> 
> Hi Morten,
> 
> Morten Brørup, Oct 06, 2024 at 11:02:
> > Personally, I would prefer following the convention of rte_ether
> functions to return boolean (as int)...
> >
> > static inline int
> rte_is_ signed>_ether_addr(const struct rte_ether_addr *ea)
> 
> Sorry, I haven't followed your recommendation in v3, but I have a good
> reason :)
> 
> I find that functions following that naming scheme are impossible to
> find since they all start with the same "rte_is_" prefix regardless of
> the DPDK module in which they are defined. It it is particularly
> annoying when using code completion with clangd or similar tools.
> 
> rte_is_power_of_2   
> rte_is_aligned  
> rte_is_same_ether_addr  
> rte_is_zero_ether_addr  
> rte_is_unicast_ether_addr   
> rte_is_multicast_ether_addr 
> rte_is_broadcast_ether_addr 
> rte_is_universal_ether_addr 
> rte_is_local_admin_ether_addr   
> rte_is_valid_assigned_ether_addr
> 
> The ethernet address functions are even more extreme as they end all
> with the same suffix:
> 
> rte___
> 
> All other public API seems to follow the inverse logic:
> 
> rte___
> 
> It does not follow the natural English language but more of an inverse
> polish notation. However, it feels more user friendly and better
> discoverable.
> 
> rte_ipv6_addr_eq
> rte_ipv6_addr_eq_prefix
> rte_ipv6_addr_is_linklocal
> rte_ipv6_addr_is_loopback
> rte_ipv6_addr_is_mcast
> rte_ipv6_addr_is_sitelocal
> rte_ipv6_addr_is_unspec
> rte_ipv6_addr_is_v4compat
> rte_ipv6_addr_is_v4mapped
> rte_ipv6_addr_mask
> rte_ipv6_llocal_from_ethernet
> rte_ipv6_mask_depth
> rte_ipv6_mc_scope
> rte_ipv6_solnode_from_addr
> 
> I hope I'm making sense :)

Excellent explanation.
You have convinced me, so I now agree with your decision.

Then, in some other future patch, the ether_addr functions should be renamed to 
follow this convention too. And for backwards compatibility, their old names 
can be preserved as macros/functions calling the new names.

> 
> > static inline int rte_is_ipv6_version(const struct rte_ipv6_hdr *ip)
> > {
> > return ip->vtc_flow & RTE_IPV6_VTC_FLOW_VERSION_MASK ==
> RTE_IPV6_VTC_FLOW_VERSION;
> > }
> >
> > Or, at your preference, an optimized variant:
> > static inline int rte_is_version_ipv6(const struct rte_ipv6_hdr *ip)
> > {
> > return *(const unsigned char *)ip & 0xf0 == 0x60;
> > }
> >
> > The first nibble is also used for version in IPv4, so an IPv4 variant
> would look similar:
> > static inline int rte_is_version_ipv4(const struct rte_ip_hdr *ip)
> > {
> > return *(const unsigned char *)ip & 0xf0 == 0x40;
> > }
> 
> I had missed this in ipv4. I could rework it for v4 if there are other
> remarks.
> 
> Thanks for the review!

[PATCH v1] net/ice: fix incorrect reading of PHY timestamp

2024-10-11 Thread Soumyadeep Hore

In ICE PMD, previously the ready bitmap checking before reading
PHY timestamp was not present. This caused incorrect Tx
timestamping.

The ready bitmap checking is enabled and PHY timestamp is read once
the ready bitmap gives positive value.

Fixes: 881169950d80 ("net/ice/base: implement initial PTP support for E830")
Cc: sta...@dpdk.org

Signed-off-by: Soumyadeep Hore 
---
 drivers/net/ice/ice_ethdev.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 7b1bd163a2..2357d6e1da 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -6516,14 +6516,32 @@ ice_timesync_read_tx_timestamp(struct rte_eth_dev *dev,
struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ice_adapter *ad =
ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   struct ice_tx_queue *txq;
uint8_t lport;
-   uint64_t ts_ns, ns, tstamp;
+   uint64_t ts_ns, ns, tstamp, tstamp_ready = 0;
+   uint64_t start_time, curr_time;
const uint64_t mask = 0x;
int ret;
 
+   txq = dev->data->tx_queues[0];
lport = hw->port_info->lport;
 
-   ret = ice_read_phy_tstamp(hw, lport, 0, &tstamp);
+   start_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000);
+
+   while (!(tstamp_ready & BIT_ULL(0))) {
+   ret = ice_get_phy_tx_tstamp_ready(hw, lport, &tstamp_ready);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Failed to get phy ready for 
timestamp");
+   return -1;
+   }
+   curr_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 
1000);
+   if (curr_time - start_time > 1000) {
+   PMD_DRV_LOG(ERR, "Timeout to get phy ready for 
timestamp");
+   return -1;
+   }
+   }
+
+   ret = ice_read_phy_tstamp(hw, lport, txq->queue_id, &tstamp);
if (ret) {
PMD_DRV_LOG(ERR, "Failed to read phy timestamp");
return -1;
-- 
2.43.0

RE: rte_ring move head question for machines with relaxed MO (arm/ppc)




> >
> > > > > > > 1. rte_ring_generic_pvt.h:
> > > > > > > =
> > > > > > >
> > > > > > > pseudo-c-code  //
> > > > > > > related armv8 instructions
> > > > > > >   
> > > > > > >--
> > 
> > > > > > >  head.load()  //
> > > > > > > ldr [head]
> > > > > > >  rte_smp_rmb()//dmb 
> > > > > > > ishld
> > > > > > >  opposite_tail.load()//ldr 
> > > > > > > [opposite_tail]
> > > > > > >  ...
> > > > > > >  rte_atomic32_cmpset(head, ...)  //ldrex[head];... 
> > > > > > > stlex[head]
> > > > > > >
> > > > > > >
> > > > > > > 2. rte_ring_c11_pvt.h
> > > > > > > =
> > > > > > >
> > > > > > > pseudo-c-code   //
> > > > > > > related armv8 instructions
> > > > > > >   
> > > > > > >--
> > 
> > > > > > > head.atomic_load(relaxed) //ldr[head]
> > > > > > > atomic_thread_fence(acquire)   //dmb ish
> > > > > > > opposite_tail.atomic_load(acquire)   //lda[opposite_tail]
> > > > > > > ...
> > > > > > > head.atomic_cas(..., relaxed)  //ldrex[haed]; 
> > > > > > > ... strex[head]
> > > > > > >
> > > > > > >
> > > > > > > 3.   rte_ring_hts_elem_pvt.h
> > > > > > > ==
> > > > > > >
> > > > > > > pseudo-c-code   //
> > > > > > > related armv8 instructions
> > > > > > >   
> > > > > > >--
> > 
> > > > > > > head.atomic_load(acquire)//lda [head]
> > > > > > > opposite_tail.load() //ldr 
> > > > > > > [opposite_tail]
> > > > > > > ...
> > > > > > > head.atomic_cas(..., acquire)// ldaex[head]; 
> > > > > > > ... strex[head]
> > > > > > >
> > > > > > > The questions that arose from these observations:
> > > > > > > a) are all 3 approaches equivalent in terms of functionality?
> > > > > > Different, lda (Load with acquire semantics) and ldr (load) are 
> > > > > > different.
> > > > >
> > > > > I understand that, my question was:
> > > > > lda {head]; ldr[tail]
> > > > > vs
> > > > > ldr [head]; dmb ishld; ldr [tail];
> > > > >
> > > > > Is there any difference in terms of functionality (memory ops
> > > > ordering/observability)?
> > > >
> > > > To be more precise:
> > > >
> > > > lda {head]; ldr[tail]
> > > > vs
> > > > ldr [head]; dmb ishld; ldr [tail];
> > > > vs
> > > > ldr [head]; dmb ishld; lda [tail];
> > > >
> > > > what would be the difference between these 3 cases?
> > >
> > > Case A: lda {head]; ldr[tail]
> > > load of the head will be observed by the memory subsystem before the
> > > load of the tail.
> > >
> > > Case B: ldr [head]; dmb ishld; ldr [tail]; load of the head will be
> > > observed by the memory subsystem Before the load of the tail.
> > >
> > >
> > > Essentially both cases A and B are the same.
> > > They preserve following program orders.
> > > LOAD-LOAD
> > > LOAD-STORE
> >
> > Ok, that is crystal clear, thanks for explanation.
> >
> >
> > > Case C: ldr [head]; dmb ishld; lda [tail]; load of the head will be
> > > observed by the memory subsystem before the load of the tail.
> >
> > Ok.
> >
> > > In addition, any load or store program order after lda[tail] will not
> > > be observed by the memory subsystem before the load of the tail.
> >
> > Ok... the question is why we need that extra hoisting barrier here?
> > From what unwanted  re-orderings we are protecting here?
> > Does it mean that without it, ldrex/strex (CAS) can be reordered with
> > load[cons.tail]?
> >
> > Actually, we probably need to look at whole picture:
> >
> > in rte_ring_generic_pvt.h
> > =
> >
> > ldr [prod.head]
> > dmb ishld
> > ldr [cons.tail]
> > ...
> > /* cas */
> > ldrex [prod.head]
> > stlex [prod.head]   /* sink barrier */
> >
> > in rte_ring_c11_pvt.h
> > =
> >
> > ldr [prod.head]
> > dmb ishld
> > lda [cons.tail]  /* exrea hoist */
> > ...
> > /* cas */
> > ldrex [prod.head]
> > strex [prod.head]
> 
> Minor thing, ldrex and strex is how Arm 32 way of doing CAS.
> Aaarch64 has a cas instruction.
> Same code in aarch64 armv9-a https://godbolt.org/z/TMvWx6v4n

Cool, thanks.

> 
> >
> > So, in _genereic_ we don't have that extra hoist barrier after 
> > load[con.tail], but
> > we have extra sink barrier at cas(prod.tail).
> >
> 
> You are right, technically, that lda[cons.tail] is not required because due 
> to the
> dependency chain up until CAS a memory reordering is not possible.
> For that reas

Re: [PATCH v6 0/4] fix segment fault when parse args

On Wed, Oct 9, 2024 at 6:50 AM Chengwen Feng  wrote:
>
> The rte_kvargs_process() was used to parse key-value (e.g. socket_id=0),
> it also supports to parse only-key (e.g. socket_id). But many drivers's
> callback can only handle key-value, it will segment fault if handles
> only-key. so the patchset [1] was introduced.
>
> Because the patchset [1] modified too much drivers, therefore:
> 1) A new API rte_kvargs_process_opt() was introduced, it inherits the
> function of rte_kvargs_process() which could parse both key-value and
> only-key.
> 2) Constraint the rte_kvargs_process() can only parse key-value.
>
> [1] 
> https://patches.dpdk.org/project/dpdk/patch/20230320092110.37295-1-fengcheng...@huawei.com/
>
> Chengwen Feng (4):
>   kvargs: add one new process API
>   net/sfc: use new API to parse kvargs
>   net/tap: use new API to parse kvargs
>   common/nfp: use new API to parse kvargs

I see Stephen wanted to have a look on the series, but rc1 is close,
and I decided to merge it with following comments.

I marked the new API as stable from the go.
The reason being that it provides the "legacy" behavior of previous
rte_kvargs_process function, and as such, some users may have been
relying on it.
If you think this is wrong, please submit a followup patch.

I was surprised of the change in behavior wrt to a NULL kvlist input
parameter mixed in the change.
But I left it as is. Idem, a followup patch may be sent.

Series applied, thanks for fixing this old issue Chengwen.

-- 
David Marchand

Re: [PATCH v5 1/1] dmadev: support strict priority configuration

2024-10-11 Thread fengchengwen

Acked-by: Chengwen Feng 

On 2024/10/11 17:13, Vamsi Krishna wrote:
> From: Vamsi Attunuru 
> 
> Some DMA controllers offer the ability to configure priority
> level for the DMA channels, allowing for the prioritization
> of DMA command execution based on channel importance.
> 
> This patch supports such strict priority configuration. If the
> dmadev supports, it should advertise the capability flag
> RTE_DMA_CAPA_PRI_POLICY_SP, then application could enable strict
> priority configuration.
> 
> Signed-off-by: Vamsi Attunuru 
> Signed-off-by: Amit Prakash Shukla 
> ---

Re: [PATCH v4 1/5] graph: add support for node specific errors


Hi Pavan,

, Aug 16, 2024 at 17:09:

From: Pavan Nikhilesh 

Add ability for Nodes to advertise error counters
during registration.

Signed-off-by: Pavan Nikhilesh 
---
v2 Changes:
- Fix compilation.
v3 Changes:
- Resend as 1/5 didn't make it through.
v4 Changes:
- Address review comments.
- Rebase on main branch.

 doc/guides/prog_guide/graph_lib.rst   |  22 +-
 .../prog_guide/img/anatomy_of_a_node.svg  | 329 +--
 .../prog_guide/img/graph_mem_layout.svg   | 921 +-
 lib/graph/graph_private.h |   1 +
 lib/graph/node.c  |  37 +-
 lib/graph/rte_graph.h |   7 +
 6 files changed, 1016 insertions(+), 301 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst 
b/doc/guides/prog_guide/graph_lib.rst
index ad09bdfe26..018900caea 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -21,6 +21,7 @@ Features of the Graph library are:
 - Nodes as plugins.
 - Support for out of tree nodes.
 - Inbuilt nodes for packet processing.
+- Node specific error counts.
 - Multi-process support.
 - Low overhead graph walk and node enqueue.
 - Low overhead statistics collection infrastructure.
@@ -124,6 +125,18 @@ Source nodes are static nodes created using 
``RTE_NODE_REGISTER`` by passing
 While performing the graph walk, the ``process()`` function of all the source
 nodes will be called first. So that these nodes can be used as input nodes for 
a graph.

+nb_errors:
+^^
+
+The number of errors that this node can report. The ``err_desc[]`` stores the 
error
+descriptions which will later be propagated to stats.
+
+err_desc[]:
+^^^
+
+The dynamic array to store the error descriptions that will be reported by this
+node.


This feature could be useful to store detailed statistics per node, not 
only for errors. It would be better to rename "errors" to "stats" or 
"xstats".


See below for a concrete suggestion.


diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d557d55f2d..e663b04d8b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -61,6 +61,7 @@ struct node {
rte_node_t id;/**< Allocated identifier for the node. */
rte_node_t parent_id; /**< Parent node identifier. */
rte_edge_t nb_edges;  /**< Number of edges from this node. */
+   struct rte_node_errors *errs; /**< Node specific errors. */


How about:

   struct rte_node_xstats *xstats; /**< Node specific extra statistics. */

And maybe const? I didn't check if you make a full copy, or if you only 
copy the pointer.



diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index ecfec2068a..b28143d737 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -29,6 +29,7 @@ extern "C" {

 #define RTE_GRAPH_NAMESIZE 64 /**< Max length of graph name. */
 #define RTE_NODE_NAMESIZE 64  /**< Max length of node name. */
+#define RTE_NODE_ERROR_DESC_SIZE 64  /**< Max length of node name. */


#define RTE_NODE_XSTATS_DESC_SIZE 64 /**< Max length of node xstats names. */


 #define RTE_GRAPH_PCAP_FILE_SZ 64 /**< Max length of pcap file name. */
 #define RTE_GRAPH_OFF_INVALID UINT32_MAX /**< Invalid graph offset. */
 #define RTE_NODE_ID_INVALID UINT32_MAX   /**< Invalid node id. */
@@ -460,6 +461,11 @@ void rte_graph_cluster_stats_get(struct 
rte_graph_cluster_stats *stat,
  */
 void rte_graph_cluster_stats_reset(struct rte_graph_cluster_stats *stat);

+struct rte_node_errors {
+   uint16_t nb_errors;  /**< Number of errors. 
*/
+   char err_desc[][RTE_NODE_ERROR_DESC_SIZE];   /**< Names of errors. 
*/
+};



struct rte_node_xstats {
   uint16_t xstats_num; /**< Number of xstats. */
   char xstats_desc[RTE_NODE_XSTATS_DESC_SIZE];  /**< Names of xstats. */
};


+
 /**
  * Structure defines the node registration parameters.
  *
@@ -472,6 +478,7 @@ struct rte_node_register {
rte_node_process_t process; /**< Node process function. */
rte_node_init_t init;   /**< Node init function. */
rte_node_fini_t fini;   /**< Node fini function. */
+   struct rte_node_errors *errs; /**< Node specific errors. */


   struct rte_node_xstats *xstats; /**< Node specific extra statistics. */

Cheers!

Re: [PATCH v3] fib: network byte order IPv4 lookup


Hi Vladimir,

Vladimir Medvedkin, Oct 10, 2024 at 13:26:

Previously when running rte_fib_lookup IPv4 addresses must have been in
host byte order.

This patch adds a new flag RTE_FIB_FLAG_LOOKUP_BE that can be passed on
fib create, which will allow to have IPv4 in network byte order on
lookup.

Signed-off-by: Vladimir Medvedkin 


[snip]


diff --git a/lib/fib/dir24_8.h b/lib/fib/dir24_8.h
index 7125049f15..2c776e118f 100644
--- a/lib/fib/dir24_8.h
+++ b/lib/fib/dir24_8.h
@@ -7,7 +7,9 @@
 #define _DIR24_8_H_
 
 #include 

+#include 
 
+#include 

 #include 
 #include 
 
@@ -237,6 +239,46 @@ dir24_8_lookup_bulk_uni(void *p, const uint32_t *ips,

}
 }
 
+#define BSWAP_MAX_LENGTH	64

+
+typedef void (*dir24_8_lookup_bulk_be_cb)(void *p, const uint32_t *ips,
+   uint64_t *next_hops, const unsigned int n);
+
+static inline void
+dir24_8_lookup_bulk_be(void *p, const uint32_t *ips,
+   uint64_t *next_hops, const unsigned int n,
+   dir24_8_lookup_bulk_be_cb cb)
+{
+   uint32_t le_ips[BSWAP_MAX_LENGTH];
+   unsigned int i;
+
+   for (i = 0; i < n; i += BSWAP_MAX_LENGTH) {
+   int j;
+   for (j = 0; j < BSWAP_MAX_LENGTH && i + j < n; j++)
+   le_ips[j] = rte_be_to_cpu_32(ips[i + j]);
+
+   cb(p, le_ips, next_hops + i, j);
+   }


This should be a noop for big endian platforms. I'm not sure the 
complier will be smart enough to collapse the nested loops.



+}
+
+#define DECLARE_BE_LOOKUP_FN(name) \
+static inline void \
+name##_be(void *p, const uint32_t *ips,
\
+   uint64_t *next_hops, const unsigned int n)  \
+{  \
+   dir24_8_lookup_bulk_be(p, ips, next_hops, n, name); \
+}
+
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_1b)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_2b)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_4b)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_8b)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_0)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_1)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_2)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_3)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_uni)
+
 void *
 dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *conf);
 
@@ -244,7 +286,7 @@ void

 dir24_8_free(void *p);
 
 rte_fib_lookup_fn_t

-dir24_8_get_lookup_fn(void *p, enum rte_fib_lookup_type type);
+dir24_8_get_lookup_fn(void *p, enum rte_fib_lookup_type type, bool be_addr);
 
 int

 dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,


[snip]


diff --git a/lib/fib/rte_fib.h b/lib/fib/rte_fib.h
index d7a5aafe53..1617235e85 100644
--- a/lib/fib/rte_fib.h
+++ b/lib/fib/rte_fib.h
@@ -28,6 +28,9 @@ struct rte_rib;
 /** Maximum depth value possible for IPv4 FIB. */
 #define RTE_FIB_MAXDEPTH   32
 
+/** If set fib lookup is expecting ipv4 in network byte order */

+#define RTE_FIB_FLAG_LOOKUP_BE 1


I think RTE_FIB_F_NETWORK_ORDER would be more appropriate.


+
 /** Type of FIB struct */
 enum rte_fib_type {
RTE_FIB_DUMMY,  /**< RIB tree based FIB */
@@ -76,6 +79,7 @@ enum rte_fib_lookup_type {
 /** FIB configuration structure */
 struct rte_fib_conf {
enum rte_fib_type type; /**< Type of FIB struct */
+   unsigned int flags;


Maybe use an explicit int size for flags like uint32_t? I doubt we'll 
ever need more than 32 flags.


Also, maybe it would be better to add this field at the end to avoid 
breaking the API?


You forgot to add a doc string for that field:

   uint32_t flags; /**< Optional feature flags from RTE_FIB_F_* **/

Thanks!

Re: [PATCH v14 1/4] lib: add generic support for reading PMU events

On Fri, 11 Oct 2024 11:56:04 +
Konstantin Ananyev  wrote:

> > + *
> > + * @return
> > + *   0 in case of success, negative value otherwise.
> > + */
> > +__rte_experimental
> > +int
> > +rte_pmu_init(void);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Finalize PMU library. This should be called after PMU counters are no 
> > longer being read.
> > + */
> > +__rte_experimental
> > +void
> > +rte_pmu_fini(void);  
> 
> Hmm..., why _fini_() is allowed to be called directly while _init_() doesn't? 

Is this handled by destructor, called from rte_eal_cleanup() and/or testpmd?

Re: [PATCH v10 0/7] Lcore variables