> -----Original Message----- > From: Aaron Conole <acon...@redhat.com> > Sent: Wednesday, November 27, 2019 2:10 PM > To: Van Haaren, Harry <harry.van.haa...@intel.com> > Cc: dev@dpdk.org > Subject: Re: [PATCH] test/service: fix wait for service core > > Harry van Haaren <harry.van.haa...@intel.com> writes: > > > This commit fixes a sporadic failure of the service_autotest > > unit test, as seen in the DPDK CI. The failure occurs as the main test > > thread did not wait on the service-thread to return, and allowing it > > to read a flag before the service was able to write to it. > > > > The fix changes the wait API call to specific the service-core ID, > > and this waits for cores with both ROLE_RTE and ROLE_SERVICE. > > > > The rte_eal_mp_wait_lcore() call does not (and should not) wait > > for service cores, so must not be used to wait on service-cores. > > > > Fixes: f038a81e1c56 ("service: add unit tests") > > > > Reported-by: Aaron Conole <acon...@redhat.com> > > Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com> > > > > --- > > It might also be good to document this behavior in the API area. It's > unclear that the lcore wait function which takes a core id will work, > but the broad wait will not.
Yes agreed that docs can improve here - different patch. > > Given this is a fix in the unit test, and not a functional change > > I'm not sure its worth backporting to LTS / stable releases? > > I've not added stable on CC yet. > > I think it's worth it if the LTS / stable branches use the unit tests > (otherwise, they will observe sporadic failures). Ok, I've added sta...@dpdk.org on CC now > > app/test/test_service_cores.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > > index 9fe38f5e0..a922c7ddc 100644 > > --- a/app/test/test_service_cores.c > > +++ b/app/test/test_service_cores.c > > @@ -483,7 +483,7 @@ service_lcore_en_dis_able(void) > > int ret = rte_eal_remote_launch(service_remote_launch_func, NULL, > > slcore_id); > > TEST_ASSERT_EQUAL(0, ret, "Ex-service core remote launch failed."); > > - rte_eal_mp_wait_lcore(); > > + rte_eal_wait_lcore(slcore_id); > > TEST_ASSERT_EQUAL(1, service_remote_launch_flag, > > "Ex-service core function call had no effect."); > > Should we also have some change like the following (just a guess): > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > index 9fe38f5e08..695c35ac6c 100644 > --- a/app/test/test_service_cores.c > +++ b/app/test/test_service_cores.c > @@ -773,7 +773,7 @@ service_app_lcore_poll_impl(const int mt_safe) > > /* flag done, then wait for the spawned 2nd core to return */ > params[0] = 1; > - rte_eal_mp_wait_lcore(); > + rte_eal_wait_lcore(app_core2); > > /* core two gets launched first - and should hold the service lock */ > TEST_ASSERT_EQUAL(0, app_core2_ret, I reviewed this usage of the function, and I believe it waits on application cores (aka, ROLE_RTE, not ROLE_SERVICE). Hence this usage is actually correct. Please review and double check my logic though - more eyes is good.