Hi Morten and Hemant,

YIELD is a NOP on non-SMT CPUs, such as Neoverse.

WFE is universally available on AArch64, but it comes with a caveat: the CPU can remain in a low-power state indefinitely unless an event is triggered. That event can be generated explicitly via SEV/SEVL by a different CPU, or implicitly through address monitoring (LDAXR).

WFET is the safer variant because it includes a timeout, so explicit or implicit event-register manipulation is not required.

--wathsala

On 6/12/26 01:11, Hemant Agrawal wrote:
Hi Morten,
On Cortex‑A72 (ARMv8), the only architectural primitives available are YIELD, 
WFE, and WFI:

        YIELD is the only deterministic, low-overhead option (pure CPU relax, 
no entry into low-power state)
        WFE can be used as a low-power idle hint, but it is event-driven and 
not time-based (it may return immediately)
        WFI depends on interrupt wakeup and is therefore not suitable for tight 
latency loops

For ~1 µs latency targets, the practical approach is a hybrid strategy:

Short waits → spin using YIELD
Slightly longer waits → opportunistically use WFE for power reduction

A simple implementation could look like (not tested):

static inline void rte_armv8_pause(unsigned int iters)
{
        if (iters < 64) {
                for (unsigned int i = 0; i < iters; i++)
                        asm volatile("yield");
        } else {
                asm volatile("sevl");
                asm volatile("wfe");
        }
}

@Wathsala Vithanage — would appreciate your thoughts, especially if there are 
any micro-architectural nuances we should consider.

Regards,
Hemant

-----Original Message-----
From: Morten Brørup <[email protected]>
Sent: 03 June 2026 17:26
To: Wathsala Vithanage <[email protected]>; Hemant Agrawal
<[email protected]>; Sachin Saxena (OSS)
<[email protected]>
Cc: [email protected]; Maxime Leroy <[email protected]>
Subject: ARM v8 rte_power_pause
Importance: High

Hi Wathsala, Hemant and Sachin,

Over at the Grout project, we are discussing power management in the
context of 100 Gbit/s latency deadlines [1].

rte_power_pause() is not implemented for ARM v8 / Cortex-A72.
Syscalls such as nanosleep() have too much overhead, and cannot be used.

Any suggestions for a power-reducing method to make a CPU core "sleep" (i.e.
do nothing) for durations in the order of 1 microsecond?

[1]:
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
b.com%2FDPDK%2Fgrout%2Fpull%2F624%23issuecomment-
4602036364&data=05%7C02%7Chemant.agrawal%40nxp.com%7Cdbff5f2e
8db1406f0c4008dec1671791%7C686ea1d3bc2b4c6fa92cd99c5c301635%7
C0%7C0%7C639160845728472826%7CUnknown%7CTWFpbGZsb3d8eyJFb
XB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTW
FpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=DRpJWjm2yaF3Cnhk0b
bFFhmGbKRweOOiWdsWco2NbX0%3D&reserved=0

-Morten

Reply via email to