https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119280
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
res = a[0] * a[0];
__asm volatile("rdcycle %0" : "=r"(end_cycle) : "r"(res) : "memory");
Will cause the second rdcycle to stay after the mult.
Otherwise you could just write the full thing in assembly.
Anways this documented and has been for all long time
