https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87608

--- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #1)
> Note the compiler can evaluate the initialization loop and then also
> evaluate the effect of static_sort1 call, so the testcase might give
> misleading results. To avoid that, pass the address of 'a' to rdtsc, or
> introduce a compiler barrier with an asm:
> 
>   asm volatile ("" :: "r"(a) : "memory");
> 
> Furthermore, note that the CPU executes the rdtsc instruction without
> waiting for all preceding computations to complete. Using lfence just before
> rdtsc will ensure that rdtsc reads the cycle counter only after all
> preceding computations are done.

Thanks for the hint.

I added the memory barrier to the code, it didn't make any appreciable
difference to the timing.

Reply via email to