https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
E.g. the builtin is often used in a loop where the user does his own atomic
load first and decides what to do based on that.
Say for
float f;

void
foo ()
{
  #pragma omp atomic
  f += 3.0f;
}
with -O2 -fopenmp we emit:
  D.2113 = &f;
  D.2115 = __atomic_load_4 (D.2113, 0);
  D.2114 = D.2115;

  <bb 3> :
  D.2112 = VIEW_CONVERT_EXPR<float>(D.2114);
  _1 = D.2112 + 3.0e+0;
  D.2116 = VIEW_CONVERT_EXPR<unsigned int>(_1);
  D.2117 = .ATOMIC_COMPARE_EXCHANGE (D.2113, D.2114, D.2116, 4, 0, 0);
  D.2118 = REALPART_EXPR <D.2117>;
  D.2119 = D.2114;
  D.2114 = D.2118;
  if (D.2118 != D.2119)
    goto <bb 3>; [0.00%]
  else
    goto <bb 4>; [100.00%]

  <bb 4> :
  return;
which is essentially
void
foo ()
{
  int x = __atomic_load_4 ((int *) &f, __ATOMIC_RELAXED), y;
  float g;
  do
    {
      __builtin_memcpy (&g, &x, 4);
      g += 3.0f;
      __builtin_memcpy (&y, &g, 4);
    }
  while (!__atomic_compare_exchange_n ((int *) &f, &x, y, false,
__ATOMIC_RELAXED, __ATOMIC_RELAXED));
}
Can you explain how your proposed change would improve this?  It would just
slow it down and make it larger.

Reply via email to