[Bug tree-optimization/80844] OpenMP SIMD doesn't know how to efficiently zero a vector (its stores zeros and reloads)

rguenth at gcc dot gnu.org Tue, 23 May 2017 00:10:01 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80844


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2017-05-23
                 CC|                            |jakub at gcc dot gnu.org
          Component|target                      |tree-optimization
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Uh.  .optimized:

float sumfloat_omp(const float*) (const float * arr)
{
  unsigned long ivtmp.22;
  vector(8) float D__lsm0.19;
  const vector(8) float vect__23.18;
  const vector(8) float vect__4.16;
  float stmp_sum_19.12;
  vector(8) float vect__18.10;
  float D.2841[8];
  vector(8) float _10;
  void * _77;
  unsigned long _97;

  <bb 2> [1.00%]:
  arr_13 = arr_12(D);
  __builtin_memset (&D.2841, 0, 32);
  _10 = MEM[(float *)&D.2841];
  ivtmp.22_78 = (unsigned long) arr_13;
  _97 = ivtmp.22_78 + 4096;

...

  <bb 4> [1.00%]:
  MEM[(float *)&D.2841] = vect__23.18_58;
  vect__18.10_79 = MEM[(float *)&D.2841];
  stmp_sum_19.12_50 = [reduc_plus_expr] vect__18.10_79;
  return stmp_sum_19.12_50;


well, that explains it ;)  Coming from

  <bb 7> [99.00%]:
  # i_33 = PHI <i_25(8), 0(6)>
  # ivtmp_35 = PHI <ivtmp_28(8), 1024(6)>
  _21 = GOMP_SIMD_LANE (simduid.0_14(D));
  _1 = (long unsigned int) i_33;
  _2 = _1 * 4;
  _3 = arr_13 + _2;
  _4 = *_3;
  _22 = D.2841[_21];
  _23 = _4 + _22;
  D.2841[_21] = _23;
  i_25 = i_33 + 1;
  ivtmp_28 = ivtmp_35 - 1;
  if (ivtmp_28 != 0)
    goto <bb 8>; [98.99%]

so we perform the reduction in memory, then LIM performs store-motion on it
but the memset isn't inlined early enough to rewrite the decl into SSA
(CCP from GOMP_SIMD_VF is missing).  In DOM we have

  __builtin_memset (&D.2841, 0, 32);
  _10 = MEM[(float *)&D.2841];

so we do not fold that.

If OMP SIMD always zeros the vector then it could also emit the maybe easier
to optimize

  WITH_SIZE_EXPR<_3, D.2841> = {};

of course gimple_fold_builtin_memset should simply be improved to optimize
now constant-size memset to = {}.

I'll have a look.

[Bug tree-optimization/80844] OpenMP SIMD doesn't know how to efficiently zero a vector (its stores zeros and reloads)

Reply via email to