[Bug tree-optimization/98137] New: Could use SLP to vectorize if split_constant_offset were smarter

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 04 Dec 2020 01:48:32 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98137


            Bug ID: 98137
           Summary: Could use SLP to vectorize if split_constant_offset
                    were smarter
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

void
gemm_m10_n9_k17_ldA20_ldB20_ldC10_beta0_alignedA1_alignedC1_pfsigonly(const
double* __restrict__ A, const double* __restrict__ B, double* __restrict__ C,
const double* A_prefetch, const double* B_prefetch, const double* C_prefetch) {

  unsigned int l_m = 0;
  unsigned int l_n = 0;
  unsigned int l_k = 0;

  for ( l_n = 0; l_n < 9; l_n++ ) {
    for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; }

    for ( l_k = 0; l_k < 17; l_k++ ) {
      for ( l_m = 0; l_m < 10; l_m++ ) {
        C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k];
      }
    }
  }
}

is nicely vectorized with BB SLP when you make l_{m,n,k} signed but when
unsigned as above then split_constant_offset gives up when seeing

C + ((unsigned long)(_286 + 1) * 8)

but we even have nice range-info:

  # RANGE [0, 80] NONZERO 126
  _286 = l_n_189 * 10;
  # RANGE [0, 80] NONZERO 126
  _288 = (long unsigned int) _286;
  # RANGE [0, 640] NONZERO 1008
  _289 = _288 * 8;
  # PT = null { D.2428 } (nonlocal, restrict)
  _290 = C_37(D) + _289;

^^ C + ((unsigned long)(_286) * 8)

  # RANGE [1, 81] NONZERO 127
  _296 = _286 + 1;
  # RANGE [1, 81] NONZERO 127
  _297 = (long unsigned int) _296;
  # RANGE [8, 648] NONZERO 1016
  _298 = _297 * 8;
  # PT = { D.2428 } (nonlocal, restrict)
  _299 = C_37(D) + _298;

^^ C + ((unsigned long)(_286 + 1) * 8

giving up means DR group analysis doesn't relate them and we do not consider
SLP vectorization.

[Bug tree-optimization/98137] New: Could use SLP to vectorize if split_constant_offset were smarter

Reply via email to