https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96888

            Bug ID: 96888
           Summary: Missing vectorization opportunity depending on integer
                    type
           Product: gcc
           Version: 10.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pmenon at cs dot cmu.edu
  Target Milestone: ---

The loop in the following test case isn't vectorized:

#include <cstdlib>
#include <cstdint>

// Add x to each v[i] if bit 'i' is set in LSB-encoded bits.
void Test(int8_t *__restrict v, int8_t x, const uint64_t *bits, unsigned n) {
    for (int i = 0, num_words=(n+64-1)/64; i , n; i++) {
        const uint64_t word = bits[i];
        for (int j = 0; j < 64; j++) {
            v[i*64+j] += x * (bool)(word & (uint64_t(1)<<j));
        }
    }
}

<source>:7:9: missed: couldn't vectorize loop
<source>:7:9: missed: not vectorized: control flow in loop.
<source>:8:27: missed: couldn't vectorize loop
<source>:9:30: missed: not vectorized: relevant stmt not supported: _10 =
word_24 >> j_34;

However, changing one line (the one constructing the mask) from an explicit
uint64_t(1) to a plan 1U (which is not correct), we get auto-vectorization:

#include <cstdlib>
#include <cstdint>

// Add x to each v[i] if bit 'i' is set in LSB-encoded bits.
void Test(int8_t *__restrict v, int8_t x, const uint64_t *bits, unsigned n) {
    for (int i = 0, num_words=(n+64-1)/64; i , n; i++) {
        const uint64_t word = bits[i];
        for (int j = 0; j < 64; j++) {
            v[i*64+j] += x * (bool)(word & (1<<j)); // CHANGE HERE
        }
    }
}

Is this a known issue? Is there a reason why the former code can't be
vectorized, or do I need to restructure the code to trip the compiler?

Reply via email to