https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96888
Bug ID: 96888 Summary: Missing vectorization opportunity depending on integer type Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pmenon at cs dot cmu.edu Target Milestone: --- The loop in the following test case isn't vectorized: #include <cstdlib> #include <cstdint> // Add x to each v[i] if bit 'i' is set in LSB-encoded bits. void Test(int8_t *__restrict v, int8_t x, const uint64_t *bits, unsigned n) { for (int i = 0, num_words=(n+64-1)/64; i , n; i++) { const uint64_t word = bits[i]; for (int j = 0; j < 64; j++) { v[i*64+j] += x * (bool)(word & (uint64_t(1)<<j)); } } } <source>:7:9: missed: couldn't vectorize loop <source>:7:9: missed: not vectorized: control flow in loop. <source>:8:27: missed: couldn't vectorize loop <source>:9:30: missed: not vectorized: relevant stmt not supported: _10 = word_24 >> j_34; However, changing one line (the one constructing the mask) from an explicit uint64_t(1) to a plan 1U (which is not correct), we get auto-vectorization: #include <cstdlib> #include <cstdint> // Add x to each v[i] if bit 'i' is set in LSB-encoded bits. void Test(int8_t *__restrict v, int8_t x, const uint64_t *bits, unsigned n) { for (int i = 0, num_words=(n+64-1)/64; i , n; i++) { const uint64_t word = bits[i]; for (int j = 0; j < 64; j++) { v[i*64+j] += x * (bool)(word & (1<<j)); // CHANGE HERE } } } Is this a known issue? Is there a reason why the former code can't be vectorized, or do I need to restructure the code to trip the compiler?