https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201
--- Comment #24 from Joel Yliluoma <bisqwit at iki dot fi> --- The simple horizontal 8-bit add seems to work nicely. Very nice work. However, the original bug report — that the code snippet quoted below no longer receives love from the SIMD optimization unless you explicitly say “pragma #omp simd” — seems still unaddressed. #define num_words 2 typedef unsigned long long E; E bytes[num_words]; unsigned char sum() { E b[num_words] = {}; //#pragma omp simd for(unsigned n=0; n<num_words; ++n) { // Calculate the sum of all bytes in a word E temp = bytes[n]; temp += (temp >> 32); temp += (temp >> 16); temp += (temp >> 8); // Save that number in an array b[n] = temp; } // Calculate sum of those sums unsigned char result = 0; //#pragma omp simd for(unsigned n=0; n<num_words; ++n) result += b[n]; return result; } Compiler Explorer link: https://godbolt.org/z/XL3cIK