https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369
--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Marcus Kool from comment #8) > Can you confirm that the code has > return __builtin_ctzl(v); __inline__ long find_pos32( unsigned char ch, mycharset32 set ) { __m256i regchx256; __m256i regset256; long v; regchx256 = _mm256_set1_epi8( ch ); regset256 = _mm256_loadu_si256( (__m256i const *) set ); v = (unsigned int) _mm256_movemask_epi8( _mm256_cmpeq_epi8(regchx256,regset256) ); if (v != 0L) return (long) __builtin_ctzl( v ); return -1; } > Thanks for the patch, but the required cast to unsigned int is > counter-intuitive and it is likely that nobody will use this cast in their > code and hence miss the optimisation. Isn't there a more elegant solution? No.