https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100973
Bug ID: 100973 Summary: gcc does not optimise based on knowing that `_mm256_movemask_ps` returns less than 255 Product: gcc Version: og10 (devel/omp/gcc-10) Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: denis.yaroshevskij at gmail dot com Target Milestone: --- Options: -O3 -std=c++20 -DNDEBUG -mavx Code: ``` #include <immintrin.h> int masking_should_evaporate(__m256 values) { int top_bits = _mm256_movemask_ps(values); top_bits &= 255; return top_bits; } ``` Godbolt: https://gcc.godbolt.org/z/a81qPWcon For this code top_bits &= 255 does not actually do anything. Clang can optimise based on that: ``` vmovmskps eax, ymm0 vzeroupper ret ``` It comes from real code.