https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120141
--- Comment #3 from Wojciech Mula <wojciech_mula at poczta dot onet.pl> --- Thank you for looking at this issue! I'm not going argue, but let me show the perspective of a programmer who wrote whole lot of x86 intrinsics and now use RVV ones. > The consensus was this is just how the intrinsics work. This is not true for intrinsics from the x86 world (I'm not sure about AltiVec ones). And this is why I asked why RVV ones do not behave similarly to x86 functions. Consider this example: ---test.c--- #include <tuple> #include <iostream> #include <immintrin.h> __m128i shift_by_zero(__m128i x) { return _mm_srli_epi32(x, 0); } __m128i add_zero(__m128i x) { return _mm_add_epi32(x, _mm_setzero_si128()); } __m128i mul_zero(__m128i x) { return _mm_mul_epi32(x, _mm_setzero_si128()); } ---eof--- It is compiled by `gcc -std=c++20 -march=tigerlake -O3` into (https://godbolt.org/z/PxaGcexn9): "shift_by_zero(long long __vector(2))": ret "add_zero(long long __vector(2))": ret "mul_zero(long long __vector(2))": vpxor xmm0, xmm0, xmm0 ret > The intrinsic interface is working as designed. If you want to avoid nop > codes, then don't pass arguments that result in nop operations to the > intrinsics interfaces. The problem is that you don't always write intrinsics directly. In C++ programs we use templates. For example, `_mm_srli_epi32` mentioned above accepts only a compile-time constant, this in C++ you'd have a template like: template <size_t K> __m128 shift_right_epi32(__m128i x) { return _mm_srli_epi32(x, K); } If we cannot assume that a compiler will simplify `shift_right_epi32<0>`, then the implementation of template must be aware of that special case: template <size_t K> __m128 shift_right_epi32(__m128i x) { if constexpr (K == 0) { return x; } else if constexpr (K >= 32) { return _mm_setzero_si128(); } else { return _mm_srli_epi32(x, K); } } It's obviously doable, but adds more burden on the programmer's side.