https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117830
--- Comment #5 from Christoph Müllner <cmuellner at gcc dot gnu.org> --- Thank you for reporting this! I can reproduce this issue on x86_64 (I did not test on other architectures). I have also confirmed that the suspected change (1c4d39ada33d) causes this by validating that reverting the change fixes the miscompare. An initial analysis showed that we have a total of four blends in CPU2006's h264: * 3x build_base_gcc43-64bit.0000/block.c.213t.forwprop4 * 1x build_base_gcc43-64bit.0000/macroblock.c.213t.forwprop4 Looking closer at the dump files of forwprop, the issue becomes apparent: In find_sad_16x16 (macroblock.c), we merge two sequences that both utilize three of four lanes. _230 = VEC_PERM_EXPR <vect__102.3189_302, vect__102.3189_302, { 2, 3, 2, 2 }>; _238 = VEC_PERM_EXPR <vect__102.3189_302, vect__102.3189_302, { 1, 0, 1, 1 }>; vect__108.3193_321 = _238 - _230; vect__107.3192_225 = _230 + _238; _317 = VEC_PERM_EXPR <vect__107.3192_225, vect__108.3193_321, { 0, 5, 2, 7 }>; // { 0, 5, 2, 7 } could be narrowed to { 0, 5, 0, 4 } _263 = VEC_PERM_EXPR <vect__102.3189_302, vect__102.3189_302, { 3, 2, 3, 3 }>; _294 = VEC_PERM_EXPR <vect__102.3189_302, vect__102.3189_302, { 0, 1, 0, 0 }>; vect__109.3191_252 = _294 - _263; vect__104.3190_257 = _263 + _294; _247 = VEC_PERM_EXPR <vect__104.3190_257, vect__109.3191_252, { 0, 5, 2, 7 }>; // { 0, 5, 2, 7 } could be narrowed to { 0, 5, 0, 4 } This means the check if we utilize less than half of the lanes in a sequence is wrong. Looking into the code shows that this is indeed the case. I already have a fix that is currently being tested.