https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117830

--- Comment #5 from Christoph Müllner <cmuellner at gcc dot gnu.org> ---
Thank you for reporting this!

I can reproduce this issue on x86_64 (I did not test on other architectures).
I have also confirmed that the suspected change (1c4d39ada33d) causes this
by validating that reverting the change fixes the miscompare.

An initial analysis showed that we have a total of four blends in CPU2006's
h264:
* 3x build_base_gcc43-64bit.0000/block.c.213t.forwprop4
* 1x build_base_gcc43-64bit.0000/macroblock.c.213t.forwprop4

Looking closer at the dump files of forwprop, the issue becomes apparent:
In find_sad_16x16 (macroblock.c), we merge two sequences that both utilize
three of four lanes.

  _230 = VEC_PERM_EXPR <vect__102.3189_302, vect__102.3189_302, { 2, 3, 2, 2
}>;
  _238 = VEC_PERM_EXPR <vect__102.3189_302, vect__102.3189_302, { 1, 0, 1, 1
}>;
  vect__108.3193_321 = _238 - _230;
  vect__107.3192_225 = _230 + _238;
  _317 = VEC_PERM_EXPR <vect__107.3192_225, vect__108.3193_321, { 0, 5, 2, 7
}>;
  // { 0, 5, 2, 7 } could be narrowed to { 0, 5, 0, 4 }

  _263 = VEC_PERM_EXPR <vect__102.3189_302, vect__102.3189_302, { 3, 2, 3, 3
}>;
  _294 = VEC_PERM_EXPR <vect__102.3189_302, vect__102.3189_302, { 0, 1, 0, 0
}>;
  vect__109.3191_252 = _294 - _263;
  vect__104.3190_257 = _263 + _294;
  _247 = VEC_PERM_EXPR <vect__104.3190_257, vect__109.3191_252, { 0, 5, 2, 7
}>;
  // { 0, 5, 2, 7 } could be narrowed to { 0, 5, 0, 4 }

This means the check if we utilize less than half of the lanes in a sequence is
wrong.
Looking into the code shows that this is indeed the case.
I already have a fix that is currently being tested.

Reply via email to