https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108164
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- Hmm, it's correct. short __attribute__((noipa)) foo(short f) { while (f >= -1) f++; return f; } int main () { if (foo (-1) != -32768) __builtin_abort (); return 0; } shows exactly the same vectorization (-O3 -fno-vect-cost-model --param vect-epilogues-nomask=0). With the testcase in the description thread2 performs some threading which isn't performed on this testcase though and that's a trigger. -fdbg-cnt=registered_jump_thread:3-4 triggers it (3-3 and 4-4 is broken as well). The difference between -fdbg-cnt=registered_jump_thread:3-3 (borken) and -fdisable-tree-thread2 (OK) is --- a/a-t.c.254t.optimized 2022-12-19 13:43:00.654410480 +0100 +++ b/a-t.c.254t.optimized 2022-12-19 13:43:08.818523519 +0100 @@ -125,7 +125,7 @@ <bb 4> [local count: 118111600]: # RANGE [irange] short int [-INF, -2] - # f_34 = PHI <-32768(3), f_36(5)> + # f_34 = PHI <-32767(3), f_36(5)> # RANGE [irange] int [-2147483647, 1] _4 = c.3_31 + 1; if (_4 != 1) this difference appears at a-t.c.196t.dom3 which follows thread2. We enter dom3 with <bb 15> [local count: 105119324]: # f_71 = PHI <f_87(14), _50(6), f_26(8), f_40(9), f_6(10), f_46(11), f_104(12), f_108(13)> <bb 16> [local count: 118111600]: # RANGE [irange] short int [-INF, -2] # f_34 = PHI <f_71(15), f_36(17)> and the dom3 dump has things like Optimizing block #9 LKUP STMT f.1_96 = PHI <f.1_60, 32767> 2>>> STMT f.1_96 = PHI <f.1_60, 32767> <<<< STMT f.1_96 = PHI <f.1_60, 32767> Optimizing statement _9 = f.1_96 + 2; Replaced 'f.1_96' with constant '32767' gimple_simplified to _9 = 32769; Folded to: _9 = 32769; _9 : global value re-evaluated to [irange] UNDEFINED LKUP STMT _9 = 32769 ==== ASGN _9 = 32769 Optimizing statement f_40 = (short int) _9; Replaced '_9' with constant '32769' gimple_simplified to f_40 = -32767; Folded to: f_40 = -32767; f_40 : global value re-evaluated to [irange] UNDEFINED LKUP STMT f_40 = -32767 Something goes wrong here. For example for _9 = 32769; we have [irange] unsigned short [1, 32768] as global range and gimple_ranger::update_stmt will update that to UNDEFINED That bogus value comes from cprop_into_successor_phis where we have a SSA_NAME_VALUE of -32767 recorded for f_71. The only place I see is 0>>> COPY f_71 = -32767 0>>> COPY f_34 = -32767 LKUP STMT _4 = c.3_31 plus_expr 1 LKUP STMT _4 ne_expr 1 Registering killing_def (path_oracle) _4 Registering value_relation (path_oracle) (_4 > c.3_31) (root: bb9) <<<< COPY f_34 = -32767 <<<< COPY f_71 = -32767 but as you can see we revert that again. The value pops in again from record_equivalences_from_phis when visiting BB 15 via /* If we managed to iterate through each PHI alternative without breaking out of the loop, then we have a PHI which may create a useful equivalence. We do not need to record unwind data for this, since this is a true assignment and not an equivalence inferred from a comparison. All uses of this ssa name are dominated by this assignment, so unwinding just costs time and space. */ if (i == gimple_phi_num_args (phi)) { if (may_propagate_copy (lhs, rhs)) set_ssa_name_value (lhs, rhs); because just one edge is marked EDGE_EXECUTABLE (9 -> 15). That means the value computed in BB9 is wrong. That's exactly that with the UNDEFINED global range result. I _think_ what may go wrong is that we emit <bb 6> [local count: 94607391]: _50 = BIT_FIELD_REF <vect_f_27.24_52, 16, 112>; niters_vector_mult_vf.19_74 = bnd.18_75 << 3; _72 = (short int) niters_vector_mult_vf.19_74; tmp.20_73 = f_36 + _72; if (niters_vector_mult_vf.19_74 == niters.17_100) goto <bb 15>; [12.50%] else goto <bb 7>; [87.50%] <bb 7> [local count: 82781467]: # f_92 = PHI <tmp.20_73(6)> # RANGE [irange] unsigned short [0, 32767][+INF, +INF] f.1_98 = (unsigned short) f_92; see how we replace the final value with something computed in signed arithmetic. This is also visible in my shorter testcase. I have a patch.