https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103721
Bug ID: 103721 Summary: [12 regression] wrong code generated for loop with conditional (jump threading?) Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: s...@li-snyder.org Target Milestone: --- hi - With a recent checkout of gcc12 (20211214), on a x86_64-pc-linux-gnu host, the following source is miscompiled with -O2: -- x.cc -------------------------------------------------- bool flag(); const int* bar(); const int* f (const int* world) { const int* searchVolume = world; const int* currentVolume = nullptr; while (currentVolume != searchVolume && searchVolume) { currentVolume = searchVolume; if (flag()) searchVolume = bar(); } return (currentVolume); } ---------------------------------------------------------- This can be demonstrated with this test harness: -- test.cc ------------------------------------------------- #include <stdio.h> const int* f (const int* world); bool flag() { return true; } int ii[3] = {0}; int ipos = 0; const int *bar() { if (ipos == 2) return nullptr; return &ii[++ipos]; } int main() { const int* i = f (&ii[0]); printf ("%d\n", i - &ii[0]); return 0; } ------------------------------------------------------------ I expect this to print 2, which it does when compiled with -O0 or O1. But with -O2: $ g++ -c -O2 x.cc -o x.o $ g++ -o test test.cc x.o $ ./test 0 Here is the generated code for the function f (directives removed for readability): _Z1fPKi: .LFB0: pushq %rbx movq %rdi, %rbx testq %rdi, %rdi je .L2 call _Z4flagv testb %al, %al jne .L10 .L2: movq %rbx, %rax popq %rbx ret .L10: call _Z3barv movq %rbx, %rax popq %rbx ret As one can see, in the generated code, the loop is not executed more than once. It appears as if the fact that searchVolume can be changed within the if statement is being lost --- if the conditional is commented out, then things work as expected. Things seem to go bad by at least the threadfull1 pass. In the previous pass, 110t.mergephi2, we have: ----------------------------------------------------------------- ;; Function f (_Z1fPKi, funcdef_no=0, decl_uid=2370, cgraph_uid=1, symbol_order=0) Removing basic block 6 const int * f (const int * world) { const int * currentVolume; const int * searchVolume; bool _1; bool _2; bool _3; bool _14; const int * _16; <bb 2> [local count: 118111600]: goto <bb 8>; [100.00%] <bb 3> [local count: 955630225]: _14 = flag (); if (_14 != 0) goto <bb 4>; [33.00%] else goto <bb 8>; [67.00%] <bb 4> [local count: 315357972]: _16 = bar (); <bb 8> [local count: 1073741824]: # searchVolume_4 = PHI <_16(4), world_7(D)(2), searchVolume_4(3)> # currentVolume_5 = PHI <searchVolume_4(4), 0B(2), searchVolume_4(3)> _1 = searchVolume_4 != currentVolume_5; _2 = searchVolume_4 != 0B; _3 = _1 & _2; if (_3 != 0) goto <bb 3>; [89.00%] else goto <bb 7>; [11.00%] <bb 7> [local count: 118111600]: return currentVolume_5; } ----------------------------------------------------------------- which does seem to represent the loop correctly. But in 111t.threadfull1, this has been changed to: ------------------------------------------------------------------ ;; Function f (_Z1fPKi, funcdef_no=0, decl_uid=2370, cgraph_uid=1, symbol_order=0) Disambiguating loop 1 with multiple latches Merged latch edges of loop 1 ;; 2 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 3 4 8 9 7 ;; ;; Loop 1 ;; header 9, latch 8 ;; depth 1, outer 0 ;; nodes: 9 8 4 3 ;; 2 succs { 9 } ;; 3 succs { 4 8 } ;; 4 succs { 8 } ;; 8 succs { 9 } ;; 9 succs { 3 7 } ;; 7 succs { 1 } Jump threading proved probability of edge 9->7 too small (it is 11.0% (guessed) should be always (guessed)) SSA replacement table N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j searchVolume_12 -> { searchVolume_4 } currentVolume_17 -> { currentVolume_5 } .MEM_18 -> { .MEM_6 } _19 -> { _1 } _20 -> { _2 } _21 -> { _3 } currentVolume_22 -> { currentVolume_5 } .MEM_23 -> { .MEM_6 } Incremental SSA update started at block: 9 Number of blocks in CFG: 11 Number of blocks to update: 6 ( 55%) Merging blocks 2 and 9 Merging blocks 8 and 10 const int * f (const int * world) { const int * currentVolume; const int * searchVolume; bool _1; bool _2; bool _3; bool _14; const int * _16; bool _19; bool _20; bool _21; <bb 2> [local count: 118111600]: _1 = world_7(D) != 0B; _2 = world_7(D) != 0B; _3 = _1 & _2; if (_3 != 0) goto <bb 3>; [97.00%] else goto <bb 6>; [3.00%] <bb 3> [local count: 955630225]: _14 = flag (); if (_14 != 0) goto <bb 4>; [33.00%] else goto <bb 5>; [67.00%] <bb 4> [local count: 315357972]: _16 = bar (); <bb 5> [local count: 955630224]: # searchVolume_11 = PHI <_16(4), world_7(D)(3)> # currentVolume_8 = PHI <world_7(D)(4), world_7(D)(3)> _19 = currentVolume_8 != searchVolume_11; _20 = searchVolume_11 != 0B; _21 = _19 & _20; <bb 6> [local count: 118111600]: # currentVolume_22 = PHI <0B(2), currentVolume_8(5)> return currentVolume_22; } ------------------------------------------------------------------ which has no backwards branch.