On Fri, Apr 25, 2025 at 2:31 PM ywgrit via Gcc <gcc@gcc.gnu.org> wrote: > > I encountered one problem with loop-im pass. > I compiled the program dhry2reg which belongs to unixbench( > https://github.com/kdlucas/byte-unixbench). > > The gcc used > gcc (GCC) 12.3.0 > > The commands executed as following > make > ./Run -c -i 1 dhry2reg > > The results are shown below. > Dhrystone 2 using register variables 0.1 lps (10.0 s, 1 > samples) > > System Benchmarks Partial Index BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 0.1 0.0 > ======== > System Benchmarks Index Score (Partial Only) 10.0 > > Obviously, the "INDEX" is abnormal. > I wrote a demo named dhry.c based on the dhry2reg logic. > > // dhry.c > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <signal.h> > #include <unistd.h> > int run_index; > > > typedef struct record { > struct record *next_rec; > int i; > } record, *pointer; > > pointer global_pointer, next_global_pointer; > > > void report() { > printf("report:%d\n", run_index); > exit(0); > } > int main() { > printf("%d\n", run_index); > > global_pointer = (pointer )malloc(sizeof(struct record)); > next_global_pointer = (pointer )malloc(sizeof(struct record)); > global_pointer->next_rec = next_global_pointer; > > > signal(SIGALRM, report); > /* get the clock running */ > alarm(1); > char i[4]; > // no exit > for(run_index=0;;++run_index){ > *global_pointer->next_rec = *global_pointer; > } > } > > > gcc -O3 -fdump-tree-all -fdump-tree-all-graph dhry.c -o dhry > ./dhry > 0 > report:0 > > gcc -O3 -fdump-tree-all -fdump-tree-all-graph dhry.c -o dhry > -fno-tree-loop-im > ./dhry > 0 > report:1367490190 > > The generated gimple are shown below. > dhry.c.140t.laddress: > <bb 2> [local count: 10631108]: > run_index.1_1 = run_index; > printf ("%d\n", run_index.1_1); > _2 = malloc (16); > global_pointer = _2; > _3 = malloc (16); > next_global_pointer = _3; > MEM[(struct record *)_2].next_rec = _3; > signal (14, report); > alarm (1); > run_index = 0; > > <bb 3> [local count: 1073741824]: > global_pointer.4_4 = global_pointer; > _5 = global_pointer.4_4->next_rec; > *_5 = *global_pointer.4_4; > run_index.6_6 = run_index; > _7 = run_index.6_6 + 1; > run_index = _7; > goto <bb 3>; [100.00%] > > dhry.c.142t.lim2: > <bb 2> [local count: 10631108]: > run_index.1_1 = run_index; > printf ("%d\n", run_index.1_1); > _2 = malloc (16); > global_pointer = _2; > _3 = malloc (16); > next_global_pointer = _3; > MEM[(struct record *)_2].next_rec = _3; > signal (14, report); > alarm (1); > run_index = 0; > global_pointer.4_4 = global_pointer; > run_index_lsm.13_22 = run_index; > > <bb 3> [local count: 1073741824]: > # run_index_lsm.13_21 = PHI <run_index_lsm.13_22(2), > run_index_lsm.13_23(4)> > _5 = global_pointer.4_4->next_rec; > *_5 = *global_pointer.4_4; > run_index.6_6 = run_index_lsm.13_21; > _7 = run_index.6_6 + 1; > run_index_lsm.13_23 = _7; > > > > > In loop-im pass, store-motion insert run_index_lsm = run_index before loop > and replace all references of run_index with run_index_lsm. And the > following > code writes run_index_lsm back to run_index. > /* Materialize ordered store sequences on exits. */ > FOR_EACH_VEC_ELT (exits, i, e) > { > edge append_cond_position = NULL; > edge last_cond_fallthru = NULL; > if (i < sms.length ()) > { > gcc_assert (sms[i].first == e); > execute_sm_exit (loop, e, sms[i].second, aux_map, sm_ord, > append_cond_position, last_cond_fallthru); > sms[i].second.release (); > } > if (!unord_refs.is_empty ()) > execute_sm_exit (loop, e, unord_refs, aux_map, sm_unord, > append_cond_position, last_cond_fallthru); > /* Commit edge inserts here to preserve the order of stores > when an exit exits multiple loops. */ > gsi_commit_one_edge_insert (e, NULL); > } > > But run_index_lsm is not wrote back to run_index as there is no exit in > this loop. > so run_index will be zero after store motion is executed. > > Is inifinite loop a undefined behavior, so it is permitted if run_index == > 0? > If not, I think we should not apply store motion on loop with no exit.
GCC assumes you cannot observe the store since the loop is infinite. There's also -ffinite-loops which would inhibit such infinite loops and the C++ standard which does so by default. GCC will also assume the loop must terminate since otherwise the run_index increment will eventually overflow and trigger undefined behavior this way. So I don't think the testcase as written is a good motivating example. If it were written as while (1) { *global_pointer->next_rec = *global_pointer; } one might argue this is a QOI issue worth fixing. I'm not sure what drystone tries to measure here (load + store per second?) - it seems to me that with an optimizing compiler it would rather want to use volatile accesses here. Richard.