https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Mel Chen from comment #8) > Sorry for using the bad example to describe the problem I am facing. Let me > clarify my question with a more precise example. > > void array_mul(int N, int *C, short *A, short *B) { > int i, j; > for (i = 0; i < N; i++) { > C[i] = 0; // Will be transformed to __builtin_memset > for (j = 0; j < N; j++) { > C[i] += (int)A[i * N + j] * (int)B[j]; > } > } > } > > If I compile the case with -O2 -fno-tree-loop-distribute-patterns, the store > operation 'C[i] = 0' can be eliminated by dead store elimination (dse3). But > without -fno-tree-loop-distribute-patterns, it will be transformed to memset > by loop distribution (ldist) because ldist executes before dse3. Finally the > memset will not be eliminated. > > Another point is if there are other operations in the same level loop as the > store operation, is it really beneficial to do loop distribution and then > convert to builtin function? Sure, it shows a cost modeling issue given that usually loop distribution merges partitions which touch the same memory stream (but IIRC maybe only for loads). But more to the point we're missing to eliminate the dead store which should be appearant at least after PRE - LIM2 applied store motion but only PRE elides the resulting load of C[i]. Usually DCE and DSE come in pairs but after PRE we have DCE, CDDCE w/o accompaning DSE only with the next DSE only happening after loop distribution. Which means we should eventually do diff --git a/gcc/passes.def b/gcc/passes.def index e9ed3c7bc57..be3a9becde0 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -254,6 +254,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_sancov); NEXT_PASS (pass_asan); NEXT_PASS (pass_tsan); + NEXT_PASS (pass_dse); NEXT_PASS (pass_dce); /* Pass group that runs when 1) enabled, 2) there are loops in the function. Make sure to run pass_fix_loops before