https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |i?86-*-* Status|UNCONFIRMED |NEW Keywords| |memory-hog Last reconfirmed| |2014-09-08 CC| |rth at gcc dot gnu.org Blocks| |47344 Ever confirmed|0 |1 Summary|32-bit gcc uses excessive |[4.8/4.9/5 Regression] |memory during dead store |32-bit gcc uses excessive |elimination with -fPIC |memory during dead store | |elimination with -fPIC Target Milestone|--- |4.8.4 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. Possibly excessive value_rtx expansion from dse.c:canon_address. The testcase is a function with a single basic-block and 30000 stores (the static initializer function) with the pattern D.94947 = (struct Z *) &Zs; D.94947->x1_ = &Xs1[0]; D.94947->x2_ = 1; D.94947->x3_ = 1; temp.20397 = D.94947 + 12; temp.20397->x1_ = &Xs90[0]; temp.20397->x2_ = 2; temp.20397->x3_ = 1; ... temp.30587 = temp.30586 + 12; temp.30587->x1_ = &Xs611[0]; temp.30587->x2_ = 2; temp.30587->x3_ = 1; thus groups of three stores followed by an address adjustment. The above is from a GCC 4.3 IL dump. The GCC 4.9 IL dump shows MEM[(struct Z *)&Zs].x1_ = &Xs1; MEM[(struct Z *)&Zs].x2_ = 1; MEM[(struct Z *)&Zs].x3_ = 1; MEM[(struct Z *)&Zs + 12B].x1_ = &Xs90; MEM[(struct Z *)&Zs + 12B].x2_ = 2; MEM[(struct Z *)&Zs + 12B].x3_ = 1; MEM[(struct Z *)&Zs + 24B].x1_ = &Xs91; MEM[(struct Z *)&Zs + 24B].x2_ = 2; MEM[(struct Z *)&Zs + 24B].x3_ = 1; ... MEM[(struct Z *)&Zs + 122292B].x1_ = &Xs611; MEM[(struct Z *)&Zs + 122292B].x2_ = 2; MEM[(struct Z *)&Zs + 122292B].x3_ = 1; which causes each store to be expanded via st like (insn 71298 71297 71299 2 (set (reg:SI 40822) (const:SI (unspec:SI [ (symbol_ref:SI ("_ZL2Zs") [flags 0x2] <var_decl 0x7ffff5c4a098 Zs>) ] UNSPEC_GOTOFF))) t.C:5 -1 (nil)) (insn 71299 71298 71300 2 (set (mem/c:SI (plus:SI (plus:SI (reg:SI 3 bx) (reg:SI 40822)) (const_int 122216 [0x1dd68])) [4 MEM[(struct Z *)&Zs + 122208B].x3_+0 S4 A64]) (const_int 1 [0x1])) t.C:5 -1 (nil)) I suppose "lowering" PIC addresses somewhere before RTL expansion (and CSEing the addresses) would help here. Lowering as in not treating them as is_gimple_min_invariant. With 4.3 we have a single address load for &Zs (but of course we retain the individual stored addresses loads - thus still very many PIC addresses in this function). Why is CSE not able to CSE the UNSPEC_GOTOFF addresses? Does it not do it because of the (const:SI ...) wrapping (as in, not profitable)? Or is it confused about the other intermediate UNSPEC_GOTOFF uses? That said, cse1 should be able to turn the RTL into sth equivalent to what 4.3 produced.