https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mliska at suse dot cz, | |segher at gcc dot gnu.org --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- The first "bisection" possibly points at r190594 which limited the work FRE does (with the effect of leaving some things unoptimized) More precise bisection would be appreciated. I see with GCC 7.1 and -O1 (recommended for machine-generated code) a use of 3.7GB of ram. The code contains a very large basic-block. I do remember compile-time/memory-hog PRs for this code style. compile-time analysis using perf highlights: Samples: 680K of event 'cycles:pp', Event count (approx.): 607299351135 Overhead Command Shared Object Symbol ◆ 26.58% f951 f951 [.] refers_to_regno_p ▒ 9.64% f951 f951 [.] reg_overlap_mentioned_p ▒ 7.08% f951 f951 [.] find_hard_regno_for_1 ▒ 4.42% f951 f951 [.] reg_used_between_p ▒ 1.90% f951 f951 [.] get_last_value_validate which probably means we're doing some quadratic amount of work on use->def chains inside the BB. With call traces: + 48.54% 1.49% f951 f951 [.] try_combine - 25.59% 25.55% f951 f951 [.] refers_to_regno_p ▒ - refers_to_regno_p ▒ - 16.98% reg_overlap_mentioned_p ▒ - 16.00% reg_used_between_p ▒ can_combine_p ▒ try_combine ... - 4.69% refers_to_regno_p ▒ - 4.66% reg_overlap_mentioned_p ▒ - 4.36% reg_used_between_p ▒ can_combine_p ▒ try_combine so it's combine (at least at -O1) and I can also imagine that's using up the memory in its attempts to simplify & match up stuff as it uses GC memory for all the copying that involves IIRC. Segher? int reg_used_between_p (const_rtx reg, const rtx_insn *from_insn, const rtx_insn *to_insn) { rtx_insn *insn; if (from_insn == to_insn) return 0; for (insn = NEXT_INSN (from_insn); insn != to_insn; insn = NEXT_INSN (insn)) if (NONDEBUG_INSN_P (insn) && (reg_overlap_mentioned_p (reg, PATTERN (insn)) || (CALL_P (insn) && find_reg_fusage (insn, USE, reg)))) return 1; return 0; } so that just walks the BB instead of, say, using DF uses (if available during combine), or somehow recording "distance" between two rtx_insns to be able to cap the amount of work done (and conservatively return true). After all it's going to end up combining very "distant" instructions here (remember, gigantic basic-block).