> On Mon, Oct 8, 2012 at 11:04 AM, Jan Hubicka <hubi...@ucw.cz> wrote: > >> On Mon, Oct 8, 2012 at 4:50 AM, Dehao Chen <de...@google.com> wrote: > >> > Attached is the updated patch. Yes, if we add a VRP pass before > >> > profile pass, this patch would be unnecessary. Should we add a VRP > >> > pass? > >> > >> No, we don't want VRP in early optimizations. > > > > I am not quite sure about that. VRP > > 1) makes branch prediction work better by doing jump threading early > > Well ... but jump threading may need basic-block duplication which may > increase code size. Also VRP and FRE have pass ordering issues. > > > 2) is, after FRE, most effective tree pass on removing code by my profile > > statistics. > > We also don't have DSE in early opts. I don't want to end up with the > situation that we do everything in early opts ... we should do _less_ there > (but eventually iterate properly when processing cycles).
Yep, i am not quite sure about most sane variant. Missed simple jump threading in early opts definitely confuse both profile estimate and inline size estimates. But I am also not thrilled by adding more passes to early opts at all. Also last time I looked into this, CCP missed a lot of CCP oppurtunities making VRP to artifically look like more useful. Have patch that bit improves profile updating after jump threading (i.e. re-does the profile for simple cases), but still jump threading is the most common case for profile become inconsistent after expand. On a related note, with -fprofile-report I can easilly track how much of code each pass in the queue removed. I was thinking about running this on Mozilla and -O1 and removing those passes that did almost nothing. Those are mostly re-run passes, both at Gimple and RTL level. Our passmanager is not terribly friendly for controlling pass per-repetition. With introduction of -Og pass queue, do you think introducing -O1 pass queue for late tree passes (that will be quite short) is sane? What about RTL level? I guess we can split the queues for RTL optimizations, too. All optimizations passes prior register allocation are sort of optional and I guess there are also -Og candidates. I hoever find the 3 times duplicated queues bit uncool, too, but I guess it is most compatible with PM organization. At -O3 the most effective passes on combine.c are: cfg (because of cfg cleanup) -1.5474% Early inlning -0.4991% FRE -7.9369% VRP -0.9321% (if run early), ccp does -0.2273% tailr -0.5305% After IPA copyrename -2.2850% (it packs cleanups after inlining) forwprop -0.5432% VRP -0.9700% (if rerun after early passes, otherwise it is about 2%) PRE -2.4123% DOM -0.5182% RTL passes into_cfglayout -3.1400% (i.e. first cleanup_cfg) fwprop1 -3.0467% cprop -2.7786% combine -3.3346% IRA -3.4912% (i.e. the cost model preffers hard regs) bbro -0.9765% The numbers on tramp3d and LTO cc1 binary and not that different. Honza