On Thu, Nov 3, 2016 at 4:46 PM, Segher Boessenkool <seg...@kernel.crashing.org> wrote: > On Thu, Nov 03, 2016 at 09:29:07AM +0100, Richard Biener wrote: >> On Wed, Nov 2, 2016 at 4:27 PM, Segher Boessenkool >> <seg...@kernel.crashing.org> wrote: >> > On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote: >> >> >> I don't believe it needs a cleanup on every iteration. One cleanup at >> >> >> the end should work fine. >> >> > >> >> > But as the comment there says: >> >> > >> >> > /* Merge the duplicated blocks into predecessors, when possible. >> >> > */ >> >> > cleanup_cfg (0); >> >> > >> >> > (this is not a new comment), and without merging blocks this whole >> >> > patch does zilch. >> >> > >> >> > There is no need to create new basic blocks at all (can replace the >> >> > final branch in a block with a copy of the whole block it jumps to, >> >> > instead); and then it is painfully obvious that switching to layout >> >> > mode here is pointless, too. >> >> > >> >> > Do you want me to do that? >> >> > >> >> > Btw, this isn't quadratic anyway; it is a constant number (the maximal >> >> > length allowed of the block to copy) linear. Unless there are >> >> > unboundedly >> >> > many forwarder blocks, which there shouldn't be, but I cannot prove >> >> > that. >> >> >> >> Well, you iterate calling functions (find candidates, merge, cleanup-cfg) >> >> that >> >> walk the whole function. So unless the number of iterations is bound >> >> with a constant I call this quadratic (in function size). >> > >> > It is bounded (with that caveat above): new candidates will be bigger than >> > the block merged into it, so there won't be more than >> > PARAM_MAX_GOTO_DUPLICATION_INSNS passes. >> > >> > But I can make it all work simpler, in non-layout mode, if you prefer. >> >> Iteration is fine but we should restrict where we look for new >> candidates. Likewise >> block merging opportunities should be easier to exploit than by >> running cfg-cleanup... > > Yes, but that was what the original code does already, so I didn't look > deeper. "It was just a simple bugfix". > >> I'm just thinking of those gigantic machine-generated state machines where we >> ATM do quite well (at least at -O1 ...) with respect to compile-time. > > Like I said, I tested it on a 2000 node VM, very artificial so almost > no code except the threading itself, and the compgotos pass takes less > than 1% time both before and after the patch. I ony tested at -O2 though.
I see. > I'll get back to it, but first I have some things that need handling during > stage 1. Fair enough. Don't consider my comments blocking the patch. Richard. > > Segher