On Thu, Nov 03, 2016 at 09:29:07AM +0100, Richard Biener wrote: > On Wed, Nov 2, 2016 at 4:27 PM, Segher Boessenkool > <seg...@kernel.crashing.org> wrote: > > On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote: > >> >> I don't believe it needs a cleanup on every iteration. One cleanup at > >> >> the end should work fine. > >> > > >> > But as the comment there says: > >> > > >> > /* Merge the duplicated blocks into predecessors, when possible. > >> > */ > >> > cleanup_cfg (0); > >> > > >> > (this is not a new comment), and without merging blocks this whole > >> > patch does zilch. > >> > > >> > There is no need to create new basic blocks at all (can replace the > >> > final branch in a block with a copy of the whole block it jumps to, > >> > instead); and then it is painfully obvious that switching to layout > >> > mode here is pointless, too. > >> > > >> > Do you want me to do that? > >> > > >> > Btw, this isn't quadratic anyway; it is a constant number (the maximal > >> > length allowed of the block to copy) linear. Unless there are > >> > unboundedly > >> > many forwarder blocks, which there shouldn't be, but I cannot prove that. > >> > >> Well, you iterate calling functions (find candidates, merge, cleanup-cfg) > >> that > >> walk the whole function. So unless the number of iterations is bound > >> with a constant I call this quadratic (in function size). > > > > It is bounded (with that caveat above): new candidates will be bigger than > > the block merged into it, so there won't be more than > > PARAM_MAX_GOTO_DUPLICATION_INSNS passes. > > > > But I can make it all work simpler, in non-layout mode, if you prefer. > > Iteration is fine but we should restrict where we look for new > candidates. Likewise > block merging opportunities should be easier to exploit than by > running cfg-cleanup...
Yes, but that was what the original code does already, so I didn't look deeper. "It was just a simple bugfix". > I'm just thinking of those gigantic machine-generated state machines where we > ATM do quite well (at least at -O1 ...) with respect to compile-time. Like I said, I tested it on a 2000 node VM, very artificial so almost no code except the threading itself, and the compgotos pass takes less than 1% time both before and after the patch. I ony tested at -O2 though. I'll get back to it, but first I have some things that need handling during stage 1. Segher