On Fri, 2012-02-10 at 13:02 -0800, Andrew Pinski wrote:
> On Fri, Feb 10, 2012 at 12:46 PM, Michael Meissner
> <meiss...@linux.vnet.ibm.com> wrote:
> > I was looking at the routelookup EEMBC benchmark and it has code of the 
> > form:
> >
> >   while ( this_node->cmpbit > next_node->cmpbit )
> >    {
> >      this_node = next_node;
> >
> >      if ( proute->dst_addr & (0x1 << this_node->cmpbit) )
> >         next_node = this_node->rlink;
> >      else
> >         next_node = this_node->llink;
> >    }
> 
> Hmm, this looks like we could do this on the tree level better as we
> have more information about this_node there.  Like we know that we
> load from this_node->cmpbit before we do either of the branches.  So
> can move both of those loads before the branch and then we get the
> ifcvt for free.
> 
> Thanks,
> Andrew Pinski
> 

I agree that there is more information for doing the transformation at
the tree level, but I'm concerned that RTL optimization would undo the
transformation prior to RTL if-conversion.  Wouldn't fwprop just combine
the loads (rtmp, ltmp below) into their uses, re-creating the original
code?  It seems like you'd have to find a way to defeat that somehow.

Thanks,
Bill

> 
> 
> >
> > This is where you have a binary tree/trie and you are iterating going down
> > either the right link or left link until you find a stopping condition.  The
> > code in ifcvt.c does not handle optimizing these cases for conditional move
> > since the load might trap, and generates code that does if-then-else with 
> > loads
> > and jumps.
> >
> > However, since the two elements are next to each other in memory, they are
> > likely in the same cache line, particularly with aligned stacks and malloc
> > returning aligned pointers.  Except in unusual circumstances where the 
> > pointer
> > is not aligned, this means it is much faster to optimize it as:
> >
> >   while ( this_node->cmpbit > next_node->cmpbit )
> >    {
> >      this_node = next_node;
> >
> >      rtmp = this_node->rlink;
> >      ltmp = this_node->llink;
> >      if ( proute->dst_addr & (0x1 << this_node->cmpbit) )
> >         next_node = rtmp;
> >      else
> >         next_node = ltmp;
> >    }
> >
> > So I wrote some patches to do this optimization.  In ifcvt.c I added a new 
> > hook
> > that allows the backend to try and do conditional moves if the machine
> > independent code doesn't handle the special cases that the machine might 
> > have.
> >
> > Then in rs6000.c I used that hook to see if the conditional moves are 
> > adjacent,
> > and do the optimization.
> >
> > I will note that this type of code comes up quite frequently since binary 
> > trees
> > and tries are common data structure.  The file splay-tree.c in libiberty is 
> > one
> > place in the compiler tree that has conditional adjacent memory moves.
> >
> > So I would like comments on the patch before the 4.8 tree opens up.  I feel
> > even if we decide not to add the adjacent memory move patch, the hook is
> > useful, and I have some other ideas for using it for the powerpc.
> >
> > I was thinking about rewriting the rs6000 dependent parts to make it a 
> > normal
> > optimization available to all ports.  Is this something we want as a normal
> > option?  At the moment, I'm not sure it should be part of -O3 because it is
> > possible for a trap to occur if the pointer straddles a page boundary and 
> > the
> > test condition would guard against loading up the second value.  However,
> > -Ofast might be an appropriate place to do this optimization.
> >
> > At this time I don't have test cases, but I would add them for the real
> > submission.  I have bootstraped the compiler on powerpc with this option
> > enabled and it passed the bootstrap and had no regressions in make check.  I
> > will do a spec run over the weekend as well.
> >
> > In addition to libibery/splay-tree.c the following files in gcc have 
> > adjacent
> > conditional moves that this code would optimize:
> >
> >            cfg.c
> >            c-typeck.c
> >            df-scan.c
> >            fold-const.c
> >            graphds.c
> >            ira-emit.c
> >            omp-low.c
> >            rs6000.c
> >            tree-cfg.c
> >            tree-ssa-dom.c
> >            tree-ssa-loop-ivops.c
> >            tree-ssa-phiopt.c
> >            tree-ssa-uncprop.c
> >
> > 2012-02-10  Michael Meissner  <meiss...@linux.vnet.ibm.com>
> >
> >        * target.def (cmove_md_extra): New hook that is called from
> >        ifcvt.c to allow the backend to generate additional conditional
> >        moves that aren't handled by the machine independent code.  Add
> >        support to call the hook at the appropriate places.
> >        * targhooks.h (default_cmove_md_extra): Likewise.
> >        * targhooks.c (default_cmove_md_extra): Likewise.
> >        * target.h (enum ifcvt_pass): Likewise.
> >        * ifcvt.c (find_if_header): Likewise.
> >        (noce_find_if_block): Likewise.
> >        (struct noce_if_info): Likewise.
> >        (noce_process_if_block): Likewise.
> >        (cond_move_process_if_block): Likewise.
> >        (if_convert): Likewise.
> >        (rest_of_handle_if_conversion): Likewise.
> >        (rest_of_handle_if_after_combine): Likewise.
> >        (rest_of_handle_if_after_reload): Likewise.
> >        * doc/tm.texi (TARGET_CMOVE_MD_EXTRA): Likewise.
> >        * doc/tm.texi.in (TARGET_CMOVE_MD_EXTRA): Likewise.
> >
> >        * config/rs6000/rs6000.opt (-mcmove-adjacent-memory): Add support
> >        for a new switch to optimize conditional moves where each side is
> >        a memory reference and the memory locations are adjacent.
> >        * config/rs6000/rs6000.c (rs6000_cmove_md_extra): Likewise.
> >        (TARGET_CMOVE_MD_EXTRA): Likewise.
> >        (rs6000_decompose_offsettable_memref): Likewise.
> >
> > --
> > Michael Meissner, IBM
> > 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
> > meiss...@linux.vnet.ibm.com     fax +1 (978) 399-6899


Reply via email to