On Fri, 2012-02-10 at 13:02 -0800, Andrew Pinski wrote: > On Fri, Feb 10, 2012 at 12:46 PM, Michael Meissner > <meiss...@linux.vnet.ibm.com> wrote: > > I was looking at the routelookup EEMBC benchmark and it has code of the > > form: > > > > while ( this_node->cmpbit > next_node->cmpbit ) > > { > > this_node = next_node; > > > > if ( proute->dst_addr & (0x1 << this_node->cmpbit) ) > > next_node = this_node->rlink; > > else > > next_node = this_node->llink; > > } > > Hmm, this looks like we could do this on the tree level better as we > have more information about this_node there. Like we know that we > load from this_node->cmpbit before we do either of the branches. So > can move both of those loads before the branch and then we get the > ifcvt for free. > > Thanks, > Andrew Pinski >
I agree that there is more information for doing the transformation at the tree level, but I'm concerned that RTL optimization would undo the transformation prior to RTL if-conversion. Wouldn't fwprop just combine the loads (rtmp, ltmp below) into their uses, re-creating the original code? It seems like you'd have to find a way to defeat that somehow. Thanks, Bill > > > > > > This is where you have a binary tree/trie and you are iterating going down > > either the right link or left link until you find a stopping condition. The > > code in ifcvt.c does not handle optimizing these cases for conditional move > > since the load might trap, and generates code that does if-then-else with > > loads > > and jumps. > > > > However, since the two elements are next to each other in memory, they are > > likely in the same cache line, particularly with aligned stacks and malloc > > returning aligned pointers. Except in unusual circumstances where the > > pointer > > is not aligned, this means it is much faster to optimize it as: > > > > while ( this_node->cmpbit > next_node->cmpbit ) > > { > > this_node = next_node; > > > > rtmp = this_node->rlink; > > ltmp = this_node->llink; > > if ( proute->dst_addr & (0x1 << this_node->cmpbit) ) > > next_node = rtmp; > > else > > next_node = ltmp; > > } > > > > So I wrote some patches to do this optimization. In ifcvt.c I added a new > > hook > > that allows the backend to try and do conditional moves if the machine > > independent code doesn't handle the special cases that the machine might > > have. > > > > Then in rs6000.c I used that hook to see if the conditional moves are > > adjacent, > > and do the optimization. > > > > I will note that this type of code comes up quite frequently since binary > > trees > > and tries are common data structure. The file splay-tree.c in libiberty is > > one > > place in the compiler tree that has conditional adjacent memory moves. > > > > So I would like comments on the patch before the 4.8 tree opens up. I feel > > even if we decide not to add the adjacent memory move patch, the hook is > > useful, and I have some other ideas for using it for the powerpc. > > > > I was thinking about rewriting the rs6000 dependent parts to make it a > > normal > > optimization available to all ports. Is this something we want as a normal > > option? At the moment, I'm not sure it should be part of -O3 because it is > > possible for a trap to occur if the pointer straddles a page boundary and > > the > > test condition would guard against loading up the second value. However, > > -Ofast might be an appropriate place to do this optimization. > > > > At this time I don't have test cases, but I would add them for the real > > submission. I have bootstraped the compiler on powerpc with this option > > enabled and it passed the bootstrap and had no regressions in make check. I > > will do a spec run over the weekend as well. > > > > In addition to libibery/splay-tree.c the following files in gcc have > > adjacent > > conditional moves that this code would optimize: > > > > cfg.c > > c-typeck.c > > df-scan.c > > fold-const.c > > graphds.c > > ira-emit.c > > omp-low.c > > rs6000.c > > tree-cfg.c > > tree-ssa-dom.c > > tree-ssa-loop-ivops.c > > tree-ssa-phiopt.c > > tree-ssa-uncprop.c > > > > 2012-02-10 Michael Meissner <meiss...@linux.vnet.ibm.com> > > > > * target.def (cmove_md_extra): New hook that is called from > > ifcvt.c to allow the backend to generate additional conditional > > moves that aren't handled by the machine independent code. Add > > support to call the hook at the appropriate places. > > * targhooks.h (default_cmove_md_extra): Likewise. > > * targhooks.c (default_cmove_md_extra): Likewise. > > * target.h (enum ifcvt_pass): Likewise. > > * ifcvt.c (find_if_header): Likewise. > > (noce_find_if_block): Likewise. > > (struct noce_if_info): Likewise. > > (noce_process_if_block): Likewise. > > (cond_move_process_if_block): Likewise. > > (if_convert): Likewise. > > (rest_of_handle_if_conversion): Likewise. > > (rest_of_handle_if_after_combine): Likewise. > > (rest_of_handle_if_after_reload): Likewise. > > * doc/tm.texi (TARGET_CMOVE_MD_EXTRA): Likewise. > > * doc/tm.texi.in (TARGET_CMOVE_MD_EXTRA): Likewise. > > > > * config/rs6000/rs6000.opt (-mcmove-adjacent-memory): Add support > > for a new switch to optimize conditional moves where each side is > > a memory reference and the memory locations are adjacent. > > * config/rs6000/rs6000.c (rs6000_cmove_md_extra): Likewise. > > (TARGET_CMOVE_MD_EXTRA): Likewise. > > (rs6000_decompose_offsettable_memref): Likewise. > > > > -- > > Michael Meissner, IBM > > 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA > > meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899