On Fri, Feb 10, 2012 at 10:02 PM, Andrew Pinski <pins...@gmail.com> wrote: > On Fri, Feb 10, 2012 at 12:46 PM, Michael Meissner > <meiss...@linux.vnet.ibm.com> wrote: >> I was looking at the routelookup EEMBC benchmark and it has code of the form: >> >> while ( this_node->cmpbit > next_node->cmpbit ) >> { >> this_node = next_node; >> >> if ( proute->dst_addr & (0x1 << this_node->cmpbit) ) >> next_node = this_node->rlink; >> else >> next_node = this_node->llink; >> } > > Hmm, this looks like we could do this on the tree level better as we > have more information about this_node there. Like we know that we > load from this_node->cmpbit before we do either of the branches. So > can move both of those loads before the branch and then we get the > ifcvt for free.
Indeed. But note that the transform is not valid as *this_node may cross a page boundary and thus either pointer load may trap if the other does not (well, unless the C standard (and thus our middle-end) would require that iff ptr->component does not trap that *ptr does not trap either - we would require a operand_equal_p (get_base_address ()) for both addresses). Joseph, can you clarify what the C standard specifies here? Thanks, Richard. > Thanks, > Andrew Pinski > > > >> >> This is where you have a binary tree/trie and you are iterating going down >> either the right link or left link until you find a stopping condition. The >> code in ifcvt.c does not handle optimizing these cases for conditional move >> since the load might trap, and generates code that does if-then-else with >> loads >> and jumps. >> >> However, since the two elements are next to each other in memory, they are >> likely in the same cache line, particularly with aligned stacks and malloc >> returning aligned pointers. Except in unusual circumstances where the >> pointer >> is not aligned, this means it is much faster to optimize it as: >> >> while ( this_node->cmpbit > next_node->cmpbit ) >> { >> this_node = next_node; >> >> rtmp = this_node->rlink; >> ltmp = this_node->llink; >> if ( proute->dst_addr & (0x1 << this_node->cmpbit) ) >> next_node = rtmp; >> else >> next_node = ltmp; >> } >> >> So I wrote some patches to do this optimization. In ifcvt.c I added a new >> hook >> that allows the backend to try and do conditional moves if the machine >> independent code doesn't handle the special cases that the machine might >> have. >> >> Then in rs6000.c I used that hook to see if the conditional moves are >> adjacent, >> and do the optimization. >> >> I will note that this type of code comes up quite frequently since binary >> trees >> and tries are common data structure. The file splay-tree.c in libiberty is >> one >> place in the compiler tree that has conditional adjacent memory moves. >> >> So I would like comments on the patch before the 4.8 tree opens up. I feel >> even if we decide not to add the adjacent memory move patch, the hook is >> useful, and I have some other ideas for using it for the powerpc. >> >> I was thinking about rewriting the rs6000 dependent parts to make it a normal >> optimization available to all ports. Is this something we want as a normal >> option? At the moment, I'm not sure it should be part of -O3 because it is >> possible for a trap to occur if the pointer straddles a page boundary and the >> test condition would guard against loading up the second value. However, >> -Ofast might be an appropriate place to do this optimization. >> >> At this time I don't have test cases, but I would add them for the real >> submission. I have bootstraped the compiler on powerpc with this option >> enabled and it passed the bootstrap and had no regressions in make check. I >> will do a spec run over the weekend as well. >> >> In addition to libibery/splay-tree.c the following files in gcc have adjacent >> conditional moves that this code would optimize: >> >> cfg.c >> c-typeck.c >> df-scan.c >> fold-const.c >> graphds.c >> ira-emit.c >> omp-low.c >> rs6000.c >> tree-cfg.c >> tree-ssa-dom.c >> tree-ssa-loop-ivops.c >> tree-ssa-phiopt.c >> tree-ssa-uncprop.c >> >> 2012-02-10 Michael Meissner <meiss...@linux.vnet.ibm.com> >> >> * target.def (cmove_md_extra): New hook that is called from >> ifcvt.c to allow the backend to generate additional conditional >> moves that aren't handled by the machine independent code. Add >> support to call the hook at the appropriate places. >> * targhooks.h (default_cmove_md_extra): Likewise. >> * targhooks.c (default_cmove_md_extra): Likewise. >> * target.h (enum ifcvt_pass): Likewise. >> * ifcvt.c (find_if_header): Likewise. >> (noce_find_if_block): Likewise. >> (struct noce_if_info): Likewise. >> (noce_process_if_block): Likewise. >> (cond_move_process_if_block): Likewise. >> (if_convert): Likewise. >> (rest_of_handle_if_conversion): Likewise. >> (rest_of_handle_if_after_combine): Likewise. >> (rest_of_handle_if_after_reload): Likewise. >> * doc/tm.texi (TARGET_CMOVE_MD_EXTRA): Likewise. >> * doc/tm.texi.in (TARGET_CMOVE_MD_EXTRA): Likewise. >> >> * config/rs6000/rs6000.opt (-mcmove-adjacent-memory): Add support >> for a new switch to optimize conditional moves where each side is >> a memory reference and the memory locations are adjacent. >> * config/rs6000/rs6000.c (rs6000_cmove_md_extra): Likewise. >> (TARGET_CMOVE_MD_EXTRA): Likewise. >> (rs6000_decompose_offsettable_memref): Likewise. >> >> -- >> Michael Meissner, IBM >> 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA >> meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899