> On Sep 21, 2015, at 4:39 AM, Bernd Schmidt <bschm...@redhat.com> wrote: > >> On 09/18/2015 05:21 PM, Jiong Wang wrote: >> >> Current conditional compare (CCMP) support in GCC aim to optimize >> short circuit for cascade comparision, given a simple conditional >> compare candidate: >> >> if (a == 17 || a == 32) > [...] >> The problem is current implementation always expand t0 first, then >> t1. While the expand order need to consider the rtx costs, because "cmp" >> and "ccmp" may have different restrictions that the expand order will >> result in performance differences. >> >> For example on AArch64, "ccmp" only accept immediate within -31 ~ 31 >> while "cmp" accept wider range, so if we expand "a == 32" in the second >> step, then it will use "ccmp", and thus an extra "mov reg, 32" >> instruction is generated because "32" is out of the range. While if we >> expand "a == 32" first, then it's fine as "cmp" can encoding it. > > I've played with this patch a bit with an aarch64 cross compiler. First of > all - it doesn't seem to work, I get identical costs and the swapping doesn't > happen. Did you forget to include a rtx_cost patch? > > I was a little worried about whether this would be expensive for longer > sequences of conditions, but it seems like it looks only at leafs where we > have two comparisons, so that cost should be minimal. However, it's possible > there's room for improvement in code generation. I was curious and looked at > a slightly more complex testcase > > int > foo (int a) > { > if (a == 17 || a == 32 || a == 47 || a == 53 || a == 66 || a == 72) > return 1; > else > return 2; > } > > and this doesn't generate a sequence of ccmps as might have been expected; we > only get pairs of comparisons merged with a bit_ior: > > D.2699 = a == 17; > D.2700 = a == 32; > D.2701 = D.2699 | D.2700; > if (D.2701 != 0) goto <D.2697>; else goto <D.2702>; > <D.2702>: > D.2703 = a == 47; > D.2704 = a == 53; > D.2705 = D.2703 | D.2704; > if (D.2705 != 0) goto <D.2697>; else goto <D.2706>; > > and the ccmp expander doesn't see the entire thing. I found that a little > surprising TBH.
This is a known issue with fold-const folding too early. Replace || with | and add some parentheses and you should get a string of ccmp's. Thanks, Andrew > > > Bernd