> On Sep 21, 2015, at 4:39 AM, Bernd Schmidt <bschm...@redhat.com> wrote:
> 
>> On 09/18/2015 05:21 PM, Jiong Wang wrote:
>> 
>> Current conditional compare (CCMP) support in GCC aim to optimize
>> short circuit for cascade comparision, given a simple conditional
>> compare candidate:
>> 
>>   if (a == 17 || a == 32)
> [...]
>> The problem is current implementation always expand t0 first, then
>> t1. While the expand order need to consider the rtx costs, because "cmp"
>> and "ccmp" may have different restrictions that the expand order will
>> result in performance differences.
>> 
>> For example on AArch64, "ccmp" only accept immediate within -31 ~ 31
>> while "cmp" accept wider range, so if we expand "a == 32" in the second
>> step, then it will use "ccmp", and thus an extra "mov reg, 32"
>> instruction is generated because "32" is out of the range. While if we
>> expand "a == 32" first, then it's fine as "cmp" can encoding it.
> 
> I've played with this patch a bit with an aarch64 cross compiler. First of 
> all - it doesn't seem to work, I get identical costs and the swapping doesn't 
> happen. Did you forget to include a rtx_cost patch?
> 
> I was a little worried about whether this would be expensive for longer 
> sequences of conditions, but it seems like it looks only at leafs where we 
> have two comparisons, so that cost should be minimal. However, it's possible 
> there's room for improvement in code generation. I was curious and looked at 
> a slightly more complex testcase
> 
> int
> foo (int a)
> {
>  if (a == 17 || a == 32 || a == 47 || a == 53 || a == 66 || a == 72)
>    return 1;
>  else
>    return 2;
> }
> 
> and this doesn't generate a sequence of ccmps as might have been expected; we 
> only get pairs of comparisons merged with a bit_ior:
> 
>  D.2699 = a == 17;
>  D.2700 = a == 32;
>  D.2701 = D.2699 | D.2700;
>  if (D.2701 != 0) goto <D.2697>; else goto <D.2702>;
>  <D.2702>:
>  D.2703 = a == 47;
>  D.2704 = a == 53;
>  D.2705 = D.2703 | D.2704;
>  if (D.2705 != 0) goto <D.2697>; else goto <D.2706>;
> 
> and the ccmp expander doesn't see the entire thing. I found that a little 
> surprising TBH.

This is a known issue with fold-const folding too early. Replace || with | and 
add some parentheses and you should get a string of ccmp's. 

Thanks,
Andrew


> 
> 
> Bernd

Reply via email to