Re: [PATCH] [Aarch64] Optimize subtract in shift counts

Richard Sandiford Wed, 06 Sep 2017 09:54:56 -0700

Michael Collison <[email protected]> writes:
> Richard,
>
> The problem with this approach for Aarch64 is that
> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is
> normally 0 as it based on the TARGET_SIMD flag.


Maybe I'm wrong, but that seems like a missed optimisation in itself.
Like you say, the definition is:

  static unsigned HOST_WIDE_INT
  aarch64_shift_truncation_mask (machine_mode mode)
  {
    return
      (!SHIFT_COUNT_TRUNCATED
       || aarch64_vector_mode_supported_p (mode)
       || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 
1);
  }

SHIFT_COUNT_TRUNCATED is:

  #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD)

and aarch64_vector_mode_supported_p always returns false for
!TARGET_SIMD:

  static bool
  aarch64_vector_mode_supported_p (machine_mode mode)
  {
    if (TARGET_SIMD
        && (mode == V4SImode  || mode == V8HImode
            || mode == V16QImode || mode == V2DImode
            || mode == V2SImode  || mode == V4HImode
            || mode == V8QImode || mode == V2SFmode
            || mode == V4SFmode || mode == V2DFmode
            || mode == V4HFmode || mode == V8HFmode
            || mode == V1DFmode))
      return true;

    return false;
  }

So when does the second || condition fire?

I'm surprised the aarch64_vect_struct_mode_p part is needed, since
this hook describes the shift optabs, and AArch64 don't provide any
shift optabs for OI, CI or XI.

Thanks,
Richard

> -----Original Message-----
> From: Richard Sandiford [mailto:[email protected]] 
> Sent: Wednesday, September 6, 2017 11:32 AM
> To: Michael Collison <[email protected]>
> Cc: Richard Biener <[email protected]>; Richard Kenner
> <[email protected]>; GCC Patches <[email protected]>; nd
> <[email protected]>; Andrew Pinski <[email protected]>
> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>
> Michael Collison <[email protected]> writes:
>> Richard Sandiford do you have any objections to the patch as it stands?
>> It doesn't appear as if anything is going to change in the mid-end 
>> anytime soon.
>
> I think one of the suggestions was to do it in expand, taking advantage of 
> range info and TARGET_SHIFT_TRUNCATION_MASK.  This would be like the current 
> FMA_EXPR handling in expand_expr_real_2.
>
> I know there was talk about cleaner approaches, but at least doing the above 
> seems cleaner than doing in the backend.  It should also be a 
> nicely-contained piece of work.
>
> Thanks,
> Richard
>
>> -----Original Message-----
>> From: Richard Sandiford [mailto:[email protected]]
>> Sent: Tuesday, August 22, 2017 9:11 AM
>> To: Richard Biener <[email protected]>
>> Cc: Richard Kenner <[email protected]>; Michael Collison 
>> <[email protected]>; GCC Patches <[email protected]>; nd 
>> <[email protected]>; Andrew Pinski <[email protected]>
>> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>>
>> Richard Biener <[email protected]> writes:
>>> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford 
>>> <[email protected]> wrote:
>>>> Richard Biener <[email protected]> writes:
>>>>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford 
>>>>> <[email protected]> wrote:
>>>>>>Richard Biener <[email protected]> writes:
>>>>>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner 
>>>>>>> <[email protected]> wrote:
>>>>>>>>> Correct. It is truncated for integer shift, but not simd shift 
>>>>>>>>> instructions. We generate a pattern in the split that only
>>>>>>generates
>>>>>>>>> the integer shift instructions.
>>>>>>>>
>>>>>>>> That's unfortunate, because it would be nice to do this in
>>>>>>simplify_rtx,
>>>>>>>> since it's machine-independent, but that has to be conditioned 
>>>>>>>> on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>>>>>>>
>>>>>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in 
>>>>>>> the patterns, like for example with
>>>>>>>
>>>>>>> (define_insn ashlSI3
>>>>>>>   [(set (match_operand 0 "")
>>>>>>>          (ashl:SI (match_operand ... )
>>>>>>>                      (subreg:QI (match_operand:SI ...)))]
>>>>>>>
>>>>>>> or an explicit and:SI and combine / simplify_rtx should apply the
>>>>>>magic
>>>>>>> optimization we expect.
>>>>>>
>>>>>>The problem with the explicit AND is that you'd end up with either 
>>>>>>an AND of two constants for constant shifts, or with two separate 
>>>>>>patterns, one for constant shifts and one for variable shifts.  
>>>>>>(And the problem in theory with two patterns is that it reduces the 
>>>>>>RA's freedom, although in practice I guess we'd always want a 
>>>>>>constant shift where possible for cost reasons, and so the RA would 
>>>>>>never need to replace pseudos with constants itself.)
>>>>>>
>>>>>>I think all useful instances of this optimisation will be exposed 
>>>>>>by the gimple optimisers, so maybe expand could to do it based on 
>>>>>>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than 
>>>>>>the rtx code and it does take the mode into account.
>>>>>
>>>>> Sure, that could work as well and also take into account range info. 
>>>>> But we'd then need named expanders and the result would still have 
>>>>> the explicit and or need to be an unspec or a different RTL operation.
>>>>
>>>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have 
>>>> target-dependent rather than undefined behaviour, so it's OK for a 
>>>> target to use shift codes with out-of-range values.
>>>
>>> Hmm, but that means simplify-rtx can't do anything with them because 
>>> we need to preserve target dependent behavior.
>>
>> Yeah, it needs to punt.  In practice that shouldn't matter much.
>>
>>> I think the RTL IL should be always well-defined and its semantics 
>>> shouldn't have any target dependences (ideally, and if, then they 
>>> should be well specified via extra target hooks/macros).
>>
>> That would be nice :-) I think the problem has traditionally been that
>>> shifts can be used in quite a few define_insn patterns besides those 
>>> for shift instructions.  So if your target defines shifts to have 
>>> 256-bit precision (say) then you need to make sure that every 
>>> define_insn with a shift rtx will honour that.
>>
>> It's more natural for target guarantees to apply to instructions than 
>> to
>>> rtx codes.
>>
>>>>  And
>>>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about 
>>>> how the normal shift optabs behave, so I don't think we'd need new 
>>>> optabs or new unspecs.
>>>>
>>>> E.g. it already works this way when expanding double-word shifts, 
>>>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There 
>>>> it's possible to use a shorter sequence if you know that the shift 
>>>> optab truncates the count, so we can do that even if 
>>>> SHIFT_COUNT_TRUNCATED isn't defined.
>>>
>>> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK 
>>> applies to the instructions generated by the named shift patterns but 
>>> _not_ general shift RTXen.  But the generated pattern contains shift 
>>> RTXen and how can we figure whether they were generated by the named 
>>> expanders or by other means?  Don't define_expand also serve as 
>>> define_insn for things like combine?
>>
>> Yeah, you can't (and aren't supposed to try to) reverse-engineer the
>>> expander from the generated instructions.
>>> TARGET_SHIFT_TRUNCATION_MASK should only be used if you're expanding 
>>> a shift optab.
>>
>>> That said, from looking at the docs and your description above it 
>>> seems that semantics are not fully reflected in the RTL instruction stream?
>>
>> Yeah, the semantics go from being well-defined in the optab interface
>> to being target-dependent in the rtl stream.
>>
>> Thanks,
>> Richard
>>
>>>
>>> Richard.
>>>
>>>> Thanks,
>>>> Richard
>>>>
>>>>>
>>>>> Richard.
>>>>>
>>>>>>Thanks,
>>>>>>Richard

Re: [PATCH] [Aarch64] Optimize subtract in shift counts

Reply via email to