Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS

Richard Biener Wed, 02 Nov 2016 06:59:22 -0700

On Wed, Nov 2, 2016 at 2:43 PM, Wilco Dijkstra <wilco.dijks...@arm.com> wrote:
> Richard Biener wrote:
> On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra <wilco.dijks...@arm.com> 
> wrote:
>
>> > If bswap is false no byte swap is needed, so we found a native endian load
>> > and it will always perform the optimization by inserting an unaligned load.
>>
>> Yes, the general agreement is that the expander can do best and thus we
>> should canonicalize accesses to larger ones even for SLOW_UNALIGNED_ACCESS.
>> The expander will generate the canonical best code (hopefully...).
>
> Right, but there are cases where you have to choose between unaligned or 
> aligned
> accesses and you need to know whether the unaligned access is fast.
>
> A good example is memcpy expansion, if you have fast unaligned accesses then 
> you
> should use them to deal with the last few bytes, but if they get expanded, 
> using several
> aligned accesses is much faster than a single unaligned access.


Yes.  That's RTL expansion at which point you of course have to look
at SLOW_UNALIGNED_ACCESS.

>> > This apparently works on all targets, and doesn't cause alignment traps or
>> > huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS.
>> > So I'm at a loss what these macros are supposed to mean and how I can query
>> > whether a backend supports fast unaligned access for a particular mode.
>> >
>> > What I actually want to write is something like:
>> >
>> >  if (!FAST_UNALIGNED_LOAD (mode, align)) return false;
>> >
>> > And know that it only accepts unaligned accesses that are efficient on the 
>> > target.
>> > Maybe we need a new hook like this and get rid of the old one?
>>
>> No, we don't need to other hook.
>>
>> Note there is another similar user in gimple-fold.c when folding small
>> memcpy/memmove
>> to single load/store pairs (patch posted but not applied by me -- I've
>> asked for strict-align
>> target maintainer feedback but got none).
>
> I didn't find it, do you have a link?

https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00598.html

>> Now - for bswap I'm only 99% sure that unaligned load + bswap is
>> better than piecewise loads plus manual swap.
>
> It depends on whether unaligned loads and bswap are expanded or not. Even if 
> we
> assume the expansion is at least as efficient as doing it explicitly 
> (definitely true
> for modes larger than the native integer size - as we found out in PR77308!),
> if both the unaligned load and bswap are expanded it seems better not to make 
> the
> transformation for modes up to the word size. But there is no way to find out 
> as
> SLOW_UNALIGNED_ACCESS must be true whenever STRICT_ALIGN is true.

The case I was thinking about is availability of a bswap load operating only on
aligned memory and "regular" register bswap being "fake" provided by first
spilling to an aligned stack slot and then loading from that.

Maybe a bit far-fetched.

>> But generally I'm always in favor of removing SLOW_UNALIGNED_ACCESS /
>> STRICT_ALIGNMENT checks from the GIMPLE side of the compiler.
>
> I sort of agree because the purpose of these macros is unclear - the 
> documentation
> is insufficient and out of date. I do believe however we need an accurate way 
> to find out
> whether a target supports fast unaligned accesses as that is required to 
> generate good
> target code.

I believe the target macros are solely for RTL expansion and say that
it has to avoid
unaligned ops as those would trap.

Richard.

> Wilco

Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS

Reply via email to