On Wed, Nov 2, 2016 at 2:43 PM, Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > Richard Biener wrote: > On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra <wilco.dijks...@arm.com> > wrote: > >> > If bswap is false no byte swap is needed, so we found a native endian load >> > and it will always perform the optimization by inserting an unaligned load. >> >> Yes, the general agreement is that the expander can do best and thus we >> should canonicalize accesses to larger ones even for SLOW_UNALIGNED_ACCESS. >> The expander will generate the canonical best code (hopefully...). > > Right, but there are cases where you have to choose between unaligned or > aligned > accesses and you need to know whether the unaligned access is fast. > > A good example is memcpy expansion, if you have fast unaligned accesses then > you > should use them to deal with the last few bytes, but if they get expanded, > using several > aligned accesses is much faster than a single unaligned access.
Yes. That's RTL expansion at which point you of course have to look at SLOW_UNALIGNED_ACCESS. >> > This apparently works on all targets, and doesn't cause alignment traps or >> > huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS. >> > So I'm at a loss what these macros are supposed to mean and how I can query >> > whether a backend supports fast unaligned access for a particular mode. >> > >> > What I actually want to write is something like: >> > >> > if (!FAST_UNALIGNED_LOAD (mode, align)) return false; >> > >> > And know that it only accepts unaligned accesses that are efficient on the >> > target. >> > Maybe we need a new hook like this and get rid of the old one? >> >> No, we don't need to other hook. >> >> Note there is another similar user in gimple-fold.c when folding small >> memcpy/memmove >> to single load/store pairs (patch posted but not applied by me -- I've >> asked for strict-align >> target maintainer feedback but got none). > > I didn't find it, do you have a link? https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00598.html >> Now - for bswap I'm only 99% sure that unaligned load + bswap is >> better than piecewise loads plus manual swap. > > It depends on whether unaligned loads and bswap are expanded or not. Even if > we > assume the expansion is at least as efficient as doing it explicitly > (definitely true > for modes larger than the native integer size - as we found out in PR77308!), > if both the unaligned load and bswap are expanded it seems better not to make > the > transformation for modes up to the word size. But there is no way to find out > as > SLOW_UNALIGNED_ACCESS must be true whenever STRICT_ALIGN is true. The case I was thinking about is availability of a bswap load operating only on aligned memory and "regular" register bswap being "fake" provided by first spilling to an aligned stack slot and then loading from that. Maybe a bit far-fetched. >> But generally I'm always in favor of removing SLOW_UNALIGNED_ACCESS / >> STRICT_ALIGNMENT checks from the GIMPLE side of the compiler. > > I sort of agree because the purpose of these macros is unclear - the > documentation > is insufficient and out of date. I do believe however we need an accurate way > to find out > whether a target supports fast unaligned accesses as that is required to > generate good > target code. I believe the target macros are solely for RTL expansion and say that it has to avoid unaligned ops as those would trap. Richard. > Wilco