Re: Improving the code generated for vld and vst intrinsics

Julian Brown Tue, 22 Feb 2011 03:04:11 -0800

On Tue, 22 Feb 2011 09:42:15 +0000
Richard Sandiford <richard.sandif...@linaro.org> wrote:


> Julian Brown <jul...@codesourcery.com> writes:
> > Richard Sandiford <richard.sandif...@linaro.org> wrote:
> > 1. Struct (tree) types are defined via hard-wired code in the ARM
> > backend rather than in arm_neon.h. The "type mode" of those struct
> > types is overridden to be an extra-wide vector, the width of the
> > whole struct (so int32x2x2_t would be V4SImode, etc.).
> 
> FWIW, I was going to try to avoid this.  I think instead we should
> automatically use vector modes for structures like those in
> arm_neon.h, via a target hook.  It would improve the code quality for
> general code that has the same sort of small-array-of-vectors
> structure.  (In other words, this would help when using the generic
> vector extensions rather than the Neon-specific builtins.)

That sounds like a good plan, I think.

> > 2. Builtins (__builtin_neon_*) which previously used "big" integer
> > modes to pass/return values, are initialised such that they
> > directly pass/return the struct types above instead. The intrinsic
> > wrappers in arm_neon.h no longer need to use unions to pun the
> > types of arguments & return values.
> 
> Yeah, I'd wondered about that too.  However, these days, I think we
> ought to be able to generate good code for this type of union, and we
> seem to for the cases I've tried.  In the end I thought it was better
> to keep the underlying built-in function close to the rtl pattern.
> E.g. the fact that the name of the field is "val" seems more like an
> arm_neon.h detail than something that should be hard-coded into GCC.

I still think it's a good idea to get rid of the unions in this case,
or at least, replace the wide-integer modes in the unions with wide
vectors. But I'm happy with whatever works :-).

> > 3. When those builtins are expanded, they now use the extra-wide
> > vectors. The corresponding instruction patterns in neon.md also use
> > wide vectors (rather than wide integer modes).
> 
> I was also going to try defining non-power-of-two vectors.  Glad to
> hear it works!  (That was actually the main motivation for doing the
> rtl side first: to see whether it really would be OK to ask the
> vectoriser to treat these values as single vectors.)

(Caveat: that's one of the things that isn't well-tested. It works to
define those modes, at least.)

> I think we should keep the integer modes too, though, just like we
> allow DImode for double registers.  (I'm surprised to see we don't
> allow TImode for quad registers TBH -- might look into that.)
> Given that there are no architectual restrictions on mode punning,
> these integer modes are useful neutral ground.

I'm not convinced by that. Consider that:

1. There are no useful operations on the wide-integer types.

2. There is no way of representing constants in RTL for the
wide-integer types (we have an internal bug where a constant-zero
OImode value is synthesized when using NEON intrinsics, and it leads to
an ICE: see emit-rtl.c:immed_double_const -- OImode, etc. are wider
than 2 * HOST_BITS_PER_WIDE_INT, so fail the second assertion).

Unless we can show that the wide-integer modes are really needed, and
patch things up so (2) no longer holds, I'd strongly prefer to see them
disappear. (As a side note, we obviously want to avoid wide-integer or
wide-vector modes *ever* being reloaded via core registers, since there
simply aren't enough of them for that to be possible. I think that may
be one of the reasons that integer-equivalent modes for each size have
traditionally been used by the compiler?).

> >> The VMOV is a bit disappointing, and needs further investigation.
> >
> > But that should be fairly easy: we just need to expand to subreg
> > operations for vcombine, vget_high/vget_low, etc. -- which I believe
> > will work fine now (it didn't at some point in the distant past,
> > which is why we have hardwired "vmov"s all over the place).
> > Big-endian mode may need some care to be taken, as ever.
> 
> This VMOV is coming from a plain register-register SET that we fail to
> optimise away.  The pattern is:
> 
>     (set (reg NEWX) (reg OLDX))
>     (set (subreg (reg NEWX 0)) (plus (subreg (reg OLDX 0)) ...)) ;
> OLDX dead (set (subreg (reg NEWX 8)) (plus (subreg (reg NEWX 8)) ...))
>     ...
> 
> where what we really want is for the second instruction to use NEWX
> (or for NEWX not to exist at all, whichever way to prefer to look at
> it). I think it's a general subreg optimisation problem.

Hmm, not sure about that then.

> > I don't think I'm going to have time to work on this in the
> > immediate future: please feel free to use it as a base, or ignore
> > it if your approach is simpler/better :-).
> 
> Thanks.  I might well end up "borrowing" the vector-mode stuff.

Cheers,

Julian

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Improving the code generated for vld and vst intrinsics

Reply via email to