On Tue, 22 Feb 2011 09:42:15 +0000 Richard Sandiford <richard.sandif...@linaro.org> wrote:
> Julian Brown <jul...@codesourcery.com> writes: > > Richard Sandiford <richard.sandif...@linaro.org> wrote: > > 1. Struct (tree) types are defined via hard-wired code in the ARM > > backend rather than in arm_neon.h. The "type mode" of those struct > > types is overridden to be an extra-wide vector, the width of the > > whole struct (so int32x2x2_t would be V4SImode, etc.). > > FWIW, I was going to try to avoid this. I think instead we should > automatically use vector modes for structures like those in > arm_neon.h, via a target hook. It would improve the code quality for > general code that has the same sort of small-array-of-vectors > structure. (In other words, this would help when using the generic > vector extensions rather than the Neon-specific builtins.) That sounds like a good plan, I think. > > 2. Builtins (__builtin_neon_*) which previously used "big" integer > > modes to pass/return values, are initialised such that they > > directly pass/return the struct types above instead. The intrinsic > > wrappers in arm_neon.h no longer need to use unions to pun the > > types of arguments & return values. > > Yeah, I'd wondered about that too. However, these days, I think we > ought to be able to generate good code for this type of union, and we > seem to for the cases I've tried. In the end I thought it was better > to keep the underlying built-in function close to the rtl pattern. > E.g. the fact that the name of the field is "val" seems more like an > arm_neon.h detail than something that should be hard-coded into GCC. I still think it's a good idea to get rid of the unions in this case, or at least, replace the wide-integer modes in the unions with wide vectors. But I'm happy with whatever works :-). > > 3. When those builtins are expanded, they now use the extra-wide > > vectors. The corresponding instruction patterns in neon.md also use > > wide vectors (rather than wide integer modes). > > I was also going to try defining non-power-of-two vectors. Glad to > hear it works! (That was actually the main motivation for doing the > rtl side first: to see whether it really would be OK to ask the > vectoriser to treat these values as single vectors.) (Caveat: that's one of the things that isn't well-tested. It works to define those modes, at least.) > I think we should keep the integer modes too, though, just like we > allow DImode for double registers. (I'm surprised to see we don't > allow TImode for quad registers TBH -- might look into that.) > Given that there are no architectual restrictions on mode punning, > these integer modes are useful neutral ground. I'm not convinced by that. Consider that: 1. There are no useful operations on the wide-integer types. 2. There is no way of representing constants in RTL for the wide-integer types (we have an internal bug where a constant-zero OImode value is synthesized when using NEON intrinsics, and it leads to an ICE: see emit-rtl.c:immed_double_const -- OImode, etc. are wider than 2 * HOST_BITS_PER_WIDE_INT, so fail the second assertion). Unless we can show that the wide-integer modes are really needed, and patch things up so (2) no longer holds, I'd strongly prefer to see them disappear. (As a side note, we obviously want to avoid wide-integer or wide-vector modes *ever* being reloaded via core registers, since there simply aren't enough of them for that to be possible. I think that may be one of the reasons that integer-equivalent modes for each size have traditionally been used by the compiler?). > >> The VMOV is a bit disappointing, and needs further investigation. > > > > But that should be fairly easy: we just need to expand to subreg > > operations for vcombine, vget_high/vget_low, etc. -- which I believe > > will work fine now (it didn't at some point in the distant past, > > which is why we have hardwired "vmov"s all over the place). > > Big-endian mode may need some care to be taken, as ever. > > This VMOV is coming from a plain register-register SET that we fail to > optimise away. The pattern is: > > (set (reg NEWX) (reg OLDX)) > (set (subreg (reg NEWX 0)) (plus (subreg (reg OLDX 0)) ...)) ; > OLDX dead (set (subreg (reg NEWX 8)) (plus (subreg (reg NEWX 8)) ...)) > ... > > where what we really want is for the second instruction to use NEWX > (or for NEWX not to exist at all, whichever way to prefer to look at > it). I think it's a general subreg optimisation problem. Hmm, not sure about that then. > > I don't think I'm going to have time to work on this in the > > immediate future: please feel free to use it as a base, or ignore > > it if your approach is simpler/better :-). > > Thanks. I might well end up "borrowing" the vector-mode stuff. Cheers, Julian _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain