On Thu, 13 Nov 2014, Alan Lawrence wrote: > Hmmm, I see. Yes, that would be another way. It has me wondering however > exactly where to draw the line, as to what permutations > TARGET_VECTORIZE_VEC_PERM_CONST_OK should allow / how much effort it should > go to. For example, should said hook return true for any combination of two > supported permutations, i.e. of the form (perm1(vec1) & mask) | > (perm2(vec2) & ~mask) ?
Well, if it can expand them then yes ;) Not sure how you get to two permutations though - you can't ask that with the current interface of TARGET_VECTORIZE_VEC_PERM_CONST_OK. Thus the second permutation has to be the identity. Also (perm1(vec) & mask) | (perm2(vec2) & ~mask) is equal to perm(vec, vec2, perm1 & mask | (perm2 + nelements) & ~mask) that is, doing a single two-vector permute with a properly combined mask. > I would probably argue not, but on the other hand, if the hook returns > false for that - will the generic vec-perm machinery consider the > possibility, i.e. individually checking both perm1 and perm2? What if > vec1==vec2, we can't really search to find if any appropriate perm1+perm2 > exist... ? I was only asking for special-casing the case where the 2nd vector is used unpermuted which would match shifts if it is zero. > Looking ahead to a proper solution, a few options struck me...any of which > would take quite a bit more fleshing out before they could be > implemented.... > (1) provide a hook, whereby targets can specify a vec_perm mask that they > do implement, and that is usable for performing > reductions-via-shift(-like-operation)s. For example, I note MIPS implements > direct reductions via a scheme that is very similar to > reductions-via-shifts, but the first step is not a shift but a different > operation - IIUC, because like many platforms they can't do a shift on such > a big vector, but can do after that. > (2) extend the various vec_perm_const hooks, to allow specifying a "don't > care" in the mask, i.e. the platform can output any result in there. > Theoretically, this strikes me as the "correct" solution, however from an > engineering perspective I can see (possibly fatal) issues: it almost seems > to invite code to come to depend on particular values of the unspecified > elements (e.g., what does constant-folding do). > (3) providing known-constant values to the vectorization hooks....this > seems far too messy, but there is a possibly-useful case even with > non-constant values, i.e. where multiple elements are known to be the same > (==> like a special form of the previous, "dont-care-which" of a subset). I don't see why we would need any of this special things. What could be useful is asking for zero/all-ones elements explicitely as this is what hardware usually allows. This would mean adding special values for the mask, like for example using -1 for "zero this element" and -2 for "put all-ones into this element". > For the moment, however: powerpc64-unknown-linux-gnu tests come back > all-clear (with previously posted vec_shr fix which David Edelsohn OK'd "if > no regressions"), and I've done a bootstrap+check-gcc on > arm-none-linux-gnueabihf too. I'll commit the first three patches this > afternoon, the fourth....soon after ;) Nice. Thanks again for working on this, Richard. > Cheers, Alan > > > On 13 November 2014 09:07, Richard Biener <rguent...@suse.de> wrote: > > > On Wed, 12 Nov 2014, Alan Lawrence wrote: > > > > > In response to https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01803.html, > > this > > > series removes the VEC_RSHIFT_EXPR, instead using a VEC_PERM_EXPR (with a > > > second argument full of constant zeroes) to represent the shift. > > > > > > I've kept the use of vec_shr optab for platforms that define it, as even > > on > > > platforms with a whole-vector-shift operation, this typically does not > > work as > > > a vec-perm on arbitrary vectors (the shift will pull in zeroes from the > > end, > > > whereas TARGET_VECTORIZE_VEC_PERM_CONST_OK and related machinery allows > > only > > > to check for a shift-like permutation that will work for two arbitrary > > > vectors). > > > > That's reasonable for the moment though I expected to use > > > > VEC_PERM <v4si, { 0, 0, 0, 0 }, { 4, 5, 0, 1 }> > > > > for the shift - thus the shifted in vector elements should map > > 1:1 from the 2nd vector. This means that the target can > > answer "yes" to vec_perm_const_ok (v4si, ...) which such > > a permute if it can shift in zeros as it then can do > > > > tem = shift-in-zeros > > tem2 = vec2 & ~<mask to clear not wanted stuff> > > perm_result = tem | tem2; > > > > that is, simply OR in the wanted parts of the 2nd vector. Of > > course the actual expansion can special-case a constant or > > zero 2nd vector. > > > > Usually targets provide a way of setting vector elements to > > all ones or zero with their permute instructions as well. > > > > Of course the above requires adjustments to all targets vec_perm_const_ok > > hooks and vec_perm_const expanders so for now asking for vec_shr > > is ok, but long-term it shouldn't be needed, even without changes > > to the vec_perm_const_ok interface. > > > > Thanks, > > Richard. > > > > > I've also changed from the endianness-dependent shift direction of the > > old > > > VEC_RSHIFT_EXPR, to an endian-neutral direction (VEC_PERM_EXPR is > > inherently > > > endian-neutral), changing the meaning of vec_shr_optab to match (as I > > did in > > > https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html). As > > previously, this > > > will break any *bigendian* platform defining vec_shr; I see MIPS and > > RS6000, > > > but candidate fixes for both of these have already been posted: > > > > > > (for MIPS) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01481.html, > > although > > > I have not been able to test this as there doesn't seem to be any working > > > MIPS/Loongson hardware in the Compile Farm; > > > > > > (for PowerPC) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01480.html, > > > testing in progress. > > > > > > ARM defines vec_shr only for little-endian; AArch64 does not yet, > > although in > > > previous patch series I both added a vec_shr and made it endian-neutral > > (I > > > will post revised patches for both of these shortly). > > > > > > Bootstrapped and check-gcc on x86-none-linux-gnu and arm-none-linux-gnu; > > > cross-tested on aarch64{,_be}-none-elf (FWIW, both with and without > > previous > > > patches adding a vec_shr pattern) > > > > > > Ok for trunk if no regressions on PowerPC? > > > > > > Thanks, Alan > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany