On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner > <meiss...@linux.vnet.ibm.com> wrote: >> This patches optimizes the PowerPC vector set operation for 64-bit doubles >> and >> longs where the elements in the vector set may have been extracted from >> another >> vector (PR target/81593): >> >> Here an an example: >> >> vector double >> test_vpasted (vector double high, vector double low) >> { >> vector double res; >> res[1] = high[1]; >> res[0] = low[0]; >> return res; >> } > > Interesting. We expand from > > <bb 2> [100.00%] [count: INV]: > _1 = BIT_FIELD_REF <high_4(D), 64, 64>; > res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>; > _2 = BIT_FIELD_REF <low_7(D), 64, 0>; > res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>; > return res_8; > > but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap > pass looks like the canonical pass for this even though it's quite awkward > to fill this in.
I was thinking about this exactly. Though for the scale use of BIT_INSERT_EXPR/BIT_FIELD_REF. I have a case where someone writes (this shows up in GCC too): a->b = c->b; a->d = c->d; a->e = c->e; a->f = c->f; a->g = c->g; a->h = c->h; Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I have: _1 = BIT_FIELD_REF <a.1_3, 2, 0>; _8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>; _2 = BIT_FIELD_REF <a.1_3, 4, 2>; _9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>; .... For the vector case, can't we write it as: _1 = BIT_FIELD_REF <high_4(D), 64, 64>; _2 = BIT_FIELD_REF <low_7(D), 64, 0>; res_8 = {_1, _2}; And then have some match.pd patterns (which might get complex), to rewrite that into VEC_PERM_EXPR? The reason why I ask that is because say someone who wrote: vector double test_vpasted (vector double high, vector double low) { vector double res = { high[1], low[0] }; return res; } Thanks, Andrew Pinski > > So a match.pd rule would work as well here - your ppc backend patterns > are v2df specific, right? > >> Previously it would generate: >> >> xxpermdi 12,34,34,2 >> vspltisw 2,0 >> xxlor 0,35,35 >> xxpermdi 34,34,12,0 >> xxpermdi 34,0,34,1 >> >> and with these patches, it now generates: >> >> xxpermdi 34,35,34,1 >> >> I have tested it on a little endian power8 system and a big endian power7 >> system with the usual bootstrap and make checks with no regressions. Can I >> check this into the trunk? >> >> I also built Spec 2006 with the compiler, and saw no changes in the code >> generated. This isn't surprising because it isn't something that auto >> vectorization might generate by default. >> >> [gcc] >> 2017-07-27 Michael Meissner <meiss...@linux.vnet.ibm.com> >> >> PR target/81593 >> * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New >> declaration. >> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to >> emit XXPERMDI accessing either double word in either vector >> register inputs. >> * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator): >> Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify >> the constraints with the removal of the -mupper-regs-* switches. >> (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs >> where either register might have come from VEC_SELECT. >> (vsx_concat_<mode>_2): Likewise. >> (vsx_concat_<mode>_3): Likewise. >> (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a >> VEC_CONCAT rather than use an UNSPEC to specify the option. >> >> [gcc/testsuite] >> 2017-07-27 Michael Meissner <meiss...@linux.vnet.ibm.com> >> >> PR target/81593 >> * gcc.target/powerpc/vsx-extract-6.c: New test. >> * gcc.target/powerpc/vsx-extract-7.c: Likewise. >> >> -- >> Michael Meissner, IBM >> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA >> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797