On Tue, Oct 27, 2015 at 01:21:51PM -0700, Richard Henderson wrote: > The problem in this PR is that a word-mode subreg is used to write to a > multi-word pseudo, under the assumption that is the correct way to insert a > value into the appropriate bits of the pseudo. > > Except that the pseudo then gets assigned to an SSE register, at which point > all of the assumptions start to fall apart. Primarily, (subreg X 0) and > (subreg X 8) do not in the end resolve to different hard registers, so an > assignment to (subreg X 0) may legitimately clobber all of X. > > There *are* ways to insert a value into an element of an SSE register, but > what > comes out of reload is indistinguishable from a normal DImode assignment. > > An ideal solution would be to use a different method than subregs for > multi-word registers. Using the same code for "view convert", insert, and > extract is going to continue to cause us problems. > > I had a look over the other major vector targets: > > * ppc, sparc, s390 already disallow the described condition. > > * arm ought not be subject to this problem because each vector register is > composed of 4 individually addressable registers. So when (subreg X N) is > simplified, we really do resolve to the desired hard register. > > * aarch64 is almost certainly vulnerable, since it deleted its > CANNOT_CHANGE_MODE_CLASS implementation in January. I haven't tried to create > a test case that fails for it, but I'm certain it's possible.
The best I've come up with so far needs some union-hackery that I'm not convinced is legal, and looks like: typedef union { double v[2]; double s __attribute__ ((vector_size (16))); } data; data reg; void set_lower (double b) { dodgy_data stack_var; double __attribute__ ((vector_size (16))) one = { 1.0, 1.0 }; stack_var.s = reg.s; stack_var.s += one; stack_var.v[0] += b; reg.s = stack_var.s; } This shows the issue going back to GCC 4.9, the code we generate for AArch64 looks like: set_lower: fmov v2.2d, 1.0e+0 adrp x0, reg ldr q1, [x0, #:lo12:reg] fadd v1.2d, v1.2d, v2.2d orr v2.16b, v1.16b, v1.16b fadd d2, d0, d1 // <----- Clobbered stack_var.v[1]. str q2, [x0, #:lo12:reg] // <----- Wrote zeroes to the top half of reg ret Reading the documentation you add below, we'd need to bring back CANNOT_CHANGE_MODE_CLASS for AArch64 and forbid changes from wide registers to 64-bit (and larger) values. Is this right? Are these workarounds intended to be temporary, or is the midend bug likely to be fixed? It isn't clear from the PR and the documentation you add how long-lived the workaround will be, and it is a shame to lose the otherwise useful optimisations enabled by a permissive CANNOT_CHANGE_MODE_CLASS. Thanks, James > > > Tested on x86_64 and committed. > > > r~ > PR rtl-opt/67609 > * config/i386/i386.c (ix86_cannot_change_mode_class): Disallow > narrowing subregs on SSE and MMX registers. > * doc/tm.texi.in (CANNOT_CHANGE_MODE_CLASS): Clarify when subregs that > appear to be sub-words of multi-register pseudos must be rejected. > * doc/tm.texi: Regenerate. > testsuite/ > * gcc.target/i386/pr67609-2.c: New test. > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index d05c8f8..82fd054 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -43031,15 +43031,22 @@ ix86_cannot_change_mode_class (machine_mode from, > machine_mode to, > if (MAYBE_FLOAT_CLASS_P (regclass)) > return true; > > + /* Vector registers do not support QI or HImode loads. If we don't > + disallow a change to these modes, reload will assume it's ok to > + drop the subreg from (subreg:SI (reg:HI 100) 0). This affects > + the vec_dupv4hi pattern. > + > + Further, we cannot allow word_mode subregs of full vector modes. > + Otherwise the middle-end will assume it's ok to store to > + (subreg:DI (reg:TI 100) 0) in order to modify only the low 64 bits > + of the 128-bit register. However, after reload the subreg will > + be dropped leaving a plain DImode store. This is indistinguishable > + from a "normal" DImode move, and so we're justified to use movsd, > + which modifies the entire 128-bit register. > + > + Combining these two conditions, disallow all narrowing mode changes. */ > if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass)) > - { > - /* Vector registers do not support QI or HImode loads. If we don't > - disallow a change to these modes, reload will assume it's ok to > - drop the subreg from (subreg:SI (reg:HI 100) 0). This affects > - the vec_dupv4hi pattern. */ > - if (GET_MODE_SIZE (from) < 4) > - return true; > - } > + return GET_MODE_SIZE (to) < GET_MODE_SIZE (from); > > return false; > } > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > index a8666b1..606ddb6 100644 > --- a/gcc/doc/tm.texi > +++ b/gcc/doc/tm.texi > @@ -2823,8 +2823,8 @@ in the reload pass. > If defined, a C expression that returns nonzero for a @var{class} for which > a change from mode @var{from} to mode @var{to} is invalid. > > -For the example, loading 32-bit integer or floating-point objects into > -floating-point registers on the Alpha extends them to 64 bits. > +For example, loading 32-bit integer or floating-point objects into > +floating-point registers on Alpha extends them to 64 bits. > Therefore loading a 64-bit object and then storing it as a 32-bit object > does not store the low-order 32 bits, as would be the case for a normal > register. Therefore, @file{alpha.h} defines @code{CANNOT_CHANGE_MODE_CLASS} > @@ -2835,6 +2835,17 @@ as below: > (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \ > ? reg_classes_intersect_p (FLOAT_REGS, (CLASS)) : 0) > @end smallexample > + > +Even if storing from a register in mode @var{to} would be valid, > +if both @var{from} and @code{raw_reg_mode} for @var{class} are wider > +than @code{word_mode}, then we must prevent @var{to} narrowing the > +mode. This happens when the middle-end assumes that it can load > +or store pieces of an @var{N}-word pseudo, and that the pseudo will > +eventually be allocated to @var{N} @code{word_mode} hard registers. > +Failure to prevent this kind of mode change will result in the > +entire @code{raw_reg_mode} being modified instead of the partial > +value that the middle-end intended. > + > @end defmac > > @deftypefn {Target Hook} reg_class_t TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS > (int, @var{reg_class_t}) > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > index 69b6cf9..93620eb 100644 > --- a/gcc/doc/tm.texi.in > +++ b/gcc/doc/tm.texi.in > @@ -2461,8 +2461,8 @@ in the reload pass. > If defined, a C expression that returns nonzero for a @var{class} for which > a change from mode @var{from} to mode @var{to} is invalid. > > -For the example, loading 32-bit integer or floating-point objects into > -floating-point registers on the Alpha extends them to 64 bits. > +For example, loading 32-bit integer or floating-point objects into > +floating-point registers on Alpha extends them to 64 bits. > Therefore loading a 64-bit object and then storing it as a 32-bit object > does not store the low-order 32 bits, as would be the case for a normal > register. Therefore, @file{alpha.h} defines @code{CANNOT_CHANGE_MODE_CLASS} > @@ -2473,6 +2473,17 @@ as below: > (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \ > ? reg_classes_intersect_p (FLOAT_REGS, (CLASS)) : 0) > @end smallexample > + > +Even if storing from a register in mode @var{to} would be valid, > +if both @var{from} and @code{raw_reg_mode} for @var{class} are wider > +than @code{word_mode}, then we must prevent @var{to} narrowing the > +mode. This happens when the middle-end assumes that it can load > +or store pieces of an @var{N}-word pseudo, and that the pseudo will > +eventually be allocated to @var{N} @code{word_mode} hard registers. > +Failure to prevent this kind of mode change will result in the > +entire @code{raw_reg_mode} being modified instead of the partial > +value that the middle-end intended. > + > @end defmac > > @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS > diff --git a/gcc/testsuite/gcc.target/i386/pr67609-2.c > b/gcc/testsuite/gcc.target/i386/pr67609-2.c > new file mode 100644 > index 0000000..fece437 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr67609-2.c > @@ -0,0 +1,29 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -msse2" } */ > +/* { dg-require-effective-target sse2 } */ > + > +#include <stdlib.h> > +#include <emmintrin.h> > + > +__m128d reg = { 2.0, 4.0 }; > + > +void > +__attribute__((noinline)) > +set_lower (double b) > +{ > + double v[2]; > + _mm_store_pd(v, reg); > + v[0] = b; > + reg = _mm_load_pd(v); > +} > + > +int > +main () > +{ > + set_lower (6.0); > + > + if (reg[1] != 4.0) > + abort (); > + > + return 0; > +}