Re: [PATCH 37/40] i386: Allow MMX intrinsic emulation with SSE

2019-02-14 Thread H.J. Lu
On Thu, Feb 14, 2019 at 3:12 PM H.J. Lu  wrote:
>
> On Thu, Feb 14, 2019 at 2:57 PM Uros Bizjak  wrote:
> >
> > On Thu, Feb 14, 2019 at 10:02 PM H.J. Lu  wrote:
> >
> > > > > > > gcc/
> > > > > > >
> > > > > > > PR target/89021
> > > > > > > * config/i386/i386-builtin.def: Enable MMX intrinsics with
> > > > > > > SSE/SSE2/SSSE3.
> > > > > > > * config/i386/i386.c (ix86_option_override_internal): 
> > > > > > > Don't
> > > > > > > enable MMX ISA with TARGET_MMX_WITH_SSE by default.
> > > > > > > (ix86_init_mmx_sse_builtins): Enable MMX intrinsics with
> > > > > > > SSE/SSE2/SSSE3.
> > > > > > > (ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate MMX
> > > > > > > intrinsics with TARGET_MMX_WITH_SSE.
> > > > > > > * config/i386/mmintrin.h: Don't require MMX in 64-bit 
> > > > > > > mode.
> > > > > > >
> > > > >
> > > > > >
> > > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > > index a9abbe8706b..1d417e08734 100644
> > > > > > > --- a/gcc/config/i386/i386.c
> > > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > > @@ -4165,12 +4165,15 @@ ix86_option_override_internal (bool 
> > > > > > > main_args_p,
> > > > > > >opts->x_target_flags
> > > > > > > |= TARGET_SUBTARGET64_DEFAULT & ~opts_set->x_target_flags;
> > > > > > >
> > > > > > > -  /* Enable by default the SSE and MMX builtins.  Do allow 
> > > > > > > the user to
> > > > > > > -explicitly disable any of these.  In particular, 
> > > > > > > disabling SSE and
> > > > > > > -MMX for kernel code is extremely useful.  */
> > > > > > > +  /* Enable the SSE and MMX builtins by default.  Don't 
> > > > > > > enable MMX
> > > > > > > + ISA with TARGET_MMX_WITH_SSE by default.  Do allow the 
> > > > > > > user to
> > > > > > > +explicitly disable any of these.  In particular, 
> > > > > > > disabling SSE
> > > > > > > +and MMX for kernel code is extremely useful.  */
> > > > > > >if (!ix86_arch_specified)
> > > > > > >opts->x_ix86_isa_flags
> > > > > > > -   |= ((OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE | 
> > > > > > > OPTION_MASK_ISA_MMX
> > > > > > > +   |= ((OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE
> > > > > > > +| (TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags)
> > > > > > > +   ? 0 : OPTION_MASK_ISA_MMX)
> > > > > > >  | TARGET_SUBTARGET64_ISA_DEFAULT)
> > > > > > >  & ~opts->x_ix86_isa_flags_explicit);
> > > > > >
> > > > > > Please split the above into two clauses, the first that sets SSE and
> > > > > > MMX by default, and the second to or with
> > > > > >
> > > > > > opts->x_ix86_isa_flags
> > > > > >  |= TARGET_SUBTARGET64_ISA_DEFAULT & 
> > > > > > ~opts->x_ix86_isa_flags_explicit
> > > > > >
> > > > >
> > > > > Like this?
> > > >
> > > > Yes, but also split the comment.
> > >
> > > I will go with
> > >
> > >  /* Enable by default the SSE and MMX builtins.  Do allow the user to
> > >  explicitly disable any of these.  In particular, disabling SSE 
> > > and
> > >  MMX for kernel code is extremely useful.  */
> > >   if (!ix86_arch_specified)
> > > {
> > >   /* Don't enable MMX ISA with TARGET_MMX_WITH_SSE.  */
> > >   opts->x_ix86_isa_flags
> > > |= ((OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE
> > >  | (TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags)
> > > ? 0 : OPTION_MASK_ISA_MMX))
> > > & ~opts->x_ix86_isa_flags_explicit);
> > >   opts->x_ix86_isa_flags
> > > |= (TARGET_SUBTARGET64_ISA_DEFAULT
> > > & ~opts->x_ix86_isa_flags_explicit);
> > > }
> >
> > I'll commit the following patch that finally defines
> > TARGET_SUBTARGET64_ISA_DEFAULT. You could then simply clear the MMX
> > bit from x_i86_isa_flags, like:
> >
> >   if (!ix86_arch_specified)
> > opts->x_ix86_isa_flags
> >   |= TARGET_SUBTARGET64_ISA_DEFAULT & ~opts->x_ix86_isa_flags_explicit;
> >
> >   /* Don't enable MMX ISA with TARGET_MMX_WITH_SSE.  */
> >   if (TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags))
> > opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_MMX;
>
> I think it should be:
>
>   /* Don't enable MMX ISA with TARGET_MMX_WITH_SSE.  */
>   if (!(opts->x_ix86_isa_flags & OPTION_MASK_ISA_MMX)
I meant opts->x_ix86_isa_flags_explicit.
>   && TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags))
> opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_MMX;
>
> Thanks.
>
> --
> H.J.



-- 
H.J.


[committed] Fix combiner's make_extraction (PR rtl-optimization/89354)

2019-02-14 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled on i686-linux, because
make_extraction is asked to make an extraction of 33 bits from DImode MEM
at position 0 and happily returns ZERO_EXTRACT with SImode (even when SImode
can hold only 32 bits), the caller (make_field_assignment) then on this
testcase because of that throws away the |= 0x1ULL.

Fixed thusly, bootstrapped/regtested on {x86_64,i686,powerpc64{,le}}-linux,
preapproved by Segher on IRC, committed to trunk and 8.x branch.

I've also gathered statistics and the only time during those bootstraps and
(except for still pending (second) powerpc64-linux regtest) regtests the
only time this patch made any difference was on this newly added testcase on
i686-linux.

2019-02-14  Jakub Jelinek  

PR rtl-optimization/89354
* combine.c (make_extraction): Punt if extraction_mode is narrower
than len bits.

* gcc.dg/pr89354.c: New test.

--- gcc/combine.c.jj2019-02-05 16:38:28.0 +0100
+++ gcc/combine.c   2019-02-14 16:45:41.445096523 +0100
@@ -7830,6 +7830,10 @@ make_extraction (machine_mode mode, rtx
   && partial_subreg_p (extraction_mode, mode))
 extraction_mode = mode;
 
+  /* Punt if len is too large for extraction_mode.  */
+  if (maybe_gt (len, GET_MODE_PRECISION (extraction_mode)))
+return NULL_RTX;
+
   if (!MEM_P (inner))
 wanted_inner_mode = wanted_inner_reg_mode;
   else
--- gcc/testsuite/gcc.dg/pr89354.c.jj   2019-02-14 17:02:26.013552853 +0100
+++ gcc/testsuite/gcc.dg/pr89354.c  2019-02-14 17:01:44.431237813 +0100
@@ -0,0 +1,22 @@
+/* PR rtl-optimization/89354 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-msse2" { target sse2_runtime } } */
+
+static unsigned long long q = 0;
+
+__attribute__((noipa)) static void
+foo (void)
+{
+  q = (q & ~0x1ULL) | 0x1ULL;
+}
+
+int
+main ()
+{
+  __asm volatile ("" : "+m" (q));
+  foo ();
+  if (q != 0x1ULL)
+__builtin_abort ();
+  return 0;
+}

Jakub


Re: [PATCH 37/40] i386: Allow MMX intrinsic emulation with SSE

2019-02-14 Thread Uros Bizjak
On Fri, Feb 15, 2019 at 12:14 AM H.J. Lu  wrote:

> > > > > > > > gcc/
> > > > > > > >
> > > > > > > > PR target/89021
> > > > > > > > * config/i386/i386-builtin.def: Enable MMX intrinsics 
> > > > > > > > with
> > > > > > > > SSE/SSE2/SSSE3.
> > > > > > > > * config/i386/i386.c (ix86_option_override_internal): 
> > > > > > > > Don't
> > > > > > > > enable MMX ISA with TARGET_MMX_WITH_SSE by default.
> > > > > > > > (ix86_init_mmx_sse_builtins): Enable MMX intrinsics with
> > > > > > > > SSE/SSE2/SSSE3.
> > > > > > > > (ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate 
> > > > > > > > MMX
> > > > > > > > intrinsics with TARGET_MMX_WITH_SSE.
> > > > > > > > * config/i386/mmintrin.h: Don't require MMX in 64-bit 
> > > > > > > > mode.
> > > > > > > >
> > > > > >
> > > > > > >
> > > > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > > > index a9abbe8706b..1d417e08734 100644
> > > > > > > > --- a/gcc/config/i386/i386.c
> > > > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > > > @@ -4165,12 +4165,15 @@ ix86_option_override_internal (bool 
> > > > > > > > main_args_p,
> > > > > > > >opts->x_target_flags
> > > > > > > > |= TARGET_SUBTARGET64_DEFAULT & 
> > > > > > > > ~opts_set->x_target_flags;
> > > > > > > >
> > > > > > > > -  /* Enable by default the SSE and MMX builtins.  Do allow 
> > > > > > > > the user to
> > > > > > > > -explicitly disable any of these.  In particular, 
> > > > > > > > disabling SSE and
> > > > > > > > -MMX for kernel code is extremely useful.  */
> > > > > > > > +  /* Enable the SSE and MMX builtins by default.  Don't 
> > > > > > > > enable MMX
> > > > > > > > + ISA with TARGET_MMX_WITH_SSE by default.  Do allow 
> > > > > > > > the user to
> > > > > > > > +explicitly disable any of these.  In particular, 
> > > > > > > > disabling SSE
> > > > > > > > +and MMX for kernel code is extremely useful.  */
> > > > > > > >if (!ix86_arch_specified)
> > > > > > > >opts->x_ix86_isa_flags
> > > > > > > > -   |= ((OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE | 
> > > > > > > > OPTION_MASK_ISA_MMX
> > > > > > > > +   |= ((OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE
> > > > > > > > +| (TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags)
> > > > > > > > +   ? 0 : OPTION_MASK_ISA_MMX)
> > > > > > > >  | TARGET_SUBTARGET64_ISA_DEFAULT)
> > > > > > > >  & ~opts->x_ix86_isa_flags_explicit);
> > > > > > >
> > > > > > > Please split the above into two clauses, the first that sets SSE 
> > > > > > > and
> > > > > > > MMX by default, and the second to or with
> > > > > > >
> > > > > > > opts->x_ix86_isa_flags
> > > > > > >  |= TARGET_SUBTARGET64_ISA_DEFAULT & 
> > > > > > > ~opts->x_ix86_isa_flags_explicit
> > > > > > >
> > > > > >
> > > > > > Like this?
> > > > >
> > > > > Yes, but also split the comment.
> > > >
> > > > I will go with
> > > >
> > > >  /* Enable by default the SSE and MMX builtins.  Do allow the user 
> > > > to
> > > >  explicitly disable any of these.  In particular, disabling SSE 
> > > > and
> > > >  MMX for kernel code is extremely useful.  */
> > > >   if (!ix86_arch_specified)
> > > > {
> > > >   /* Don't enable MMX ISA with TARGET_MMX_WITH_SSE.  */
> > > >   opts->x_ix86_isa_flags
> > > > |= ((OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE
> > > >  | (TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags)
> > > > ? 0 : OPTION_MASK_ISA_MMX))
> > > > & ~opts->x_ix86_isa_flags_explicit);
> > > >   opts->x_ix86_isa_flags
> > > > |= (TARGET_SUBTARGET64_ISA_DEFAULT
> > > > & ~opts->x_ix86_isa_flags_explicit);
> > > > }
> > >
> > > I'll commit the following patch that finally defines
> > > TARGET_SUBTARGET64_ISA_DEFAULT. You could then simply clear the MMX
> > > bit from x_i86_isa_flags, like:
> > >
> > >   if (!ix86_arch_specified)
> > > opts->x_ix86_isa_flags
> > >   |= TARGET_SUBTARGET64_ISA_DEFAULT & 
> > > ~opts->x_ix86_isa_flags_explicit;
> > >
> > >   /* Don't enable MMX ISA with TARGET_MMX_WITH_SSE.  */
> > >   if (TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags))
> > > opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_MMX;
> >
> > I think it should be:
> >
> >   /* Don't enable MMX ISA with TARGET_MMX_WITH_SSE.  */
> >   if (!(opts->x_ix86_isa_flags & OPTION_MASK_ISA_MMX)
> I meant opts->x_ix86_isa_flags_explicit.
> >   && TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags))
> > opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_MMX;

Well ... I didn't test this part. OTOH, maybe this part is not needed,
MMX disabling can go *after*

  /* Turn on MMX builtins for -msse.  */
  if (TARGET_SSE_P (opts->x_ix86_isa_flags))
opts->x_ix86_isa

Re: [PATCH 37/40] i386: Allow MMX intrinsic emulation with SSE

2019-02-14 Thread H.J. Lu
On Thu, Feb 14, 2019 at 3:21 PM Uros Bizjak  wrote:
>
> On Fri, Feb 15, 2019 at 12:14 AM H.J. Lu  wrote:
>
> > > > > > > > > gcc/
> > > > > > > > >
> > > > > > > > > PR target/89021
> > > > > > > > > * config/i386/i386-builtin.def: Enable MMX intrinsics 
> > > > > > > > > with
> > > > > > > > > SSE/SSE2/SSSE3.
> > > > > > > > > * config/i386/i386.c (ix86_option_override_internal): 
> > > > > > > > > Don't
> > > > > > > > > enable MMX ISA with TARGET_MMX_WITH_SSE by default.
> > > > > > > > > (ix86_init_mmx_sse_builtins): Enable MMX intrinsics 
> > > > > > > > > with
> > > > > > > > > SSE/SSE2/SSSE3.
> > > > > > > > > (ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to 
> > > > > > > > > emulate MMX
> > > > > > > > > intrinsics with TARGET_MMX_WITH_SSE.
> > > > > > > > > * config/i386/mmintrin.h: Don't require MMX in 64-bit 
> > > > > > > > > mode.
> > > > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > > > > index a9abbe8706b..1d417e08734 100644
> > > > > > > > > --- a/gcc/config/i386/i386.c
> > > > > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > > > > @@ -4165,12 +4165,15 @@ ix86_option_override_internal (bool 
> > > > > > > > > main_args_p,
> > > > > > > > >opts->x_target_flags
> > > > > > > > > |= TARGET_SUBTARGET64_DEFAULT & 
> > > > > > > > > ~opts_set->x_target_flags;
> > > > > > > > >
> > > > > > > > > -  /* Enable by default the SSE and MMX builtins.  Do 
> > > > > > > > > allow the user to
> > > > > > > > > -explicitly disable any of these.  In particular, 
> > > > > > > > > disabling SSE and
> > > > > > > > > -MMX for kernel code is extremely useful.  */
> > > > > > > > > +  /* Enable the SSE and MMX builtins by default.  Don't 
> > > > > > > > > enable MMX
> > > > > > > > > + ISA with TARGET_MMX_WITH_SSE by default.  Do allow 
> > > > > > > > > the user to
> > > > > > > > > +explicitly disable any of these.  In particular, 
> > > > > > > > > disabling SSE
> > > > > > > > > +and MMX for kernel code is extremely useful.  */
> > > > > > > > >if (!ix86_arch_specified)
> > > > > > > > >opts->x_ix86_isa_flags
> > > > > > > > > -   |= ((OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE | 
> > > > > > > > > OPTION_MASK_ISA_MMX
> > > > > > > > > +   |= ((OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE
> > > > > > > > > +| (TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags)
> > > > > > > > > +   ? 0 : OPTION_MASK_ISA_MMX)
> > > > > > > > >  | TARGET_SUBTARGET64_ISA_DEFAULT)
> > > > > > > > >  & ~opts->x_ix86_isa_flags_explicit);
> > > > > > > >
> > > > > > > > Please split the above into two clauses, the first that sets 
> > > > > > > > SSE and
> > > > > > > > MMX by default, and the second to or with
> > > > > > > >
> > > > > > > > opts->x_ix86_isa_flags
> > > > > > > >  |= TARGET_SUBTARGET64_ISA_DEFAULT & 
> > > > > > > > ~opts->x_ix86_isa_flags_explicit
> > > > > > > >
> > > > > > >
> > > > > > > Like this?
> > > > > >
> > > > > > Yes, but also split the comment.
> > > > >
> > > > > I will go with
> > > > >
> > > > >  /* Enable by default the SSE and MMX builtins.  Do allow the 
> > > > > user to
> > > > >  explicitly disable any of these.  In particular, disabling 
> > > > > SSE and
> > > > >  MMX for kernel code is extremely useful.  */
> > > > >   if (!ix86_arch_specified)
> > > > > {
> > > > >   /* Don't enable MMX ISA with TARGET_MMX_WITH_SSE.  */
> > > > >   opts->x_ix86_isa_flags
> > > > > |= ((OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE
> > > > >  | (TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags)
> > > > > ? 0 : OPTION_MASK_ISA_MMX))
> > > > > & ~opts->x_ix86_isa_flags_explicit);
> > > > >   opts->x_ix86_isa_flags
> > > > > |= (TARGET_SUBTARGET64_ISA_DEFAULT
> > > > > & ~opts->x_ix86_isa_flags_explicit);
> > > > > }
> > > >
> > > > I'll commit the following patch that finally defines
> > > > TARGET_SUBTARGET64_ISA_DEFAULT. You could then simply clear the MMX
> > > > bit from x_i86_isa_flags, like:
> > > >
> > > >   if (!ix86_arch_specified)
> > > > opts->x_ix86_isa_flags
> > > >   |= TARGET_SUBTARGET64_ISA_DEFAULT & 
> > > > ~opts->x_ix86_isa_flags_explicit;
> > > >
> > > >   /* Don't enable MMX ISA with TARGET_MMX_WITH_SSE.  */
> > > >   if (TARGET_MMX_WITH_SSE_P (opts->x_ix86_isa_flags))
> > > > opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_MMX;
> > >
> > > I think it should be:
> > >
> > >   /* Don't enable MMX ISA with TARGET_MMX_WITH_SSE.  */
> > >   if (!(opts->x_ix86_isa_flags & OPTION_MASK_ISA_MMX)
> > I meant opts->x_ix86_isa_flags_explicit.
> > >   && TARGET_MMX_WITH_SSE_P (opts->x_ix86_is

[PATCH, i386]: Enable MMX, SSE and SSE2 by default in TARGET_SUBTARGET64_ISA_DEFAULT

2019-02-14 Thread Uros Bizjak
No functional changes.

2019-02-15  Uroš Bizjak  

* config/i386/i386.h (TARGET_SUBTARGET64_ISA_DEFAULT):
Enable MMX, SSE and SSE2 by default.
* config/i386/i386.c (ix86_option_override_internal): Do not
explicitly set MMX, SSE and SSE2 flags for TARGET_64BIT here.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 268907)
+++ config/i386/i386.c  (working copy)
@@ -4165,14 +4165,9 @@ ix86_option_override_internal (bool main_args_p,
   opts->x_target_flags
|= TARGET_SUBTARGET64_DEFAULT & ~opts_set->x_target_flags;
 
-  /* Enable by default the SSE and MMX builtins.  Do allow the user to
-explicitly disable any of these.  In particular, disabling SSE and
-MMX for kernel code is extremely useful.  */
   if (!ix86_arch_specified)
-  opts->x_ix86_isa_flags
-   |= ((OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_MMX
-| TARGET_SUBTARGET64_ISA_DEFAULT)
-& ~opts->x_ix86_isa_flags_explicit);
+   opts->x_ix86_isa_flags
+ |= TARGET_SUBTARGET64_ISA_DEFAULT & ~opts->x_ix86_isa_flags_explicit;
 
   if (TARGET_RTD_P (opts->x_target_flags))
warning (0,
Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 268907)
+++ config/i386/i386.h  (working copy)
@@ -633,7 +633,9 @@ extern tree x86_mfence;
 
 /* Extra bits to force on w/ 64-bit mode.  */
 #define TARGET_SUBTARGET64_DEFAULT 0
-#define TARGET_SUBTARGET64_ISA_DEFAULT 0
+/* Enable MMX, SSE and SSE2 by default.  */
+#define TARGET_SUBTARGET64_ISA_DEFAULT \
+  (OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_SSE2)
 
 /* Replace MACH-O, ifdefs by in-line tests, where possible. 
(a) Macros defined in config/i386/darwin.h  */


[PATCH wwwdocs] changes.html for "asm inline"

2019-02-14 Thread Segher Boessenkool
Hi!

I did the following patch for the GCC 8 changes.html (in the 8.3 section):

Index: htdocs/gcc-8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.98
diff -u -r1.98 changes.html
--- htdocs/gcc-8/changes.html   29 Dec 2018 21:09:09 -  1.98
+++ htdocs/gcc-8/changes.html   14 Feb 2019 23:39:34 -
@@ -1354,6 +1354,14 @@
 
 GCC 8.3 is not yet released.
 
+C family
+
+
+  Support has been added for asm inline. An asm
+  that is inline is counted as minimum length for inlining
+  decisions, irrespective of how big it looks otherwise.
+
+
 Windows
 
 


Is that okay?  Also okay for GCC 7.5?  And for GCC 9?


Segher


Re: Go patch committed: Compile thunks with -Os

2019-02-14 Thread Ian Lance Taylor
On Wed, Feb 13, 2019 at 5:21 PM Ian Lance Taylor  wrote:
>
> Nikhil Benesch noticed that changes in the GCC backend were making the
> use of defer functions that call recover less efficient.  A defer
> thunk is a generated function that looks like this (this is the entire
> function body):
>
> if !runtime.setdeferretaddr(&L) {
> deferredFunction()
> }
> L:
>
> The idea is that the address of the label passed to setdeferretaddr is
> the address to which deferredFunction returns.  The code in canrecover
> compares the return address of the function to this saved address to
> see whether the recover function can return non-nil.  This is
> explained in marginally more detail at
> https://www.airs.com/blog/archives/376 .
>
> When the return address does not match, the canrecover code does a
> more costly check that requires unwinding the stack.  What Nikhil
> Benesch noticed is that we were always taking that fallback.
>
> It turned out that the label address passed to setdeferretaddr was not
> the label to which the deferred function would return.  And that was
> because the epilogue was being duplicated by the bb-reorder pass, and
> the label was moved to one copy of the epilogue while the deferred
> function returned to the other epilogue.
>
> Of course there is no reason to duplicate the epilogue in such a small
> function.  One easy way to disable that epilogue duplication is to
> compile the function with -Os.  That is what this patch does.  This
> patch compiles all thunks, not just defer thunks, with -Os, but since
> they are all small that does no harm.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.
>
> Ian
>
> 2019-02-13  Ian Lance Taylor  
>
> * go-gcc.cc: #include "opts.h".
> (Gcc_backend::function): Compile thunks with -Os.

This change revealed that changing function optimization attributes
can cause the compiler to switch to the options stored in
optimization_default_node.  That caused the change to
flag_strict_aliasing in go_imported_unsafe to be lost.  I fixed that
with this patch, which simply updates optimization_default_node.  I
don't know if there is a better way to do this.  For this patch
bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

2019-02-14  Ian Lance Taylor  

* go-backend.c (go_imported_unsafe): Update
optimization_default_node.
Index: go-backend.c
===
--- go-backend.c(revision 268917)
+++ go-backend.c(working copy)
@@ -89,6 +89,7 @@ void
 go_imported_unsafe (void)
 {
   flag_strict_aliasing = false;
+  TREE_OPTIMIZATION (optimization_default_node)->x_flag_strict_aliasing = 
false;
 
   /* Let the backend know that the options have changed.  */
   targetm.override_options_after_change ();


libgo patch committed: Run examples

2019-02-14 Thread Ian Lance Taylor
This patch to the libgo gotest script runs examples when appropriate
in the libgo testsuite.  An example with a "// Output:" comment is
supposed to be run, comparing the output of the example with the text
in the comment.  Up until now we were not actually doing that, so we
were in effect skipping some tests.  This changes that.  The changes
to the script are not fully general for all Go code, but should be
sufficient for the code that actually appears in libgo.  One example
had to be tweaked to match the output generated by gccgo.

This patch also cleans up some cruft in gotest, and should fix GCC PR 89168.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268904)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-c2fc3b83d832725accd4fa5874a5b5ca02dd90dc
+4a6f2bb2c8d3f00966f001a5b03c57cb4a278265
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/example_test.go
===
--- libgo/go/runtime/example_test.go(revision 268369)
+++ libgo/go/runtime/example_test.go(working copy)
@@ -31,7 +31,7 @@ func ExampleFrames() {
// To keep this example's output stable
// even if there are changes in the testing package,
// stop unwinding when we leave package runtime.
-   if !strings.Contains(frame.File, "runtime/") {
+   if !strings.Contains(frame.File, "runtime/") && 
!strings.Contains(frame.File, "/test/") {
break
}
fmt.Printf("- more:%v | %s\n", more, frame.Function)
@@ -47,8 +47,8 @@ func ExampleFrames() {
a()
// Output:
// - more:true | runtime.Callers
-   // - more:true | runtime_test.ExampleFrames.func1
-   // - more:true | runtime_test.ExampleFrames.func2
-   // - more:true | runtime_test.ExampleFrames.func3
+   // - more:true | runtime_test.ExampleFrames..func1
+   // - more:true | runtime_test.ExampleFrames..func2
+   // - more:true | runtime_test.ExampleFrames..func3
// - more:true | runtime_test.ExampleFrames
 }
Index: libgo/testsuite/gotest
===
--- libgo/testsuite/gotest  (revision 268369)
+++ libgo/testsuite/gotest  (working copy)
@@ -289,12 +289,6 @@ x)
;;
 esac
 
-# Some tests expect the _obj directory created by the gc Makefiles.
-mkdir _obj
-
-# Some tests expect the _test directory created by the gc Makefiles.
-mkdir _test
-
 case "x$gofiles" in
 x)
for f in `ls *_test.go`; do
@@ -404,14 +398,6 @@ x)
;;
 esac
 
-# Run any commands given in sources, like
-#   // gotest: $GC foo.go
-# to build any test-only dependencies.
-holdGC="$GC"
-GC="$GC -g -c -I ."
-sed -n 's/^\/\/ gotest: //p' $gofiles | sh
-GC="$holdGC"
-
 case "x$pkgfiles" in
 x)
pkgbasefiles=`ls *.go | grep -v _test.go 2>/dev/null`
@@ -514,26 +500,29 @@ localname() {
 #
 symtogo() {
   result=""
-  for tp in $*
-  do
+  for tp in $*; do
 s=$(echo "$tp" | sed -e 's/\.\.z2f/%/g' | sed -e 's/.*%//')
-# screen out methods (X.Y.Z)
+# Screen out methods (X.Y.Z).
 if ! expr "$s" : '^[^.]*\.[^.]*$' >/dev/null 2>&1; then
   continue
 fi
-echo "$s"
+tname=$(testname $s)
+# Skip TestMain.
+if test x$tname = xTestMain; then
+  continue
+fi
+# Check that the function is defined in a test file,
+# not an ordinary non-test file.
+if grep "^func $tname(" $gofiles $xgofiles >/dev/null 2>&1; then
+  echo "$s"
+fi
   done
 }
 
 {
-   text="T"
-
# On systems using PPC64 ELF ABI v1 function symbols show up
-   # as descriptors in the data section.  We assume that $goarch
-   # distinguishes v1 (ppc64) from v2 (ppc64le).
-   if test "$goos" != "aix" && test "$goarch" = "ppc64"; then
-   text="[TD]"
-   fi
+   # as descriptors in the data section.
+   text="[TD]"
 
# test functions are named TestFoo
# the grep -v eliminates methods and other special names
@@ -575,13 +564,10 @@ symtogo() {
# test array
echo
echo 'var tests = []testing.InternalTest {'
-   for i in $tests
-   do
+   for i in $tests; do
n=$(testname $i)
-   if test "$n" != "TestMain"; then
-   j=$(localname $i)
-   echo '  {"'$n'", '$j'},'
-   fi
+   j=$(localname $i)
+   echo '  {"'$n'", '$j'},'
done
echo '}'
 
@@ -589,8 +575,7 @@ symtogo() {
# The comment makes the multiline declaration
# gofmt-safe even when there are

Re: [PATCH] Fix ICE with optimize("Ofast") due to option handling (PR other/89342)

2019-02-14 Thread Joseph Myers
On Thu, 14 Feb 2019, Jakub Jelinek wrote:

> Hi!
> 
> We ICE on the following testcase, because while we save optimize,
> and optimize_{size,debug} vars during option saving/restoring, we don't save
> optimize_fast, and because of that end up with optimize 0 optimize_fast 1
> which the option handling code ICEs on - 
>   if (fast)
> gcc_assert (level == 3);
> in maybe_default_option.  Fixed thusly, just treat optimize_fast like
> the other flags, bootstrapped/regtested on x86_64-linux and i686-linux, ok
> for trunk/8.3?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [C PATCH] Reject weak nested functions (PR c/89340)

2019-02-14 Thread Joseph Myers
On Fri, 15 Feb 2019, Jakub Jelinek wrote:

> Hi!
> 
> We ICE on the following testcase, because C nested functions are turned into
> !TREE_PUBLIC ones very soon,  and the IPA code asserts that DECL_WEAK 
> functions
> are either TREE_PUBLIC or DECL_EXTERNAL.
> As we reject static __attribute__((weak)) void foo () {}, I think we should
> reject weak nested functions, they don't make much sense either, they are
> TU local too.
> 
> The following patch fixes that.  The other effect of the patch is that leaf
> attribute is warned and ignored on the nested function, but similarly, we
> ignore and warn for leaf attribute on other TU local functions, we see the
> nested function body and can analyze everything in it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Go patch committed: Harmonize types referenced by both C and Go

2019-02-14 Thread Ian Lance Taylor
This patch to the Go frontend and libgo by Nikhil Benesch harmonizes
types referenced by both C and Go.  Compiling with LTO revealed a
number of cases in the runtime and standard library where C and Go
disagreed about the type of an object or function (or where Go and
code generated by the compiler disagreed).  In all cases the
underlying representation was the same (e.g., uintptr vs.void*), so
this wasn't causing actual problems, but it did result in a number of
annoying warnings when compiling with LTO.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268922)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-4a6f2bb2c8d3f00966f001a5b03c57cb4a278265
+03e28273a4fcb114f5204d52ed107591404002f4
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 268891)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -1344,7 +1344,7 @@ Func_descriptor_expression::make_func_de
   if (Func_descriptor_expression::descriptor_type != NULL)
 return;
   Type* uintptr_type = Type::lookup_integer_type("uintptr");
-  Type* struct_type = Type::make_builtin_struct_type(1, "code", uintptr_type);
+  Type* struct_type = Type::make_builtin_struct_type(1, "fn", uintptr_type);
   Func_descriptor_expression::descriptor_type =
 Type::make_builtin_named_type("functionDescriptor", struct_type);
 }
@@ -3874,7 +3874,9 @@ Unsafe_type_conversion_expression::do_ge
  || et->integer_type() != NULL
   || et->is_nil_type());
   else if (et->is_unsafe_pointer_type())
-go_assert(t->points_to() != NULL);
+go_assert(t->points_to() != NULL
+ || (t->integer_type() != NULL
+ && t->integer_type() == 
Type::lookup_integer_type("uintptr")->real_type()));
   else if (t->interface_type() != NULL)
 {
   bool empty_iface = t->interface_type()->is_empty();
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 268891)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -4513,13 +4513,13 @@ Build_recover_thunks::can_recover_arg(Lo
 builtin_return_address =
   Gogo::declare_builtin_rf_address("__builtin_return_address");
 
+  Type* uintptr_type = Type::lookup_integer_type("uintptr");
   static Named_object* can_recover;
   if (can_recover == NULL)
 {
   const Location bloc = Linemap::predeclared_location();
   Typed_identifier_list* param_types = new Typed_identifier_list();
-  Type* voidptr_type = Type::make_pointer_type(Type::make_void_type());
-  param_types->push_back(Typed_identifier("a", voidptr_type, bloc));
+  param_types->push_back(Typed_identifier("a", uintptr_type, bloc));
   Type* boolean_type = Type::lookup_bool_type();
   Typed_identifier_list* results = new Typed_identifier_list();
   results->push_back(Typed_identifier("", boolean_type, bloc));
@@ -4539,6 +4539,7 @@ Build_recover_thunks::can_recover_arg(Lo
   args->push_back(zexpr);
 
   Expression* call = Expression::make_call(fn, args, false, location);
+  call = Expression::make_unsafe_cast(uintptr_type, call, location);
 
   args = new Expression_list();
   args->push_back(call);
Index: gcc/go/gofrontend/runtime.cc
===
--- gcc/go/gofrontend/runtime.cc(revision 268369)
+++ gcc/go/gofrontend/runtime.cc(working copy)
@@ -60,8 +60,6 @@ enum Runtime_function_type
   RFT_IFACE,
   // Go type interface{}, C type struct __go_empty_interface.
   RFT_EFACE,
-  // Go type func(unsafe.Pointer), C type void (*) (void *).
-  RFT_FUNC_PTR,
   // Pointer to Go type descriptor.
   RFT_TYPE,
   // [2]string.
@@ -176,15 +174,6 @@ runtime_function_type(Runtime_function_t
  t = Type::make_empty_interface_type(bloc);
  break;
 
-   case RFT_FUNC_PTR:
- {
-   Typed_identifier_list* param_types = new Typed_identifier_list();
-   Type* ptrtype = runtime_function_type(RFT_POINTER);
-   param_types->push_back(Typed_identifier("", ptrtype, bloc));
-   t = Type::make_function_type(NULL, param_types, NULL, bloc);
- }
- break;
-
case RFT_TYPE:
  t = Type::make_type_descriptor_ptr_type();
  break;
@@ -265,7 +254,6 @@ convert_to_runtime_function_type(Runtime
 case RFT_COMPLEX128:
 case RFT_STRING:
 case RFT_POINTER:
-case RFT_FUNC_PTR:
   {
Type* t = runtime_function_type(bft);
if (!Type::are_identical(t, e->type(), true, NULL))
Index: gcc/go/gofrontend/runtime.def
==

Re: [PATCH] Fix up and improve allow_blank_lines testsuite handling (PR other/69006, PR testsuite/88920)

2019-02-14 Thread Mike Stump
On Feb 13, 2019, at 1:09 AM, Jakub Jelinek  wrote:
> 
> ok for trunk?

Ok.


Re: [PATCH] Fix up and improve allow_blank_lines testsuite handling (PR other/69006, PR testsuite/88920, take 2)

2019-02-14 Thread Mike Stump
On Feb 13, 2019, at 5:37 AM, Jakub Jelinek  wrote:
> Here is an updated patch that documents it.  Bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?

Ok.


Re: [PATCH] Add testcases for multiple -fsanitize=, -fno-sanitize= or -fno-sanitize-recover= options (take 2)

2019-02-14 Thread Mike Stump
On Feb 14, 2019, at 6:15 AM, Jakub Jelinek  wrote:
> Ah, yes, UNRESOLVED doesn't show up visible when running tests by hand,
> rather than doing test_summary.  Here is an updated patch that adds the
> needed dg-skip-if directives.  Ok for trunk?

Ok.


[PATCH] i386: Insert ENDBR for NOTE_INSN_DELETED_LABEL only if needed

2019-02-14 Thread H.J. Lu
NOTE_INSN_DELETED_LABEL is used to mark what used to be a 'code_label',
but was not used for other purposes than taking its address and was
transformed to mark that no code jumps to it.  NOTE_INSN_DELETED_LABEL
is generated only in 3 places:

1. When delete_insn sees an unused label which is an explicit label in
the input source code or its address is taken, it turns the label into
a NOTE_INSN_DELETED_LABEL note.
2. When rtl_tidy_fallthru_edge deletes a tablejump, it turns the
tablejump into a NOTE_INSN_DELETED_LABEL note.
3. ix86_init_large_pic_reg creats a NOTE_INSN_DELETED_LABEL note, .L2,
to initialize large model PIC register:

L2:
movabsq $_GLOBAL_OFFSET_TABLE_-.L2, %r11
leaq.L2(%rip), %rax
movabsq $val@GOT, %rdx
addq%r11, %rax

Among of them, ENDBR is needed only when the label address is taken.
rest_of_insert_endbranch has

  if ((LABEL_P (insn) && LABEL_PRESERVE_P (insn))
  || (NOTE_P (insn)
  && NOTE_KIND (insn) == NOTE_INSN_DELETED_LABEL))
/* TODO.  Check /s bit also.  */
{
  cet_eb = gen_nop_endbr ();
  emit_insn_after (cet_eb, insn);
  continue;
}

For NOTE_INSN_DELETED_LABEL, we should check if forced_labels to see
if its address is taken.  Also ix86_init_large_pic_reg shouldn't set
LABEL_PRESERVE_P (in_struct) since NOTE_INSN_DELETED_LABEL is suffcient
to keep the label.

gcc/

PR target/89355
* config/i386/i386.c (rest_of_insert_endbranch): Check
forced_labels to see if the address of NOTE_INSN_DELETED_LABEL
is taken.
(ix86_init_large_pic_reg): Don't set LABEL_PRESERVE_P.

gcc/testsuite/

PR target/89355
* gcc.target/i386/cet-label-3.c: New test.
* gcc.target/i386/cet-label-4.c: Likewise.
* gcc.target/i386/cet-label-5.c: Likewise.
---
 gcc/config/i386/i386.c  |  9 +---
 gcc/testsuite/gcc.target/i386/cet-label-3.c | 23 +
 gcc/testsuite/gcc.target/i386/cet-label-4.c | 12 +++
 gcc/testsuite/gcc.target/i386/cet-label-5.c | 13 
 4 files changed, 54 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/cet-label-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/cet-label-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/cet-label-5.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fd05873ba39..ed53fbea9ae 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2734,10 +2734,14 @@ rest_of_insert_endbranch (void)
  continue;
}
 
+ /* NB: Add an ENDBR for NOTE_INSN_DELETED_LABEL only if its
+addresss is taken.  */
  if ((LABEL_P (insn) && LABEL_PRESERVE_P (insn))
  || (NOTE_P (insn)
- && NOTE_KIND (insn) == NOTE_INSN_DELETED_LABEL))
-   /* TODO.  Check /s bit also.  */
+ && NOTE_KIND (insn) == NOTE_INSN_DELETED_LABEL
+ && vec_safe_contains
+  (forced_labels,
+   static_cast (insn
{
  cet_eb = gen_nop_endbr ();
  emit_insn_after (cet_eb, insn);
@@ -7002,7 +7006,6 @@ ix86_init_large_pic_reg (unsigned int tmp_regno)
   gcc_assert (Pmode == DImode);
   label = gen_label_rtx ();
   emit_label (label);
-  LABEL_PRESERVE_P (label) = 1;
   tmp_reg = gen_rtx_REG (Pmode, tmp_regno);
   gcc_assert (REGNO (pic_offset_table_rtx) != tmp_regno);
   emit_insn (gen_set_rip_rex64 (pic_offset_table_rtx,
diff --git a/gcc/testsuite/gcc.target/i386/cet-label-3.c 
b/gcc/testsuite/gcc.target/i386/cet-label-3.c
new file mode 100644
index 000..9f427a866f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/cet-label-3.c
@@ -0,0 +1,23 @@
+/* PR target/89355  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcf-protection" } */
+/* { dg-final { scan-assembler-times "endbr32" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "endbr64" 1 { target { ! ia32 } } } } */
+int
+test (int* val)
+{
+  int status = 99;
+
+  if (!val)
+{
+  status = 22;
+  goto end;
+}
+
+  extern int x;
+  *val = x;
+
+  status = 0;
+end:
+  return status;
+}
diff --git a/gcc/testsuite/gcc.target/i386/cet-label-4.c 
b/gcc/testsuite/gcc.target/i386/cet-label-4.c
new file mode 100644
index 000..d743d2bf202
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/cet-label-4.c
@@ -0,0 +1,12 @@
+/* PR target/89355  */
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -fcf-protection -fPIC -mcmodel=large" } */
+/* { dg-final { scan-assembler-times "endbr64" 1 } } */
+
+extern int val;
+
+int
+test (void)
+{
+  return val;
+}
diff --git a/gcc/testsuite/gcc.target/i386/cet-label-5.c 
b/gcc/testsuite/gcc.target/i386/cet-label-5.c
new file mode 100644
index 000..4d5ca816598
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/cet-label-5.c
@@ -0,0 +1,13 @@

Re: [PATCH] Fix tree-loop-distribution.c ICE with -ftrapv (PR tree-optimization/89278)

2019-02-14 Thread Richard Biener
On February 14, 2019 11:52:17 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>The following testcase ICEs, because we try to gimplify a complex
>expression
>that with -ftrapv wants to emit multiple bbs.  Fixed by using
>rewrite_to_non_trapping_overflow.  Bootstrapped/regtested on
>x86_64-linux
>and i686-linux, ok for trunk and 8.3?

OK. 

Richard. 

>2019-02-14  Richard Biener  
>   Jakub Jelinek  
>
>   PR tree-optimization/89278
>   * tree-loop-distribution.c: Include tree-eh.h.
>   (generate_memset_builtin, generate_memcpy_builtin): Call
>   rewrite_to_non_trapping_overflow on builtin->size before passing it
>   to force_gimple_operand_gsi.
>
>   * gcc.dg/pr89278.c: New test.
>
>--- gcc/tree-loop-distribution.c.jj2019-01-10 11:43:02.255576992 +0100
>+++ gcc/tree-loop-distribution.c   2019-02-14 12:17:24.403403131 +0100
>@@ -114,6 +114,7 @@ along with GCC; see the file COPYING3.
> #include "tree-scalar-evolution.h"
> #include "params.h"
> #include "tree-vectorizer.h"
>+#include "tree-eh.h"
> 
> 
> #define MAX_DATAREFS_NUM \
>@@ -996,7 +997,7 @@ generate_memset_builtin (struct loop *lo
>   /* The new statements will be placed before LOOP.  */
>   gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
> 
>-  nb_bytes = builtin->size;
>+  nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
>  nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
>  false, GSI_CONTINUE_LINKING);
>   mem = builtin->dst_base;
>@@ -1048,7 +1049,7 @@ generate_memcpy_builtin (struct loop *lo
>   /* The new statements will be placed before LOOP.  */
>   gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
> 
>-  nb_bytes = builtin->size;
>+  nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
>  nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
>  false, GSI_CONTINUE_LINKING);
>   dest = builtin->dst_base;
>--- gcc/testsuite/gcc.dg/pr89278.c.jj  2019-02-14 12:18:38.778173413
>+0100
>+++ gcc/testsuite/gcc.dg/pr89278.c 2019-02-14 12:18:19.065499344 +0100
>@@ -0,0 +1,23 @@
>+/* PR tree-optimization/89278 */
>+/* { dg-do compile } */
>+/* { dg-options "-O1 -ftrapv -ftree-loop-distribute-patterns --param
>max-loop-header-insns=2" } */
>+
>+void
>+foo (int *w, int x, int y, int z)
>+{
>+  while (x < y + z)
>+{
>+  w[x] = 0;
>+  ++x;
>+}
>+}
>+
>+void
>+bar (int *__restrict u, int *__restrict w, int x, int y, int z)
>+{
>+  while (x < y + z)
>+{
>+  w[x] = u[x];
>+  ++x;
>+}
>+}
>
>   Jakub



Re: [PATCH] Avoid assuming valid_constant_size_p argument is a constant expression (PR 89294)

2019-02-14 Thread Eric Botcazou
> The attached patch removes the assumption introduced earlier today
> in my fix for bug 87996 that the valid_constant_size_p argument is
> a constant expression.  I couldn't come up with a C/C++ test case
> where this isn't true but apparently it can happen in Ada which I
> inadvertently didn't build.

Can we do something here?  Our internal testers have been down for 3 days 
because of this blunder...

-- 
Eric Botcazou


Re: [PATCH] Fix tree-loop-distribution.c ICE with -ftrapv (PR tree-optimization/89278)

2019-02-14 Thread Bin.Cheng
On Fri, Feb 15, 2019 at 6:52 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase ICEs, because we try to gimplify a complex expression
> that with -ftrapv wants to emit multiple bbs.  Fixed by using
> rewrite_to_non_trapping_overflow.  Bootstrapped/regtested on x86_64-linux
So with what condition we can safely rewrite trapping operations into
non trapping one?  Does the rewrite nullify -ftrapv which requires
trap behavior?

Thanks,
bin
> and i686-linux, ok for trunk and 8.3?
>
> 2019-02-14  Richard Biener  
> Jakub Jelinek  
>
> PR tree-optimization/89278
> * tree-loop-distribution.c: Include tree-eh.h.
> (generate_memset_builtin, generate_memcpy_builtin): Call
> rewrite_to_non_trapping_overflow on builtin->size before passing it
> to force_gimple_operand_gsi.
>
> * gcc.dg/pr89278.c: New test.
>
> --- gcc/tree-loop-distribution.c.jj 2019-01-10 11:43:02.255576992 +0100
> +++ gcc/tree-loop-distribution.c2019-02-14 12:17:24.403403131 +0100
> @@ -114,6 +114,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-scalar-evolution.h"
>  #include "params.h"
>  #include "tree-vectorizer.h"
> +#include "tree-eh.h"
>
>
>  #define MAX_DATAREFS_NUM \
> @@ -996,7 +997,7 @@ generate_memset_builtin (struct loop *lo
>/* The new statements will be placed before LOOP.  */
>gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
>
> -  nb_bytes = builtin->size;
> +  nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
>nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
>false, GSI_CONTINUE_LINKING);
>mem = builtin->dst_base;
> @@ -1048,7 +1049,7 @@ generate_memcpy_builtin (struct loop *lo
>/* The new statements will be placed before LOOP.  */
>gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
>
> -  nb_bytes = builtin->size;
> +  nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
>nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
>false, GSI_CONTINUE_LINKING);
>dest = builtin->dst_base;
> --- gcc/testsuite/gcc.dg/pr89278.c.jj   2019-02-14 12:18:38.778173413 +0100
> +++ gcc/testsuite/gcc.dg/pr89278.c  2019-02-14 12:18:19.065499344 +0100
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/89278 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -ftrapv -ftree-loop-distribute-patterns --param 
> max-loop-header-insns=2" } */
> +
> +void
> +foo (int *w, int x, int y, int z)
> +{
> +  while (x < y + z)
> +{
> +  w[x] = 0;
> +  ++x;
> +}
> +}
> +
> +void
> +bar (int *__restrict u, int *__restrict w, int x, int y, int z)
> +{
> +  while (x < y + z)
> +{
> +  w[x] = u[x];
> +  ++x;
> +}
> +}
>
> Jakub


Re: [PATCH] Fix tree-loop-distribution.c ICE with -ftrapv (PR tree-optimization/89278)

2019-02-14 Thread Jakub Jelinek
On Fri, Feb 15, 2019 at 03:25:33PM +0800, Bin.Cheng wrote:
> So with what condition we can safely rewrite trapping operations into
> non trapping one?  Does the rewrite nullify -ftrapv which requires
> trap behavior?

For the particular expression?  Yes, otherwise no.

-ftrapv should be either replaced with -fsanitize=signed-integer-overflow
-fsanitize-undefined-trap-on-error, or at least implemented that way in the
middle-end (perhaps with a separate ifn, so that we can pattern recognize it
during expansion and use library calls where the inline call is not small
enough).  We haven't done that yet though.

Jakub


Re: [PATCH] Fix tree-loop-distribution.c ICE with -ftrapv (PR tree-optimization/89278)

2019-02-14 Thread Jakub Jelinek
On Fri, Feb 15, 2019 at 08:33:44AM +0100, Jakub Jelinek wrote:
> On Fri, Feb 15, 2019 at 03:25:33PM +0800, Bin.Cheng wrote:
> > So with what condition we can safely rewrite trapping operations into
> > non trapping one?  Does the rewrite nullify -ftrapv which requires
> > trap behavior?
> 
> For the particular expression?  Yes, otherwise no.
> 
> -ftrapv should be either replaced with -fsanitize=signed-integer-overflow
> -fsanitize-undefined-trap-on-error, or at least implemented that way in the
> middle-end (perhaps with a separate ifn, so that we can pattern recognize it
> during expansion and use library calls where the inline call is not small
> enough).  We haven't done that yet though.

To clarify, the current -ftrapv implementation doesn't guarantee you get
traps on overflow, it will happily optimize computations away at any time
during GIMPLE optimizations, or turn stuff into unsigned computations etc.
(not just through this rewrite function, but many other ways).
For -fsanitize=signed-integer-overflow -fsanitize-undefined-trap-on-error
there are no guarantees either, but we try hard not to optimize those away,
we have TYPE_OVERFLOW_SANITIZED checks that punt certain optimizations in
fold-const.c/match.pd and early (right after going into ssa form) we turn
the arithmetics into ifns, which are optimized away only if we can prove
there will be no overflow.  On the other side, it can hinder other
optimizations (a lot).  And possibly overflowing computations introduced
during later optimizations are not sanitized.
The question is what -ftrapv users want, plus right now they have a choice,
catch perhaps less UB with more optimization opportunities (-ftrapv)
or catch more optimize less (UBSan).

Jakub


Re: [PATCH 37/40] i386: Allow MMX intrinsic emulation with SSE

2019-02-14 Thread Uros Bizjak
On Thu, Feb 14, 2019 at 12:03 AM H.J. Lu  wrote:

> > > > > Allow MMX intrinsic emulation with SSE/SSE2/SSSE3.  Don't enable MMX 
> > > > > ISA
> > > > > by default with TARGET_MMX_WITH_SSE.
> > > > >
> > > > > For pr82483-1.c and pr82483-2.c, "-mssse3 -mno-mmx" compiles in 64-bit
> > > > > mode since MMX intrinsics can be emulated wit SSE.
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR target/89021
> > > > > * config/i386/i386-builtin.def: Enable MMX intrinsics with
> > > > > SSE/SSE2/SSSE3.
> > > > > * config/i386/i386.c (ix86_option_override_internal): Don't
> > > > > enable MMX ISA with TARGET_MMX_WITH_SSE by default.
> > > > > (bdesc_tm): Enable MMX intrinsics with SSE/SSE2/SSSE3.
> > > > > (ix86_init_mmx_sse_builtins): Likewise.
> > > > > (ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate MMX
> > > > > intrinsics with TARGET_MMX_WITH_SSE.
> > > > > * config/i386/mmintrin.h: Don't require MMX in 64-bit mode.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > > PR target/89021
> > > > > * gcc.target/i386/pr82483-1.c: Error only on ia32.
> > > > > * gcc.target/i386/pr82483-2.c: Likewise.
> > > > > ---
> > > > >  gcc/config/i386/i386-builtin.def  | 126 
> > > > > +++---
> > > > >  gcc/config/i386/i386.c|  62 +++
> > > > >  gcc/config/i386/mmintrin.h|  10 +-
> > > > >  gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
> > > > >  gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
> > > > >  5 files changed, 118 insertions(+), 84 deletions(-)
> > > > >
> > >
> > > > > @@ -30810,13 +30815,13 @@ static const struct builtin_description 
> > > > > bdesc_##kind[] =  \
> > > > > we're lazy.  Add casts to make them fit.  */
> > > > >  static const struct builtin_description bdesc_tm[] =
> > > > >  {
> > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_WM64", 
> > > > > (enum ix86_builtins) BUILT_IN_TM_STORE_M64, UNKNOWN, 
> > > > > VOID_FTYPE_PV2SI_V2SI },
> > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_WaRM64", (enum ix86_builtins) 
> > > > > BUILT_IN_TM_STORE_WAR_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_WaWM64", (enum ix86_builtins) 
> > > > > BUILT_IN_TM_STORE_WAW_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_RM64", 
> > > > > (enum ix86_builtins) BUILT_IN_TM_LOAD_M64, UNKNOWN, V2SI_FTYPE_PCV2SI 
> > > > > },
> > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_RaRM64", (enum ix86_builtins) 
> > > > > BUILT_IN_TM_LOAD_RAR_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_RaWM64", (enum ix86_builtins) 
> > > > > BUILT_IN_TM_LOAD_RAW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_RfWM64", (enum ix86_builtins) 
> > > > > BUILT_IN_TM_LOAD_RFW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_WM64", (enum ix86_builtins) BUILT_IN_TM_STORE_M64, 
> > > > > UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_WaRM64", (enum ix86_builtins) 
> > > > > BUILT_IN_TM_STORE_WAR_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_WaWM64", (enum ix86_builtins) 
> > > > > BUILT_IN_TM_STORE_WAW_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_RM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_M64, 
> > > > > UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_RaRM64", (enum ix86_builtins) 
> > > > > BUILT_IN_TM_LOAD_RAR_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_RaWM64", (enum ix86_builtins) 
> > > > > BUILT_IN_TM_LOAD_RAW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
> > > > > "__builtin__ITM_RfWM64", (enum ix86_builtins) 
> > > > > BUILT_IN_TM_LOAD_RFW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > >
> > > > Please explain why you need the above change.
> > >
> > > Reverted.
> >
> > Actually, I don't know if this change is needed or not, since I don't
> > know ITM that good. We still have __m64 (V2SI) loads and stores; they
> > are handled by a different (SSE2) instruction, but the access is still
> > there. Does ITM care if the access is atomic or not? I don't k

Re: [Patch, aarch64] Issue warning/error for mixing functions with/without aarch64_vector_pcs attribute

2019-02-14 Thread Richard Sandiford
Steve Ellcey  writes:
> Szabolcs pointed out that my SIMD ABI patches that implement the
> aarch64_vector_pcs attribute do not generate a warning or error
> when being mixed with functions that do not have the attribute because
> the 'affects_type_identity' field was false in the attribute table.
>
> This patch fixes that.  I thought I could just set it to true but it
> turned out I also had to implement TARGET_COMP_TYPE_ATTRIBUTES as well.

Yeah, looks like the flag just controls whether the attribute should
be printed as part of the type, to give sensible error messages.

> This patch does that and adds a test case to check for the error
> when assigning a function with the attribute to a pointer type without
> the attribute.
>
> The test checks for an error because the testsuite adds -pedantic-
> errors to the compile line.  Without this you would just get a warning,
> but that is consistent with any mixing of different function types in a
> function pointer assignment.
>
> Tested with a bootstrap build and test run on aarch64.  OK for checkin?
>
> Steve Ellcey
> sell...@marvell.com
>
>
> 2018-02-13  Steve Ellcey  
>
>   * config/aarch64/aarch64.c (aarch64_attribute_table): Change
>   affects_type_identity to true for aarch64_vector_pcs.
>   (aarch64_comp_type_attributes): New function.
>   (TARGET_COMP_TYPE_ATTRIBUTES): New macro.
>
> 2018-02-13  Steve Ellcey  
>
>   * gcc.target/aarch64/pcs_attribute.c: New test.

OK, thanks.

Richard


[PATCH] Document LWG 2735 status and add test

2019-02-14 Thread Jonathan Wakely

This DR was already resolved for GCC 7.1 by the implementation of DR
2192, but we didn't have an explicit test for the behaviour that 2735
guarantees.

* doc/xml/manual/intro.xml: Document LWG 2735 status.
* include/bits/std_abs.h: Add comment about LWG 2735.
* testsuite/26_numerics/headers/cstdlib/dr2735.cc: New test.

Tested powerpc64le-linux, committed to trunk.


commit 32a397016958aad920289c895f4b65602d7aa46d
Author: Jonathan Wakely 
Date:   Thu Feb 14 08:54:00 2019 +

Document LWG 2735 status and add test

This DR was already resolved for GCC 7.1 by the implementation of DR
2192, but we didn't have an explicit test for the behaviour that 2735
guarantees.

* doc/xml/manual/intro.xml: Document LWG 2735 status.
* include/bits/std_abs.h: Add comment about LWG 2735.
* testsuite/26_numerics/headers/cstdlib/dr2735.cc: New test.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index 28210cb0862..71050a0cebc 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -1134,6 +1134,17 @@ requirements of the license of GCC.
 Define the value_compare typedef.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="&DR;#2735">2735:
+   std::abs(short),
+std::abs(signed char) and others should return
+int instead of double in order to be
+compatible with C++98 and C
+   
+
+Resolved by the changes for
+  2192.
+
+
 http://www.w3.org/1999/xlink"; xlink:href="&DR;#2770">2770:
tuple_size specialization is not
 SFINAE compatible and breaks decomposition declarations
diff --git a/libstdc++-v3/include/bits/std_abs.h 
b/libstdc++-v3/include/bits/std_abs.h
index 60a65423c38..8430010b432 100644
--- a/libstdc++-v3/include/bits/std_abs.h
+++ b/libstdc++-v3/include/bits/std_abs.h
@@ -64,6 +64,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // _GLIBCXX_RESOLVE_LIB_DEFECTS
 // 2192. Validity and return type of std::abs(0u) is unclear
 // 2294.  should declare abs(double)
+// 2735. std::abs(short), std::abs(signed char) and others should return int
 
 #ifndef __CORRECT_ISO_CPP_MATH_H_PROTO
   inline _GLIBCXX_CONSTEXPR double
diff --git a/libstdc++-v3/testsuite/26_numerics/headers/cstdlib/dr2735.cc 
b/libstdc++-v3/testsuite/26_numerics/headers/cstdlib/dr2735.cc
new file mode 100644
index 000..2a542011fa6
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/headers/cstdlib/dr2735.cc
@@ -0,0 +1,48 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile }
+
+// NB: Don't include any other headers in this file.
+// LWG 2735. std::abs(short), std::abs(signed char) and others should return
+// int instead of double in order to be compatible with C++98 and C
+#include 
+
+template struct is_int { };
+template<> struct is_int { typedef int type; };
+
+template
+typename is_int::type
+do_check(T t)
+{
+  return T(0);
+}
+
+template
+void check()
+{
+  do_check(std::abs(T(0)));
+}
+
+void test()
+{
+  check();
+  check();
+  check();
+  check();
+  check();
+}


Re: [PATCH] Fix ICE in strlen () > 0 folding (PR tree-optimization/89314)

2019-02-14 Thread Richard Biener
On Wed, Feb 13, 2019 at 12:14 AM Jakub Jelinek  wrote:
>
> Hi!
>
> fold_binary_loc verifies that strlen argument is a pointer, but doesn't
> verify what the pointee is.
> The following patch just always converts it to the right pointer type
> (const char *) and dereferences only that.
> Another option would be punt if the pointee (TYPE_MAIN_VARIANT) is not
> char_type_node, but then e.g. unsigned_char_type_node or
> signed_char_type_node (or maybe char8_t) wouldn't be that bad.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2019-02-12  Jakub Jelinek  
>
> PR tree-optimization/89314
> * fold-const.c (fold_binary_loc): Cast strlen argument to
> const char * before dereferencing it.  Formatting fixes.
>
> * gcc.dg/pr89314.c: New test.
>
> --- gcc/fold-const.c.jj 2019-02-11 18:04:18.0 +0100
> +++ gcc/fold-const.c2019-02-12 21:11:21.491388038 +0100
> @@ -10740,20 +10740,24 @@ fold_binary_loc (location_t loc, enum tr
> strlen(ptr) != 0   =>  *ptr != 0
>  Other cases should reduce to one of these two (or a constant)
>  due to the return value of strlen being unsigned.  */
> -  if (TREE_CODE (arg0) == CALL_EXPR
> - && integer_zerop (arg1))
> +  if (TREE_CODE (arg0) == CALL_EXPR && integer_zerop (arg1))
> {
>   tree fndecl = get_callee_fndecl (arg0);
>
>   if (fndecl
>   && fndecl_built_in_p (fndecl, BUILT_IN_STRLEN)
>   && call_expr_nargs (arg0) == 1
> - && TREE_CODE (TREE_TYPE (CALL_EXPR_ARG (arg0, 0))) == 
> POINTER_TYPE)
> + && (TREE_CODE (TREE_TYPE (CALL_EXPR_ARG (arg0, 0)))
> + == POINTER_TYPE))
> {
> - tree iref = build_fold_indirect_ref_loc (loc,
> -  CALL_EXPR_ARG (arg0, 0));
> + tree ptrtype
> +   = build_pointer_type (build_qualified_type (char_type_node,
> +   TYPE_QUAL_CONST));
> + tree ptr = fold_convert_loc (loc, ptrtype,
> +  CALL_EXPR_ARG (arg0, 0));
> + tree iref = build_fold_indirect_ref_loc (loc, ptr);
>   return fold_build2_loc (loc, code, type, iref,
> - build_int_cst (TREE_TYPE (iref), 0));
> + build_int_cst (TREE_TYPE (iref), 0));
> }
> }
>
> --- gcc/testsuite/gcc.dg/pr89314.c.jj   2019-02-12 21:15:11.624589045 +0100
> +++ gcc/testsuite/gcc.dg/pr89314.c  2019-02-12 21:14:49.138960233 +0100
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/89314 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Wbuiltin-declaration-mismatch -Wextra" } */
> +
> +extern __SIZE_TYPE__ strlen (const float *);   /* { dg-warning "mismatch in 
> argument 1 type of built-in function" } */
> +void bar (void);
> +
> +void
> +foo (float *s)
> +{
> +  if (strlen (s) > 0)
> +bar ();
> +}
>
> Jakub


Re: [PATCH] Call free_dominance_info when transformed in DCE (PR rtl-optimization/89242).

2019-02-14 Thread Richard Biener
On Wed, Feb 13, 2019 at 6:56 AM Martin Liška  wrote:
>
> Hi.
>
> The patch is very similar to r236460 where we should release dominance info
> when the CFG is modified.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2019-02-12  Martin Liska  
>
> PR rtl-optimization/89242
> * dce.c (delete_unmarked_insns): Call free_dominance_info we
> process a transformation.
>
> gcc/testsuite/ChangeLog:
>
> 2019-02-12  Martin Liska  
>
> PR rtl-optimization/89242
> * g++.dg/pr89242.C: New test.
> ---
>  gcc/dce.c  |  1 +
>  gcc/testsuite/g++.dg/pr89242.C | 15 +++
>  2 files changed, 16 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/pr89242.C
>
>


Re: PR87689, PowerPC64 ELFv2 function parameter passing violation

2019-02-14 Thread Richard Biener
On Wed, Feb 13, 2019 at 7:59 AM Alan Modra  wrote:
>
> Covers for a generic fortran bug.  The effect is that we'll needlessly
> waste 64 bytes of stack space on some calls, but I don't see any
> simple and fully correct patch in generic code.  Bootstrapped and
> regression tested powerpc64le-linux.  OK mainline and branches?

This looks very wrong to me ;)  It won't work when compiling with -flto
for example.

The frontend needs to be properly fixed.

Richard.

> PR target/87689
> * config/rs6000/rs6000.c (rs6000_function_parms_need_stack): Cope
> with fortran function decls that lack all args.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 31256a4da8d..288b7606b5e 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -12325,6 +12325,13 @@ rs6000_function_parms_need_stack (tree fun, bool 
> incoming)
>if ((!incoming && !prototype_p (fntype)) || stdarg_p (fntype))
>  return true;
>
> +  /* FIXME: Fortran arg lists can contain hidden parms, fooling
> + prototype_p into saying the function is prototyped when in fact
> + the number and type of args is unknown.  See PR 87689.  */
> +  if (!incoming && (strcmp (lang_hooks.name, "GNU F77") == 0
> +   || lang_GNU_Fortran ()))
> +return true;
> +
>INIT_CUMULATIVE_INCOMING_ARGS (args_so_far_v, fntype, NULL_RTX);
>args_so_far = pack_cumulative_args (&args_so_far_v);
>
>
> --
> Alan Modra
> Australia Development Lab, IBM


Re: [PATCH] PR rtl-optimization/88308 Update LABEL_NUSES in move_insn_for_shrink_wrap

2019-02-14 Thread Richard Biener
On Thu, Feb 14, 2019 at 12:08 AM Aaron Sawdey  wrote:
>
> I've tracked pr/88308 down to move_insn_for_shrink_wrap(). This function 
> moves an insn
> from one BB to another by copying it and deleting the old one. Unfortunately 
> this does
> the LABEL_NUSES count on labels referenced because deleting the old 
> instruction decrements
> the count and nothing in this function is incrementing the count.
>
> It just happens that on rs6000 with -m64, force_const_mem() gets called on 
> the address
> and that sets LABEL_PRESERVE_P on the label which prevents it from being 
> deleted. For
> whatever reason this doesn't happen in a -m32 compilation, and the label and 
> it's associated
> jump table data are deleted. This later causes the ICE when the dwarf code 
> tries to look
> at the label.
>
> Segher and I came up with 3 possible solutions to this:
>
> 1) Don't let move_insn_for_shrink_wrap try to move insns with label_ref in 
> them.
> 2) Call mark_jump_label() on the copied instruction to fix up the ref counts.
> 3) Make the function actually move the insn instead of copying/deleting it.
>
> It seemed like option 2 was the best thing for stage 4 as it is not 
> inhibiting anything
> and is just doing a fixup of the ref count.
>
> OK for trunk after regtesting on ppc64be (32/64) and x86_64?

OK.

> Thanks!
>Aaron
>
>
> 2019-02-13  Aaron Sawdey  
>
> * shrink-wrap.c (move_insn_for_shrink_wrap): Fix LABEL_NUSES counts
> on copied instruction.
>
>
> Index: gcc/shrink-wrap.c
> ===
> --- gcc/shrink-wrap.c   (revision 268783)
> +++ gcc/shrink-wrap.c   (working copy)
> @@ -414,7 +414,12 @@
>dead_debug_insert_temp (debug, DF_REF_REGNO (def), insn,
>   DEBUG_TEMP_BEFORE_WITH_VALUE);
>
> -  emit_insn_after (PATTERN (insn), bb_note (bb));
> +  rtx_insn *insn_copy = emit_insn_after (PATTERN (insn), bb_note (bb));
> +  /* Update the LABEL_NUSES count on any referenced labels. The ideal
> + solution here would be to actually move the instruction instead
> + of copying/deleting it as this loses some notations on the
> + insn.  */
> +  mark_jump_label (PATTERN (insn), insn_copy, 0);
>delete_insn (insn);
>return true;
>  }
>
>
> --
> Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
> 050-2/C113  (507) 253-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain
>


[PATCH] PR middle-end/89303 add testcase for std::enable_shared_from_this

2019-02-14 Thread Jonathan Wakely

* testsuite/20_util/enable_shared_from_this/89303.cc: New test.

Tested x86_64-linux, committed to trunk.

commit 1a2917b994921926d37c609d386a2cc32ed65735
Author: Jonathan Wakely 
Date:   Thu Feb 14 09:16:04 2019 +

PR middle-end/89303 add testcase for std::enable_shared_from_this

* testsuite/20_util/enable_shared_from_this/89303.cc: New test.

diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/89303.cc 
b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/89303.cc
new file mode 100644
index 000..3b23332c35e
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/89303.cc
@@ -0,0 +1,39 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-O1" }
+// { dg-do run { target c++11 } }
+
+// PR middle-end/89303
+
+#include 
+
+class blob final: public std::enable_shared_from_this
+{
+  int* data;
+
+public:
+  blob() { data = new int; }
+  ~blob() { delete data; }
+};
+
+int
+main()
+{
+  std::shared_ptr tg = std::make_shared();
+  return tg->shared_from_this().use_count() - 2;
+}


Re: Go patch committed: Compile thunks with -Os

2019-02-14 Thread Richard Biener
On Thu, Feb 14, 2019 at 2:21 AM Ian Lance Taylor  wrote:
>
> Nikhil Benesch noticed that changes in the GCC backend were making the
> use of defer functions that call recover less efficient.  A defer
> thunk is a generated function that looks like this (this is the entire
> function body):
>
> if !runtime.setdeferretaddr(&L) {
> deferredFunction()
> }
> L:
>
> The idea is that the address of the label passed to setdeferretaddr is
> the address to which deferredFunction returns.  The code in canrecover
> compares the return address of the function to this saved address to
> see whether the recover function can return non-nil.  This is
> explained in marginally more detail at
> https://www.airs.com/blog/archives/376 .
>
> When the return address does not match, the canrecover code does a
> more costly check that requires unwinding the stack.  What Nikhil
> Benesch noticed is that we were always taking that fallback.
>
> It turned out that the label address passed to setdeferretaddr was not
> the label to which the deferred function would return.  And that was
> because the epilogue was being duplicated by the bb-reorder pass, and
> the label was moved to one copy of the epilogue while the deferred
> function returned to the other epilogue.
>
> Of course there is no reason to duplicate the epilogue in such a small
> function.  One easy way to disable that epilogue duplication is to
> compile the function with -Os.  That is what this patch does.  This
> patch compiles all thunks, not just defer thunks, with -Os, but since
> they are all small that does no harm.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.

I think an easier way would have been to mark it with the cold attribute?

> Ian
>
> 2019-02-13  Ian Lance Taylor  
>
> * go-gcc.cc: #include "opts.h".
> (Gcc_backend::function): Compile thunks with -Os.


Re: [PATCH 37/40] i386: Allow MMX intrinsic emulation with SSE

2019-02-14 Thread Richard Biener
On Thu, Feb 14, 2019 at 9:16 AM Uros Bizjak  wrote:
>
> On Thu, Feb 14, 2019 at 12:03 AM H.J. Lu  wrote:
>
> > > > > > Allow MMX intrinsic emulation with SSE/SSE2/SSSE3.  Don't enable 
> > > > > > MMX ISA
> > > > > > by default with TARGET_MMX_WITH_SSE.
> > > > > >
> > > > > > For pr82483-1.c and pr82483-2.c, "-mssse3 -mno-mmx" compiles in 
> > > > > > 64-bit
> > > > > > mode since MMX intrinsics can be emulated wit SSE.
> > > > > >
> > > > > > gcc/
> > > > > >
> > > > > > PR target/89021
> > > > > > * config/i386/i386-builtin.def: Enable MMX intrinsics with
> > > > > > SSE/SSE2/SSSE3.
> > > > > > * config/i386/i386.c (ix86_option_override_internal): Don't
> > > > > > enable MMX ISA with TARGET_MMX_WITH_SSE by default.
> > > > > > (bdesc_tm): Enable MMX intrinsics with SSE/SSE2/SSSE3.
> > > > > > (ix86_init_mmx_sse_builtins): Likewise.
> > > > > > (ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate MMX
> > > > > > intrinsics with TARGET_MMX_WITH_SSE.
> > > > > > * config/i386/mmintrin.h: Don't require MMX in 64-bit mode.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > >
> > > > > > PR target/89021
> > > > > > * gcc.target/i386/pr82483-1.c: Error only on ia32.
> > > > > > * gcc.target/i386/pr82483-2.c: Likewise.
> > > > > > ---
> > > > > >  gcc/config/i386/i386-builtin.def  | 126 
> > > > > > +++---
> > > > > >  gcc/config/i386/i386.c|  62 +++
> > > > > >  gcc/config/i386/mmintrin.h|  10 +-
> > > > > >  gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
> > > > > >  gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
> > > > > >  5 files changed, 118 insertions(+), 84 deletions(-)
> > > > > >
> > > >
> > > > > > @@ -30810,13 +30815,13 @@ static const struct builtin_description 
> > > > > > bdesc_##kind[] =  \
> > > > > > we're lazy.  Add casts to make them fit.  */
> > > > > >  static const struct builtin_description bdesc_tm[] =
> > > > > >  {
> > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > "__builtin__ITM_WM64", (enum ix86_builtins) BUILT_IN_TM_STORE_M64, 
> > > > > > UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > "__builtin__ITM_WaRM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_STORE_WAR_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > "__builtin__ITM_WaWM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_STORE_WAW_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > "__builtin__ITM_RM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_M64, 
> > > > > > UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > "__builtin__ITM_RaRM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_LOAD_RAR_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > "__builtin__ITM_RaWM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_LOAD_RAW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > "__builtin__ITM_RfWM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_LOAD_RFW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > CODE_FOR_nothing, "__builtin__ITM_WM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_STORE_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > CODE_FOR_nothing, "__builtin__ITM_WaRM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_STORE_WAR_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > CODE_FOR_nothing, "__builtin__ITM_WaWM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_STORE_WAW_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > CODE_FOR_nothing, "__builtin__ITM_RM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_LOAD_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > CODE_FOR_nothing, "__builtin__ITM_RaRM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_LOAD_RAR_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > CODE_FOR_nothing, "__builtin__ITM_RaWM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_LOAD_RAW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > CODE_FOR_nothing, "__builtin__ITM_RfWM64", (enum ix86_builtins) 
> > > > > > BUILT_IN_TM_LOAD_RFW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > >
> > > > > Please explain why you need the above change.
> > > >
> > > > Reverted.
> > >
> > > Actually, I don't know if this change i

Re: Provide __start_minfo/__stop_minfo for linkers that don't (PR d/87864)

2019-02-14 Thread Rainer Orth
Hi Iain,

> On Tue, 29 Jan 2019 at 13:24, Rainer Orth  
> wrote:
>>
>> Solaris ld only gained support for section bracketing in Solaris 11.4.
>> Fortunately, in gdc it is only used for the minfo section, so it's easy
>> to provide a workaround by adding two additional startup files
>> drt{begin,end}.o which define __start_minfo and __stop_minfo.
>>
>> This patch does just that.
>>
>> I've raised a couple of questions in the PR already:
>>
>> * I've introduced a new -dstartfiles option which triggers the use of
>>   libgphobos.spec even with -nophoboslib.  Since it's effectively
>>   internal to the build system, I'm not currently documenting it.
>>
>> * I'm reading the spec file addition from a file: keeping it in a make
>>   variable would be extremely messy due to the necessary quoting.
>>
>> * I've chosen to use -Wc instead of -Xcompiler throughout: it's way
>>   shorter when more options need to be passed and it can take several
>>   comma-separated options at once.
>>
>> * libdruntime/gcc/drtstuff.c needs a copyright notice unless one wants
>>   to keep it in the public domain (also plausible).  Effectively
>>   something for Iain to decide.
>>
>> Bootstrapped without regressions on i386-pc-solaris2.11 (Solaris 11.3),
>> no regressions compared to Solaris 11.4 test results.
>>
>> Rainer
>>
>> --
>> -
>> Rainer Orth, Center for Biotechnology, Bielefeld University
>>
>>
>> 2018-11-20  Rainer Orth  
>>
>> libphobos:
>> PR d/87864
>> * configure.ac [!DCFG_MINFO_BRACKETING] (DRTSTUFF_SPEC): New 
>> variable.
>> Substitute it.
>> * libdruntime/m4/druntime/os.m4 (DRUNTIME_OS_MINFO_BRACKETING):
>> New automake conditional.
>> * configure: Regenerate.
>> * libdruntime/gcc/drtstuff.c: New file.
>> * libdruntime/Makefile.am [!DRUNTIME_OS_MINFO_BRACKETING]
>> (DRTSTUFF, toolexeclib_DATA): New variables.
>> (gcc/drtbegin.lo, gcc/drtend.lo): New rules.
>> (libgdruntime_la_LDFLAGS): Add -dstartfiles -Bgcc -B../src.
>> (libgdruntime_la_DEPENDENCIES): New variable.
>> * src/Makefile.am (libgphobos_la_LDFLAGS): Add -dstartfiles
>> -B../libdruntime/gcc.
>> * libdruntime/Makefile.in, src/Makefile.in: Regenerate.
>> * Makefile.in, testsuite/Makefile.in: Regenerate.
>> * libdruntime/rt/sections_elf_shared.d (Minfo_Bracketing): Don't
>> assert.
>> * src/drtstuff.spec: New file.
>> * src/libgphobos.spec.in (DRTSTUFF_SPEC): Substitute.
>> (*lib): Only pass SPEC_PHOBOS_DEPS without -debuglib, -defaultlib,
>> -nophoboslib.
>> * testsuite/testsuite_flags.in <--gdcldflags> (GDCLDFLAGS): Add
>> -B${BUILD_DIR}/libdruntime/gcc.
>>
>> * libdruntime/Makefile.am (unittest_static_LDFLAGS): Use -Wc
>> instead of -Xcompiler.
>> (libgdruntime_t_la_LDFLAGS): Likewise.
>> (unittest_LDFLAGS): Likewise.
>> * src/Makefile.am (unittest_static_LDFLAGS): Likewise.
>> (libgphobos_t_la_LDFLAGS): Likewise.
>> (unittest_LDFLAGS): Likewise.
>>
>> gcc/d:
>> PR d/87864
>> * lang.opt (dstartfiles): New option.
>> * d-spec.cc (need_spec): New variable.
>> (lang_specific_driver) : Enable need_spec.
>> (lang_specific_pre_link): Also load libgphobos.spec if need_spec.
>>
>> gcc/testsuite:
>> PR d/87864
>> * lib/gdc.exp (gdc_link_flags): Add path to drtbegin.o/drtend.o if
>> present.
>>
>
> I'd say go for it.  I see that there's a tab that found its way into
> lib/gdc.exp, and there's a copyright notice that needs fixing up.

that tab is both due the gcc convention (GCS actually) of using tabs
instead of 8 spaces, unlike D, and Emacs' tcl mode that follows it.
I've now fixed it up to be consistent with the rest of gdc.exp.

For the drtstuff.c copyright notice, I've taken GPLv3+runtime exception,
just like the libgcc/crtstuff.c one where this snippet effectively comes
from.  Since this file is gdc-only, I guess that's ok?

I'm running an i686-pc-linux-gnu bootstrap right now where this patch
should be a no-op, just to make sure again that it doesn't break
anything.  Unless you see some error or there's a problem with the
choice of license, I'm going to check it in afterwards.

> I'd make another change after this, and move / remove the
> rt/sections_*.d modules to gcc/sections/*.d, as those modules mirrored
> from upstream are all very specific to the dmd compiler itself, and I
> don't think will be able to use sections_osx or sections_win32
> verbatim in the same way as sections_elf_shared.

Certainly makes sense.  ldc has its own sections_ldc.d, probably for
similar reasons.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


# HG c

Re: [PATCH 37/40] i386: Allow MMX intrinsic emulation with SSE

2019-02-14 Thread Uros Bizjak
On Thu, Feb 14, 2019 at 10:44 AM Richard Biener
 wrote:
>
> On Thu, Feb 14, 2019 at 9:16 AM Uros Bizjak  wrote:
> >
> > On Thu, Feb 14, 2019 at 12:03 AM H.J. Lu  wrote:
> >
> > > > > > > Allow MMX intrinsic emulation with SSE/SSE2/SSSE3.  Don't enable 
> > > > > > > MMX ISA
> > > > > > > by default with TARGET_MMX_WITH_SSE.
> > > > > > >
> > > > > > > For pr82483-1.c and pr82483-2.c, "-mssse3 -mno-mmx" compiles in 
> > > > > > > 64-bit
> > > > > > > mode since MMX intrinsics can be emulated wit SSE.
> > > > > > >
> > > > > > > gcc/
> > > > > > >
> > > > > > > PR target/89021
> > > > > > > * config/i386/i386-builtin.def: Enable MMX intrinsics with
> > > > > > > SSE/SSE2/SSSE3.
> > > > > > > * config/i386/i386.c (ix86_option_override_internal): 
> > > > > > > Don't
> > > > > > > enable MMX ISA with TARGET_MMX_WITH_SSE by default.
> > > > > > > (bdesc_tm): Enable MMX intrinsics with SSE/SSE2/SSSE3.
> > > > > > > (ix86_init_mmx_sse_builtins): Likewise.
> > > > > > > (ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate MMX
> > > > > > > intrinsics with TARGET_MMX_WITH_SSE.
> > > > > > > * config/i386/mmintrin.h: Don't require MMX in 64-bit 
> > > > > > > mode.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > >
> > > > > > > PR target/89021
> > > > > > > * gcc.target/i386/pr82483-1.c: Error only on ia32.
> > > > > > > * gcc.target/i386/pr82483-2.c: Likewise.
> > > > > > > ---
> > > > > > >  gcc/config/i386/i386-builtin.def  | 126 
> > > > > > > +++---
> > > > > > >  gcc/config/i386/i386.c|  62 +++
> > > > > > >  gcc/config/i386/mmintrin.h|  10 +-
> > > > > > >  gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
> > > > > > >  gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
> > > > > > >  5 files changed, 118 insertions(+), 84 deletions(-)
> > > > > > >
> > > > >
> > > > > > > @@ -30810,13 +30815,13 @@ static const struct builtin_description 
> > > > > > > bdesc_##kind[] =  \
> > > > > > > we're lazy.  Add casts to make them fit.  */
> > > > > > >  static const struct builtin_description bdesc_tm[] =
> > > > > > >  {
> > > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > > "__builtin__ITM_WM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_STORE_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > > "__builtin__ITM_WaRM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_STORE_WAR_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > > "__builtin__ITM_WaWM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_STORE_WAW_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > > "__builtin__ITM_RM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_M64, 
> > > > > > > UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > > "__builtin__ITM_RaRM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_LOAD_RAR_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > > "__builtin__ITM_RaWM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_LOAD_RAW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > > -  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> > > > > > > "__builtin__ITM_RfWM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_LOAD_RFW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > > CODE_FOR_nothing, "__builtin__ITM_WM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_STORE_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > > CODE_FOR_nothing, "__builtin__ITM_WaRM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_STORE_WAR_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > > CODE_FOR_nothing, "__builtin__ITM_WaWM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_STORE_WAW_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
> > > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > > CODE_FOR_nothing, "__builtin__ITM_RM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_LOAD_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > > CODE_FOR_nothing, "__builtin__ITM_RaRM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_LOAD_RAR_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > > CODE_FOR_nothing, "__builtin__ITM_RaWM64", (enum ix86_builtins) 
> > > > > > > BUILT_IN_TM_LOAD_RAW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
> > > > > > > +  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, 
> > > > > > > CODE_FOR_nothing

[PR fortran/89348, patch] Fortran Command Options documentation fixes

2019-02-14 Thread Mark Eggleston
Enabling of -fdec-include is missing from list of options enabled by 
-fdec. When rendered as a PDF some lines are too long in the list of 
options controlling Fortran dialect and in the list of options to 
request or suppress errors and warnings.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89348

Patch and change log attached to PR.

OK for trunk? I do not have commit rights.

--
https://www.codethink.co.uk/privacy.html



Re: [PATCH] Construct ipa_reduced_postorder always for overwritable (PR ipa/89009).

2019-02-14 Thread Jan Hubicka
> Hi.
> 
> This is patch candidate I created and tested. It's not adding
> filtering based on opt_for_fn which I would defer to the next
> stage1.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin

> From d036f75a880bc91f67a5473767b35ba2f8a4ffe3 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Mon, 11 Feb 2019 16:47:06 +0100
> Subject: [PATCH] Reduce SCCs in IPA postorder.
> 
> gcc/ChangeLog:
> 
> 2019-02-13  Martin Liska  
> 
>   * ipa-cp.c (build_toporder_info): Use
>   ignore_edge_if_not_available as edge filter.
>   * ipa-inline.c (inline_small_functions): Likewise.
>   * ipa-pure-const.c (ignore_edge_for_pure_const):
>   Move to ipa-utils.h and rename to ignore_edge_if_not_available.
>   (propagate_pure_const): Use ignore_edge_if_not_available
>   as edge filter.
>   * ipa-reference.c (ignore_edge_p): Make SCCs more fine
>   based on availability and ECF_LEAF attribute.
>   * ipa-utils.c (searchc): Refactor code.
>   * ipa-utils.h (ignore_edge_if_not_available): New.

OK, I think it is safe to wait for stage1 - it is bit fragile to
propagate across different graph then postorder is computed (as
manifested by the bug) but it should be safe if SCCs are simply bigger
then they should be.

Next stage1 we should also teach the callback to ignore edges of calls
that are not being optimized.

Honza
> ---
>  gcc/ipa-cp.c |  3 ++-
>  gcc/ipa-inline.c |  2 +-
>  gcc/ipa-pure-const.c | 13 +
>  gcc/ipa-reference.c  | 13 ++---
>  gcc/ipa-utils.c  |  3 +--
>  gcc/ipa-utils.h  | 10 ++
>  6 files changed, 25 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 442d5c63eff..2253b0cef63 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -815,7 +815,8 @@ build_toporder_info (struct ipa_topo_info *topo)
>topo->stack = XCNEWVEC (struct cgraph_node *, symtab->cgraph_count);
>  
>gcc_checking_assert (topo->stack_top == 0);
> -  topo->nnodes = ipa_reduced_postorder (topo->order, true, NULL);
> +  topo->nnodes = ipa_reduced_postorder (topo->order, true,
> + ignore_edge_if_not_available);
>  }
>  
>  /* Free information about strongly connected components and the arrays in
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 360c3de3289..c7e68a73706 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -1778,7 +1778,7 @@ inline_small_functions (void)
>   metrics.  */
>  
>max_count = profile_count::uninitialized ();
> -  ipa_reduced_postorder (order, true, NULL);
> +  ipa_reduced_postorder (order, true, ignore_edge_if_not_available);
>free (order);
>  
>FOR_EACH_DEFINED_FUNCTION (node)
> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
> index a8a3956d2d5..e61d279289e 100644
> --- a/gcc/ipa-pure-const.c
> +++ b/gcc/ipa-pure-const.c
> @@ -1395,17 +1395,6 @@ cdtor_p (cgraph_node *n, void *)
>return false;
>  }
>  
> -/* We only propagate across edges with non-interposable callee.  */
> -
> -static bool
> -ignore_edge_for_pure_const (struct cgraph_edge *e)
> -{
> -  enum availability avail;
> -  e->callee->function_or_virtual_thunk_symbol (&avail, e->caller);
> -  return (avail <= AVAIL_INTERPOSABLE);
> -}
> -
> -
>  /* Produce transitive closure over the callgraph and compute pure/const
> attributes.  */
>  
> @@ -1423,7 +1412,7 @@ propagate_pure_const (void)
>bool has_cdtor;
>  
>order_pos = ipa_reduced_postorder (order, true,
> -  ignore_edge_for_pure_const);
> +  ignore_edge_if_not_available);
>if (dump_file)
>  {
>cgraph_node::dump_cgraph (dump_file);
> diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
> index d1759a374bc..16cc4cf44f9 100644
> --- a/gcc/ipa-reference.c
> +++ b/gcc/ipa-reference.c
> @@ -677,14 +677,21 @@ get_read_write_all_from_node (struct cgraph_node *node,
>}
>  }
>  
> -/* Skip edges from and to nodes without ipa_reference enables.  This leave
> -   them out of strongy connected coponents and makes them easyto skip in the
> +/* Skip edges from and to nodes without ipa_reference enabled.
> +   Ignore not available symbols.  This leave
> +   them out of strongly connected components and makes them easy to skip in 
> the
> propagation loop bellow.  */
>  
>  static bool
>  ignore_edge_p (cgraph_edge *e)
>  {
> -  return (!opt_for_fn (e->caller->decl, flag_ipa_reference)
> +  enum availability avail;
> +  e->callee->function_or_virtual_thunk_symbol (&avail, e->caller);
> +
> +  return (avail < AVAIL_INTERPOSABLE
> +   || (avail == AVAIL_INTERPOSABLE
> +   && !(flags_from_decl_or_type (e->callee->decl) & ECF_LEAF))
> +   || !opt_for_fn (e->caller->decl, flag_ipa_reference)
>|| !opt_for_fn (e->callee->function_symbol ()->decl,
> flag_ipa_reference));
>  }
> diff --git a/gc

Re: V2 [PATCH] driver: Also prune joined switches with negation

2019-02-14 Thread Jakub Jelinek
On Wed, Feb 13, 2019 at 06:27:51PM -0800, H.J. Lu wrote:
> --- a/gcc/doc/options.texi
> +++ b/gcc/doc/options.texi
> @@ -227,7 +227,10 @@ options, their @code{Negative} properties should form a 
> circular chain.
>  For example, if options @option{-@var{a}}, @option{-@var{b}} and
>  @option{-@var{c}} are mutually exclusive, their respective @code{Negative}
>  properties should be @samp{Negative(@var{b})}, @samp{Negative(@var{c})}
> -and @samp{Negative(@var{a})}.
> +and @samp{Negative(@var{a})}.  @code{Negative} can be used together
> +with @code{Joined} if there is no @code{RejectNegative} property.
> +@code{Negative} is ignored if there is @code{Joined} without
> +@code{RejectNegative}.

I think this doesn't describe what is implemented.

Something like:
 the option name with the leading ``-'' removed.  This chain action will
 propagate through the @code{Negative} property of the option to be
-turned off.
+turned off.  The driver will prune options, removing those that are
+turned off by some later option.  This pruning is not done for options
+with @code{Joined} or @code{JoinedOrMissing} properties, unless the
+options have either @code{RejectNegative} property or the @code{Negative}
+property mentions an option other than itself.

 As a consequence, if you have a group of mutually-exclusive
 options, their @code{Negative} properties should form a circular chain.

?

Otherwise LGTM, but Joseph is the options machinery maintainer, so I'll
defer to him here.

Jakub


[PATCH] Add testcases for multiple -fsanitize=, -fno-sanitize= or -fno-sanitize-recover= options

2019-02-14 Thread Jakub Jelinek
Hi!

The following patch adds testcase coverage to make sure
-f{,no-}sanitize{,-recover}= options are all passed to the compiler backend
from the driver.

All these tests were broken by the earlier option handling patch from H.J.:
https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00492.html
and as nothing in the testsuite revealed the patch broke this, I think we
want to cover this in the testsuite.

Tested on x86_64-linux with
make check-gcc check-c++-all RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
ubsan.exp=opts*'
with current trunk (all tests PASS) and with trunk patched with the above
patch (all tests FAIL).  Ok for trunk?

2019-02-14  Jakub Jelinek  

* c-c++-common/ubsan/opts-1.c: New test.
* c-c++-common/ubsan/opts-2.c: New test.
* c-c++-common/ubsan/opts-3.c: New test.
* c-c++-common/ubsan/opts-4.c: New test.

--- gcc/testsuite/c-c++-common/ubsan/opts-1.c.jj2019-02-14 
11:31:33.144895232 +0100
+++ gcc/testsuite/c-c++-common/ubsan/opts-1.c   2019-02-14 11:33:23.049077585 
+0100
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -fsanitize=shift 
-fsanitize=float-divide-by-zero -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_divrem_overflow" 2 
"optimized" } } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_shift_out_of_bounds" 1 
"optimized" } } */
+
+int
+foo (int x, int y)
+{
+  return x / y;
+}
+
+int
+bar (int x, int y)
+{
+  return x << y;
+}
+
+float
+baz (float x, float y)
+{
+  return x / y;
+}
--- gcc/testsuite/c-c++-common/ubsan/opts-2.c.jj2019-02-14 
11:33:29.806965829 +0100
+++ gcc/testsuite/c-c++-common/ubsan/opts-2.c   2019-02-14 11:34:03.169414166 
+0100
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -fno-sanitize=shift 
-fsanitize=float-divide-by-zero -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_divrem_overflow" 2 
"optimized" } } */
+/* { dg-final { scan-tree-dump-not "__ubsan_handle_shift_out_of_bounds" 
"optimized" } } */
+
+int
+foo (int x, int y)
+{
+  return x / y;
+}
+
+int
+bar (int x, int y)
+{
+  return x << y;
+}
+
+float
+baz (float x, float y)
+{
+  return x / y;
+}
--- gcc/testsuite/c-c++-common/ubsan/opts-3.c.jj2019-02-14 
11:34:10.538292322 +0100
+++ gcc/testsuite/c-c++-common/ubsan/opts-3.c   2019-02-14 11:34:35.512879358 
+0100
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -fno-sanitize=shift 
-fno-sanitize=float-divide-by-zero -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_divrem_overflow" 1 
"optimized" } } */
+/* { dg-final { scan-tree-dump-not "__ubsan_handle_shift_out_of_bounds" 
"optimized" } } */
+
+int
+foo (int x, int y)
+{
+  return x / y;
+}
+
+int
+bar (int x, int y)
+{
+  return x << y;
+}
+
+float
+baz (float x, float y)
+{
+  return x / y;
+}
--- gcc/testsuite/c-c++-common/ubsan/opts-4.c.jj2019-02-14 
11:40:35.771922337 +0100
+++ gcc/testsuite/c-c++-common/ubsan/opts-4.c   2019-02-14 11:40:29.220030674 
+0100
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined 
-fno-sanitize-recover=integer-divide-by-zero -fno-sanitize-recover=shift 
-fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_divrem_overflow_abort" 1 
"optimized" } } */
+/* { dg-final { scan-tree-dump-times 
"__ubsan_handle_shift_out_of_bounds_abort" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_type_mismatch_v1" 1 
"optimized" } } */
+/* { dg-final { scan-tree-dump-not "__ubsan_handle_type_mismatch_v1_abort" 
"optimized" } } */
+
+int
+foo (int x, int y)
+{
+  return x / y;
+}
+
+int
+bar (int x, int y)
+{
+  return x << y;
+}
+
+enum E { E0, E1, E2, E3 };
+
+enum E
+baz (enum E *x)
+{
+  return *x;
+}

Jakub


Re: GCC 8 backports

2019-02-14 Thread Martin Liška
On 11/20/18 11:58 AM, Martin Liška wrote:
> On 10/3/18 11:23 AM, Martin Liška wrote:
>> On 9/25/18 8:48 AM, Martin Liška wrote:
>>> Hi.
>>>
>>> One more tested patch.
>>>
>>> Martin
>>>
>>
>> One more tested patch.
>>
>> Martin
>>
> 
> Hi.
> 
> One another tested patch that I'm going to install.
> 
> Martin
> 

Hi.

Another 2 patches that I've just tested and will install.

Martin
>From 65c43b0a4fb210485ad01d5f55573bfc0f17441d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 11 Feb 2019 08:13:03 +
Subject: [PATCH 1/2] Backport r268762

gcc/ChangeLog:

2019-02-11  Martin Liska  

	PR ipa/89009
	* ipa-cp.c (build_toporder_info): Remove usage of a param.
	* ipa-inline.c (inline_small_functions): Likewise.
	* ipa-pure-const.c (propagate_pure_const): Likewise.
	(propagate_nothrow): Likewise.
	* ipa-reference.c (propagate): Likewise.
	* ipa-utils.c (struct searchc_env): Remove unused field.
	(searchc): Always search across AVAIL_INTERPOSABLE.
	(ipa_reduced_postorder): Always allow AVAIL_INTERPOSABLE as
	the only called IPA pure const can properly not propagate
	across interposable boundary.
	* ipa-utils.h (ipa_reduced_postorder): Remove param.

gcc/testsuite/ChangeLog:

2019-02-11  Martin Liska  

	PR ipa/89009
	* g++.dg/ipa/pr89009.C: New test.
---
 gcc/ipa-cp.c   |  2 +-
 gcc/ipa-inline.c   |  2 +-
 gcc/ipa-pure-const.c   |  4 ++--
 gcc/ipa-reference.c|  2 +-
 gcc/ipa-utils.c|  9 +++--
 gcc/ipa-utils.h|  2 +-
 gcc/testsuite/g++.dg/ipa/pr89009.C | 12 
 7 files changed, 21 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr89009.C

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index e868b9c2623..5bd4df0ecb7 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -820,7 +820,7 @@ build_toporder_info (struct ipa_topo_info *topo)
   topo->stack = XCNEWVEC (struct cgraph_node *, symtab->cgraph_count);
 
   gcc_checking_assert (topo->stack_top == 0);
-  topo->nnodes = ipa_reduced_postorder (topo->order, true, true, NULL);
+  topo->nnodes = ipa_reduced_postorder (topo->order, true, NULL);
 }
 
 /* Free information about strongly connected components and the arrays in
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 996b04cb81d..bde7ecfd0d5 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -1759,7 +1759,7 @@ inline_small_functions (void)
  metrics.  */
 
   max_count = profile_count::uninitialized ();
-  ipa_reduced_postorder (order, true, true, NULL);
+  ipa_reduced_postorder (order, true, NULL);
   free (order);
 
   FOR_EACH_DEFINED_FUNCTION (node)
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index a80b6845633..d36d1ba9b73 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -1443,7 +1443,7 @@ propagate_pure_const (void)
   bool remove_p = false;
   bool has_cdtor;
 
-  order_pos = ipa_reduced_postorder (order, true, false,
+  order_pos = ipa_reduced_postorder (order, true,
  ignore_edge_for_pure_const);
   if (dump_file)
 {
@@ -1773,7 +1773,7 @@ propagate_nothrow (void)
   int i;
   struct ipa_dfs_info * w_info;
 
-  order_pos = ipa_reduced_postorder (order, true, false,
+  order_pos = ipa_reduced_postorder (order, true,
  ignore_edge_for_nothrow);
   if (dump_file)
 {
diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
index 6490c03f8d0..b9db61697d1 100644
--- a/gcc/ipa-reference.c
+++ b/gcc/ipa-reference.c
@@ -728,7 +728,7 @@ propagate (void)
  the global information.  All the nodes within a cycle will have
  the same info so we collapse cycles first.  Then we can do the
  propagation in one pass from the leaves to the roots.  */
-  order_pos = ipa_reduced_postorder (order, true, true, ignore_edge_p);
+  order_pos = ipa_reduced_postorder (order, true, ignore_edge_p);
   if (dump_file)
 ipa_print_order (dump_file, "reduced", order, order_pos);
 
diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
index a271bb822cb..106d3079391 100644
--- a/gcc/ipa-utils.c
+++ b/gcc/ipa-utils.c
@@ -63,7 +63,6 @@ struct searchc_env {
   int order_pos;
   splay_tree nodes_marked_new;
   bool reduce;
-  bool allow_overwritable;
   int count;
 };
 
@@ -105,7 +104,7 @@ searchc (struct searchc_env* env, struct cgraph_node *v,
 
   if (w->aux
 	  && (avail > AVAIL_INTERPOSABLE
-	  || (env->allow_overwritable && avail == AVAIL_INTERPOSABLE)))
+	  || avail == AVAIL_INTERPOSABLE))
 	{
 	  w_info = (struct ipa_dfs_info *) w->aux;
 	  if (w_info->new_node)
@@ -162,7 +161,7 @@ searchc (struct searchc_env* env, struct cgraph_node *v,
 
 int
 ipa_reduced_postorder (struct cgraph_node **order,
-		   bool reduce, bool allow_overwritable,
+		   bool reduce,
 		   bool (*ignore_edge) (struct cgraph_edge *))
 {
   struct cgraph_node *node;
@@ -175,15 +174,13 @@ ipa_reduced_postorder (struct cgraph_node **order,
   env.nodes_marked_new = splay_tree_new (splay_tree_compare_ints, 0, 0);
   env.count = 1;
   env.reduce 

[PATCH 07/40] i386: Emulate MMX mmx_pmaddwd with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX pmaddwd with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pmaddwd): Also allow TARGET_MMX_WITH_SSE.
(*mmx_pmaddwd): Also allow TARGET_MMX_WITH_SSE.  Add SSE support.
---
 gcc/config/i386/mmx.md | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 58054b7e0c7..23c10dffc38 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -823,20 +823,20 @@
(sign_extend:V2SI
  (vec_select:V2HI (match_dup 2)
(parallel [(const_int 1) (const_int 3)]))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_pmaddwd"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
 (plus:V2SI
  (mult:V2SI
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0")
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv")
(parallel [(const_int 0) (const_int 2)])))
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")
(parallel [(const_int 0) (const_int 2)]
  (mult:V2SI
(sign_extend:V2SI
@@ -845,10 +845,15 @@
(sign_extend:V2SI
  (vec_select:V2HI (match_dup 2)
(parallel [(const_int 1) (const_int 3)]))]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmaddwd\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmaddwd\t{%2, %0|%0, %2}
+   pmaddwd\t{%2, %0|%0, %2}
+   vpmaddwd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_pmulhrwv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 09/40] i386: Emulate MMX 3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX 3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (any_logic:3): New.
(any_logic:*mmx_3): Also allow TARGET_MMX_WITH_SSE.
Add SSE support.
---
 gcc/config/i386/mmx.md | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 4738d6b428e..9e7798d4b47 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1080,15 +1080,28 @@
   "TARGET_MMX"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
+(define_expand "3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+   (any_logic:MMXMODEI
+ (match_operand:MMXMODEI 1 "nonimmediate_operand")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
 (any_logic:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sselog,sselog")
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 14/40] i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE.

PR target/89021
* config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation.
(sse_cvttps2pi): Likewise.
---
 gcc/config/i386/sse.md | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c8e0133560a..083f9ef0f44 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4574,26 +4574,32 @@
(set_attr "mode" "V4SF")])
 
 (define_insn "sse_cvtps2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yv")
(vec_select:V2SI
- (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm")]
+ (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm")]
   UNSPEC_FIX_NOTRUNC)
  (parallel [(const_int 0) (const_int 1)])))]
-  "TARGET_SSE"
-  "cvtps2pi\t{%1, %0|%0, %q1}"
-  [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
+  "@
+   cvtps2pi\t{%1, %0|%0, %q1}
+   %vcvtps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "ssecvt")
+   (set_attr "unit" "mmx,*")
(set_attr "mode" "DI")])
 
 (define_insn "sse_cvttps2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yv")
(vec_select:V2SI
- (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm"))
+ (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm"))
  (parallel [(const_int 0) (const_int 1)])))]
-  "TARGET_SSE"
-  "cvttps2pi\t{%1, %0|%0, %q1}"
-  [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
+  "@
+   cvttps2pi\t{%1, %0|%0, %q1}
+   %vcvttps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "ssecvt")
+   (set_attr "unit" "mmx,*")
(set_attr "prefix_rep" "0")
(set_attr "mode" "SF")])
 
-- 
2.20.1



[PATCH 02/40] i386: Emulate MMX packsswb/packssdw/packuswb with SSE2

2019-02-14 Thread H.J. Lu
Emulate MMX packsswb/packssdw/packuswb with SSE packsswb/packssdw/packuswb
plus moving bits 64:95 to bits 32:63 in SSE register.  Only SSE register
source operand is allowed.

2019-02-08  H.J. Lu  
Uros Bizjak  

PR target/89021
* config/i386/i386-protos.h (ix86_move_vector_high_sse_to_mmx):
New prototype.
(ix86_split_mmx_pack): Likewise.
* config/i386/i386.c (ix86_move_vector_high_sse_to_mmx): New
function.
(ix86_split_mmx_pack): Likewise.
* config/i386/i386.md (mmx_isa): New.
(enabled): Also check mmx_isa.
* config/i386/mmx.md (any_s_truncate): New code iterator.
(s_trunsuffix): New code attr.
(mmx_packsswb): Removed.
(mmx_packssdw): Likewise.
(mmx_packuswb): Likewise.
(mmx_packswb): New define_insn_and_split to emulate
MMX packsswb/packuswb with SSE2.
(mmx_packssdw): Likewise.
---
 gcc/config/i386/i386-protos.h |  3 ++
 gcc/config/i386/i386.c| 54 
 gcc/config/i386/i386.md   | 13 +++
 gcc/config/i386/mmx.md| 67 +++
 4 files changed, 107 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 27f5cc13abf..a53b48438ec 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -202,6 +202,9 @@ extern void ix86_expand_vecop_qihi (enum rtx_code, rtx, 
rtx, rtx);
 
 extern rtx ix86_split_stack_guard (void);
 
+extern void ix86_move_vector_high_sse_to_mmx (rtx);
+extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
+
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
 #endif /* TREE_CODE  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 83d3117f46d..c6325224c9d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20226,6 +20226,60 @@ ix86_expand_vector_move_misalign (machine_mode mode, 
rtx operands[])
 gcc_unreachable ();
 }
 
+/* Move bits 64:95 to bits 32:63.  */
+
+void
+ix86_move_vector_high_sse_to_mmx (rtx op)
+{
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (0), GEN_INT (2),
+ GEN_INT (0), GEN_INT (0)));
+  rtx dest = lowpart_subreg (V4SImode, op, GET_MODE (op));
+  op = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  rtx insn = gen_rtx_SET (dest, op);
+  emit_insn (insn);
+}
+
+/* Split MMX pack with signed/unsigned saturation with SSE/SSE2.  */
+
+void
+ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+
+  machine_mode dmode = GET_MODE (op0);
+  machine_mode smode = GET_MODE (op1);
+  machine_mode inner_dmode = GET_MODE_INNER (dmode);
+  machine_mode inner_smode = GET_MODE_INNER (smode);
+
+  /* Get the corresponding SSE mode for destination.  */
+  int nunits = 16 / GET_MODE_SIZE (inner_dmode);
+  machine_mode sse_dmode = mode_for_vector (GET_MODE_INNER (dmode),
+   nunits).require ();
+  machine_mode sse_half_dmode = mode_for_vector (GET_MODE_INNER (dmode),
+nunits / 2).require ();
+
+  /* Get the corresponding SSE mode for source.  */
+  nunits = 16 / GET_MODE_SIZE (inner_smode);
+  machine_mode sse_smode = mode_for_vector (GET_MODE_INNER (smode),
+   nunits).require ();
+
+  /* Generate SSE pack with signed/unsigned saturation.  */
+  rtx dest = lowpart_subreg (sse_dmode, op0, GET_MODE (op0));
+  op1 = lowpart_subreg (sse_smode, op1, GET_MODE (op1));
+  op2 = lowpart_subreg (sse_smode, op2, GET_MODE (op2));
+
+  op1 = gen_rtx_fmt_e (code, sse_half_dmode, op1);
+  op2 = gen_rtx_fmt_e (code, sse_half_dmode, op2);
+  rtx insn = gen_rtx_SET (dest, gen_rtx_VEC_CONCAT (sse_dmode,
+   op1, op2));
+  emit_insn (insn);
+
+  ix86_move_vector_high_sse_to_mmx (op0);
+}
+
 /* Helper function of ix86_fixup_binary_operands to canonicalize
operand order.  Returns true if the operands should be swapped.  */
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 40ed93dc804..e1727676deb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -792,6 +792,10 @@
avx512vl,noavx512vl,x64_avx512dq,x64_avx512bw"
   (const_string "base"))
 
+;; Define instruction set of MMX instructions
+(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"
+  (const_string "base"))
+
 (define_attr "enabled" ""
   (cond [(eq_attr "isa" "x64") (symbol_ref "TARGET_64BIT")
 (eq_attr "isa" "x64_sse2")
@@ -830,6 +834,15 @@
 (eq_attr "isa" "noavx512dq") (symbol_ref "!TARGET_AVX512DQ")
 (eq_attr "isa" "avx512vl") (symbol_ref "TARGET_AVX512VL")
 (eq_attr "isa" "noavx512vl") (symbol_ref "!TARGET_AVX512VL")
+
+(eq_a

[PATCH 13/40] i386: Emulate MMX pshufw with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX pshufw with SSE.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_pshufw): Also check TARGET_MMX and
TARGET_MMX_WITH_SSE.
(mmx_pshufw_1): Add SSE emulation.
(*vec_dupv4hi): Changed to define_insn_and_split and also allow
TARGET_MMX_WITH_SSE to support SSE emulation.
---
 gcc/config/i386/mmx.md | 79 ++
 1 file changed, 64 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 441a08d22b7..497af2d74b7 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1339,7 +1339,8 @@
   [(match_operand:V4HI 0 "register_operand")
(match_operand:V4HI 1 "nonimmediate_operand")
(match_operand:SI 2 "const_int_operand")]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
 {
   int mask = INTVAL (operands[2]);
   emit_insn (gen_mmx_pshufw_1 (operands[0], operands[1],
@@ -1351,14 +1352,15 @@
 })
 
 (define_insn "mmx_pshufw_1"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yv")
 (vec_select:V4HI
-  (match_operand:V4HI 1 "nonimmediate_operand" "ym")
+  (match_operand:V4HI 1 "nonimmediate_operand" "ym,Yv")
   (parallel [(match_operand 2 "const_0_to_3_operand")
  (match_operand 3 "const_0_to_3_operand")
  (match_operand 4 "const_0_to_3_operand")
  (match_operand 5 "const_0_to_3_operand")])))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
 {
   int mask = 0;
   mask |= INTVAL (operands[2]) << 0;
@@ -1367,11 +1369,20 @@
   mask |= INTVAL (operands[5]) << 6;
   operands[2] = GEN_INT (mask);
 
-  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
+  switch (which_alternative)
+{
+case 0:
+  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
+case 1:
+  return "%vpshuflw\t{%2, %1, %0|%0, %1, %2}";
+default:
+  gcc_unreachable ();
+}
 }
-  [(set_attr "type" "mmxcvt")
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxcvt,sselog")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 (define_insn "mmx_pswapdv2si2"
   [(set (match_operand:V2SI 0 "register_operand" "=y")
@@ -1384,16 +1395,54 @@
(set_attr "prefix_extra" "1")
(set_attr "mode" "DI")])
 
-(define_insn "*vec_dupv4hi"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv4hi"
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yv,Yw")
(vec_duplicate:V4HI
  (truncate:HI
-   (match_operand:SI 1 "register_operand" "0"]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "pshufw\t{$0, %0, %0|%0, %0, 0}"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (match_operand:SI 1 "register_operand" "0,Yv,r"]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   pshufw\t{$0, %0, %0|%0, %0, 0}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(const_int 0)]
+{
+  rtx op;
+  operands[0] = lowpart_subreg (V8HImode, operands[0],
+   GET_MODE (operands[0]));
+  if (TARGET_AVX2)
+{
+  operands[1] = lowpart_subreg (HImode, operands[1],
+   GET_MODE (operands[1]));
+  op = gen_rtx_VEC_DUPLICATE (V8HImode, operands[1]);
+}
+  else
+{
+  operands[1] = lowpart_subreg (V8HImode, operands[1],
+   GET_MODE (operands[1]));
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (8,
+ GEN_INT (0),
+ GEN_INT (0),
+ GEN_INT (0),
+ GEN_INT (0),
+ GEN_INT (4),
+ GEN_INT (5),
+ GEN_INT (6),
+ GEN_INT (7)));
+
+  op = gen_rtx_VEC_SELECT (V8HImode, operands[1], mask);
+}
+  rtx insn = gen_rtx_SET (operands[0], op);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64,x64_avx")
+   (set_attr "type" "mmxcvt,sselog1,ssemov")
+   (set_attr "length_immediate" "1,1,0")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn_and_split "*vec_dupv2si"
   [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv,Yw")
-- 
2.20.1



[PATCH 35/40] i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE

2019-02-14 Thread H.J. Lu
PR target/89021
* config/i386/mmx.md (MMXMODE:mov): Also allow
TARGET_MMX_WITH_SSE.
(MMXMODE:*mov_internal): Likewise.
(MMXMODE:movmisalign): Likewise.
---
 gcc/config/i386/mmx.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index a618a620eb1..81ee6250051 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -70,7 +70,7 @@
 (define_expand "mov"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
(match_operand:MMXMODE 1 "nonimmediate_operand"))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (mode, operands);
   DONE;
@@ -81,7 +81,7 @@
 "=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,m,r,v,!y,*x")
(match_operand:MMXMODE 1 "nonimm_or_0_operand"
 "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
-  "TARGET_MMX
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -207,7 +207,7 @@
 (define_expand "movmisalign"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
(match_operand:MMXMODE 1 "nonimmediate_operand"))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (mode, operands);
   DONE;
-- 
2.20.1



[PATCH 10/40] i386: Emulate MMX mmx_andnot3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mmx_andnot3 with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (mmx_andnot3): Also allow
TARGET_MMX_WITH_SSE.  Add SSE support.
---
 gcc/config/i386/mmx.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 9e7798d4b47..2a9972e79d9 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1063,14 +1063,18 @@
 ;
 
 (define_insn "mmx_andnot3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
(and:MMXMODEI
- (not:MMXMODEI (match_operand:MMXMODEI 1 "register_operand" "0"))
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX"
-  "pandn\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (not:MMXMODEI (match_operand:MMXMODEI 1 "register_operand" "0,0,Yv"))
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   pandn\t{%2, %0|%0, %2}
+   pandn\t{%2, %0|%0, %2}
+   vpandn\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sselog,sselog")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODEI 0 "register_operand")
-- 
2.20.1



[PATCH 17/40] i386: Emulate MMX mmx_pinsrw with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mmx_pinsrw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pinsrw): Also check TARGET_MMX and
TARGET_MMX_WITH_SSE.
(*mmx_pinsrw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 99208d4a4de..b9f7c89cd55 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1296,32 +1296,45 @@
 (match_operand:SI 2 "nonimmediate_operand"))
  (match_operand:V4HI 1 "register_operand")
   (match_operand:SI 3 "const_0_to_3_operand")))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
 {
   operands[2] = gen_lowpart (HImode, operands[2]);
   operands[3] = GEN_INT (1 << INTVAL (operands[3]));
 })
 
 (define_insn "*mmx_pinsrw"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
 (vec_merge:V4HI
   (vec_duplicate:V4HI
-(match_operand:HI 2 "nonimmediate_operand" "rm"))
- (match_operand:V4HI 1 "register_operand" "0")
+(match_operand:HI 2 "nonimmediate_operand" "rm,rm,rm"))
+ (match_operand:V4HI 1 "register_operand" "0,0,Yv")
   (match_operand:SI 3 "const_int_operand")))]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ((unsigned) exact_log2 (INTVAL (operands[3]))
< GET_MODE_NUNITS (V4HImode))"
 {
   operands[3] = GEN_INT (exact_log2 (INTVAL (operands[3])));
-  if (MEM_P (operands[2]))
-return "pinsrw\t{%3, %2, %0|%0, %2, %3}";
+  if (TARGET_MMX_WITH_SSE && TARGET_AVX)
+{
+  if (MEM_P (operands[2]))
+   return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  else
+   return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+}
   else
-return "pinsrw\t{%3, %k2, %0|%0, %k2, %3}";
+{
+  if (MEM_P (operands[2]))
+   return "pinsrw\t{%3, %2, %0|%0, %2, %3}";
+  else
+   return "pinsrw\t{%3, %k2, %0|%0, %k2, %3}";
+}
 }
-  [(set_attr "type" "mmxcvt")
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcvt,sselog,sselog")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_pextrw"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-- 
2.20.1



[PATCH 16/40] i386: Emulate MMX mmx_pextrw with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mmx_pextrw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pextrw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 497af2d74b7..99208d4a4de 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1324,16 +1324,18 @@
(set_attr "mode" "DI")])
 
 (define_insn "mmx_pextrw"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
 (zero_extend:SI
  (vec_select:HI
-   (match_operand:V4HI 1 "register_operand" "y")
-   (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n")]]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "pextrw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "mmxcvt")
+   (match_operand:V4HI 1 "register_operand" "y,Yv")
+   (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n")]]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "%vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxcvt,sselog1")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 (define_expand "mmx_pshufw"
   [(match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 04/40] i386: Emulate MMX plusminus/sat_plusminus with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX plusminus/sat_plusminus with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (MMXMODEI8): Require TARGET_SSE2 for V1DI.
(plusminus:mmx_3): Check
TARGET_MMX_WITH_SSE.
(sat_plusminus:mmx_3): Likewise.
(3): New.
(*mmx_3): Add SSE emulation.
(*mmx_3): Likewise.
---
 gcc/config/i386/mmx.md | 51 --
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 8ae24439e8d..b6277789091 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -45,7 +45,7 @@
 
 ;; 8 byte integral modes handled by MMX (and by extension, SSE)
 (define_mode_iterator MMXMODEI [V8QI V4HI V2SI])
-(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI V1DI])
+(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI (V1DI "TARGET_SSE2")])
 
 ;; All 8-byte vector modes handled by MMX
 (define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF])
@@ -665,37 +665,54 @@
(plusminus:MMXMODEI8
  (match_operand:MMXMODEI8 1 "nonimmediate_operand")
  (match_operand:MMXMODEI8 2 "nonimmediate_operand")))]
-  "TARGET_MMX || (TARGET_SSE2 && mode == V1DImode)"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+
+(define_expand "3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+   (plusminus:MMXMODEI
+ (match_operand:MMXMODEI 1 "nonimmediate_operand")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,Yv")
 (plusminus:MMXMODEI8
- (match_operand:MMXMODEI8 1 "nonimmediate_operand" "0")
- (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_MMX || (TARGET_SSE2 && mode == V1DImode))
+ (match_operand:MMXMODEI8 1 "nonimmediate_operand" "0,0,Yv")
+ (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
&& ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseadd,sseadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODE12 0 "register_operand")
(sat_plusminus:MMXMODE12
  (match_operand:MMXMODE12 1 "nonimmediate_operand")
  (match_operand:MMXMODE12 2 "nonimmediate_operand")))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODE12 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODE12 0 "register_operand" "=y,x,Yv")
 (sat_plusminus:MMXMODE12
- (match_operand:MMXMODE12 1 "nonimmediate_operand" "0")
- (match_operand:MMXMODE12 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODE12 1 "nonimmediate_operand" "0,0,Yv")
+ (match_operand:MMXMODE12 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseadd,sseadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_mulv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 31/40] i386: Emulate MMX pshufb with SSE version

2019-02-14 Thread H.J. Lu
Emulate MMX version of pshufb with SSE version by masking out the bit 3
of the shuffle control byte.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/sse.md (ssse3_pshufbv8qi3): Renamed to ...
(ssse3_pshufbv8qi3_mmx): This.
(ssse3_pshufbv8qi3): New.
(ssse3_pshufbv8qi3_sse): Likewise.
---
 gcc/config/i386/sse.md | 56 --
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index cc7dbe79fa7..a92505c54a1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15722,18 +15722,70 @@
(set_attr "btver2_decode" "vector")
(set_attr "mode" "")])
 
-(define_insn "ssse3_pshufbv8qi3"
+(define_expand "ssse3_pshufbv8qi3"
+  [(set (match_operand:V8QI 0 "register_operand")
+   (unspec:V8QI [(match_operand:V8QI 1 "register_operand")
+ (match_operand:V8QI 2 "nonimmediate_operand")]
+UNSPEC_PSHUFB))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+{
+  if (TARGET_MMX_WITH_SSE)
+{
+  rtx op2 = force_reg (V8QImode, operands[2]);
+  emit_insn (gen_ssse3_pshufbv8qi3_sse (operands[0], operands[1],
+   op2));
+  DONE;
+}
+})
+
+(define_insn "ssse3_pshufbv8qi3_mmx"
   [(set (match_operand:V8QI 0 "register_operand" "=y")
(unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0")
  (match_operand:V8QI 2 "nonimmediate_operand" "ym")]
 UNSPEC_PSHUFB))]
-  "TARGET_SSSE3"
+  "TARGET_SSSE3 && !TARGET_MMX_WITH_SSE"
   "pshufb\t{%2, %0|%0, %2}";
   [(set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
(set_attr "mode" "DI")])
 
+(define_insn_and_split "ssse3_pshufbv8qi3_sse"
+  [(set (match_operand:V8QI 0 "register_operand" "=x,Yv")
+   (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0,Yv")
+ (match_operand:V8QI 2 "register_operand" "x,Yv")]
+UNSPEC_PSHUFB))
+   (clobber (match_scratch:V4SI 3 "=x,Yv"))]
+  "TARGET_SSSE3 && TARGET_MMX_WITH_SSE"
+  "#"
+  "reload_completed"
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 3)
+   (and:V4SI (match_dup 3) (match_dup 2)))
+   (set (match_dup 0)
+   (unspec:V16QI [(match_dup 1) (match_dup 4)] UNSPEC_PSHUFB))]
+{
+  /* Emulate MMX version of pshufb with SSE version by masking out the
+ bit 3 of the shuffle control byte.  */
+  operands[0] = lowpart_subreg (V16QImode, operands[0],
+   GET_MODE (operands[0]));
+  operands[1] = lowpart_subreg (V16QImode, operands[1],
+   GET_MODE (operands[1]));
+  operands[2] = lowpart_subreg (V4SImode, operands[2],
+   GET_MODE (operands[2]));
+  operands[4] = lowpart_subreg (V16QImode, operands[3],
+   GET_MODE (operands[3]));
+  rtvec par = gen_rtvec (4, GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7));
+  rtx vec_const = gen_rtx_CONST_VECTOR (V4SImode, par);
+  operands[5] = force_const_mem (V4SImode, vec_const);
+}
+  [(set_attr "mmx_isa" "x64_noavx,x64_avx")
+   (set_attr "type" "sselog1")
+   (set_attr "mode" "TI,TI")])
+
 (define_insn "_psign3"
   [(set (match_operand:VI124_AVX2 0 "register_operand" "=x,x")
(unspec:VI124_AVX2
-- 
2.20.1



[PATCH 06/40] i386: Emulate MMX smulv4hi3_highpart with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mulv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_smulv4hi3_highpart): Also allow
TARGET_MMX_WITH_SSE.
(*mmx_smulv4hi3_highpart): Also allow TARGET_MMX_WITH_SSE. Add
SSE support.
---
 gcc/config/i386/mmx.md | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 8ec7632912b..58054b7e0c7 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -752,23 +752,28 @@
  (sign_extend:V4SI
(match_operand:V4HI 2 "nonimmediate_operand")))
(const_int 16]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_smulv4hi3_highpart"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(mult:V4SI
  (sign_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
  (sign_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
(const_int 16]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmulhw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmulhw\t{%2, %0|%0, %2}
+   pmulhw\t{%2, %0|%0, %2}
+   vpmulhw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_umulv4hi3_highpart"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 01/40] i386: Allow MMX register modes in SSE registers

2019-02-14 Thread H.J. Lu
In 64-bit mode, SSE2 can be used to emulate MMX instructions without
3DNOW.  We can use SSE2 to support MMX register modes.

PR target/89021
* config/i386/i386.c (ix86_set_reg_reg_cost): Add support for
TARGET_MMX_WITH_SSE with VALID_MMX_REG_MODE.
(ix86_vector_mode_supported_p): Likewise.
* config/i386/i386.h (TARGET_MMX_WITH_SSE): New.
(TARGET_MMX_WITH_SSE_P): Likewise.
---
 gcc/config/i386/i386.c | 5 +++--
 gcc/config/i386/i386.h | 5 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4efb6ae0e44..83d3117f46d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -40495,7 +40495,8 @@ ix86_set_reg_reg_cost (machine_mode mode)
  || (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
  || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
  || (TARGET_SSE && VALID_SSE_REG_MODE (mode))
- || (TARGET_MMX && VALID_MMX_REG_MODE (mode)))
+ || ((TARGET_MMX || TARGET_MMX_WITH_SSE)
+ && VALID_MMX_REG_MODE (mode)))
units = GET_MODE_SIZE (mode);
 }
 
@@ -44321,7 +44322,7 @@ ix86_vector_mode_supported_p (machine_mode mode)
 return true;
   if (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
 return true;
-  if (TARGET_MMX && VALID_MMX_REG_MODE (mode))
+  if ((TARGET_MMX ||TARGET_MMX_WITH_SSE) && VALID_MMX_REG_MODE (mode))
 return true;
   if (TARGET_3DNOW && VALID_MMX_REG_MODE_3DNOW (mode))
 return true;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 83b025e0cf5..db814d9ed17 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -201,6 +201,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define TARGET_16BIT   TARGET_CODE16
 #define TARGET_16BIT_P(x)  TARGET_CODE16_P(x)
 
+#define TARGET_MMX_WITH_SSE \
+  (TARGET_64BIT && TARGET_SSE2)
+#define TARGET_MMX_WITH_SSE_P(x) \
+  (TARGET_64BIT_P (x) && TARGET_SSE2_P (x))
+
 #include "config/vxworks-dummy.h"
 
 #include "config/i386/i386-opts.h"
-- 
2.20.1



[PATCH 20/40] i386: Emulate MMX mmx_umulv4hi3_highpart with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mmx_umulv4hi3_highpart with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_umulv4hi3_highpart): Also check
TARGET_MMX and TARGET_MMX_WITH_SSE.
(*mmx_umulv4hi3_highpart): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 9ff0db9c2ed..1fdd09242af 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -785,24 +785,30 @@
  (zero_extend:V4SI
(match_operand:V4HI 2 "nonimmediate_operand")))
(const_int 16]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_umulv4hi3_highpart"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(mult:V4SI
  (zero_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
  (zero_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
  (const_int 16]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmulhuw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "@
+   pmulhuw\t{%2, %0|%0, %2}
+   pmulhuw\t{%2, %0|%0, %2}
+   vpmulhuw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_pmaddwd"
   [(set (match_operand:V2SI 0 "register_operand")
-- 
2.20.1



[PATCH 15/40] i386: Emulate MMX sse_cvtpi2ps with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX sse_cvtpi2ps with SSE2 cvtdq2ps, preserving upper 64 bits of
destination XMM register.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (sse_cvtpi2ps): Renamed to ...
(*mmx_cvtpi2ps): This.  Disabled for TARGET_MMX_WITH_SSE.
(sse_cvtpi2ps): New.
(mmx_cvtpi2ps_sse): Likewise.
---
 gcc/config/i386/sse.md | 77 --
 1 file changed, 75 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 083f9ef0f44..b1bab15af41 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4561,14 +4561,87 @@
 ;;
 ;
 
-(define_insn "sse_cvtpi2ps"
+(define_expand "sse_cvtpi2ps"
+  [(set (match_operand:V4SF 0 "register_operand")
+   (vec_merge:V4SF
+ (vec_duplicate:V4SF
+   (float:V2SF (match_operand:V2SI 2 "nonimmediate_operand")))
+ (match_operand:V4SF 1 "register_operand")
+ (const_int 3)))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
+{
+  if (TARGET_MMX_WITH_SSE)
+{
+  rtx op2 = force_reg (V2SImode, operands[2]);
+  emit_insn (gen_mmx_cvtpi2ps_sse (operands[0], operands[1], op2));
+  DONE;
+}
+})
+
+(define_insn_and_split "mmx_cvtpi2ps_sse"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,Yv")
+   (vec_merge:V4SF
+ (vec_duplicate:V4SF
+   (float:V2SF (match_operand:V2SI 2 "register_operand" "x,Yv")))
+ (match_operand:V4SF 1 "register_operand" "0,Yv")
+ (const_int 3)))
+   (clobber (match_scratch:V4SF 3 "=x,Yv"))]
+  "TARGET_MMX_WITH_SSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op2 = lowpart_subreg (V4SImode, operands[2],
+   GET_MODE (operands[2]));
+  /* Generate SSE2 cvtdq2ps.  */
+  rtx insn = gen_floatv4siv4sf2 (operands[3], op2);
+  emit_insn (insn);
+
+  /* Merge operands[3] with operands[0].  */
+  rtx mask, op1;
+  if (TARGET_AVX)
+{
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (0), GEN_INT (1),
+ GEN_INT (6), GEN_INT (7)));
+  op1 = gen_rtx_VEC_CONCAT (V8SFmode, operands[3], operands[1]);
+  op2 = gen_rtx_VEC_SELECT (V4SFmode, op1, mask);
+  insn = gen_rtx_SET (operands[0], op2);
+}
+  else
+{
+  /* NB: SSE can only concatenate OP0 and OP3 to OP0.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (4), GEN_INT (5)));
+  op1 = gen_rtx_VEC_CONCAT (V8SFmode, operands[0], operands[3]);
+  op2 = gen_rtx_VEC_SELECT (V4SFmode, op1, mask);
+  insn = gen_rtx_SET (operands[0], op2);
+  emit_insn (insn);
+
+  /* Swap bits 0:63 with bits 64:127.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (0), GEN_INT (1)));
+  rtx dest = gen_rtx_REG (V4SImode, REGNO (operands[0]));
+  op1 = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  insn = gen_rtx_SET (dest, op1);
+}
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssecvt")
+   (set_attr "mode" "V4SF")])
+
+(define_insn "*mmx_cvtpi2ps"
   [(set (match_operand:V4SF 0 "register_operand" "=x")
(vec_merge:V4SF
  (vec_duplicate:V4SF
(float:V2SF (match_operand:V2SI 2 "nonimmediate_operand" "ym")))
  (match_operand:V4SF 1 "register_operand" "0")
  (const_int 3)))]
-  "TARGET_SSE"
+  "TARGET_SSE && !TARGET_MMX_WITH_SSE"
   "cvtpi2ps\t{%2, %0|%0, %2}"
   [(set_attr "type" "ssecvt")
(set_attr "mode" "V4SF")])
-- 
2.20.1



[PATCH 05/40] i386: Emulate MMX mulv4hi3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mulv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mulv4hi3): New.
(*mmx_mulv4hi3): Also allow TARGET_MMX_WITH_SSE.  Add SSE
support.
---
 gcc/config/i386/mmx.md | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index b6277789091..8ec7632912b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -721,14 +721,26 @@
   "TARGET_MMX"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
+(define_expand "mulv4hi3"
+  [(set (match_operand:V4HI 0 "register_operand")
+(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand")
+  (match_operand:V4HI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
+
 (define_insn "*mmx_mulv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
-(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand" "%0")
-  (match_operand:V4HI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmullw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
+(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv")
+  (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmullw\t{%2, %0|%0, %2}
+   pmullw\t{%2, %0|%0, %2}
+   vpmullw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_smulv4hi3_highpart"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 03/40] i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX

2019-02-14 Thread H.J. Lu
Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX.  For MMX punpckhXX,
move bits 64:127 to bits 0:63 in SSE register.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/i386-protos.h (ix86_split_mmx_punpck): New
prototype.
* config/i386/i386.c (ix86_split_mmx_punpck): New function.
* config/i386/mmx.m (mmx_punpckhbw): Changed to
define_insn_and_split to support SSE emulation.
(mmx_punpcklbw): Likewise.
(mmx_punpckhwd): Likewise.
(mmx_punpcklwd): Likewise.
(mmx_punpckhdq): Likewise.
(mmx_punpckldq): Likewise.
---
 gcc/config/i386/i386-protos.h |   1 +
 gcc/config/i386/i386.c|  77 +++
 gcc/config/i386/mmx.md| 138 ++
 3 files changed, 168 insertions(+), 48 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a53b48438ec..37581837a32 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -204,6 +204,7 @@ extern rtx ix86_split_stack_guard (void);
 
 extern void ix86_move_vector_high_sse_to_mmx (rtx);
 extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
+extern void ix86_split_mmx_punpck (rtx[], bool);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c6325224c9d..dce4038685e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20280,6 +20280,83 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code 
code)
   ix86_move_vector_high_sse_to_mmx (op0);
 }
 
+/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  */
+
+void
+ix86_split_mmx_punpck (rtx operands[], bool high_p)
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  machine_mode mode = GET_MODE (op0);
+  rtx mask;
+  /* The corresponding SSE mode.  */
+  machine_mode sse_mode, double_sse_mode;
+
+  switch (mode)
+{
+case E_V8QImode:
+  sse_mode = V16QImode;
+  double_sse_mode = V32QImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (16,
+ GEN_INT (0), GEN_INT (16),
+ GEN_INT (1), GEN_INT (17),
+ GEN_INT (2), GEN_INT (18),
+ GEN_INT (3), GEN_INT (19),
+ GEN_INT (4), GEN_INT (20),
+ GEN_INT (5), GEN_INT (21),
+ GEN_INT (6), GEN_INT (22),
+ GEN_INT (7), GEN_INT (23)));
+  break;
+
+case E_V4HImode:
+  sse_mode = V8HImode;
+  double_sse_mode = V16HImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (8,
+ GEN_INT (0), GEN_INT (8),
+ GEN_INT (1), GEN_INT (9),
+ GEN_INT (2), GEN_INT (10),
+ GEN_INT (3), GEN_INT (11)));
+  break;
+
+case E_V2SImode:
+  sse_mode = V4SImode;
+  double_sse_mode = V8SImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4,
+ GEN_INT (0), GEN_INT (4),
+ GEN_INT (1), GEN_INT (5)));
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  /* Generate SSE punpcklXX.  */
+  rtx dest = lowpart_subreg (sse_mode, op0, GET_MODE (op0));
+  op1 = lowpart_subreg (sse_mode, op1, GET_MODE (op1));
+  op2 = lowpart_subreg (sse_mode, op2, GET_MODE (op2));
+
+  op1 = gen_rtx_VEC_CONCAT (double_sse_mode, op1, op2);
+  op2 = gen_rtx_VEC_SELECT (sse_mode, op1, mask);
+  rtx insn = gen_rtx_SET (dest, op2);
+  emit_insn (insn);
+
+  if (high_p)
+{
+  /* Move bits 64:127 to bits 0:63.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (0), GEN_INT (0)));
+  dest = lowpart_subreg (V4SImode, dest, GET_MODE (dest));
+  op1 = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  insn = gen_rtx_SET (dest, op1);
+  emit_insn (insn);
+}
+}
+
 /* Helper function of ix86_fixup_binary_operands to canonicalize
operand order.  Returns true if the operands should be swapped.  */
 
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index ca9cf20f8e3..8ae24439e8d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1064,87 +1064,129 @@
(set_attr "type" "mmxshft,sselog,sselog")
(set_attr "mode" "DI,TI,TI")])
 
-(define_insn "mmx_punpckhbw"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+(define_insn_and_split "mmx_punpckhbw"
+  [(set (match_operand:V8QI 0 "register_opera

[PATCH 11/40] i386: Emulate MMX mmx_eq/mmx_gt3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mmx_eq/mmx_gt3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_eq3): Also allow
TARGET_MMX_WITH_SSE.
(*mmx_eq3): Also allow TARGET_MMX_WITH_SSE.  Add SSE
support.
(mmx_gt3): Likewise.
---
 gcc/config/i386/mmx.md | 39 ---
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 2a9972e79d9..132ce7af802 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1033,28 +1033,37 @@
 (eq:MMXMODEI
  (match_operand:MMXMODEI 1 "nonimmediate_operand")
  (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);")
 
 (define_insn "*mmx_eq3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
 (eq:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (EQ, mode, operands)"
-  "pcmpeq\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxcmp")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (EQ, mode, operands)"
+  "@
+   pcmpeq\t{%2, %0|%0, %2}
+   pcmpeq\t{%2, %0|%0, %2}
+   vpcmpeq\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcmp,ssecmp,ssecmp")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_gt3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
 (gt:MMXMODEI
- (match_operand:MMXMODEI 1 "register_operand" "0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX"
-  "pcmpgt\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxcmp")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "register_operand" "0,0,Yv")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   pcmpgt\t{%2, %0|%0, %2}
+   pcmpgt\t{%2, %0|%0, %2}
+   vpcmpgt\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcmp,ssecmp,ssecmp")
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 00/40] V5: Emulate MMX intrinsics with SSE

2019-02-14 Thread H.J. Lu
On x86-64, since __m64 is returned and passed in XMM registers, we can
emulate MMX intrinsics with SSE instructions. To support it, we added

 #define TARGET_MMX_WITH_SSE \
  (TARGET_64BIT && TARGET_SSE2 && !TARGET_3DNOW)

SSE emulation is disabled for 3DNOW since 3DNOW patterns haven't been
updated with SSE emulation.

;; Define instruction set of MMX instructions
(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"
  (const_string "base"))

 (eq_attr "mmx_isa" "native")
   (symbol_ref "!TARGET_MMX_WITH_SSE")
 (eq_attr "mmx_isa" "x64")
   (symbol_ref "TARGET_MMX_WITH_SSE")
 (eq_attr "mmx_isa" "x64_avx")
   (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
 (eq_attr "mmx_isa" "x64_noavx")
   (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

We added SSE emulation to MMX patterns and disabled MMX alternatives with
TARGET_MMX_WITH_SSE.

Most of MMX instructions have equivalent SSE versions and results of some
SSE versions need to be reshuffled to the right order for MMX.  Thee are
couple tricky cases:

1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
mask operand and handle unmapped bits 64:127 at memory address by
adjusting source and mask operands together with memory address.

2. MMX movntq is emulated with SSE2 DImode movnti, which is available
in 64-bit mode.

3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
SSE emulation must clear the bit 4 in the shuffle control mask.

4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve
the upper 64 bits of destination XMM register.

Tests are also added to check each SSE emulation of MMX intrinsics.

There are no regressions on i686 and x86-64.  For x86-64, GCC is also
tested with

--with-arch=native --with-cpu=native

on AVX2 and AVX512F machines.

H.J. Lu (40):
  i386: Allow MMX register modes in SSE registers
  i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
  i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
  i386: Emulate MMX plusminus/sat_plusminus with SSE
  i386: Emulate MMX mulv4hi3 with SSE
  i386: Emulate MMX smulv4hi3_highpart with SSE
  i386: Emulate MMX mmx_pmaddwd with SSE
  i386: Emulate MMX ashr3/3 with SSE
  i386: Emulate MMX 3 with SSE
  i386: Emulate MMX mmx_andnot3 with SSE
  i386: Emulate MMX mmx_eq/mmx_gt3 with SSE
  i386: Emulate MMX vec_dupv2si with SSE
  i386: Emulate MMX pshufw with SSE
  i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
  i386: Emulate MMX sse_cvtpi2ps with SSE
  i386: Emulate MMX mmx_pextrw with SSE
  i386: Emulate MMX mmx_pinsrw with SSE
  i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
  i386: Emulate MMX mmx_pmovmskb with SSE
  i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
  i386: Emulate MMX maskmovq with SSE2 maskmovdqu
  i386: Emulate MMX mmx_uavgv8qi3 with SSE
  i386: Emulate MMX mmx_uavgv4hi3 with SSE
  i386: Emulate MMX mmx_psadbw with SSE
  i386: Emulate MMX movntq with SSE2 movntidi
  i386: Emulate MMX umulv1siv1di3 with SSE2
  i386: Emulate MMX ssse3_phwv4hi3 with SSE
  i386: Emulate MMX ssse3_phdv2si3 with SSE
  i386: Emulate MMX ssse3_pmaddubsw with SSE
  i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
  i386: Emulate MMX pshufb with SSE version
  i386: Emulate MMX ssse3_psign3 with SSE
  i386: Emulate MMX ssse3_palignrdi with SSE
  i386: Emulate MMX abs2 with SSE
  i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE
  i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE
  i386: Allow MMX intrinsic emulation with SSE
  i386: Enable TM MMX intrinsics with SSE2
  i386: Add tests for MMX intrinsic emulations with SSE
  i386: Also enable SSSE3 __m64 tests in 64-bit mode

 gcc/config/i386/constraints.md|   6 +
 gcc/config/i386/i386-builtin.def  | 126 +--
 gcc/config/i386/i386-c.c  |   2 +
 gcc/config/i386/i386-protos.h |   4 +
 gcc/config/i386/i386.c| 206 +++-
 gcc/config/i386/i386.h|   5 +
 gcc/config/i386/i386.md   |  13 +
 gcc/config/i386/mmintrin.h|  10 +-
 gcc/config/i386/mmx.md| 915 --
 gcc/config/i386/sse.md| 360 +--
 gcc/config/i386/xmmintrin.h   |  61 ++
 gcc/testsuite/gcc.target/i386/mmx-vals.h  |  77 ++
 gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c   |  42 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c   |  39 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c   |  41 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c   |  40 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c   |  30 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c   |  35 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c   |  39 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-17.c

[PATCH 25/40] i386: Emulate MMX movntq with SSE2 movntidi

2019-02-14 Thread H.J. Lu
Emulate MMX movntq with SSE2 movntidi.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (sse_movntq): Add SSE2 emulation.
---
 gcc/config/i386/mmx.md | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 0c08aebb071..274e895f51e 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -214,12 +214,16 @@
 })
 
 (define_insn "sse_movntq"
-  [(set (match_operand:DI 0 "memory_operand" "=m")
-   (unspec:DI [(match_operand:DI 1 "register_operand" "y")]
+  [(set (match_operand:DI 0 "memory_operand" "=m,m")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "y,r")]
   UNSPEC_MOVNTQ))]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "movntq\t{%1, %0|%0, %1}"
-  [(set_attr "type" "mmxmov")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   movntq\t{%1, %0|%0, %1}
+   movnti\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxmov,ssemov")
(set_attr "mode" "DI")])
 
 ;
-- 
2.20.1



[PATCH 08/40] i386: Emulate MMX ashr3/3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX ashr3/3 with SSE.  Only SSE register
source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_ashr3): Changed to define_expand.
Disallow TARGET_MMX_WITH_SSE.
(mmx_3): Likewise.
(ashr3): New.
(*ashr3): Likewise.
(3): Likewise.
(*3): Likewise.
---
 gcc/config/i386/mmx.md | 68 --
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 23c10dffc38..4738d6b428e 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -958,33 +958,69 @@
   [(set_attr "type" "mmxadd")
(set_attr "mode" "DI")])
 
-(define_insn "mmx_ashr3"
-  [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
+(define_expand "mmx_ashr3"
+  [(set (match_operand:MMXMODE24 0 "register_operand")
 (ashiftrt:MMXMODE24
- (match_operand:MMXMODE24 1 "register_operand" "0")
- (match_operand:DI 2 "nonmemory_operand" "yN")))]
-  "TARGET_MMX"
-  "psra\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
+ (match_operand:MMXMODE24 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE")
+
+(define_expand "ashr3"
+  [(set (match_operand:MMXMODE24 0 "register_operand")
+(ashiftrt:MMXMODE24
+ (match_operand:MMXMODE24 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE")
+
+(define_insn "*ashr3"
+  [(set (match_operand:MMXMODE24 0 "register_operand" "=y,x,Yv")
+(ashiftrt:MMXMODE24
+ (match_operand:MMXMODE24 1 "register_operand" "0,0,Yv")
+ (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   psra\t{%2, %0|%0, %2}
+   psra\t{%2, %0|%0, %2}
+   vpsra\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseishft,sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
-(define_insn "mmx_3"
-  [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
+(define_expand "mmx_3"
+  [(set (match_operand:MMXMODE248 0 "register_operand")
 (any_lshift:MMXMODE248
- (match_operand:MMXMODE248 1 "register_operand" "0")
- (match_operand:DI 2 "nonmemory_operand" "yN")))]
-  "TARGET_MMX"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
+ (match_operand:MMXMODE248 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE")
+
+(define_expand "3"
+  [(set (match_operand:MMXMODE248 0 "register_operand")
+(any_lshift:MMXMODE248
+ (match_operand:MMXMODE248 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE")
+
+(define_insn "*3"
+  [(set (match_operand:MMXMODE248 0 "register_operand" "=y,x,Yv")
+(any_lshift:MMXMODE248
+ (match_operand:MMXMODE248 1 "register_operand" "0,0,Yv")
+ (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseishft,sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 12/40] i386: Emulate MMX vec_dupv2si with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX vec_dupv2si with SSE.  Add the "Yw" constraint to allow
broadcast from integer register for AVX512BW with TARGET_AVX512VL.
Only SSE register source operand is allowed.

PR target/89021
* config/i386/constraints.md (Yw): New constraint.
* config/i386/mmx.md (*vec_dupv2si): Changed to
define_insn_and_split and also allow TARGET_MMX_WITH_SSE to
support SSE emulation.
---
 gcc/config/i386/constraints.md |  6 ++
 gcc/config/i386/mmx.md | 24 +---
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 16075b4acf3..c546b20d9dc 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -110,6 +110,8 @@
 ;;  v  any EVEX encodable SSE register for AVX512VL target,
 ;; otherwise any SSE register
 ;;  h  EVEX encodable SSE register with number factor of four
+;;  w  any EVEX encodable SSE register for AVX512BW with TARGET_AVX512VL
+;; target.
 
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
@@ -146,6 +148,10 @@
  "TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS"
  "@internal For AVX512VL, any EVEX encodable SSE register 
(@code{%xmm0-%xmm31}), otherwise any SSE register.")
 
+(define_register_constraint "Yw"
+ "TARGET_AVX512BW && TARGET_AVX512VL ? ALL_SSE_REGS : NO_REGS"
+ "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for AVX512BW 
with TARGET_AVX512VL target.")
+
 ;; We use the B prefix to denote any number of internal operands:
 ;;  f  FLAGS_REG
 ;;  g  GOT memory operand.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 132ce7af802..441a08d22b7 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1395,14 +1395,24 @@
(set_attr "length_immediate" "1")
(set_attr "mode" "DI")])
 
-(define_insn "*vec_dupv2si"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2si"
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv,Yw")
(vec_duplicate:V2SI
- (match_operand:SI 1 "register_operand" "0")))]
-  "TARGET_MMX"
-  "punpckldq\t%0, %0"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+ (match_operand:SI 1 "register_operand" "0,0,Yv,r")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   punpckldq\t%0, %0
+   #
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+   (vec_duplicate:V4SI (match_dup 1)))]
+  "operands[0] = lowpart_subreg (V4SImode, operands[0],
+GET_MODE (operands[0]));"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx,x64_avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI,TI")])
 
 (define_insn "*mmx_concatv2si"
   [(set (match_operand:V2SI 0 "register_operand" "=y,y")
-- 
2.20.1



[PATCH 40/40] i386: Also enable SSSE3 __m64 tests in 64-bit mode

2019-02-14 Thread H.J. Lu
Since we now emulate MMX intrinsics with SSE in 64-bit mode, we can
enable SSSE3 __m64 tests even when AVX is enabled.

PR target/89021
* gcc.target/i386/ssse3-pabsb.c: Also enable __m64 check in
64-bit mode.
* gcc.target/i386/ssse3-pabsd.c: Likewise.
* gcc.target/i386/ssse3-pabsw.c: Likewise.
* gcc.target/i386/ssse3-palignr.c: Likewise.
* gcc.target/i386/ssse3-phaddd.c: Likewise.
* gcc.target/i386/ssse3-phaddsw.c: Likewise.
* gcc.target/i386/ssse3-phaddw.c: Likewise.
* gcc.target/i386/ssse3-phsubd.c: Likewise.
* gcc.target/i386/ssse3-phsubsw.c: Likewise.
* gcc.target/i386/ssse3-phsubw.c: Likewise.
* gcc.target/i386/ssse3-pmaddubsw.c: Likewise.
* gcc.target/i386/ssse3-pmulhrsw.c: Likewise.
* gcc.target/i386/ssse3-pshufb.c: Likewise.
* gcc.target/i386/ssse3-psignb.c: Likewise.
* gcc.target/i386/ssse3-psignd.c: Likewise.
* gcc.target/i386/ssse3-psignw.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/ssse3-pabsb.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pabsd.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pabsw.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-palignr.c   | 6 +++---
 gcc/testsuite/gcc.target/i386/ssse3-phaddd.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phaddsw.c   | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phaddw.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phsubd.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phsubsw.c   | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phsubw.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pmaddubsw.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pmulhrsw.c  | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pshufb.c| 6 +++---
 gcc/testsuite/gcc.target/i386/ssse3-psignb.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-psignd.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-psignw.c| 4 ++--
 16 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c 
b/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c
index 7caa1b6c3a6..eef4ccae222 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c
@@ -15,7 +15,7 @@
 #include "ssse3-vals.h"
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
 /* Test the 64-bit form */
 static void
 ssse3_test_pabsb (int *i1, int *r)
@@ -63,7 +63,7 @@ TEST (void)
   /* Manually compute the result */
   compute_correct_result(&vals[i + 0], ck);
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
   /* Run the 64-bit tests */
   ssse3_test_pabsb (&vals[i + 0], &r[0]);
   ssse3_test_pabsb (&vals[i + 2], &r[2]);
diff --git a/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c 
b/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c
index 3a73cf01170..60043bad4a4 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c
@@ -16,7 +16,7 @@
 
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
 /* Test the 64-bit form */
 static void
 ssse3_test_pabsd (int *i1, int *r)
@@ -62,7 +62,7 @@ TEST (void)
   /* Manually compute the result */
   compute_correct_result(&vals[i + 0], ck);
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
   /* Run the 64-bit tests */
   ssse3_test_pabsd (&vals[i + 0], &r[0]);
   ssse3_test_pabsd (&vals[i + 2], &r[2]);
diff --git a/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c 
b/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c
index 67e4721b8e6..dd0caa9783f 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c
@@ -16,7 +16,7 @@
 
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
 /* Test the 64-bit form */
 static void
 ssse3_test_pabsw (int *i1, int *r)
@@ -64,7 +64,7 @@ TEST (void)
   /* Manually compute the result */
   compute_correct_result (&vals[i + 0], ck);
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
   /* Run the 64-bit tests */
   ssse3_test_pabsw (&vals[i + 0], &r[0]);
   ssse3_test_pabsw (&vals[i + 2], &r[2]);
diff --git a/gcc/testsuite/gcc.target/i386/ssse3-palignr.c 
b/gcc/testsuite/gcc.target/i386/ssse3-palignr.c
index dbee9bee4aa..f266f7805b8 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-palignr.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-palignr.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
 /* Test the 64-bit form */
 static void
 ssse3_test_palignr (int *i1, int *i2, unsigned int imm, int *r)
@@ -214,7 +214,7 @@ compute_correct_result_128 (int *i1, int *i2, unsigned int 
imm, int *r)
   bout[i] = buf[imm + i];
 }
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
 static void
 compute_correct_result_64 (int *i1, int *i2, unsigned int imm, int *r)
 {
@@ -256,7 +256,7 @@ TEST (void)
   for (i = 0; i < 256; i

[PATCH 23/40] i386: Emulate MMX mmx_uavgv4hi3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mmx_uavgv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_uavgv4hi3): Also check TARGET_MMX and
TARGET_MMX_WITH_SSE.
(*mmx_uavgv4hi3): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index b0009afc35d..e1432edcd3d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1753,27 +1753,33 @@
  (const_vector:V4SI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (PLUS, V4HImode, operands);")
 
 (define_insn "*mmx_uavgv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(plus:V4SI
  (plus:V4SI
(zero_extend:V4SI
- (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+ (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
(zero_extend:V4SI
- (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+ (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
  (const_vector:V4SI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (PLUS, V4HImode, operands)"
-  "pavgw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
-   (set_attr "mode" "DI")])
+  "@
+   pavgw\t{%2, %0|%0, %2}
+   pavgw\t{%2, %0|%0, %2}
+   vpavgw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_psadbw"
   [(set (match_operand:V1DI 0 "register_operand" "=y")
-- 
2.20.1



[PATCH 30/40] i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX ssse3_pmulhrswv4hi3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/sse.md (*ssse3_pmulhrswv4hi3): Add SSE emulation.
---
 gcc/config/i386/sse.md | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a7d0889f3e1..cc7dbe79fa7 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15677,25 +15677,31 @@
(set_attr "mode" "")])
 
 (define_insn "*ssse3_pmulhrswv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(plus:V4SI
  (lshiftrt:V4SI
(mult:V4SI
  (sign_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
  (sign_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
(const_int 14))
  (match_operand:V4HI 3 "const1_operand"))
(const_int 1]
-  "TARGET_SSSE3 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "pmulhrsw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseimul")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && TARGET_SSSE3
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "@
+   pmulhrsw\t{%2, %0|%0, %2}
+   pmulhrsw\t{%2, %0|%0, %2}
+   vpmulhrsw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseimul")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "_pshufb3"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x,v")
-- 
2.20.1



[PATCH 24/40] i386: Emulate MMX mmx_psadbw with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mmx_psadbw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_psadbw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index e1432edcd3d..0c08aebb071 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1782,14 +1782,19 @@
(set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_psadbw"
-  [(set (match_operand:V1DI 0 "register_operand" "=y")
-(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "0")
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")]
+  [(set (match_operand:V1DI 0 "register_operand" "=y,x,Yv")
+(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "0,0,Yv")
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")]
 UNSPEC_PSADBW))]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "psadbw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   psadbw\t{%2, %0|%0, %2}
+   psadbw\t{%2, %0|%0, %2}
+   vpsadbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn_and_split "mmx_pmovmskb"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-- 
2.20.1



[PATCH 22/40] i386: Emulate MMX mmx_uavgv8qi3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mmx_uavgv8qi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_uavgv8qi3): Also check TARGET_MMX
and TARGET_MMX_WITH_SSE.
(*mmx_uavgv8qi3): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 1fdd09242af..b0009afc35d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1698,42 +1698,47 @@
  (const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "TARGET_SSE || TARGET_3DNOW"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (PLUS, V8QImode, operands);")
 
 (define_insn "*mmx_uavgv8qi3"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
(truncate:V8QI
  (lshiftrt:V8HI
(plus:V8HI
  (plus:V8HI
(zero_extend:V8HI
- (match_operand:V8QI 1 "nonimmediate_operand" "%0"))
+ (match_operand:V8QI 1 "nonimmediate_operand" "%0,0,Yv"))
(zero_extend:V8HI
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")))
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")))
  (const_vector:V8HI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)
  (const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "(TARGET_SSE || TARGET_3DNOW)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (PLUS, V8QImode, operands)"
 {
   /* These two instructions have the same operation, but their encoding
  is different.  Prefer the one that is de facto standard.  */
-  if (TARGET_SSE || TARGET_3DNOW_A)
+  if (TARGET_MMX_WITH_SSE && TARGET_AVX)
+return "vpavgb\t{%2, %1, %0|%0, %1, %2}";
+  else if (TARGET_SSE || TARGET_3DNOW_A)
 return "pavgb\t{%2, %0|%0, %2}";
   else
 return "pavgusb\t{%2, %0|%0, %2}";
 }
-  [(set_attr "type" "mmxshft")
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
(set (attr "prefix_extra")
  (if_then_else
(not (ior (match_test "TARGET_SSE")
 (match_test "TARGET_3DNOW_A")))
(const_string "1")
(const_string "*")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_uavgv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 34/40] i386: Emulate MMX abs2 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX abs2 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/sse.md (abs2): Add SSE emulation.
---
 gcc/config/i386/sse.md | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 1a0549c66fb..91e46fcfba4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15991,16 +15991,19 @@
 })
 
 (define_insn "abs2"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,Yv")
(abs:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "ym")))]
-  "TARGET_SSSE3"
-  "pabs\t{%1, %0|%0, %1}";
-  [(set_attr "type" "sselog1")
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "ym,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   pabs\t{%1, %0|%0, %1}
+   %vpabs\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "sselog1")
(set_attr "prefix_rep" "0")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 36/40] i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE

2019-02-14 Thread H.J. Lu
PR target/89021
* config/i386/i386.c (ix86_expand_vector_init_duplicate): Set
mmx_ok to true if TARGET_MMX_WITH_SSE is true.
(ix86_expand_vector_init_one_nonzero): Likewise.
(ix86_expand_vector_init_one_var): Likewise.
(ix86_expand_vector_init_general): Likewise.
(ix86_expand_vector_init): Likewise.
(ix86_expand_vector_set): Likewise.
(ix86_expand_vector_extract): Likewise.
* config/i386/mmx.md (*vec_dupv2sf): Changed to
define_insn_and_split to support SSE emulation.
(*vec_extractv2sf_0): Likewise.
(*vec_extractv2sf_1): Likewise.
(*vec_extractv2si_0): Likewise.
(*vec_extractv2si_1): Likewise.
(*vec_extractv2si_zext_mem): Likewise.
(vec_setv2sf): Also allow TARGET_MMX_WITH_SSE.
(vec_extractv2sf_1 splitter): Likewise.
(vec_extractv2sfsf): Likewise.
(vec_setv2si): Likewise.
(vec_extractv2si_1 splitter): Likewise.
(vec_extractv2sisi): Likewise.
(vec_setv4hi): Likewise.
(vec_extractv4hihi): Likewise.
(vec_setv8qi): Likewise.
(vec_extractv8qiqi): Likewise.
---
 gcc/config/i386/i386.c |  8 +
 gcc/config/i386/mmx.md | 69 +++---
 2 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index dce4038685e..a9abbe8706b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42625,6 +42625,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
 {
   bool ok;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SImode:
@@ -42784,6 +42785,7 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, 
machine_mode mode,
   bool use_vector_set = false;
   rtx (*gen_vec_set_0) (rtx, rtx, rtx) = NULL;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2DImode:
@@ -42977,6 +42979,7 @@ ix86_expand_vector_init_one_var (bool mmx_ok, 
machine_mode mode,
   XVECEXP (const_vec, 0, one_var) = CONST0_RTX (GET_MODE_INNER (mode));
   const_vec = gen_rtx_CONST_VECTOR (mode, XVEC (const_vec, 0));
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2DFmode:
@@ -43362,6 +43365,7 @@ ix86_expand_vector_init_general (bool mmx_ok, 
machine_mode mode,
   machine_mode quarter_mode = VOIDmode;
   int n, i;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SFmode:
@@ -43561,6 +43565,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx 
vals)
   int i;
   rtx x;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
+
   /* Handle first initialization from vector elts.  */
   if (n_elts != XVECLEN (vals, 0))
 {
@@ -43660,6 +43666,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx 
val, int elt)
   machine_mode mmode = VOIDmode;
   rtx (*gen_blendm) (rtx, rtx, rtx, rtx);
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SFmode:
@@ -44015,6 +44022,7 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, 
rtx vec, int elt)
   bool use_vec_extr = false;
   rtx tmp;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SImode:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 81ee6250051..867d87ce644 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -555,14 +555,23 @@
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2SF")])
 
-(define_insn "*vec_dupv2sf"
-  [(set (match_operand:V2SF 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2sf"
+  [(set (match_operand:V2SF 0 "register_operand" "=y,x,Yv")
(vec_duplicate:V2SF
- (match_operand:SF 1 "register_operand" "0")))]
-  "TARGET_MMX"
-  "punpckldq\t%0, %0"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+ (match_operand:SF 1 "register_operand" "0,0,Yv")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   punpckldq\t%0, %0
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+   (vec_duplicate:V4SF (match_dup 1)))]
+  "operands[0] = lowpart_subreg (V4SFmode, operands[0],
+GET_MODE (operands[0]));"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "*mmx_concatv2sf"
   [(set (match_operand:V2SF 0 "register_operand" "=y,y")
@@ -580,7 +589,7 @@
   [(match_operand:V2SF 0 "register_operand")
(match_operand:SF 1 "register_operand")
(match_operand 2 "const_int_operand")]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_set (false, operands[0], operands[1],
  INTVAL (operands[2]));
@@ -594,11 +603,13 @@
(vec_select:SF
  (match_operand:V2SF 1 "nonimmediate_operand" " xm,x,ym,y,m,m")
  (parallel [(const_int 0)])))]
-  "TARGET_MMX && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && !(MEM_

[PATCH 37/40] i386: Allow MMX intrinsic emulation with SSE

2019-02-14 Thread H.J. Lu
Allow MMX intrinsic emulation with SSE/SSE2/SSSE3.  Don't enable MMX ISA
by default with TARGET_MMX_WITH_SSE.

For pr82483-1.c and pr82483-2.c, "-mssse3 -mno-mmx" compiles in 64-bit
mode since MMX intrinsics can be emulated wit SSE.

gcc/

PR target/89021
* config/i386/i386-builtin.def: Enable MMX intrinsics with
SSE/SSE2/SSSE3.
* config/i386/i386.c (ix86_option_override_internal): Don't
enable MMX ISA with TARGET_MMX_WITH_SSE by default.
(ix86_init_mmx_sse_builtins): Enable MMX intrinsics with
SSE/SSE2/SSSE3.
(ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate MMX
intrinsics with TARGET_MMX_WITH_SSE.
* config/i386/mmintrin.h: Don't require MMX in 64-bit mode.

gcc/testsuite/

PR target/89021
* gcc.target/i386/pr82483-1.c: Error only on ia32.
* gcc.target/i386/pr82483-2.c: Likewise.
---
 gcc/config/i386/i386-builtin.def  | 126 +++---
 gcc/config/i386/i386.c|  46 ++--
 gcc/config/i386/mmintrin.h|  10 +-
 gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
 5 files changed, 110 insertions(+), 76 deletions(-)

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 88005f4687f..10a9d631f29 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -100,7 +100,7 @@ BDESC (0, 0, CODE_FOR_fnstsw, "__builtin_ia32_fnstsw", 
IX86_BUILTIN_FNSTSW, UNKN
 BDESC (0, 0, CODE_FOR_fnclex, "__builtin_ia32_fnclex", IX86_BUILTIN_FNCLEX, 
UNKNOWN, (int) VOID_FTYPE_VOID)
 
 /* MMX */
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_emms, "__builtin_ia32_emms", 
IX86_BUILTIN_EMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
+BDESC (OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_mmx_emms, 
"__builtin_ia32_emms", IX86_BUILTIN_EMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
 
 /* 3DNow! */
 BDESC (OPTION_MASK_ISA_3DNOW, 0, CODE_FOR_mmx_femms, "__builtin_ia32_femms", 
IX86_BUILTIN_FEMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
@@ -442,68 +442,68 @@ BDESC (0, 0, CODE_FOR_rotrqi3, "__builtin_ia32_rorqi", 
IX86_BUILTIN_RORQI, UNKNO
 BDESC (0, 0, CODE_FOR_rotrhi3, "__builtin_ia32_rorhi", IX86_BUILTIN_RORHI, 
UNKNOWN, (int) UINT16_FTYPE_UINT16_INT)
 
 /* MMX */
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv8qi3, "__builtin_ia32_paddb", 
IX86_BUILTIN_PADDB, UNKNOWN, (int) V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv4hi3, "__builtin_ia32_paddw", 
IX86_BUILTIN_PADDW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv2si3, "__builtin_ia32_paddd", 
IX86_BUILTIN_PADDD, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv8qi3, "__builtin_ia32_psubb", 
IX86_BUILTIN_PSUBB, UNKNOWN, (int) V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv4hi3, "__builtin_ia32_psubw", 
IX86_BUILTIN_PSUBW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv2si3, "__builtin_ia32_psubd", 
IX86_BUILTIN_PSUBD, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ssaddv8qi3, 
"__builtin_ia32_paddsb", IX86_BUILTIN_PADDSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ssaddv4hi3, 
"__builtin_ia32_paddsw", IX86_BUILTIN_PADDSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_sssubv8qi3, 
"__builtin_ia32_psubsb", IX86_BUILTIN_PSUBSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_sssubv4hi3, 
"__builtin_ia32_psubsw", IX86_BUILTIN_PSUBSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_usaddv8qi3, 
"__builtin_ia32_paddusb", IX86_BUILTIN_PADDUSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_usaddv4hi3, 
"__builtin_ia32_paddusw", IX86_BUILTIN_PADDUSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ussubv8qi3, 
"__builtin_ia32_psubusb", IX86_BUILTIN_PSUBUSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ussubv4hi3, 
"__builtin_ia32_psubusw", IX86_BUILTIN_PSUBUSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_mulv4hi3, "__builtin_ia32_pmullw", 
IX86_BUILTIN_PMULLW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_smulv4hi3_highpart, 
"__builtin_ia32_pmulhw", IX86_BUILTIN_PMULHW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_andv2si3, "__builtin_ia32_pand", 
IX86_BUILTIN_PAND, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_andnotv2si3, 
"__builtin_ia32_pandn", IX86_BUILTIN_PANDN, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_iorv2si3, "__builtin_ia32_por", 
IX86_BUILTIN_POR, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC 

[PATCH 18/40] i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_v4hi3): Also check TARGET_MMX
and TARGET_MMX_WITH_SSE.
(mmx_v8qi3): Likewise.
(smaxmin:v4hi3): New.
(umaxmin:v8qi3): Likewise.
(smaxmin:*mmx_v4hi3): Add SSE emulation.
(umaxmin:*mmx_v8qi3): Likewise.
---
 gcc/config/i386/mmx.md | 60 +++---
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index b9f7c89cd55..dcc1bd1becf 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -925,38 +925,66 @@
 (smaxmin:V4HI
  (match_operand:V4HI 1 "nonimmediate_operand")
  (match_operand:V4HI 2 "nonimmediate_operand")))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
+
+(define_expand "v4hi3"
+  [(set (match_operand:V4HI 0 "register_operand")
+(smaxmin:V4HI
+ (match_operand:V4HI 1 "nonimmediate_operand")
+ (match_operand:V4HI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
 
 (define_insn "*mmx_v4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
 (smaxmin:V4HI
- (match_operand:V4HI 1 "nonimmediate_operand" "%0")
- (match_operand:V4HI 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+ (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (, V4HImode, operands)"
-  "pw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   pw\t{%2, %0|%0, %2}
+   pw\t{%2, %0|%0, %2}
+   vpw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_v8qi3"
   [(set (match_operand:V8QI 0 "register_operand")
 (umaxmin:V8QI
  (match_operand:V8QI 1 "nonimmediate_operand")
  (match_operand:V8QI 2 "nonimmediate_operand")))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "ix86_fixup_binary_operands_no_copy (, V8QImode, operands);")
+
+(define_expand "v8qi3"
+  [(set (match_operand:V8QI 0 "register_operand")
+(umaxmin:V8QI
+ (match_operand:V8QI 1 "nonimmediate_operand")
+ (match_operand:V8QI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, V8QImode, operands);")
 
 (define_insn "*mmx_v8qi3"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
 (umaxmin:V8QI
- (match_operand:V8QI 1 "nonimmediate_operand" "%0")
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+ (match_operand:V8QI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (, V8QImode, operands)"
-  "pb\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   pb\t{%2, %0|%0, %2}
+   pb\t{%2, %0|%0, %2}
+   vpb\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_ashr3"
   [(set (match_operand:MMXMODE24 0 "register_operand")
-- 
2.20.1



[PATCH 33/40] i386: Emulate MMX ssse3_palignrdi with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX version of palignrq with SSE version by concatenating 2
64-bit MMX operands into a single 128-bit SSE operand, followed by
SSE psrldq.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/sse.md (ssse3_palignrdi): Changed to
define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 58 ++
 1 file changed, 48 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f235fe36a2d..1a0549c66fb 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15873,23 +15873,61 @@
(set_attr "prefix" "orig,vex,evex")
(set_attr "mode" "")])
 
-(define_insn "ssse3_palignrdi"
-  [(set (match_operand:DI 0 "register_operand" "=y")
-   (unspec:DI [(match_operand:DI 1 "register_operand" "0")
-   (match_operand:DI 2 "nonimmediate_operand" "ym")
-   (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n")]
+(define_insn_and_split "ssse3_palignrdi"
+  [(set (match_operand:DI 0 "register_operand" "=y,x,Yv")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "0,0,Yv")
+   (match_operand:DI 2 "nonimmediate_operand" "ym,x,Yv")
+   (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n,n,n")]
   UNSPEC_PALIGNR))]
-  "TARGET_SSSE3"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
 {
-  operands[3] = GEN_INT (INTVAL (operands[3]) / 8);
-  return "palignr\t{%3, %2, %0|%0, %2, %3}";
+  switch (which_alternative)
+{
+case 0:
+  operands[3] = GEN_INT (INTVAL (operands[3]) / 8);
+  return "palignr\t{%3, %2, %0|%0, %2, %3}";
+case 1:
+case 2:
+  return "#";
+default:
+  gcc_unreachable ();
+}
 }
-  [(set_attr "type" "sseishft")
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+   (lshiftrt:V1TI (match_dup 0) (match_dup 3)))]
+{
+  /* Emulate MMX palignrdi with SSE psrldq.  */
+  rtx op0 = lowpart_subreg (V2DImode, operands[0],
+   GET_MODE (operands[0]));
+  rtx insn;
+  if (TARGET_AVX)
+insn = gen_vec_concatv2di (op0, operands[2], operands[1]);
+  else
+{
+  /* NB: SSE can only concatenate OP0 and OP1 to OP0.  */
+  insn = gen_vec_concatv2di (op0, operands[1], operands[2]);
+  emit_insn (insn);
+  /* Swap bits 0:63 with bits 64:127.  */
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2),
+ GEN_INT (3),
+ GEN_INT (0),
+ GEN_INT (1)));
+  rtx op1 = lowpart_subreg (V4SImode, op0, GET_MODE (op0));
+  rtx op2 = gen_rtx_VEC_SELECT (V4SImode, op1, mask);
+  insn = gen_rtx_SET (op1, op2);
+}
+  emit_insn (insn);
+  operands[0] = lowpart_subreg (V1TImode, op0, GET_MODE (op0));
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseishft")
(set_attr "atom_unit" "sishuf")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 ;; Mode iterator to handle singularity w/ absence of V2DI and V4DI
 ;; modes for abs instruction on pre AVX-512 targets.
-- 
2.20.1



[PATCH 19/40] i386: Emulate MMX mmx_pmovmskb with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX mmx_pmovmskb with SSE by zero-extending result of SSE pmovmskb
from QImode to SImode.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_pmovmskb): Changed to
define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/mmx.md | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index dcc1bd1becf..9ff0db9c2ed 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1774,14 +1774,30 @@
   [(set_attr "type" "mmxshft")
(set_attr "mode" "DI")])
 
-(define_insn "mmx_pmovmskb"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:V8QI 1 "register_operand" "y")]
+(define_insn_and_split "mmx_pmovmskb"
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
+   (unspec:SI [(match_operand:V8QI 1 "register_operand" "y,x")]
   UNSPEC_MOVMSK))]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "pmovmskb\t{%1, %0|%0, %1}"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   pmovmskb\t{%1, %0|%0, %1}
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
+   (set (match_dup 0)
+   (zero_extend:SI (match_dup 2)))]
+{
+  /* Generate SSE pmovmskb and zero-extend from QImode to SImode.  */
+  operands[1] = lowpart_subreg (V16QImode, operands[1],
+   GET_MODE (operands[1]));
+  operands[2] = lowpart_subreg (QImode, operands[0],
+   GET_MODE (operands[0]));
+}
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxcvt,ssemov")
+   (set_attr "mode" "DI,TI")])
 
 (define_expand "mmx_maskmovq"
   [(set (match_operand:V8QI 0 "memory_operand")
-- 
2.20.1



[PATCH 29/40] i386: Emulate MMX ssse3_pmaddubsw with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX ssse3_pmaddubsw with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_pmaddubsw): Add SSE emulation.
---
 gcc/config/i386/sse.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index af6a305d63e..a7d0889f3e1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15569,17 +15569,17 @@
(set_attr "mode" "TI")])
 
 (define_insn "ssse3_pmaddubsw"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(ss_plus:V4HI
  (mult:V4HI
(zero_extend:V4HI
  (vec_select:V4QI
-   (match_operand:V8QI 1 "register_operand" "0")
+   (match_operand:V8QI 1 "register_operand" "0,0,Yv")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)])))
(sign_extend:V4HI
  (vec_select:V4QI
-   (match_operand:V8QI 2 "nonimmediate_operand" "ym")
+   (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)]
  (mult:V4HI
@@ -15591,13 +15591,17 @@
  (vec_select:V4QI (match_dup 2)
(parallel [(const_int 1) (const_int 3)
   (const_int 5) (const_int 7)]))]
-  "TARGET_SSSE3"
-  "pmaddubsw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   pmaddubsw\t{%2, %0|%0, %2}
+   pmaddubsw\t{%2, %0|%0, %2}
+   vpmaddubsw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "simul")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_mode_iterator PMULHRSW
   [V4HI V8HI (V16HI "TARGET_AVX2")])
-- 
2.20.1



[PATCH 27/40] i386: Emulate MMX ssse3_phwv4hi3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX ssse3_phwv4hi3 with SSE by moving bits
64:95 to bits 32:63 in SSE register.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_phwv4hi3):
Changed to define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b1bab15af41..97cbd250dd4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15257,13 +15257,13 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
-(define_insn "ssse3_phwv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+(define_insn_and_split "ssse3_phwv4hi3"
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(vec_concat:V4HI
  (vec_concat:V2HI
(ssse3_plusminus:HI
  (vec_select:HI
-   (match_operand:V4HI 1 "register_operand" "0")
+   (match_operand:V4HI 1 "register_operand" "0,0,Yv")
(parallel [(const_int 0)]))
  (vec_select:HI (match_dup 1) (parallel [(const_int 1)])))
(ssse3_plusminus:HI
@@ -15272,19 +15272,37 @@
  (vec_concat:V2HI
(ssse3_plusminus:HI
  (vec_select:HI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")
(parallel [(const_int 0)]))
  (vec_select:HI (match_dup 2) (parallel [(const_int 1)])))
(ssse3_plusminus:HI
  (vec_select:HI (match_dup 2) (parallel [(const_int 2)]))
  (vec_select:HI (match_dup 2) (parallel [(const_int 3)]))]
-  "TARGET_SSSE3"
-  "phw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   phw\t{%2, %0|%0, %2}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(const_int 0)]
+{
+  /* Generate SSE version of the operation.  */
+  rtx op0 = lowpart_subreg (V8HImode, operands[0],
+   GET_MODE (operands[0]));
+  rtx op1 = lowpart_subreg (V8HImode, operands[1],
+   GET_MODE (operands[1]));
+  rtx op2 = lowpart_subreg (V8HImode, operands[2],
+   GET_MODE (operands[2]));
+  emit_insn (gen_ssse3_phwv8hi3 (op0, op1, op2));
+  ix86_move_vector_high_sse_to_mmx (op0);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "complex")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "avx2_phdv8si3"
   [(set (match_operand:V8SI 0 "register_operand" "=x")
-- 
2.20.1



[PATCH 28/40] i386: Emulate MMX ssse3_phdv2si3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX ssse3_phdv2si3 with SSE by moving bits
64:95 to bits 32:63 in SSE register.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_phdv2si3):
Changed to define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 97cbd250dd4..af6a305d63e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15381,26 +15381,44 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
-(define_insn "ssse3_phdv2si3"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "ssse3_phdv2si3"
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
(vec_concat:V2SI
  (plusminus:SI
(vec_select:SI
- (match_operand:V2SI 1 "register_operand" "0")
+ (match_operand:V2SI 1 "register_operand" "0,0,Yv")
  (parallel [(const_int 0)]))
(vec_select:SI (match_dup 1) (parallel [(const_int 1)])))
  (plusminus:SI
(vec_select:SI
- (match_operand:V2SI 2 "nonimmediate_operand" "ym")
+ (match_operand:V2SI 2 "nonimmediate_operand" "ym,x,Yv")
  (parallel [(const_int 0)]))
(vec_select:SI (match_dup 2) (parallel [(const_int 1)])]
-  "TARGET_SSSE3"
-  "phd\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   phd\t{%2, %0|%0, %2}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(const_int 0)]
+{
+  /* Generate SSE version of the operation.  */
+  rtx op0 = lowpart_subreg (V4SImode, operands[0],
+   GET_MODE (operands[0]));
+  rtx op1 = lowpart_subreg (V4SImode, operands[1],
+   GET_MODE (operands[1]));
+  rtx op2 = lowpart_subreg (V4SImode, operands[2],
+   GET_MODE (operands[2]));
+  emit_insn (gen_ssse3_phdv4si3 (op0, op1, op2));
+  ix86_move_vector_high_sse_to_mmx (op0);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "complex")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "avx2_pmaddubsw256"
   [(set (match_operand:V16HI 0 "register_operand" "=x,v")
-- 
2.20.1



[PATCH 21/40] i386: Emulate MMX maskmovq with SSE2 maskmovdqu

2019-02-14 Thread H.J. Lu
Emulate MMX maskmovq with SSE2 maskmovdqu for TARGET_MMX_WITH_SSE by
zero-extending source and mask operands to 128 bits.  Handle unmapped
bits 64:127 at memory address by adjusting source and mask operands
together with memory address.

PR target/89021
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__MMX_WITH_SSE__ for TARGET_MMX_WITH_SSE.
* config/i386/xmmintrin.h: Emulate MMX maskmovq with SSE2
maskmovdqu for __MMX_WITH_SSE__.
---
 gcc/config/i386/i386-c.c|  2 ++
 gcc/config/i386/xmmintrin.h | 61 +
 2 files changed, 63 insertions(+)

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 5e7e46fcebe..213e1b56c6b 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -548,6 +548,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, "__CLDEMOTE__");
   if (isa_flag2 & OPTION_MASK_ISA_PTWRITE)
 def_or_undef (parse_in, "__PTWRITE__");
+  if (TARGET_MMX_WITH_SSE)
+def_or_undef (parse_in, "__MMX_WITH_SSE__");
   if (TARGET_IAMCU)
 {
   def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index 58284378514..a915f6c87d7 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1165,7 +1165,68 @@ _m_pshufw (__m64 __A, int const __N)
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_maskmove_si64 (__m64 __A, __m64 __N, char *__P)
 {
+#ifdef __MMX_WITH_SSE__
+  /* Emulate MMX maskmovq with SSE2 maskmovdqu and handle unmapped bits
+ 64:127 at address __P.  */
+  typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+  typedef char __v16qi __attribute__ ((__vector_size__ (16)));
+  /* Zero-extend __A and __N to 128 bits.  */
+  __v2di __A128 = __extension__ (__v2di) { ((__v1di) __A)[0], 0 };
+  __v2di __N128 = __extension__ (__v2di) { ((__v1di) __N)[0], 0 };
+
+  /* Check the alignment of __P.  */
+  __SIZE_TYPE__ offset = ((__SIZE_TYPE__) __P) & 0xf;
+  if (offset)
+{
+  /* If the misalignment of __P > 8, subtract __P by 8 bytes.
+Otherwise, subtract __P by the misalignment.  */
+  if (offset > 8)
+   offset = 8;
+  __P = (char *) (((__SIZE_TYPE__) __P) - offset);
+
+  /* Shift __A128 and __N128 to the left by the adjustment.  */
+  switch (offset)
+   {
+   case 1:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 8);
+ break;
+   case 2:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 2 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 2 * 8);
+ break;
+   case 3:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 3 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 3 * 8);
+ break;
+   case 4:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 4 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 4 * 8);
+ break;
+   case 5:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 5 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 5 * 8);
+ break;
+   case 6:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 6 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 6 * 8);
+ break;
+   case 7:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 7 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 7 * 8);
+ break;
+   case 8:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 8 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 8 * 8);
+ break;
+   default:
+ break;
+   }
+}
+  __builtin_ia32_maskmovdqu ((__v16qi)__A128, (__v16qi)__N128, __P);
+#else
   __builtin_ia32_maskmovq ((__v8qi)__A, (__v8qi)__N, __P);
+#endif
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-- 
2.20.1



[PATCH 26/40] i386: Emulate MMX umulv1siv1di3 with SSE2

2019-02-14 Thread H.J. Lu
Emulate MMX umulv1siv1di3 with SSE2.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (sse2_umulv1siv1di3): Add SSE emulation
support.
(*sse2_umulv1siv1di3): Add SSE2 emulation.
---
 gcc/config/i386/mmx.md | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 274e895f51e..a618a620eb1 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -911,24 +911,30 @@
(vec_select:V1SI
  (match_operand:V2SI 2 "nonimmediate_operand")
  (parallel [(const_int 0)])]
-  "TARGET_SSE2"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE2"
   "ix86_fixup_binary_operands_no_copy (MULT, V2SImode, operands);")
 
 (define_insn "*sse2_umulv1siv1di3"
-  [(set (match_operand:V1DI 0 "register_operand" "=y")
+  [(set (match_operand:V1DI 0 "register_operand" "=y,x,Yv")
 (mult:V1DI
  (zero_extend:V1DI
(vec_select:V1SI
- (match_operand:V2SI 1 "nonimmediate_operand" "%0")
+ (match_operand:V2SI 1 "nonimmediate_operand" "%0,0,Yv")
  (parallel [(const_int 0)])))
  (zero_extend:V1DI
(vec_select:V1SI
- (match_operand:V2SI 2 "nonimmediate_operand" "ym")
+ (match_operand:V2SI 2 "nonimmediate_operand" "ym,x,Yv")
  (parallel [(const_int 0)])]
-  "TARGET_SSE2 && ix86_binary_operator_ok (MULT, V2SImode, operands)"
-  "pmuludq\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && TARGET_SSE2
+   && ix86_binary_operator_ok (MULT, V2SImode, operands)"
+  "@
+   pmuludq\t{%2, %0|%0, %2}
+   pmuludq\t{%2, %0|%0, %2}
+   vpmuludq\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_v4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 32/40] i386: Emulate MMX ssse3_psign3 with SSE

2019-02-14 Thread H.J. Lu
Emulate MMX ssse3_psign3 with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_psign3): Add SSE emulation.
---
 gcc/config/i386/sse.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a92505c54a1..f235fe36a2d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15804,17 +15804,21 @@
(set_attr "mode" "")])
 
 (define_insn "ssse3_psign3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
(unspec:MMXMODEI
- [(match_operand:MMXMODEI 1 "register_operand" "0")
-  (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")]
+ [(match_operand:MMXMODEI 1 "register_operand" "0,0,Yv")
+  (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")]
  UNSPEC_PSIGN))]
-  "TARGET_SSSE3"
-  "psign\t{%2, %0|%0, %2}";
-  [(set_attr "type" "sselog1")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   psign\t{%2, %0|%0, %2}
+   psign\t{%2, %0|%0, %2}
+   vpsign\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "_palignr_mask"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=v")
-- 
2.20.1



[PATCH 38/40] i386: Enable TM MMX intrinsics with SSE2

2019-02-14 Thread H.J. Lu
This pach enables TM MMX intrinsics with SSE2 when MMX is disabled.

PR target/89021
* config/i386/i386.c (bdesc_tm): Enable MMX intrinsics with
SSE2.
---
 gcc/config/i386/i386.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1d417e08734..20219983462 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -31075,13 +31075,13 @@ static const struct builtin_description 
bdesc_##kind[] =  \
we're lazy.  Add casts to make them fit.  */
 static const struct builtin_description bdesc_tm[] =
 {
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_WM64", (enum 
ix86_builtins) BUILT_IN_TM_STORE_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_WaRM64", (enum 
ix86_builtins) BUILT_IN_TM_STORE_WAR_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_WaWM64", (enum 
ix86_builtins) BUILT_IN_TM_STORE_WAW_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_RM64", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_RaRM64", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_RAR_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_RaWM64", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_RAW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_RfWM64", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_RFW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_WM64", (enum ix86_builtins) BUILT_IN_TM_STORE_M64, UNKNOWN, 
VOID_FTYPE_PV2SI_V2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_WaRM64", (enum ix86_builtins) BUILT_IN_TM_STORE_WAR_M64, 
UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_WaWM64", (enum ix86_builtins) BUILT_IN_TM_STORE_WAW_M64, 
UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_RM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_M64, UNKNOWN, 
V2SI_FTYPE_PCV2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_RaRM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_RAR_M64, 
UNKNOWN, V2SI_FTYPE_PCV2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_RaWM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_RAW_M64, 
UNKNOWN, V2SI_FTYPE_PCV2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_RfWM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_RFW_M64, 
UNKNOWN, V2SI_FTYPE_PCV2SI },
 
   { OPTION_MASK_ISA_SSE, 0, CODE_FOR_nothing, "__builtin__ITM_WM128", (enum 
ix86_builtins) BUILT_IN_TM_STORE_M128, UNKNOWN, VOID_FTYPE_PV4SF_V4SF },
   { OPTION_MASK_ISA_SSE, 0, CODE_FOR_nothing, "__builtin__ITM_WaRM128", (enum 
ix86_builtins) BUILT_IN_TM_STORE_WAR_M128, UNKNOWN, VOID_FTYPE_PV4SF_V4SF },
@@ -31099,7 +31099,7 @@ static const struct builtin_description bdesc_tm[] =
   { OPTION_MASK_ISA_AVX, 0, CODE_FOR_nothing, "__builtin__ITM_RaWM256", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_RAW_M256, UNKNOWN, V8SF_FTYPE_PCV8SF },
   { OPTION_MASK_ISA_AVX, 0, CODE_FOR_nothing, "__builtin__ITM_RfWM256", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_RFW_M256, UNKNOWN, V8SF_FTYPE_PCV8SF },
 
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_LM64", (enum 
ix86_builtins) BUILT_IN_TM_LOG_M64, UNKNOWN, VOID_FTYPE_PCVOID },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_LM64", (enum ix86_builtins) BUILT_IN_TM_LOG_M64, UNKNOWN, 
VOID_FTYPE_PCVOID },
   { OPTION_MASK_ISA_SSE, 0, CODE_FOR_nothing, "__builtin__ITM_LM128", (enum 
ix86_builtins) BUILT_IN_TM_LOG_M128, UNKNOWN, VOID_FTYPE_PCVOID },
   { OPTION_MASK_ISA_AVX, 0, CODE_FOR_nothing, "__builtin__ITM_LM256", (enum 
ix86_builtins) BUILT_IN_TM_LOG_M256, UNKNOWN, VOID_FTYPE_PCVOID },
 };
-- 
2.20.1



Re: PR87689, PowerPC64 ELFv2 function parameter passing violation

2019-02-14 Thread Alan Modra
On Thu, Feb 14, 2019 at 10:32:50AM +0100, Richard Biener wrote:
> On Wed, Feb 13, 2019 at 7:59 AM Alan Modra  wrote:
> >
> > Covers for a generic fortran bug.  The effect is that we'll needlessly
> > waste 64 bytes of stack space on some calls, but I don't see any
> > simple and fully correct patch in generic code.  Bootstrapped and
> > regression tested powerpc64le-linux.  OK mainline and branches?
> 
> This looks very wrong to me ;)  It won't work when compiling with -flto
> for example.

Blah.  Nothing looks right to me. :)  That patch was really me giving
up on the bug (and hoping I'd found a suitable hack that could be
applied to gcc-8 and gcc-7).

> The frontend needs to be properly fixed.

You'll notice I didn't assign myself to the bug..

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PR fortran/89348, patch] Fortran Command Options documentation fixes

2019-02-14 Thread Thomas König
Mark,

> Patch and change log attached to PR.

Could you please submit this the normal way, with the ChangeLog in the text and 
the patch ad attachment?

Regards, Thomas


[PR fortran/89348, patch] Fortran Command Options documentation fixes

2019-02-14 Thread Mark Eggleston
Enabling of -fdec-include is missing from list of options enabled by 
-fdec. When rendered as a PDF some lines are too long in the list of 
options controlling Fortran dialect and in the list of options to 
request or suppress errors and warnings.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89348

gcc/fortran/ChangeLog:

2019-02-14  Mark Eggleston 

    PR fortran/89348
    * invoke.texi: Lines too long for PDF in option lists. Add
    -fdec-include to list of options enabled by -fdec.

OK for trunk? I do not have commit access.

regards, Mark


--
https://www.codethink.co.uk/privacy.html

>From 876b91cfa9aaed30a621fc80d98a9d2820170ad7 Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Tue, 12 Feb 2019 15:52:47 +
Subject: [PATCH] Fortran Command Options documentation fixes.

Enabling of -fdec-include missing from list of options enabled by -fdec.
Some lines were too long in the list of options controlling Fortran dialect
and in the list of options to request or suppress errors and warnings.
---
 gcc/fortran/invoke.texi | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 80804993522..27ad0721a7d 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -144,14 +144,15 @@ by type.  Explanations are in the following sections.
 @item Error and Warning Options
 @xref{Error and Warning Options,,Options to request or suppress errors
 and warnings}.
-@gccoptlist{-Waliasing -Wall -Wampersand -Wargument-mismatch -Warray-bounds
--Wc-binding-type -Wcharacter-truncation -Wconversion @gol
+@gccoptlist{-Waliasing -Wall -Wampersand -Wargument-mismatch @gol 
+-Warray-bounds -Wc-binding-type -Wcharacter-truncation -Wconversion @gol
 -Wdo-subscript -Wfunction-elimination -Wimplicit-interface @gol
--Wimplicit-procedure -Wintrinsic-shadow -Wuse-without-only -Wintrinsics-std @gol
--Wline-truncation -Wno-align-commons -Wno-tabs -Wreal-q-constant @gol
--Wsurprising -Wunderflow -Wunused-parameter -Wrealloc-lhs @gol
--Wrealloc-lhs-all -Wfrontend-loop-interchange -Wtarget-lifetime @gol
--fmax-errors=@var{n} -fsyntax-only -pedantic -pedantic-errors @gol
+-Wimplicit-procedure -Wintrinsic-shadow -Wuse-without-only @gol
+-Wintrinsics-std -Wline-truncation -Wno-align-commons -Wno-tabs @gol
+-Wreal-q-constant -Wsurprising -Wunderflow -Wunused-parameter @gol
+-Wrealloc-lhs -Wrealloc-lhs-all -Wfrontend-loop-interchange @gol
+-Wtarget-lifetime -fmax-errors=@var{n} -fsyntax-only -pedantic @gol
+-pedantic-errors
 }
 
 @item Debugging Options
@@ -183,15 +184,15 @@ and warnings}.
 @gccoptlist{-faggressive-function-elimination -fblas-matmul-limit=@var{n} @gol
 -fbounds-check -fcheck-array-temporaries @gol
 -fcheck=@var{} @gol
--fcoarray=@var{} -fexternal-blas -ff2c
+-fcoarray=@var{} -fexternal-blas -ff2c @gol
 -ffrontend-loop-interchange @gol
 -ffrontend-optimize @gol
 -finit-character=@var{n} -finit-integer=@var{n} -finit-local-zero @gol
 -finit-derived @gol
--finit-logical=@var{}
+-finit-logical=@var{} @gol
 -finit-real=@var{} @gol
 -finline-matmul-limit=@var{n} @gol
--fmax-array-constructor=@var{n} -fmax-stack-var-size=@var{n}
+-fmax-array-constructor=@var{n} -fmax-stack-var-size=@var{n} @gol
 -fno-align-commons @gol
 -fno-automatic -fno-protect-parens -fno-underscoring @gol
 -fsecond-underscore -fpack-derived -frealloc-lhs -frecursive @gol
@@ -251,6 +252,7 @@ full documentation.
 Other flags enabled by this switch are:
 @option{-fdollar-ok} @option{-fcray-pointer} @option{-fdec-structure}
 @option{-fdec-intrinsic-ints} @option{-fdec-static} @option{-fdec-math}
+@option{-fdec-include}
 
 If @option{-fd-lines-as-code}/@option{-fd-lines-as-comments} are unset, then
 @option{-fdec} also sets @option{-fd-lines-as-comments}.
-- 
2.11.0



Re: [PATCH] Add testcases for multiple -fsanitize=, -fno-sanitize= or -fno-sanitize-recover= options

2019-02-14 Thread H.J. Lu
On Thu, Feb 14, 2019 at 3:09 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch adds testcase coverage to make sure
> -f{,no-}sanitize{,-recover}= options are all passed to the compiler backend
> from the driver.
>
> All these tests were broken by the earlier option handling patch from H.J.:
> https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00492.html
> and as nothing in the testsuite revealed the patch broke this, I think we
> want to cover this in the testsuite.
>
> Tested on x86_64-linux with
> make check-gcc check-c++-all RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
> ubsan.exp=opts*'
> with current trunk (all tests PASS) and with trunk patched with the above
> patch (all tests FAIL).  Ok for trunk?
>
> 2019-02-14  Jakub Jelinek  
>
> * c-c++-common/ubsan/opts-1.c: New test.
> * c-c++-common/ubsan/opts-2.c: New test.
> * c-c++-common/ubsan/opts-3.c: New test.
> * c-c++-common/ubsan/opts-4.c: New test.

I got

UNRESOLVED: c-c++-common/ubsan/opts-1.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_divrem_overflow" 2
UNRESOLVED: c-c++-common/ubsan/opts-1.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_shift_out_of_bounds" 1
UNRESOLVED: c-c++-common/ubsan/opts-2.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_divrem_overflow" 2
UNRESOLVED: c-c++-common/ubsan/opts-2.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-not
optimized "__ubsan_handle_shift_out_of_bounds"
UNRESOLVED: c-c++-common/ubsan/opts-3.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_divrem_overflow" 1
UNRESOLVED: c-c++-common/ubsan/opts-3.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-not
optimized "__ubsan_handle_shift_out_of_bounds"
UNRESOLVED: c-c++-common/ubsan/opts-4.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_divrem_overflow_abort" 1
UNRESOLVED: c-c++-common/ubsan/opts-4.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_shift_out_of_bounds_abort" 1
UNRESOLVED: c-c++-common/ubsan/opts-4.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_type_mismatch_v1" 1
UNRESOLVED: c-c++-common/ubsan/opts-4.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-not
optimized "__ubsan_handle_type_mismatch_v1_abort"
UNRESOLVED: c-c++-common/ubsan/opts-1.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_divrem_overflow" 2
UNRESOLVED: c-c++-common/ubsan/opts-1.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_shift_out_of_bounds" 1
UNRESOLVED: c-c++-common/ubsan/opts-2.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_divrem_overflow" 2
UNRESOLVED: c-c++-common/ubsan/opts-2.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-not
optimized "__ubsan_handle_shift_out_of_bounds"
UNRESOLVED: c-c++-common/ubsan/opts-3.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_divrem_overflow" 1
UNRESOLVED: c-c++-common/ubsan/opts-3.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-not
optimized "__ubsan_handle_shift_out_of_bounds"
UNRESOLVED: c-c++-common/ubsan/opts-4.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_divrem_overflow_abort" 1
UNRESOLVED: c-c++-common/ubsan/opts-4.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_shift_out_of_bounds_abort" 1
UNRESOLVED: c-c++-common/ubsan/opts-4.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
optimized "__ubsan_handle_type_mismatch_v1" 1
UNRESOLVED: c-c++-common/ubsan/opts-4.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-not
optimized "__ubsan_handle_type_mismatch_v1_abort"

since  -flto suppresses -fdump-tree-optimized.

H.J.
> --- gcc/testsuite/c-c++-common/ubsan/opts-1.c.jj2019-02-14 
> 11:31:33.144895232 +0100
> +++ gcc/testsuite/c-c++-common/ubsan/opts-1.c   2019-02-14 11:33:23.049077585 
> +0100
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined -fsanitize=shift 
> -fsanitize=float-divide-by-zero -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "__ubsan_handle_divrem_overflow" 2 
> "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "__ubsan_handle_shift_out_of_bounds" 1 
> "optimized" } } */
> +
> +int
> +foo (int x, int y)
> +{
> +  return x / y;
> +}
> +
> +int
> +bar (int x, int y)
> +{
> +  return x << y;
> +}
> +
> +float
> +baz (float x, float

Re: V2 [PATCH] driver: Also prune joined switches with negation

2019-02-14 Thread H.J. Lu
On Thu, Feb 14, 2019 at 12:03:30PM +0100, Jakub Jelinek wrote:
> On Wed, Feb 13, 2019 at 06:27:51PM -0800, H.J. Lu wrote:
> > --- a/gcc/doc/options.texi
> > +++ b/gcc/doc/options.texi
> > @@ -227,7 +227,10 @@ options, their @code{Negative} properties should form 
> > a circular chain.
> >  For example, if options @option{-@var{a}}, @option{-@var{b}} and
> >  @option{-@var{c}} are mutually exclusive, their respective @code{Negative}
> >  properties should be @samp{Negative(@var{b})}, @samp{Negative(@var{c})}
> > -and @samp{Negative(@var{a})}.
> > +and @samp{Negative(@var{a})}.  @code{Negative} can be used together
> > +with @code{Joined} if there is no @code{RejectNegative} property.
> > +@code{Negative} is ignored if there is @code{Joined} without
> > +@code{RejectNegative}.
> 
> I think this doesn't describe what is implemented.
> 
> Something like:
>  the option name with the leading ``-'' removed.  This chain action will
>  propagate through the @code{Negative} property of the option to be
> -turned off.
> +turned off.  The driver will prune options, removing those that are
> +turned off by some later option.  This pruning is not done for options
> +with @code{Joined} or @code{JoinedOrMissing} properties, unless the
> +options have either @code{RejectNegative} property or the @code{Negative}
> +property mentions an option other than itself.
> 
>  As a consequence, if you have a group of mutually-exclusive
>  options, their @code{Negative} properties should form a circular chain.
> 
> ?
> 
> Otherwise LGTM, but Joseph is the options machinery maintainer, so I'll
> defer to him here.
> 

Here is the updated patch with a "-march=native -march=knl" testcase.

Thanks.

H.J.
---
When -march=native is passed to host_detect_local_cpu to the backend,
it overrides all command lines after it.  That means

$ gcc -march=native -march=skylake-avx512

is the treated as

$ gcc -march=skylake-avx512 -march=native

Prune joined switches with Negative and RejectNegative to allow
-march=skylake-avx512 to override previous -march=native on command-line.

gcc/

PR driver/69471
* opts-common.c (prune_options): Also prune joined switches
with Negative and RejectNegative.
* config/i386/i386.opt (march=): Add Negative(march=).
(mtune=): Add Negative(mtune=).
* doc/options.texi: Document Negative used together with Joined
and RejectNegative.

gcc/testsuite/

PR driver/69471
* gcc.dg/pr69471-1.c: New test.
* gcc.dg/pr69471-2.c: Likewise.
* gcc.target/i386/pr69471-3.c: Likewise.
---
 gcc/config/i386/i386.opt  |  4 ++--
 gcc/doc/options.texi  |  6 +-
 gcc/opts-common.c | 11 ---
 gcc/testsuite/gcc.dg/pr69471-1.c  |  9 +
 gcc/testsuite/gcc.dg/pr69471-2.c  |  8 
 gcc/testsuite/gcc.target/i386/pr69471-3.c | 11 +++
 6 files changed, 43 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr69471-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr69471-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr69471-3.c

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 9b93241f790..b7998ee7363 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -253,7 +253,7 @@ EnumValue
 Enum(ix86_align_data) String(cacheline) Value(ix86_align_data_type_cacheline)
 
 march=
-Target RejectNegative Joined Var(ix86_arch_string)
+Target RejectNegative Negative(march=) Joined Var(ix86_arch_string)
 Generate code for given CPU.
 
 masm=
@@ -510,7 +510,7 @@ Target Report Mask(TLS_DIRECT_SEG_REFS)
 Use direct references against %gs when accessing tls data.
 
 mtune=
-Target RejectNegative Joined Var(ix86_tune_string)
+Target RejectNegative Negative(mtune=) Joined Var(ix86_tune_string)
 Schedule code for given CPU.
 
 mtune-ctrl=
diff --git a/gcc/doc/options.texi b/gcc/doc/options.texi
index 0081243acab..1c83d241488 100644
--- a/gcc/doc/options.texi
+++ b/gcc/doc/options.texi
@@ -220,7 +220,11 @@ property is used.
 The option will turn off another option @var{othername}, which is
 the option name with the leading ``-'' removed.  This chain action will
 propagate through the @code{Negative} property of the option to be
-turned off.
+turned off.  The driver will prune options, removing those that are
+turned off by some later option.  This pruning is not done for options
+with @code{Joined} or @code{JoinedOrMissing} properties, unless the
+options have either @code{RejectNegative} property or the @code{Negative}
+property mentions an option other than itself.
 
 As a consequence, if you have a group of mutually-exclusive
 options, their @code{Negative} properties should form a circular chain.
diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index ee8898b22ec..edbb3ac9b6d 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -1015,7 +1015,9 @@ prune_options (struct cl_decoded_option **decoded_options,
  

Fix PR72715 "ICE in gfc_trans_omp_do, at fortran/trans-openmp.c:3164"

2019-02-14 Thread Thomas Schwinge
Hi!

PR72715 "ICE in gfc_trans_omp_do, at fortran/trans-openmp.c:3164" is the
OpenACC variant of the OpenMP PR60127 "ICE with OpenMP and DO CONCURRENT"
(trunk r210331) changes.

On Mon, 29 Aug 2016 14:33:07 -0700, Cesar Philippidis  
wrote:
> It looks like the fortran FE has some preliminary support for do
> concurrent loops, however it was not well tested, nor is do concurrent
> supported by the OpenACC spec.

(Tobias Burnus has written up in some further PRs and emails what the
issues are.)

> This patch teaches the fortran FE to
> error when an acc loop directive is applied to a do concurrent loop.
> 
> The reason why the existing do concurrent wasn't detected earlier is
> because the only tests that utilized do concurrent loops contained other
> expected failures, therefore the FE never successfully left the resolver
> stage. And this ICE occurred as the loop was being translated into gimple.
> 
> There's one other questionably use of EXEC_DO_CONCURRENT that involves
> the OpenACC cache directive. I've decided to leave gfc_exec_oacc_cache
> alone for the time being because the OpenACC spec does not explicitly
> define what it means by 'loop'. Then again, the user isn't required to
> explicitly mark acc loops inside acc kernels regions, so perhaps it
> would be better to leave the end user with more flexibility. On the
> other hand, it's debatable whether do concurrent loops should even be
> permitted inside acc offloaded regions.
> 
> I've applied this patch to gomp-4_0-branch. Is this OK for trunk, gcc-6
> and gcc-5?

Thanks.  As attached committed to trunk in r268875, and will backport to
release branches later.


Grüße
 Thomas


>From dac1fbf62c5293c4b6b2788a6d9677c73088df5d Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Thu, 14 Feb 2019 13:44:19 +
Subject: [PATCH] Fix PR72715 "ICE in gfc_trans_omp_do, at
 fortran/trans-openmp.c:3164"

The OpenACC 'resolve_oacc_nested_loops' function duplicates most code of the
OpenMP 'resolve_omp_do', but didn't include the PR60127 "ICE with OpenMP and DO
CONCURRENT" (trunk r210331) changes.  (Probably the two functions should be
unified?)

The Fortran DO CONCURRENT construct is a way to tell the compiler that loop
iterations don't have any interdependencies -- which is information that would
very well be suitable for OpenACC/OpenMP loops.  There are some "details"
however, see the discussion/references in PR60127, so for the time being, make
this a compile-time error instead of an ICE.

	gcc/fortran/
	* openmp.c (resolve_oacc_nested_loops): Error on do concurrent
	loops.

	gcc/testsuite/
	* gfortran.dg/goacc/loop-3-2.f95: Error on do concurrent loops.
	* gfortran.dg/goacc/loop-3.f95: Likewise.
	* gfortran.dg/goacc/pr72715.f90: New test.

Reviewed-by: Thomas Schwinge 

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@268875 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/fortran/ChangeLog| 6 ++
 gcc/fortran/openmp.c | 8 +++-
 gcc/testsuite/ChangeLog  | 7 +++
 gcc/testsuite/gfortran.dg/goacc/loop-3-2.f95 | 4 ++--
 gcc/testsuite/gfortran.dg/goacc/loop-3.f95   | 4 ++--
 gcc/testsuite/gfortran.dg/goacc/pr72715.f90  | 6 ++
 6 files changed, 30 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/pr72715.f90

diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index c573f77410c..71cef4f1884 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,9 @@
+2019-02-14  Cesar Philippidis  
+
+	PR fortran/72715
+	* openmp.c (resolve_oacc_nested_loops): Error on do concurrent
+	loops.
+
 2019-02-13  Martin Liska  
 
 	PR fortran/88649
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 15c5842dea4..8651afaee4f 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5760,7 +5760,13 @@ resolve_oacc_nested_loops (gfc_code *code, gfc_code* do_code, int collapse,
 		 "at %L", &do_code->loc);
 	  break;
 	}
-  gcc_assert (do_code->op == EXEC_DO || do_code->op == EXEC_DO_CONCURRENT);
+  if (do_code->op == EXEC_DO_CONCURRENT)
+	{
+	  gfc_error ("!$ACC LOOP cannot be a DO CONCURRENT loop at %L",
+		 &do_code->loc);
+	  break;
+	}
+  gcc_assert (do_code->op == EXEC_DO);
   if (do_code->ext.iterator->var->ts.type != BT_INTEGER)
 	gfc_error ("!$ACC LOOP iteration variable must be of type integer at %L",
 		   &do_code->loc);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index def998a14ee..6a649831c94 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2019-02-14  Cesar Philippidis  
+
+	PR fortran/72715
+	* gfortran.dg/goacc/loop-3-2.f95: Error on do concurrent loops.
+	* gfortran.dg/goacc/loop-3.f95: Likewise.
+	* gfortran.dg/goacc/pr72715.f90: New test.
+
 2019-02-14  Martin Liska  
 
 	PR rtl-optimization/89242
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-3-2.f95 b/gcc/testsuite/gfortran.dg/goacc/loop-3-2.f95
index 9be74a85919..c091084a4f5 100644
--- a/g

Re: GCC 8 backports

2019-02-14 Thread Martin Liška
On 2/14/19 12:23 PM, Martin Liška wrote:
> On 11/20/18 11:58 AM, Martin Liška wrote:
>> On 10/3/18 11:23 AM, Martin Liška wrote:
>>> On 9/25/18 8:48 AM, Martin Liška wrote:
 Hi.

 One more tested patch.

 Martin

>>>
>>> One more tested patch.
>>>
>>> Martin
>>>
>>
>> Hi.
>>
>> One another tested patch that I'm going to install.
>>
>> Martin
>>
> 
> Hi.
> 
> Another 2 patches that I've just tested and will install.
> 
> Martin
> 

One more patch.

Martin
>From a434c00b2d9a540152ea149244fa8df97f64def4 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 14 Feb 2019 14:25:48 +0100
Subject: [PATCH] Backport r268873

gcc/ChangeLog:

2019-02-14  Martin Liska  

	PR rtl-optimization/89242
	* dce.c (delete_unmarked_insns): Call free_dominance_info we
	process a transformation.

gcc/testsuite/ChangeLog:

2019-02-14  Martin Liska  

	PR rtl-optimization/89242
	* g++.dg/pr89242.C: New test.
---
 gcc/dce.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/dce.c b/gcc/dce.c
index 2ef4dd7bd3b..ce2edc43efb 100644
--- a/gcc/dce.c
+++ b/gcc/dce.c
@@ -642,7 +642,10 @@ delete_unmarked_insns (void)
 
   /* Deleted a pure or const call.  */
   if (must_clean)
-delete_unreachable_blocks ();
+{
+  delete_unreachable_blocks ();
+  free_dominance_info (CDI_DOMINATORS);
+}
 }
 
 
-- 
2.20.1



Re: [PATCH 08/40] i386: Emulate MMX ashr3/3 with SSE

2019-02-14 Thread Uros Bizjak
On Thu, Feb 14, 2019 at 1:30 PM H.J. Lu  wrote:
>
> Emulate MMX ashr3/3 with SSE.  Only SSE register
> source operand is allowed.
>
> PR target/89021
> * config/i386/mmx.md (mmx_ashr3): Changed to define_expand.
> Disallow TARGET_MMX_WITH_SSE.
> (mmx_3): Likewise.
> (ashr3): New.
> (*ashr3): Likewise.
> (3): Likewise.
> (*3): Likewise.

Please add "|| TARGET_MMX_WITH_SSE" with new constraints to
mmx_*3 insn instead and don't introduce unnecessary mmx_*
expander.

Uros.
> ---
>  gcc/config/i386/mmx.md | 68 --
>  1 file changed, 52 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 23c10dffc38..4738d6b428e 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -958,33 +958,69 @@
>[(set_attr "type" "mmxadd")
> (set_attr "mode" "DI")])
>
> -(define_insn "mmx_ashr3"
> -  [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
> +(define_expand "mmx_ashr3"
> +  [(set (match_operand:MMXMODE24 0 "register_operand")
>  (ashiftrt:MMXMODE24
> - (match_operand:MMXMODE24 1 "register_operand" "0")
> - (match_operand:DI 2 "nonmemory_operand" "yN")))]
> -  "TARGET_MMX"
> -  "psra\t{%2, %0|%0, %2}"
> -  [(set_attr "type" "mmxshft")
> + (match_operand:MMXMODE24 1 "register_operand")
> + (match_operand:DI 2 "nonmemory_operand")))]
> +  "TARGET_MMX || TARGET_MMX_WITH_SSE")
> +
> +(define_expand "ashr3"
> +  [(set (match_operand:MMXMODE24 0 "register_operand")
> +(ashiftrt:MMXMODE24
> + (match_operand:MMXMODE24 1 "register_operand")
> + (match_operand:DI 2 "nonmemory_operand")))]
> +  "TARGET_MMX_WITH_SSE")
> +
> +(define_insn "*ashr3"
> +  [(set (match_operand:MMXMODE24 0 "register_operand" "=y,x,Yv")
> +(ashiftrt:MMXMODE24
> + (match_operand:MMXMODE24 1 "register_operand" "0,0,Yv")
> + (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
> +  "TARGET_MMX || TARGET_MMX_WITH_SSE"
> +  "@
> +   psra\t{%2, %0|%0, %2}
> +   psra\t{%2, %0|%0, %2}
> +   vpsra\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> +   (set_attr "type" "mmxshft,sseishft,sseishft")
> (set (attr "length_immediate")
>   (if_then_else (match_operand 2 "const_int_operand")
> (const_string "1")
> (const_string "0")))
> -   (set_attr "mode" "DI")])
> +   (set_attr "mode" "DI,TI,TI")])
>
> -(define_insn "mmx_3"
> -  [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
> +(define_expand "mmx_3"
> +  [(set (match_operand:MMXMODE248 0 "register_operand")
>  (any_lshift:MMXMODE248
> - (match_operand:MMXMODE248 1 "register_operand" "0")
> - (match_operand:DI 2 "nonmemory_operand" "yN")))]
> -  "TARGET_MMX"
> -  "p\t{%2, %0|%0, %2}"
> -  [(set_attr "type" "mmxshft")
> + (match_operand:MMXMODE248 1 "register_operand")
> + (match_operand:DI 2 "nonmemory_operand")))]
> +  "TARGET_MMX || TARGET_MMX_WITH_SSE")
> +
> +(define_expand "3"
> +  [(set (match_operand:MMXMODE248 0 "register_operand")
> +(any_lshift:MMXMODE248
> + (match_operand:MMXMODE248 1 "register_operand")
> + (match_operand:DI 2 "nonmemory_operand")))]
> +  "TARGET_MMX_WITH_SSE")
> +
> +(define_insn "*3"
> +  [(set (match_operand:MMXMODE248 0 "register_operand" "=y,x,Yv")
> +(any_lshift:MMXMODE248
> + (match_operand:MMXMODE248 1 "register_operand" "0,0,Yv")
> + (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
> +  "TARGET_MMX || TARGET_MMX_WITH_SSE"
> +  "@
> +   p\t{%2, %0|%0, %2}
> +   p\t{%2, %0|%0, %2}
> +   vp\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> +   (set_attr "type" "mmxshft,sseishft,sseishft")
> (set (attr "length_immediate")
>   (if_then_else (match_operand 2 "const_int_operand")
> (const_string "1")
> (const_string "0")))
> -   (set_attr "mode" "DI")])
> +   (set_attr "mode" "DI,TI,TI")])
>
>  ;
>  ;;
> --
> 2.20.1
>


[PATCH] Enforce LWG DR 2566 requirement for container adaptors

2019-02-14 Thread Jonathan Wakely

Although there is no good use for stack> or similar
types with a mismatched value_type, it's possible somebody is doing that
and getting away with it currently. This patch only enforces the new
requirement for C++17 and later. During stage 1 we should consider
enforcing it for C++11 and C++14.

* doc/xml/manual/intro.xml: Document LWG 2566 status.
* include/bits/stl_queue.h (queue, priority_queue): Add static
assertions to enforce LWG 2566 requirement on value_type.
* include/bits/stl_stack.h (stack): Likewise.

Tested powerpc64le-linux, committed to trunk.


commit 69e8bd2a9cbae0b3e19abcd36615fc9e662db947
Author: Jonathan Wakely 
Date:   Thu Feb 14 09:58:17 2019 +

Enforce LWG DR 2566 requirement for container adaptors

Although there is no good use for stack> or similar
types with a mismatched value_type, it's possible somebody is doing that
and getting away with it currently. This patch only enforces the new
requirement for C++17 and later. During stage 1 we should consider
enforcing it for C++11 and C++14.

* doc/xml/manual/intro.xml: Document LWG 2566 status.
* include/bits/stl_queue.h (queue, priority_queue): Add static
assertions to enforce LWG 2566 requirement on value_type.
* include/bits/stl_stack.h (stack): Likewise.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index 71050a0cebc..2a3231f4eb4 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -1120,11 +1120,18 @@ requirements of the license of GCC.
 ill-formed.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="&DR;#2537">2537:
+   Requirements on the first template parameter of container 
adaptors
+   
+
+Add static assertions to enforce the requirement.
+
+
 http://www.w3.org/1999/xlink"; xlink:href="&DR;#2583">2583:
There is no way to supply an allocator for 
basic_string(str, pos)

 
-Add new constructor
+Add new constructor.
 
 
 http://www.w3.org/1999/xlink"; xlink:href="&DR;#2684">2684:
diff --git a/libstdc++-v3/include/bits/stl_queue.h 
b/libstdc++-v3/include/bits/stl_queue.h
index 6d092c9bbfe..1eb56810edc 100644
--- a/libstdc++-v3/include/bits/stl_queue.h
+++ b/libstdc++-v3/include/bits/stl_queue.h
@@ -118,7 +118,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
using _Uses = typename
  enable_if::value>::type;
-#endif
+
+#if __cplusplus >= 201703L
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2566. Requirements on the first template parameter of container
+  // adaptors
+  static_assert(is_same<_Tp, typename _Sequence::value_type>::value,
+ "value_type must be the same as the underlying container");
+#endif // C++17
+#endif // C++11
 
 public:
   typedef typename _Sequence::value_type   value_type;
@@ -451,17 +459,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
using _Uses = typename
  enable_if::value>::type;
-#endif
+
+#if __cplusplus >= 201703L
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2566. Requirements on the first template parameter of container
+  // adaptors
+  static_assert(is_same<_Tp, typename _Sequence::value_type>::value,
+ "value_type must be the same as the underlying container");
+#endif // C++17
+#endif // C++11
 
 public:
   typedef typename _Sequence::value_type   value_type;
-  typedef typename _Sequence::reference reference;
-  typedef typename _Sequence::const_reference const_reference;
-  typedef typename _Sequence::size_type size_type;
-  typedef  _Sequence   container_type;
+  typedef typename _Sequence::referencereference;
+  typedef typename _Sequence::const_reference  const_reference;
+  typedef typename _Sequence::size_typesize_type;
+  typedef  _Sequence   container_type;
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // DR 2684. priority_queue lacking comparator typedef
-  typedef _Compare value_compare;
+  typedef _Compare value_compare;
 
 protected:
   //  See queue::c for notes on these names.
diff --git a/libstdc++-v3/include/bits/stl_stack.h 
b/libstdc++-v3/include/bits/stl_stack.h
index e8443a78a05..28faab2e871 100644
--- a/libstdc++-v3/include/bits/stl_stack.h
+++ b/libstdc++-v3/include/bits/stl_stack.h
@@ -120,7 +120,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
using _Uses = typename
  enable_if::value>::type;
-#endif
+
+#if __cplusplus >= 201703L
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2566. Requirements on the first template parameter of container
+  // adaptors
+  static_assert(is_same<_Tp, typename _Sequence::value_type>::value,

[PATCH] LWG 2537 fix priority_queue constructors to establish invariant

2019-02-14 Thread Jonathan Wakely

This change is safe to make now (in stage 4), because the constructors
are currently incorrect and unusable (unless the supplied container
already contains a heap, in which case the new make_heap calls are
redundant but harmless).

* doc/xml/manual/intro.xml: Document LWG 2537 status.
* include/bits/stl_queue.h
(priority_queue(const Compare&, const Container&, const Alloc&))
(priority_queue(const Compare&, Container&&, const Alloc&)): Call
make_heap.
* testsuite/23_containers/priority_queue/dr2537.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit 00eb831064ca6c6cfff48b2f3f8b1217b0c5d527
Author: Jonathan Wakely 
Date:   Thu Feb 14 10:52:56 2019 +

LWG 2537 fix priority_queue constructors to establish invariant

This change is safe to make now (in stage 4), because the constructors
are currently incorrect and unusable (unless the supplied container
already contains a heap, in which case the new make_heap calls are
redundant but harmless).

* doc/xml/manual/intro.xml: Document LWG 2537 status.
* include/bits/stl_queue.h
(priority_queue(const Compare&, const Container&, const Alloc&))
(priority_queue(const Compare&, Container&&, const Alloc&)): Call
make_heap.
* testsuite/23_containers/priority_queue/dr2537.cc: New test.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index 2a3231f4eb4..656e32b00aa 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -1127,6 +1127,14 @@ requirements of the license of GCC.
 Add static assertions to enforce the requirement.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="&DR;#2566">2566:
+   Constructors for priority_queue taking allocators
+should call make_heap
+   
+
+Call make_heap.
+
+
 http://www.w3.org/1999/xlink"; xlink:href="&DR;#2583">2583:
There is no way to supply an allocator for 
basic_string(str, pos)

diff --git a/libstdc++-v3/include/bits/stl_queue.h 
b/libstdc++-v3/include/bits/stl_queue.h
index 1eb56810edc..dd1d5d9727a 100644
--- a/libstdc++-v3/include/bits/stl_queue.h
+++ b/libstdc++-v3/include/bits/stl_queue.h
@@ -520,14 +520,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
priority_queue(const _Compare& __x, const _Alloc& __a)
: c(__a), comp(__x) { }
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2537. Constructors [...] taking allocators should call make_heap
   template>
priority_queue(const _Compare& __x, const _Sequence& __c,
   const _Alloc& __a)
-   : c(__c, __a), comp(__x) { }
+   : c(__c, __a), comp(__x)
+   { std::make_heap(c.begin(), c.end(), comp); }
 
   template>
priority_queue(const _Compare& __x, _Sequence&& __c, const _Alloc& __a)
-   : c(std::move(__c), __a), comp(__x) { }
+   : c(std::move(__c), __a), comp(__x)
+   { std::make_heap(c.begin(), c.end(), comp); }
 
   template>
priority_queue(const priority_queue& __q, const _Alloc& __a)
diff --git a/libstdc++-v3/testsuite/23_containers/priority_queue/dr2537.cc 
b/libstdc++-v3/testsuite/23_containers/priority_queue/dr2537.cc
new file mode 100644
index 000..ecb51780ee5
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/priority_queue/dr2537.cc
@@ -0,0 +1,50 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+
+struct Q : std::priority_queue
+{
+  using priority_queue::priority_queue;
+
+  bool is_heap() const
+  { return std::is_heap(c.begin(), c.end()); }
+};
+
+void
+test01()
+{
+  const Q::value_compare cmp;
+  const Q::container_type c{ 2, 3, 5, 7, 11, 13, 17, 19, 23 };
+  const Q::container_type::allocator_type a;
+
+  Q q1(cmp, c, a);
+  VERIFY( q1.is_heap() );
+
+  auto c2 = c;
+  Q q2(cmp, std::move(c2), a);
+  VERIFY( q2.is_heap() );
+}
+
+int
+main()
+{
+  test01();
+}


[PATCH] Add std::timespec and std::timespec_get for C++17

2019-02-14 Thread Jonathan Wakely

* configure.ac: Check for C11 timespec_get function.
* crossconfig.m4 (freebsd, linux, gnu, cygwin, solaris, netbsd)
(openbsd): Likewise
* config.h.in: Regenerate.
* configure: Regenerate.
* include/c_global/ctime (timespec, timespec_get): Add to namespace
std for C++17 and up.

Tested powerpc64le-linux (and bootstrapped on aarch64-linux-gnu cross
compiler), committed to trunk.


commit 9c433c582ce9ca7a12a61a1e674ca7b3885318d8
Author: Jonathan Wakely 
Date:   Thu Feb 14 11:06:46 2019 +

Add std::timespec and std::timespec_get for C++17

* configure.ac: Check for C11 timespec_get function.
* crossconfig.m4 (freebsd, linux, gnu, cygwin, solaris, netbsd)
(openbsd): Likewise
* config.h.in: Regenerate.
* configure: Regenerate.
* include/c_global/ctime (timespec, timespec_get): Add to namespace
std for C++17 and up.

diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index ad5b4117cfd..6c98f270441 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -265,6 +265,9 @@ if $GLIBCXX_IS_NATIVE; then
   AC_CHECK_FUNCS(aligned_alloc posix_memalign memalign _aligned_malloc)
   AC_CHECK_FUNCS(_wfopen)
 
+  # C11 functions for C++17 library
+  AC_CHECK_FUNCS(timespec_get)
+
   # For iconv support.
   AM_ICONV
 
diff --git a/libstdc++-v3/crossconfig.m4 b/libstdc++-v3/crossconfig.m4
index 3de40dc138b..4a303008053 100644
--- a/libstdc++-v3/crossconfig.m4
+++ b/libstdc++-v3/crossconfig.m4
@@ -135,6 +135,7 @@ case "${host}" in
 fi
 AC_CHECK_FUNCS(__cxa_thread_atexit)
 AC_CHECK_FUNCS(aligned_alloc posix_memalign memalign _aligned_malloc)
+AC_CHECK_FUNCS(timespec_get)
 ;;
 
   *-fuchsia*)
@@ -194,6 +195,7 @@ case "${host}" in
 GCC_CHECK_TLS
 AC_CHECK_FUNCS(__cxa_thread_atexit_impl)
 AC_CHECK_FUNCS(aligned_alloc posix_memalign memalign _aligned_malloc)
+AC_CHECK_FUNCS(timespec_get)
 AM_ICONV
 ;;
   *-mingw32*)
@@ -221,6 +223,7 @@ case "${host}" in
   AC_DEFINE(HAVE_ISNANL)
 fi
 AC_CHECK_FUNCS(aligned_alloc posix_memalign memalign _aligned_malloc)
+AC_CHECK_FUNCS(timespec_get)
 ;;
   *-qnx6.1* | *-qnx6.2*)
 SECTION_FLAGS='-ffunction-sections -fdata-sections'
diff --git a/libstdc++-v3/include/c_global/ctime 
b/libstdc++-v3/include/c_global/ctime
index cdd3d8d7171..685c821b577 100644
--- a/libstdc++-v3/include/c_global/ctime
+++ b/libstdc++-v3/include/c_global/ctime
@@ -72,4 +72,13 @@ namespace std
   using ::strftime;
 } // namespace
 
+#if __cplusplus >= 201703L && defined(_GLIBCXX_HAVE_TIMESPEC_GET)
+#undef timespec_get
+namespace std
+{
+  using ::timespec;
+  using ::timespec_get;
+} // namespace std
+#endif
+
 #endif


Re: [PATCH 15/40] i386: Emulate MMX sse_cvtpi2ps with SSE

2019-02-14 Thread Uros Bizjak
On Thu, Feb 14, 2019 at 1:30 PM H.J. Lu  wrote:
>
> Emulate MMX sse_cvtpi2ps with SSE2 cvtdq2ps, preserving upper 64 bits of
> destination XMM register.  Only SSE register source operand is allowed.
>
> PR target/89021
> * config/i386/mmx.md (sse_cvtpi2ps): Renamed to ...
> (*mmx_cvtpi2ps): This.  Disabled for TARGET_MMX_WITH_SSE.
> (sse_cvtpi2ps): New.
> (mmx_cvtpi2ps_sse): Likewise.

Now you can merge both instructions together using:

(clobber (match_scratch:V4SF 3 "=X,x,Yv"))

Please note "X" for the original case where scratch is not needed.

Uros.

> ---
>  gcc/config/i386/sse.md | 77 --
>  1 file changed, 75 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 083f9ef0f44..b1bab15af41 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -4561,14 +4561,87 @@
>  ;;
>  ;
>
> -(define_insn "sse_cvtpi2ps"
> +(define_expand "sse_cvtpi2ps"
> +  [(set (match_operand:V4SF 0 "register_operand")
> +   (vec_merge:V4SF
> + (vec_duplicate:V4SF
> +   (float:V2SF (match_operand:V2SI 2 "nonimmediate_operand")))
> + (match_operand:V4SF 1 "register_operand")
> + (const_int 3)))]
> +  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
> +{
> +  if (TARGET_MMX_WITH_SSE)
> +{
> +  rtx op2 = force_reg (V2SImode, operands[2]);
> +  emit_insn (gen_mmx_cvtpi2ps_sse (operands[0], operands[1], op2));
> +  DONE;
> +}
> +})
> +
> +(define_insn_and_split "mmx_cvtpi2ps_sse"
> +  [(set (match_operand:V4SF 0 "register_operand" "=x,Yv")
> +   (vec_merge:V4SF
> + (vec_duplicate:V4SF
> +   (float:V2SF (match_operand:V2SI 2 "register_operand" "x,Yv")))
> + (match_operand:V4SF 1 "register_operand" "0,Yv")
> + (const_int 3)))
> +   (clobber (match_scratch:V4SF 3 "=x,Yv"))]
> +  "TARGET_MMX_WITH_SSE"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +{
> +  rtx op2 = lowpart_subreg (V4SImode, operands[2],
> +   GET_MODE (operands[2]));
> +  /* Generate SSE2 cvtdq2ps.  */
> +  rtx insn = gen_floatv4siv4sf2 (operands[3], op2);
> +  emit_insn (insn);
> +
> +  /* Merge operands[3] with operands[0].  */
> +  rtx mask, op1;
> +  if (TARGET_AVX)
> +{
> +  mask = gen_rtx_PARALLEL (VOIDmode,
> +  gen_rtvec (4, GEN_INT (0), GEN_INT (1),
> + GEN_INT (6), GEN_INT (7)));
> +  op1 = gen_rtx_VEC_CONCAT (V8SFmode, operands[3], operands[1]);
> +  op2 = gen_rtx_VEC_SELECT (V4SFmode, op1, mask);
> +  insn = gen_rtx_SET (operands[0], op2);
> +}
> +  else
> +{
> +  /* NB: SSE can only concatenate OP0 and OP3 to OP0.  */
> +  mask = gen_rtx_PARALLEL (VOIDmode,
> +  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
> + GEN_INT (4), GEN_INT (5)));
> +  op1 = gen_rtx_VEC_CONCAT (V8SFmode, operands[0], operands[3]);
> +  op2 = gen_rtx_VEC_SELECT (V4SFmode, op1, mask);
> +  insn = gen_rtx_SET (operands[0], op2);
> +  emit_insn (insn);
> +
> +  /* Swap bits 0:63 with bits 64:127.  */
> +  mask = gen_rtx_PARALLEL (VOIDmode,
> +  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
> + GEN_INT (0), GEN_INT (1)));
> +  rtx dest = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> +  op1 = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
> +  insn = gen_rtx_SET (dest, op1);
> +}
> +  emit_insn (insn);
> +  DONE;
> +}
> +  [(set_attr "isa" "noavx,avx")
> +   (set_attr "type" "ssecvt")
> +   (set_attr "mode" "V4SF")])
> +
> +(define_insn "*mmx_cvtpi2ps"
>[(set (match_operand:V4SF 0 "register_operand" "=x")
> (vec_merge:V4SF
>   (vec_duplicate:V4SF
> (float:V2SF (match_operand:V2SI 2 "nonimmediate_operand" "ym")))
>   (match_operand:V4SF 1 "register_operand" "0")
>   (const_int 3)))]
> -  "TARGET_SSE"
> +  "TARGET_SSE && !TARGET_MMX_WITH_SSE"
>"cvtpi2ps\t{%2, %0|%0, %2}"
>[(set_attr "type" "ssecvt")
> (set_attr "mode" "V4SF")])
> --
> 2.20.1
>


[PATCH] Add testcases for multiple -fsanitize=, -fno-sanitize= or -fno-sanitize-recover= options (take 2)

2019-02-14 Thread Jakub Jelinek
On Thu, Feb 14, 2019 at 05:48:29AM -0800, H.J. Lu wrote:
> I got
> 
> UNRESOLVED: c-c++-common/ubsan/opts-1.c   -O2 -flto
> -fuse-linker-plugin -fno-fat-lto-objects   scan-tree-dump-times
> optimized "__ubsan_handle_divrem_overflow" 2

Ah, yes, UNRESOLVED doesn't show up visible when running tests by hand,
rather than doing test_summary.  Here is an updated patch that adds the
needed dg-skip-if directives.  Ok for trunk?

2019-02-14  Jakub Jelinek  

* c-c++-common/ubsan/opts-1.c: New test.
* c-c++-common/ubsan/opts-2.c: New test.
* c-c++-common/ubsan/opts-3.c: New test.
* c-c++-common/ubsan/opts-4.c: New test.

--- gcc/testsuite/c-c++-common/ubsan/opts-1.c.jj2019-02-14 
11:31:33.144895232 +0100
+++ gcc/testsuite/c-c++-common/ubsan/opts-1.c   2019-02-14 11:33:23.049077585 
+0100
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -fsanitize=shift 
-fsanitize=float-divide-by-zero -fdump-tree-optimized" } */
+/* { dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" } } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_divrem_overflow" 2 
"optimized" } } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_shift_out_of_bounds" 1 
"optimized" } } */
+
+int
+foo (int x, int y)
+{
+  return x / y;
+}
+
+int
+bar (int x, int y)
+{
+  return x << y;
+}
+
+float
+baz (float x, float y)
+{
+  return x / y;
+}
--- gcc/testsuite/c-c++-common/ubsan/opts-2.c.jj2019-02-14 
11:33:29.806965829 +0100
+++ gcc/testsuite/c-c++-common/ubsan/opts-2.c   2019-02-14 11:34:03.169414166 
+0100
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -fno-sanitize=shift 
-fsanitize=float-divide-by-zero -fdump-tree-optimized" } */
+/* { dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" } } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_divrem_overflow" 2 
"optimized" } } */
+/* { dg-final { scan-tree-dump-not "__ubsan_handle_shift_out_of_bounds" 
"optimized" } } */
+
+int
+foo (int x, int y)
+{
+  return x / y;
+}
+
+int
+bar (int x, int y)
+{
+  return x << y;
+}
+
+float
+baz (float x, float y)
+{
+  return x / y;
+}
--- gcc/testsuite/c-c++-common/ubsan/opts-3.c.jj2019-02-14 
11:34:10.538292322 +0100
+++ gcc/testsuite/c-c++-common/ubsan/opts-3.c   2019-02-14 11:34:35.512879358 
+0100
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -fno-sanitize=shift 
-fno-sanitize=float-divide-by-zero -fdump-tree-optimized" } */
+/* { dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" } } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_divrem_overflow" 1 
"optimized" } } */
+/* { dg-final { scan-tree-dump-not "__ubsan_handle_shift_out_of_bounds" 
"optimized" } } */
+
+int
+foo (int x, int y)
+{
+  return x / y;
+}
+
+int
+bar (int x, int y)
+{
+  return x << y;
+}
+
+float
+baz (float x, float y)
+{
+  return x / y;
+}
--- gcc/testsuite/c-c++-common/ubsan/opts-4.c.jj2019-02-14 
11:40:35.771922337 +0100
+++ gcc/testsuite/c-c++-common/ubsan/opts-4.c   2019-02-14 11:40:29.220030674 
+0100
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined 
-fno-sanitize-recover=integer-divide-by-zero -fno-sanitize-recover=shift 
-fdump-tree-optimized" } */
+/* { dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" } } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_divrem_overflow_abort" 1 
"optimized" } } */
+/* { dg-final { scan-tree-dump-times 
"__ubsan_handle_shift_out_of_bounds_abort" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_type_mismatch_v1" 1 
"optimized" } } */
+/* { dg-final { scan-tree-dump-not "__ubsan_handle_type_mismatch_v1_abort" 
"optimized" } } */
+
+int
+foo (int x, int y)
+{
+  return x / y;
+}
+
+int
+bar (int x, int y)
+{
+  return x << y;
+}
+
+enum E { E0, E1, E2, E3 };
+
+enum E
+baz (enum E *x)
+{
+  return *x;
+}


Jakub


Re: [PATCH 25/40] i386: Emulate MMX movntq with SSE2 movntidi

2019-02-14 Thread Uros Bizjak
On Thu, Feb 14, 2019 at 1:30 PM H.J. Lu  wrote:
>
> Emulate MMX movntq with SSE2 movntidi.  Only SSE register source operand
> is allowed.

There is no SSE register source operand. Probably "Only register
source operand is allowed."

Uros.

>
> PR target/89021
> * config/i386/mmx.md (sse_movntq): Add SSE2 emulation.
> ---
>  gcc/config/i386/mmx.md | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 0c08aebb071..274e895f51e 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -214,12 +214,16 @@
>  })
>
>  (define_insn "sse_movntq"
> -  [(set (match_operand:DI 0 "memory_operand" "=m")
> -   (unspec:DI [(match_operand:DI 1 "register_operand" "y")]
> +  [(set (match_operand:DI 0 "memory_operand" "=m,m")
> +   (unspec:DI [(match_operand:DI 1 "register_operand" "y,r")]
>UNSPEC_MOVNTQ))]
> -  "TARGET_SSE || TARGET_3DNOW_A"
> -  "movntq\t{%1, %0|%0, %1}"
> -  [(set_attr "type" "mmxmov")
> +  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
> +   && (TARGET_SSE || TARGET_3DNOW_A)"
> +  "@
> +   movntq\t{%1, %0|%0, %1}
> +   movnti\t{%1, %0|%0, %1}"
> +  [(set_attr "mmx_isa" "native,x64")
> +   (set_attr "type" "mmxmov,ssemov")
> (set_attr "mode" "DI")])
>
>  ;
> --
> 2.20.1
>


Re: [PATCH 31/40] i386: Emulate MMX pshufb with SSE version

2019-02-14 Thread Uros Bizjak
On Thu, Feb 14, 2019 at 1:30 PM H.J. Lu  wrote:
>
> Emulate MMX version of pshufb with SSE version by masking out the bit 3
> of the shuffle control byte.  Only SSE register source operand is allowed.
>
> PR target/89021
> * config/i386/sse.md (ssse3_pshufbv8qi3): Renamed to ...
> (ssse3_pshufbv8qi3_mmx): This.
> (ssse3_pshufbv8qi3): New.
> (ssse3_pshufbv8qi3_sse): Likewise.

These insns can also be merged together using

 (clobber (match_scratch:V4SI 3 "=X,x,Yv"))

Uros.
> ---
>  gcc/config/i386/sse.md | 56 --
>  1 file changed, 54 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index cc7dbe79fa7..a92505c54a1 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -15722,18 +15722,70 @@
> (set_attr "btver2_decode" "vector")
> (set_attr "mode" "")])
>
> -(define_insn "ssse3_pshufbv8qi3"
> +(define_expand "ssse3_pshufbv8qi3"
> +  [(set (match_operand:V8QI 0 "register_operand")
> +   (unspec:V8QI [(match_operand:V8QI 1 "register_operand")
> + (match_operand:V8QI 2 "nonimmediate_operand")]
> +UNSPEC_PSHUFB))]
> +  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
> +{
> +  if (TARGET_MMX_WITH_SSE)
> +{
> +  rtx op2 = force_reg (V8QImode, operands[2]);
> +  emit_insn (gen_ssse3_pshufbv8qi3_sse (operands[0], operands[1],
> +   op2));
> +  DONE;
> +}
> +})
> +
> +(define_insn "ssse3_pshufbv8qi3_mmx"
>[(set (match_operand:V8QI 0 "register_operand" "=y")
> (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0")
>   (match_operand:V8QI 2 "nonimmediate_operand" "ym")]
>  UNSPEC_PSHUFB))]
> -  "TARGET_SSSE3"
> +  "TARGET_SSSE3 && !TARGET_MMX_WITH_SSE"
>"pshufb\t{%2, %0|%0, %2}";
>[(set_attr "type" "sselog1")
> (set_attr "prefix_extra" "1")
> (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p 
> (insn)"))
> (set_attr "mode" "DI")])
>
> +(define_insn_and_split "ssse3_pshufbv8qi3_sse"
> +  [(set (match_operand:V8QI 0 "register_operand" "=x,Yv")
> +   (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0,Yv")
> + (match_operand:V8QI 2 "register_operand" "x,Yv")]
> +UNSPEC_PSHUFB))
> +   (clobber (match_scratch:V4SI 3 "=x,Yv"))]
> +  "TARGET_SSSE3 && TARGET_MMX_WITH_SSE"
> +  "#"
> +  "reload_completed"
> +  [(set (match_dup 3) (match_dup 5))
> +   (set (match_dup 3)
> +   (and:V4SI (match_dup 3) (match_dup 2)))
> +   (set (match_dup 0)
> +   (unspec:V16QI [(match_dup 1) (match_dup 4)] UNSPEC_PSHUFB))]
> +{
> +  /* Emulate MMX version of pshufb with SSE version by masking out the
> + bit 3 of the shuffle control byte.  */
> +  operands[0] = lowpart_subreg (V16QImode, operands[0],
> +   GET_MODE (operands[0]));
> +  operands[1] = lowpart_subreg (V16QImode, operands[1],
> +   GET_MODE (operands[1]));
> +  operands[2] = lowpart_subreg (V4SImode, operands[2],
> +   GET_MODE (operands[2]));
> +  operands[4] = lowpart_subreg (V16QImode, operands[3],
> +   GET_MODE (operands[3]));
> +  rtvec par = gen_rtvec (4, GEN_INT (0xf7f7f7f7),
> +GEN_INT (0xf7f7f7f7),
> +GEN_INT (0xf7f7f7f7),
> +GEN_INT (0xf7f7f7f7));
> +  rtx vec_const = gen_rtx_CONST_VECTOR (V4SImode, par);
> +  operands[5] = force_const_mem (V4SImode, vec_const);
> +}
> +  [(set_attr "mmx_isa" "x64_noavx,x64_avx")
> +   (set_attr "type" "sselog1")
> +   (set_attr "mode" "TI,TI")])
> +
>  (define_insn "_psign3"
>[(set (match_operand:VI124_AVX2 0 "register_operand" "=x,x")
> (unspec:VI124_AVX2
> --
> 2.20.1
>


Re: Move -Wmaybe-uninitialized to -Wextra

2019-02-14 Thread Tom Tromey
> "Marc" == Marc Glisse  writes:

>> Lastly, in the case of uninitialized variables, the usual solution
>> of initializing them is trivial and always safe (some coding styles
>> even require it).

Marc> Here it shows that we don't work with the same type of code at all. If
Marc> I am using a boost::optional, i.e. a class with a buffer and a boolean
Marc> that says if the buffer is initialized, how do I initialize the
Marc> (private) buffer? Or should boost itself zero out the buffer whenever
Marc> the boolean is set to false?

This is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80635 (I know you
know, but maybe others on the thread don't).

I think in this specific case (std::optional and similar classes), GCC
should provide a way for the class to indicate that
-Wmaybe-uninitialized should not apply to the payload.

>> A shared definition of a false positive should be one of the very
>> first steps to coming closer to a consensus.  Real world (as opposed
>> to anecdotal) data on the rates of actual rates of false positives
>> and negatives vs true positives would be also most helpful, as would
>> some consensus of the severity of the bugs the true positives
>> expose, as well as some objective measure of the ease of
>> suppression.  There probably are others but these would be a start.

Marc> This data is going to be super hard to get. Most projects have been
Marc> compiling for years and tweaking their code to avoid some warnings. We
Marc> do not get to see the code that people originally write, we can only
Marc> see what they commit.

gdb has gone through this over the years -- it turns on many warnings
and sometimes false positives show up.  Most of the time there's a
comment, for -Wmaybe-uninitialized grep for "init.*gcc" in the source.
Unfortunately the comment isn't standardized; but I only get ~20 hits
for this in gdb, so it isn't really so bad in practice.

Tom


Re: [Patch] [arm] Fix 88714, Arm LDRD/STRD peepholes

2019-02-14 Thread Kyrill Tkachov



On 2/11/19 2:35 PM, Matthew Malcomson wrote:

On 10/02/19 09:42, Christophe Lyon wrote:
>
> Both this simple patch or the previous fix all the ICEs I reported, 
thanks.

>
> Of course, the scan-assembler failures remain to be fixed.
>

In the testcase I failed to account for targets that don't support arm
mode or
targets that do not support the ldrd/strd instructions.

This patch accounts for both of these by adding some
dg-require-effective-target lines to the testcase.

This patch also adds a new effective-target procedure to check a target
supports arm ldrd/strd.
This check uses the 'r' constraint to ensure SP is not used so that it 
will

work for thumb mode code generation as well as arm mode.

Tested by running this testcase with cross compilers using 
"-march=armv5t",

"-mcpu=cortex-m3", "-mcpu-arm7tdmi", "-mcpu=cortex-a9 -march=armv5t" for
both
arm-none-eabi and arm-none-linux-gnueabihf.
Also ran this testcase with `make check` natively.

Ok for trunk?


Ok.

Thanks,

Kyrill



gcc/testsuite/ChangeLog:

2019-02-11  Matthew Malcomson 

    * gcc.dg/rtl/arm/ldrd-peepholes.c: Restrict testcase.
    * lib/target-supports.exp: Add procedure to check for ldrd.



diff --git a/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c
b/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c
index
4c3949c0963b8482545df670c31db2d9ec0f26b3..cbb64a770f5d796250601cafe481d7c2ea13f2eb 


100644
--- a/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c
+++ b/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c
@@ -1,4 +1,6 @@
  /* { dg-do compile { target arm*-*-* } } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
  /* { dg-skip-if "Ensure only targetting arm with TARGET_LDRD" { *-*-*
} { "-mthumb" } { "" } } */
  /* { dg-options "-O3 -marm -fdump-rtl-peephole2" } */

diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index
a0b4b99067f9ae225bde3b6bc719e89e1ea8e0e1..16dd018e8020fdf8e104690fed6a4e8919aa4aa1 


100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4918,6 +4918,27 @@ proc check_effective_target_arm_prefer_ldrd_strd
{ } {
  }  "-O2 -mthumb" ]
  }

+# Return true if LDRD/STRD instructions are available on this target.
+proc check_effective_target_arm_ldrd_strd_ok { } {
+    if { ![check_effective_target_arm32] } {
+  return 0;
+    }
+
+    return [check_no_compiler_messages arm_ldrd_strd_ok object {
+  int main(void)
+  {
+    __UINT64_TYPE__ a = 1, b = 10;
+    __UINT64_TYPE__ *c = &b;
+    // `a` will be in a valid register since it's a DImode quantity.
+    asm ("ldrd %0, %1"
+ : "=r" (a)
+ : "m" (c));
+    return a == 10;
+  }
+    }]
+}
+
  # Return 1 if this is a PowerPC target supporting -meabi.

  proc check_effective_target_powerpc_eabi_ok { } {



[PATCH] i386: Check -mmanual-endbr in pass_insert_endbranch::gate

2019-02-14 Thread H.J. Lu
When -mmanual-endbr is used with -fcf-protection, only functions marked
with cf_check attribute should be instrumented with ENDBR.  We should
skip rest_of_insert_endbranch on functions without cf_check attribute.

OK for trunk?

Thanks.

H.J.
---
gcc/

PR target/89353
* config/i386/i386.c (rest_of_insert_endbranch): Move the
-mmanual-endbr and cf_check attribute check to ..
(pass_insert_endbranch::gate): Here.

gcc/testsuite/

PR target/89353
* gcc.target/i386/cf_check-6.c: New test.
---
 gcc/config/i386/i386.c | 10 +-
 gcc/testsuite/gcc.target/i386/cf_check-6.c | 22 ++
 2 files changed, 27 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/cf_check-6.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fd05873ba39..a99ca23fffa 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2640,9 +2640,6 @@ rest_of_insert_endbranch (void)
 
   if (!lookup_attribute ("nocf_check",
 TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl)))
-  && (!flag_manual_endbr
- || lookup_attribute ("cf_check",
-  DECL_ATTRIBUTES (cfun->decl)))
   && !cgraph_node::get (cfun->decl)->only_called_directly_p ())
 {
   /* Queue ENDBR insertion to x86_function_profiler.  */
@@ -2773,9 +2770,12 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *)
+  virtual bool gate (function *fun)
 {
-  return ((flag_cf_protection & CF_BRANCH));
+  return ((flag_cf_protection & CF_BRANCH)
+ && (!flag_manual_endbr
+ || lookup_attribute ("cf_check",
+  DECL_ATTRIBUTES (fun->decl;
 }
 
   virtual unsigned int execute (function *)
diff --git a/gcc/testsuite/gcc.target/i386/cf_check-6.c 
b/gcc/testsuite/gcc.target/i386/cf_check-6.c
new file mode 100644
index 000..292b964238d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/cf_check-6.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcf-protection -mmanual-endbr" } */
+/* { dg-final { scan-assembler-not {\mendbr} } } */
+
+int
+bar (int* val)
+{
+  int status = 99;
+
+  if((val == 0))
+{
+  status = 22;
+  goto end;
+}
+
+  extern int x;
+  *val = x;
+
+  status = 0;
+end:
+  return status;
+}
-- 
2.20.1



[PATCH] DR 2586 fix value category in uses-allocator checks

2019-02-14 Thread Jonathan Wakely

Because uses-allocator construction is invariably done with a const
lvalue the __uses_alloc helper should use a const lvalue for the
is_constructible checks. Otherwise, it can detect that the type can be
constructed from an rvalue, and then an error happens when a const
lvalue is passed to the constructor instead.

Prior to LWG DR 2586 scoped_allocator_adaptor incorrectly used an rvalue
type in the is_constructible check and then used a non-const lvalue for
the actual construction. The other components using uses-allocator
construction (tuple and polymorphic_allocator) have always done so with
a const lvalue allocator, although the use of __use_alloc in our
implementation meant they behaved the same as scoped_allocator_adaptor
and incorrectly used rvalues for the is_constructible checks.

In C++20 the P0591R4 changes mean that all uses-allocator construction
is defined in terms of the new uses_allocator_construction_args
functions, which always use a const lvalue allocator.

The changes in this patch ensure that the __use_alloc helper correctly
matches the requirements in the standard, consistently using a const
lvalue allocator for the is_constructible checks and the actual
constructor arguments.

* doc/xml/manual/intro.xml: Document LWG 2586 status.
* include/bits/uses_allocator.h (__uses_alloc): Use const lvalue
allocator type in is_constructible checks.
* testsuite/20_util/scoped_allocator/69293_neg.cc: Adjust dg-error.
* testsuite/20_util/scoped_allocator/dr2586.cc: New test.
* testsuite/20_util/tuple/cons/allocators.cc: Add test using
problematic type from LWG 2586 discussion.
* testsuite/20_util/uses_allocator/69293_neg.cc: Adjust dg-error.
* testsuite/20_util/uses_allocator/cons_neg.cc: Likewise.

Tested powerpc64le-linux, committed to trunk.


commit 4e3200491f4cde4ae884a28bb11ece782ca17997
Author: Jonathan Wakely 
Date:   Thu Feb 14 11:17:22 2019 +

DR 2586 fix value category in uses-allocator checks

Because uses-allocator construction is invariably done with a const
lvalue the __uses_alloc helper should use a const lvalue for the
is_constructible checks. Otherwise, it can detect that the type can be
constructed from an rvalue, and then an error happens when a const
lvalue is passed to the constructor instead.

Prior to LWG DR 2586 scoped_allocator_adaptor incorrectly used an rvalue
type in the is_constructible check and then used a non-const lvalue for
the actual construction. The other components using uses-allocator
construction (tuple and polymorphic_allocator) have always done so with
a const lvalue allocator, although the use of __use_alloc in our
implementation meant they behaved the same as scoped_allocator_adaptor
and incorrectly used rvalues for the is_constructible checks.

In C++20 the P0591R4 changes mean that all uses-allocator construction
is defined in terms of the new uses_allocator_construction_args
functions, which always use a const lvalue allocator.

The changes in this patch ensure that the __use_alloc helper correctly
matches the requirements in the standard, consistently using a const
lvalue allocator for the is_constructible checks and the actual
constructor arguments.

* doc/xml/manual/intro.xml: Document LWG 2586 status.
* include/bits/uses_allocator.h (__uses_alloc): Use const lvalue
allocator type in is_constructible checks.
* testsuite/20_util/scoped_allocator/69293_neg.cc: Adjust dg-error.
* testsuite/20_util/scoped_allocator/dr2586.cc: New test.
* testsuite/20_util/tuple/cons/allocators.cc: Add test using
problematic type from LWG 2586 discussion.
* testsuite/20_util/uses_allocator/69293_neg.cc: Adjust dg-error.
* testsuite/20_util/uses_allocator/cons_neg.cc: Likewise.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index 656e32b00aa..9761b82fd65 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -1142,6 +1142,14 @@ requirements of the license of GCC.
 Add new constructor.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="&DR;#2586">2586:
+   Wrong value category used in 
scoped_allocator_adaptor::construct()
+   
+
+Change internal helper for uses-allocator construction
+  to always check using const lvalue allocators.
+
+
 http://www.w3.org/1999/xlink"; xlink:href="&DR;#2684">2684:
priority_queue lacking comparator typedef

diff --git a/libstdc++-v3/include/bits/uses_allocator.h 
b/libstdc++-v3/include/bits/uses_allocator.h
index a118f695535..015828bee18 100644
--- a/libstdc++-v3/include/bits/uses_allocator.h
+++ b/libstdc++-v3/include/bits/uses_allocator.h
@@ -87,14 +87,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __uses

[PATCH] Update libstdc++ documentation for implementation status

2019-02-14 Thread Jonathan Wakely

* doc/xml/manual/status_cxx2017.xml: Add P0063R3 to status table.
* doc/html/*: Regenerate.

Committed to trunk.

I've also updated the LibstdcxxTodo wiki page:
https://gcc.gnu.org/wiki/LibstdcxxTodo?action=diff&rev2=107&rev1=100


commit 5f7cebfc9de5a7c07c447d06b610002964065730
Author: Jonathan Wakely 
Date:   Thu Feb 14 15:10:28 2019 +

Update libstdc++ documentation for implementation status

* doc/xml/manual/status_cxx2017.xml: Add P0063R3 to status table.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index c9913a9e3a7..bb82e34bba7 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -90,6 +90,17 @@ Feature-testing recommendations for C++.
   __cpp_lib_uncaught_exceptions >= 201411
 
 
+
+   C++17 should refer to C11 instead of C99 
+  
+   http://www.w3.org/1999/xlink"; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0063r3.html";>
+   P0063R3
+   
+  
+   9.1 
+  
+
+
 
Variant: a type-safe union for C++17 
   


Re: [committed] Fix set_uids_in_ptset (PR middle-end/89303)

2019-02-14 Thread Rainer Orth
Hi Jakub,

> The following testcase is miscompiled on x86_64-linux (-m32 and -m64) at
> -O1, as a pointer has two vars in points-to set, the first one is escaped
> heap var and the second one is escaped non-heap var, and in the end the last
> var that sets vars_contains_escaped won and overwrote
> vars_contains_escaped_heap rather than oring into it.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> preapproved by Richard on IRC, committed to trunk.
> Will test 8.x backport tonight and commit to 8.3 if that succeeds.
>
> 2019-02-13  Jakub Jelinek  
>
>   PR middle-end/89303
>   * tree-ssa-structalias.c (set_uids_in_ptset): Or in vi->is_heap_var
>   into pt->vars_contains_escaped_heap instead of setting
>   pt->vars_contains_escaped_heap to it.
>
> 2019-02-13  Jonathan Wakely  
>   Jakub Jelinek  
>
>   PR middle-end/89303
>   * g++.dg/torture/pr89303.C: New test.

the new testcase FAILs on Solaris:

+FAIL: g++.dg/torture/pr89303.C   -O0  (test for excess errors)
+FAIL: g++.dg/torture/pr89303.C   -O1  (test for excess errors)
+FAIL: g++.dg/torture/pr89303.C   -O2  (test for excess errors)
+FAIL: g++.dg/torture/pr89303.C   -O2 -flto  (test for excess errors)
+FAIL: g++.dg/torture/pr89303.C   -O2 -flto -flto-partition=none  (test for 
excess errors)
+FAIL: g++.dg/torture/pr89303.C   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
+FAIL: g++.dg/torture/pr89303.C   -O3 -g  (test for excess errors)
+FAIL: g++.dg/torture/pr89303.C   -Os  (test for excess errors)

Excess errors:
ld: warning: symbol 'typeinfo for std::bad_weak_ptr' has differing sizes:
(file /var/tmp//ccB1o8Ya.o value=0x8; file 
/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/./libstdc++-v3/src/.libs/libstdc++.so
 value=0xc);
/var/tmp//ccB1o8Ya.o definition taken

I suspect the class can just be renamed in pr89303.C to avoid the
conflict with include/bits/shared_ptr_base.h?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix excess warnings from -Wtype-limits with location wrappers (PR c++/88680)

2019-02-14 Thread Jason Merrill

On 2/6/19 9:23 PM, David Malcolm wrote:

PR c++/88680 reports excess warnings from -Wtype-limits after the C++
FE's use of location wrappers was extended in r267272 for cases such as:

   const unsigned n = 8;
   static_assert (n >= 0 && n % 2 == 0, "");

t.C:3:18: warning: comparison of unsigned expression >= 0 is always true
   [-Wtype-limits]
 3 | static_assert (n >= 0 && n % 2 == 0, "");
   |~~^~~~

The root cause is that the location wrapper around "n" breaks the
suppression of the warning for the "if OP0 is a constant that is >= 0"
case.

This patch fixes it by calling fold_for_warn on OP0, extracting the
constant.


Is there a reason not to do this for OP1 as well?

Jason


Re: PR87689, PowerPC64 ELFv2 function parameter passing violation

2019-02-14 Thread Segher Boessenkool
On Thu, Feb 14, 2019 at 10:32:50AM +0100, Richard Biener wrote:
> On Wed, Feb 13, 2019 at 7:59 AM Alan Modra  wrote:
> >
> > Covers for a generic fortran bug.  The effect is that we'll needlessly
> > waste 64 bytes of stack space on some calls, but I don't see any
> > simple and fully correct patch in generic code.  Bootstrapped and
> > regression tested powerpc64le-linux.  OK mainline and branches?
> 
> This looks very wrong to me ;)  It won't work when compiling with -flto
> for example.

Yeah, that is a show-stopper.

> The frontend needs to be properly fixed.

Sure, but until that happens our target suffers while it seems to work for
everyone else.  This won't be the first or last time a target needs an ugly
workaround, and this one is in target code even ;-)


Segher


  1   2   >