[PATCH] Fortran : ICE in gfc_conv_scalarized_array_ref PR53298

2020-07-20 Thread Mark Eggleston

Please find attached a fix for PR53298.

This appears to be a very simple fix, however, since it involves 
structures I'm unfamiliar with I think it needs checking.


When the gfc_ref structure is created for a (1:) substring it has an 
expression for start and a NULL for the end.  For (:5) the start 
expression is NULL, a new expression structure is created and assigned 
to start.  If the end expression is missing a new expression structure 
is not created, I would've expected an expression  for the missing end 
to be created but don't know how it should be populated.


When the gfc_ref structure is processed in gfc_walk_array_ref two gfc_ss 
structures are created, one for the start expression and one for the end 
expression.  For (1:) the end expression is NULL and will lead to the 
ICE.  I added a check for the end expression being NULL and if so a 
gfc_ss structure is not created for substring end.  I did not expect the 
change to work but it did.


The patch has been tested on x98_64 using make check-fortran.

If OK I can commit to master and backport.

Fortran  : ICE in gfc_conv_scalarized_array_ref PR53298

When an array of characters is an argument to a subroutine and
is accessed using (:)(1:) an ICE occurs.  The upper bound of the
substring does not have an expression and such should not have
a Scalarization State structure added to the Scalarization State
chain.

2020-07-20  Mark Eggleston 

gcc/fortran/

    PR fortran/53298
    * trans-array.c (gfc_walk_array_ref): If ref->ss.end is set
    call gfc_get_scalar_ss.

2020-07-20  Mark Eggleston 

gcc/testsuite/

    PR fortran/53298
    * gfortran.dg/pr53298.f90: New test.

--
https://www.codethink.co.uk/privacy.html

>From aa1537ffa55d2c85c1f61df3301182627ca35ca3 Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Fri, 17 Jul 2020 14:22:48 +0100
Subject: [PATCH] Fortran  : ICE in gfc_conv_scalarized_array_ref PR53298

When an array of characters is an argument to a subroutine and
is accessed using (:)(1:) an ICE occurs.  The upper bound of the
substring does not have an expression and such should not have
a Scalarization State structure added to the Scalarization State
chain.

2020-07-20  Mark Eggleston  

gcc/fortran/

	PR fortran/53298
	* trans-array.c (gfc_walk_array_ref): If ref->ss.end is set
	call gfc_get_scalar_ss.

2020-07-20  Mark Eggleston  

gcc/testsuite/

	PR fortran/53298
	* gfortran.dg/pr53298.f90: New test.
---
 gcc/fortran/trans-array.c |  3 ++-
 gcc/testsuite/gfortran.dg/pr53298.f90 | 14 ++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr53298.f90

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 54e1107c711..8f93b43bafb 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -10800,7 +10800,8 @@ gfc_walk_array_ref (gfc_ss * ss, gfc_expr * expr, gfc_ref * ref)
   if (ref->type == REF_SUBSTRING)
 	{
 	  ss = gfc_get_scalar_ss (ss, ref->u.ss.start);
-	  ss = gfc_get_scalar_ss (ss, ref->u.ss.end);
+	  if (ref->u.ss.end)
+	ss = gfc_get_scalar_ss (ss, ref->u.ss.end);
 	}
 
   /* We're only interested in array sections from now on.  */
diff --git a/gcc/testsuite/gfortran.dg/pr53298.f90 b/gcc/testsuite/gfortran.dg/pr53298.f90
new file mode 100644
index 000..998f88df926
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr53298.f90
@@ -0,0 +1,14 @@
+! { dg-do run }
+
+program test
+  character(len=5) :: str(3)
+  str = ["abcde", "12345", "ABCDE" ]
+  call f(str(:))
+contains
+  subroutine f(x)
+character(len=*) :: x(:)
+write(*,*) x(:)(1:) 
+  end subroutine f
+end program test
+
+! { dg-output "abcde12345ABCDE" }
-- 
2.11.0



Re: [committed] amdgcn: Handle early debug info in mkoffload

2020-07-20 Thread Richard Biener via Gcc-patches
On Fri, Jul 17, 2020 at 1:30 PM Andrew Stubbs  wrote:
>
> On 17/07/2020 07:20, Thomas Schwinge wrote:
> >> --- a/gcc/config/gcn/mkoffload.c
> >> +++ b/gcc/config/gcn/mkoffload.c
> >> @@ -33,31 +33,53 @@
> >>   #include 
> >>   #include "collect-utils.h"
> >>   #include "gomp-constants.h"
> >> +#include "simple-object.h"
> >> +#include "elf.h"
> >> +
> >> +/* These probably won't be in elf.h for a while.  */
> >> +#ifndef EM_AMDGPU
> >
> > Nope, it already is.  ;-P
> [...]
> > I've got the canary 'EM_AMDGPU', but not the other '#define's.
>
> Oops, I've committed this patch to fix it.
>
> The code now assumes only that the relocations are defined as a block.
> The rest are undefined and redefined individually.
>
> This matches the usage we've previously had in gcn-run.c and
> plugin-gcn.c (actually, those can be cleaned up now, but that's another
> patch).
>
> >> -/* Files to unlink.  */
> >> -static const char *gcn_s1_name;
> >> -static const char *gcn_s2_name;
> >> -static const char *gcn_o_name;
> >> -static const char *gcn_cfile_name;
> >>   static const char *gcn_dumpbase;
> >> +static struct obstack files_to_cleanup;
> >
> > (Good idea; should do similar in the other 'mkoffload's.)
>
> Actually, I think the original code was more readable, but now we have a
> (potentially) variable number of files to clean up I needed something
> more general.
>
> >> +uint32_t elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX803;  // Default GPU 
> >> architecture.
> >
> > For easier later maintenance, shouldn't this be a '#define' (or similar)
> > done next to where the GCC back end defines its default?
>
> I thought of this, but I don't think there's actually a problem. The
> default is defined via OPTION_DEFAULT_SPECS, so there ought to be an
> explicit option passed to mkoffload at all times. If that's not the case
> I think the problem lies elsewhere.
>
> In practice, a mismatch here will not fail silently, so we'll know to
> fix it.
>
> The way simple_object is supposed to work is to clone (or merge) the ELF
> headers from an existing binary. Unfortunately, the way mkoffload is
> currently coded we don't have any to clone from until too late. We could
> separate the assemble and link steps, but I chose not to go that way at
> this time.

You could defer the debug copying to after mkoffload assembled the
offloaded code and use those objects as template.  Maybe that's
what you refer to with separating assemble and link steps.

Richard.

> Andrew


Re: [PATCH] fold-const: Handle bitfields in native_encode_initializer [PR93121]

2020-07-20 Thread Richard Biener
On Sat, 18 Jul 2020, Jakub Jelinek wrote:

> Hi!
> 
> When working on __builtin_bit_cast that needs to handle bitfields too,
> I've made the following change to handle at least some bitfields in
> native_encode_initializer (those that have integral representative).
> 
> Bootstrapped/regtested on {x86_64,i686,powerpc64{,le}}-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-07-18  Jakub Jelinek  
> 
>   PR libstdc++/93121
>   * fold-const.c (native_encode_initializer): Handle bit-fields.
> 
>   * gcc.dg/tree-ssa/pr93121-1.c: New test.
> 
> --- gcc/fold-const.c.jj   2020-06-24 10:39:48.995469213 +0200
> +++ gcc/fold-const.c  2020-07-17 13:51:53.181392890 +0200
> @@ -8047,6 +8047,7 @@ native_encode_initializer (tree init, un
> tree field = ce->index;
> tree val = ce->value;
> HOST_WIDE_INT pos, fieldsize;
> +   unsigned HOST_WIDE_INT bpos = 0, epos = 0;
>  
> if (field == NULL_TREE)
>   return 0;
> @@ -8066,15 +8067,122 @@ native_encode_initializer (tree init, un
> if (fieldsize == 0)
>   continue;
>  
> +   if (DECL_BIT_FIELD (field))
> + {
> +   if (!tree_fits_uhwi_p (DECL_FIELD_BIT_OFFSET (field)))
> + return 0;
> +   fieldsize = TYPE_PRECISION (TREE_TYPE (field));
> +   bpos = tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field));
> +   if (bpos % BITS_PER_UNIT)
> + bpos %= BITS_PER_UNIT;
> +   else
> + bpos = 0;
> +   fieldsize += bpos;
> +   epos = fieldsize % BITS_PER_UNIT;
> +   fieldsize += BITS_PER_UNIT - 1;
> +   fieldsize /= BITS_PER_UNIT;
> + }
> +
> if (off != -1 && pos + fieldsize <= off)
>   continue;
>  
> -   if (DECL_BIT_FIELD (field))
> - return 0;
> -
> if (val == NULL_TREE)
>   continue;
>  
> +   if (DECL_BIT_FIELD (field))
> + {
> +   /* FIXME: Handle PDP endian.  */
> +   if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
> + return 0;
> +
> +   tree repr = DECL_BIT_FIELD_REPRESENTATIVE (field);
> +   if (repr == NULL_TREE
> +   || TREE_CODE (val) != INTEGER_CST
> +   || !INTEGRAL_TYPE_P (TREE_TYPE (repr)))
> + return 0;
> +
> +   HOST_WIDE_INT rpos = int_byte_position (repr);
> +   if (rpos > pos)
> + return 0;
> +   wide_int w = wi::to_wide (val,
> + TYPE_PRECISION (TREE_TYPE (repr)));
> +   int diff = (TYPE_PRECISION (TREE_TYPE (repr))
> +   - TYPE_PRECISION (TREE_TYPE (field)));
> +   HOST_WIDE_INT bitoff = (pos - rpos) * BITS_PER_UNIT + bpos;
> +   if (!BYTES_BIG_ENDIAN)
> + w = wi::lshift (w, bitoff);
> +   else
> + w = wi::lshift (w, diff - bitoff);
> +   val = wide_int_to_tree (TREE_TYPE (repr), w);
> +
> +   unsigned char buf[MAX_BITSIZE_MODE_ANY_INT
> + / BITS_PER_UNIT + 1];
> +   int l = native_encode_int (val, buf, sizeof buf, 0);
> +   if (l * BITS_PER_UNIT != TYPE_PRECISION (TREE_TYPE (repr)))
> + return 0;
> +
> +   if (ptr == NULL)
> + continue;
> +
> +   /* If the bitfield does not start at byte boundary, handle
> +  the partial byte at the start.  */
> +   if (bpos
> +   && (off == -1 || (pos >= off && len >= 1)))
> + {
> +   if (!BYTES_BIG_ENDIAN)
> + {
> +   int mask = (1 << bpos) - 1;
> +   buf[pos - rpos] &= ~mask;
> +   buf[pos - rpos] |= ptr[pos - o] & mask;
> + }
> +   else
> + {
> +   int mask = (1 << (BITS_PER_UNIT - bpos)) - 1;
> +   buf[pos - rpos] &= mask;
> +   buf[pos - rpos] |= ptr[pos - o] & ~mask;
> + }
> + }
> +   /* If the bitfield does not end at byte boundary, handle
> +  the partial byte at the end.  */
> +   if (epos
> +   && (off == -1
> +   || pos + fieldsize <= (HOST_WIDE_INT) off + len))
> + {
> +   if (!BYTES_BIG_ENDIAN)
> + {
> +   int mask = (1 << epos) - 1;
> +   buf[pos - rpos + fieldsize - 1] &= mask;
> +   buf[pos - rpos + fieldsize - 1]
> + |= ptr[pos + fieldsize - 1 - o] & ~mask;
> + }
> + 

Re: [patch] gcc/testsuite: Scale down long-running tree-prof.exp tests on slow targets

2020-07-20 Thread Richard Biener via Gcc-patches
On Mon, Jul 20, 2020 at 6:48 AM Sandra Loosemore
 wrote:
>
> I was looking at some timeout failures in nios2-linux-gnu test results
> and found several tree-prof.exp tests were doing what appears to be an
> excessive number of iterations (350 million?).  Even though this is
> hardware and not a simulator, I thought it would be reasonable to tell
> the test harness to treat it like a simulator, and make these tests pay
> attention to the is_simulator flag to run fewer iterations, as a number
> of other test cases do to scale down long-running tests.
>
> I somewhat randomly chose to reduce the counts by a factor of 100 so the
> longest-running one takes just over 10 seconds on this target.  But, the
> original numbers seem pretty random to me as well.  Is there actually
> any benefit to running more iterations even on a fast target?

I think at least parts of tree-prof.exp exercises sample-based profiling
which might require more iterations.  For example cold_partition_label.c
was changed by

commit f63ba78ce6d50bf627dd18018179eb03bf89716f
Author: Andi Kleen 
Date:   Thu Jul 14 02:14:56 2016 +

Some fixes for profile test cases for autofdo

This fixes some basic issues with the profile test cases with autofdo.

- Disable checking for value transformations that autofdo does not
  support.
- Disable checking for fixed hit counts which autofdo does not support
- Enable dumping of afdo log file and check right log file.
- Increase run time of test cases to 1M iterations because autofdo needs
  a few samples to make sense of a program. The test case don't run
  noticeable slower with that.

There are still failures unfortunately, especially the indirect call
transformations do not trigger because autofdo thinks they are not hot.
This can be addressed later.

so the change to a larger number of iterations was intended.  Maybe
we can arrange to pass -DFOR_AUTOFDO_TESTING for the
autofdo compiles and gate the larger number of iterations on that
(most targets do not support autofdo and to not run that mode)?

Richard.

> -Sandra


Re: [PATCH] gimple-fold: Handle bitfields in fold_const_aggregate_ref_1 [PR93121]

2020-07-20 Thread Richard Biener
On Sat, 18 Jul 2020, Jakub Jelinek wrote:

> Hi!
> 
> When working on __builtin_bit_cast that needs to handle bitfields too,
> I've made the following change to handle at least some bitfields in
> fold_const_aggregate_ref_1 (those that have integral representative).
> It already handles some, but only those that start and end at byte
> boundaries.
> 
> Bootstrapped/regtested on {x86_64,i686,powerpc64{,le}}-linux, ok for trunk?

OK.

Richard.

> 2020-07-18  Jakub Jelinek  
> 
>   PR libstdc++/93121
>   * gimple-fold.c (fold_const_aggregate_ref_1): For COMPONENT_REF
>   of a bitfield not aligned on byte boundaries try to
>   fold_ctor_reference DECL_BIT_FIELD_REPRESENTATIVE if any and
>   adjust it depending on endianity.
> 
>   * gcc.dg/tree-ssa/pr93121-2.c: New test.
> 
> --- gcc/gimple-fold.c.jj  2020-07-13 19:09:33.218871556 +0200
> +++ gcc/gimple-fold.c 2020-07-17 19:17:59.694537680 +0200
> @@ -7189,8 +7189,64 @@ fold_const_aggregate_ref_1 (tree t, tree
>if (maybe_lt (offset, 0))
>   return NULL_TREE;
>  
> -  return fold_ctor_reference (TREE_TYPE (t), ctor, offset, size,
> -   base);
> +  tem = fold_ctor_reference (TREE_TYPE (t), ctor, offset, size, base);
> +  if (tem)
> + return tem;
> +
> +  /* For bit field reads try to read the representative and
> +  adjust.  */
> +  if (TREE_CODE (t) == COMPONENT_REF
> +   && DECL_BIT_FIELD (TREE_OPERAND (t, 1))
> +   && DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (t, 1)))
> + {
> +   HOST_WIDE_INT csize, coffset;
> +   tree field = TREE_OPERAND (t, 1);
> +   tree repr = DECL_BIT_FIELD_REPRESENTATIVE (field);
> +   if (INTEGRAL_TYPE_P (TREE_TYPE (repr))
> +   && size.is_constant (&csize)
> +   && offset.is_constant (&coffset)
> +   && (coffset % BITS_PER_UNIT != 0
> +   || csize % BITS_PER_UNIT != 0)
> +   && !reverse
> +   && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN)
> + {
> +   poly_int64 bitoffset;
> +   poly_uint64 field_offset, repr_offset;
> +   if (poly_int_tree_p (DECL_FIELD_OFFSET (field), &field_offset)
> +   && poly_int_tree_p (DECL_FIELD_OFFSET (repr), &repr_offset))
> + bitoffset = (field_offset - repr_offset) * BITS_PER_UNIT;
> +   else
> + bitoffset = 0;
> +   bitoffset += (tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field))
> + - tree_to_uhwi (DECL_FIELD_BIT_OFFSET (repr)));
> +   HOST_WIDE_INT bitoff;
> +   int diff = (TYPE_PRECISION (TREE_TYPE (repr))
> +   - TYPE_PRECISION (TREE_TYPE (field)));
> +   if (bitoffset.is_constant (&bitoff)
> +   && bitoff >= 0
> +   && bitoff <= diff)
> + {
> +   offset -= bitoff;
> +   size = tree_to_uhwi (DECL_SIZE (repr));
> +
> +   tem = fold_ctor_reference (TREE_TYPE (repr), ctor, offset,
> +  size, base);
> +   if (tem && TREE_CODE (tem) == INTEGER_CST)
> + {
> +   if (!BYTES_BIG_ENDIAN)
> + tem = wide_int_to_tree (TREE_TYPE (field),
> + wi::lrshift (wi::to_wide (tem),
> +  bitoff));
> +   else
> + tem = wide_int_to_tree (TREE_TYPE (field),
> + wi::lrshift (wi::to_wide (tem),
> +  diff - bitoff));
> +   return tem;
> + }
> + }
> + }
> + }
> +  break;
>  
>  case REALPART_EXPR:
>  case IMAGPART_EXPR:
> --- gcc/testsuite/gcc.dg/tree-ssa/pr93121-2.c.jj  2020-07-17 
> 19:47:31.842426096 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr93121-2.c 2020-07-17 19:48:24.551649910 
> +0200
> @@ -0,0 +1,22 @@
> +/* PR libstdc++/93121 */
> +/* { dg-do compile { target { ilp32 || lp64 } } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +union U { int a[3]; struct S { int d; int a : 3; int b : 24; int c : 5; int 
> e; } b; };
> +const union U u = { .a = { 0x7efa3412, 0x5a876543, 0x1eeffeed } };
> +int a, b, c;
> +
> +void
> +foo ()
> +{
> +  a = u.b.a;
> +  b = u.b.b;
> +  c = u.b.c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "a = 3;" 1 "optimized" { target le } } 
> } */
> +/* { dg-final { scan-tree-dump-times "b = 5303464;" 1 "optimized" { target 
> le } } } */
> +/* { dg-final { scan-tree-dump-times "c = 11;" 1 "optimized" { target le } } 
> } */
> +/* { dg-final { scan-tree-dump-times "a = 2;" 1 "optimized" { target be } } 
> } */
> +/* { dg-final { scan-tree-dump-times "b = -2868438;" 1 "optimized" { target 
> be } } } */
> +/* { dg-final { scan-tree-dump-times "c = 3;" 1 "optimized" { target be } } 
> } */
> 
> 
>  

Re: [wwwdocs PATCH v2] projects/tree-ssa: add a big banner to tree-browser.html

2020-07-20 Thread Richard Sandiford
Hu Jiangping  writes:
> Hi,
>
> Different from Patch v1 which removed the page, I add a big banner to
> say that the page is no longer up-to-date, just as Richard and Gerald
> recommended. OK for push?

Thanks, pushed to wwwdocs.  Sorry for the slow response.

Richard


Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]

2020-07-20 Thread Hongtao Liu via Gcc-patches
Correct PR number in ChangeLog
it's pr96243.

On Mon, Jul 20, 2020 at 1:46 PM Hongtao Liu  wrote:
>
> Hi:
>   For rtx like (eq:HI (V8SI 90) (V8SI 91)), cse will take it as a
> boolean value and try to do some optimization. But it is not true for
> vector compare, also other places in rtl passes hold the same
> assumption.
>
> Bootstrap is ok, regression test is ok for i386 backend.
>
> 2020-07-20  Hongtao Liu  
>
> gcc/
> PR target/96243
> * config/i386/i386-expand.c (ix86_expand_sse_cmp): Refine for
> maskcmp.
> (ix86_expand_mask_vec_cmp): Change prototype.
> * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): Change
> prototype.
> * config/i386/i386.c (ix86_print_operand): Remove operand
> modifier 'I'.
> * config/i386/sse.md
> (*_cmp3,
> *_cmp3,
> *_ucmp3,
> *_ucmp3,
> avx512f_maskcmp3): Deleted.
>
> gcc/testsuite
> * gcc.target/i386/pr92865-1.c: Adjust testcase.
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [committed] amdgcn: Handle early debug info in mkoffload

2020-07-20 Thread Andrew Stubbs

On 20/07/2020 08:35, Richard Biener wrote:

The way simple_object is supposed to work is to clone (or merge) the ELF
headers from an existing binary. Unfortunately, the way mkoffload is
currently coded we don't have any to clone from until too late. We could
separate the assemble and link steps, but I chose not to go that way at
this time.


You could defer the debug copying to after mkoffload assembled the
offloaded code and use those objects as template.  Maybe that's
what you refer to with separating assemble and link steps.


That's exactly it.

We'd still have to solve the relocation and symbol issues though, so for 
now we may as well solve both problems in one place.


Does it even make sense to add support for those steps to simple_object?

The relocation translation is highly target-specific, of course. The 
symbol weakening is necessary because the early debug info contains 
references to symbols that will not exist in the offloaded code (leading 
to either link failure or bloat).


Andrew


Re: [committed] amdgcn: Handle early debug info in mkoffload

2020-07-20 Thread Richard Biener via Gcc-patches
On Mon, Jul 20, 2020 at 10:40 AM Andrew Stubbs  wrote:
>
> On 20/07/2020 08:35, Richard Biener wrote:
> >> The way simple_object is supposed to work is to clone (or merge) the ELF
> >> headers from an existing binary. Unfortunately, the way mkoffload is
> >> currently coded we don't have any to clone from until too late. We could
> >> separate the assemble and link steps, but I chose not to go that way at
> >> this time.
> >
> > You could defer the debug copying to after mkoffload assembled the
> > offloaded code and use those objects as template.  Maybe that's
> > what you refer to with separating assemble and link steps.
>
> That's exactly it.
>
> We'd still have to solve the relocation and symbol issues though, so for
> now we may as well solve both problems in one place.
>
> Does it even make sense to add support for those steps to simple_object?
>
> The relocation translation is highly target-specific, of course. The
> symbol weakening is necessary because the early debug info contains
> references to symbols that will not exist in the offloaded code (leading
> to either link failure or bloat).

The odd thing is that the early debug should _not_ contain any relocations
to data nor extra symbols.  Only relocations within the debug info
should be there.

Richard.

>
> Andrew


Re: [PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message

2020-07-20 Thread Frederik Harwath
Thomas Schwinge  writes:

Hi Thomas,

>> Can I include the patch in OG10?
>
> Unless Julian/Kwok speak up soon: OK, thanks.

This has been delayed a bit by my vacation, but I have now committed
the patch.

> May want to remove "libgomp" from the first line of the commit log --
> this commit doesn't relate to libgomp specifically.
>
> (Ideally, we'd also test 'serial' construct in addition to 'kernels',
> 'parallel', but we can add that later.  I anyway have a WIP patch
> waiting, adding more 'serial' construct testing, for a different reason,
> so I'll include it there.)

I forgot to remove "libgomp" from the commit message, sorry, but
I have included the test cases for the "serial construct".

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 7c10ae450b95495dda362cb66770bb78b546592e Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jul 2020 11:24:21 +0200
Subject: [PATCH] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan
 loop" error message

According to the OpenACC standard version 2.5 and later, reductions on
orphaned gang loops are explicitly disallowed (cf.  section "Changes
from Version 2.0 to 2.5").  A loop is "orphaned" if it is not
lexically contained in a compute construct (cf. section "Loop
construct" of the OpenACC standard), i.e. in either a "parallel", a
"serial", or a "kernels" construct.

This commit fixes the check for reductions on orphaned gang loops in
the Fortran frontend which (in contrast to the C, C++ frontends)
erroneously rejects reductions on gang loops that are contained in
"kernels" constructs.

2020-07-20  Frederik Harwath  

gcc/fortran/

	* openmp.c (oacc_is_parallel_or_serial): Removed function.
	(oacc_is_kernels): New function.
	(oacc_is_compute_construct): New function.
	(resolve_oacc_loop_blocks): Use "oacc_is_compute_construct"
	instead of "oacc_is_parallel_or_serial" for checking that a
	loop is not orphaned.

gcc/testsuite/

	* gfortran.dg/goacc/orphan-reductions-2.f90: New test
	verifying that the "gang reduction on an orphan loop" error message
	is not emitted for non-orphaned loops.

	* c-c++-common/goacc/orphan-reductions-2.c: Likewise for C and C++.
---
 gcc/fortran/ChangeLog |   9 ++
 gcc/fortran/openmp.c  |  13 ++-
 gcc/testsuite/ChangeLog   |   7 ++
 .../c-c++-common/goacc/orphan-reductions-2.c  | 103 ++
 .../gfortran.dg/goacc/orphan-reductions-2.f90 |  87 +++
 5 files changed, 216 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90

diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index e86279cb647..5a1f81c286e 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,12 @@
+2020-07-20  Frederik Harwath  
+
+	* openmp.c (oacc_is_parallel_or_serial): Removed function.
+	(oacc_is_kernels): New function.
+	(oacc_is_compute_construct): New function.
+	(resolve_oacc_loop_blocks): Use "oacc_is_compute_construct"
+	instead of "oacc_is_parallel_or_serial" for checking that a
+	loop is not orphaned.
+
 2020-07-08  Harald Anlauf  
 
 	Backported from master:
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index ab68e9f2173..706933c869a 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5927,9 +5927,16 @@ oacc_is_serial (gfc_code *code)
 }
 
 static bool
-oacc_is_parallel_or_serial (gfc_code *code)
+oacc_is_kernels (gfc_code *code)
 {
-  return oacc_is_parallel (code) || oacc_is_serial (code);
+  return code->op == EXEC_OACC_KERNELS || code->op == EXEC_OACC_KERNELS_LOOP;
+}
+
+static bool
+oacc_is_compute_construct (gfc_code *code)
+{
+  return oacc_is_parallel (code) || oacc_is_serial (code)
+|| oacc_is_kernels (code);
 }
 
 static gfc_statement
@@ -6223,7 +6230,7 @@ resolve_oacc_loop_blocks (gfc_code *code)
   for (c = omp_current_ctx; c; c = c->previous)
 	if (!oacc_is_loop (c->code))
 	  break;
-  if (c == NULL || !oacc_is_parallel_or_serial (c->code))
+  if (c == NULL || !oacc_is_compute_construct (c->code))
 	gfc_error ("gang reduction on an orphan loop at %L", &code->loc);
 }
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 59e6c93b07a..fa1937a4ea2 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2020-07-20  Frederik Harwath  
+
+	* gfortran.dg/goacc/orphan-reductions-2.f90: New test
+	verifying that the "gang reduction on an orphan loop" error message
+	is not emitted for non-orphaned loops.
+	* c-c++-common/goacc/orphan-reductions-2.c: Likewise for C and C++.
+
 2020-07-12  Jakub Jelinek  
 
 	Backported from master:
diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c b/gcc/testsuite/c-c++-common/goacc/orphan-reductio

Re: [committed] amdgcn: Handle early debug info in mkoffload

2020-07-20 Thread Andrew Stubbs

On 20/07/2020 11:01, Richard Biener wrote:

On Mon, Jul 20, 2020 at 10:40 AM Andrew Stubbs  wrote:


On 20/07/2020 08:35, Richard Biener wrote:

The way simple_object is supposed to work is to clone (or merge) the ELF
headers from an existing binary. Unfortunately, the way mkoffload is
currently coded we don't have any to clone from until too late. We could
separate the assemble and link steps, but I chose not to go that way at
this time.


You could defer the debug copying to after mkoffload assembled the
offloaded code and use those objects as template.  Maybe that's
what you refer to with separating assemble and link steps.


That's exactly it.

We'd still have to solve the relocation and symbol issues though, so for
now we may as well solve both problems in one place.

Does it even make sense to add support for those steps to simple_object?

The relocation translation is highly target-specific, of course. The
symbol weakening is necessary because the early debug info contains
references to symbols that will not exist in the offloaded code (leading
to either link failure or bloat).


The odd thing is that the early debug should _not_ contain any relocations
to data nor extra symbols.  Only relocations within the debug info
should be there.


The problem with the relocations themselves is only that the x86_64 
relocation codes don't match the amdgcn codes, making them gibberish to 
the linker.


As for the symbol references, I observed it trying to link in symbols 
that only exist in the host libgomp. My assumption is that the generated 
early debug was intended for the LTO use-case, in which the output 
program will contain everything, and that this is merely the first 
offload target to support debug in this way. If it's not supposed to 
have those references at all then I'm out of my depth.


I can't say for sure whether the debug info generated with my patch is 
broken or misleading (because I'm clueless), but weakening the symbols 
works for the test cases and debug sessions that I tried.


Andrew


[PATCH] jit: Fix random truncation of testsuite output

2020-07-20 Thread Alex Coplan
Hello,

This patch fixes a bug in jit.exp which causes the DejaGnu output of the
libgccjit testsuite to be nondeterministically truncated. This bug was
copied from DejaGnu's own implementation of the host_execute function.
See the upstream bug report [0] where the maintainers point out that the
regex patterns in host_execute should (but don't currently) explicitly
match newlines to avoid relying on DejaGnu not reading more than one
line of the output (which is not guaranteed).

To reproduce the bug, run:

$ make check-jit RUNTESTFLAGS="jit.exp=test-arith-overflow.c"
$ grep -v iteration testsuite/jit/jit.sum

and you should see some lines that have been truncated (I see the word
iteration partially or fully truncated). Alternatively, simply run the
testsuite twice (saving a copy of testsuite/jit/jit.sum from the first
run) and diff the two jit.sum files to observe the random truncations to
the output.

This patch should make it easier to test jit patches in the future,
since it makes it possible to reliably compare the output of two jit.sum
files (as with the other tests in GCC).

Testing:
 * Ran the testsuite before and after the patch, observing that the only
   differences in jit.sum were in test-threads.c (nondeterministic test)
   and where the truncated output from the first run was no longer
   truncated.
 * Ran the testsuite twice after the patch, observing that the only
   differences in jit.sum between the two runs were in test-threads.c.

OK for master?

Thanks,
Alex

---

2020-07-20  Alex Coplan  

gcc/testsuite/ChangeLog:

* jit.dg/jit.exp (fixed_host_execute): Fix regex patterns to
always explicitly match newlines.


[0] : https://debbugs.gnu.org/cgi/bugreport.cgi?bug=42399
diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp
index 2f54681713b..2d8c884b6b8 100644
--- a/gcc/testsuite/jit.dg/jit.exp
+++ b/gcc/testsuite/jit.dg/jit.exp
@@ -202,37 +202,37 @@ proc fixed_host_execute {args} {
set timetol 0
exp_continue
}
-   -re "^$prefix\tNOTE:${text}*" {
+   -re "^$prefix\tNOTE:\[^\r\n\]+\r\n" {
regsub "\[\n\r\t\]*NOTE: $text\r\n" $expect_out(0,string) "" output
-   set output [string range $output 6 end]
+   set output [string range $output 6 end-2]
verbose "$output" 2
set timetol 0
exp_continue
}
-   -re "^$prefix\tPASSED:${text}*" {
+   -re "^$prefix\tPASSED:\[^\r\n\]+\r\n" {
regsub "\[\n\r\t\]*PASSED: $text\r\n" $expect_out(0,string) "" 
output
-   set output [string range $output 8 end]
+   set output [string range $output 8 end-2]
pass "$output"
set timetol 0
exp_continue
}
-   -re "^$prefix\tFAILED:${text}*" {
+   -re "^$prefix\tFAILED:\[^\r\n\]+\r\n" {
regsub "\[\n\r\t\]*FAILED: $text\r\n" $expect_out(0,string) "" 
output
-   set output [string range $output 8 end]
+   set output [string range $output 8 end-2]
fail "$output"
set timetol 0
exp_continue
}
-   -re "^$prefix\tUNTESTED:${text}*" {
+   -re "^$prefix\tUNTESTED:\[^\r\n\]+\r\n" {
regsub "\[\n\r\t\]*TESTED: $text\r\n" $expect_out(0,string) "" 
output
-   set output [string range $output 8 end]
+   set output [string range $output 8 end-2]
untested "$output"
set timetol 0
exp_continue
}
-   -re "^$prefix\tUNRESOLVED:${text}*" {
+   -re "^$prefix\tUNRESOLVED:\[^\r\n\]+\r\n" {
regsub "\[\n\r\t\]*UNRESOLVED: $text\r\n" $expect_out(0,string) "" 
output
-   set output [string range $output 8 end]
+   set output [string range $output 8 end-2]
unresolved "$output"
set timetol 0
exp_continue


Re: [PATCH] Add TARGET_LOWER_LOCAL_DECL_ALIGNMENT [PR95237]

2020-07-20 Thread Richard Biener via Gcc-patches
On Sat, Jul 18, 2020 at 7:57 AM Sunil Pandey  wrote:
>
> On Fri, Jul 17, 2020 at 1:22 AM Richard Biener
>  wrote:
> >
> > On Fri, Jul 17, 2020 at 7:15 AM Sunil Pandey  wrote:
> > >
> > > Any comment on revised patch? At least,  in finish_decl, decl global 
> > > attributes are populated.
> >
> > +static void
> > +ix86_lower_local_decl_alignment (tree decl)
> > +{
> > +  unsigned new_align = LOCAL_DECL_ALIGNMENT (decl);
> >
> > please use the macro-expanded call here since we want to amend
> > ix86_local_alignment to _not_ return a lower alignment when
> > called as LOCAL_DECL_ALIGNMENT (by adding a new parameter
> > to ix86_local_alignment).  Can you also amend the patch in this
> > way?
> >
> > +  if (new_align < DECL_ALIGN (decl))
> > +SET_DECL_ALIGN (decl, new_align);
> >
> > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > index 81bd2ee94f0..1ae99e30ed1 100644
> > --- a/gcc/c/c-decl.c
> > +++ b/gcc/c/c-decl.c
> > @@ -5601,6 +5601,8 @@ finish_decl (tree decl, location_t init_loc, tree 
> > init,
> >  }
> >
> >invoke_plugin_callbacks (PLUGIN_FINISH_DECL, decl);
> > +  /* Lower local decl alignment.  */
> > +  lower_decl_alignment (decl);
> >  }
> >
> > should come before plugin hook invocation, likewise for the cp_finish_decl 
> > case.
> >
> > +/* Lower DECL alignment.  */
> > +
> > +void
> > +lower_decl_alignment (tree decl)
> > +{
> > +  if (VAR_P (decl)
> > +  && !is_global_var (decl)
> > +  && !DECL_HARD_REGISTER (decl))
> > +targetm.lower_local_decl_alignment (decl);
> > +}
> >
> > please avoid this function, it's name sounds too generic and it's not worth
> > adding a public API for two calls.
> >
> > Alltogether this should avoid the x86 issue leaving left-overs (your 
> > identified
> > inliner case) as missed optimization [for the linux kernel which appearantly
> > decided that -mpreferred-stack-boundary=2 is a good ABI to use].
> >
> > Richard.
> >
> >
> Revised patch attached.

@@ -16776,7 +16783,7 @@ ix86_data_alignment (tree type, unsigned int
align, bool opt)

 unsigned int
 ix86_local_alignment (tree exp, machine_mode mode,
- unsigned int align)
+ unsigned int align, bool setalign)
 {
   tree type, decl;

@@ -16801,6 +16808,10 @@ ix86_local_alignment (tree exp, machine_mode mode,
   && (!decl || !DECL_USER_ALIGN (decl)))
 align = 32;

+  /* Lower decl alignment.  */
+  if (setalign && align < DECL_ALIGN (decl))
+SET_DECL_ALIGN (decl, align);
+
   /* If TYPE is NULL, we are allocating a stack slot for caller-save
  register in MODE.  We will return the largest alignment of XF
  and DF.  */

sorry for not being clear - the parameter should indicate whether an
alignment lower
than natural alignment is OK to return thus sth like

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 31757b044c8..19703cbceb9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -16641,7 +16641,7 @@ ix86_data_alignment (tree type, unsigned int
align, bool opt)

 unsigned int
 ix86_local_alignment (tree exp, machine_mode mode,
- unsigned int align)
+ unsigned int align, bool may_lower)
 {
   tree type, decl;

@@ -16658,7 +16658,8 @@ ix86_local_alignment (tree exp, machine_mode mode,

   /* Don't do dynamic stack realignment for long long objects with
  -mpreferred-stack-boundary=2.  */
-  if (!TARGET_64BIT
+  if (may_lower
+  && !TARGET_64BIT
   && align == 64
   && ix86_preferred_stack_boundary < 64
   && (mode == DImode || (type && TYPE_MODE (type) == DImode))

I also believe that spill_slot_alignment () should be able to get the
lower alignment
for long long but not get_stack_local_alignment () (both use
STACK_SLOT_ALIGNMENT).
Some uses of STACK_SLOT_ALIGNMENT also look fishy with respect to mem attributes
and alignment.

Otherwise the patch looks reasonable to salvage a misguided optimization for
a non-standard ABI.  If it is sufficient to make the people using that ABI happy
is of course another question.  I'd rather see them stop using it ...

That said, I'm hesitant to be the only one OKing this ugliness but I'd
immediately
OK a patch removing the questionable hunk from ix86_local_alignment ;)

Jakub, Jeff - any opinion?

Richard.

> > > On Tue, Jul 14, 2020 at 8:37 AM Sunil Pandey  wrote:
> > >>
> > >> On Sat, Jul 4, 2020 at 9:11 AM Richard Biener
> > >>  wrote:
> > >> >
> > >> > On July 3, 2020 11:16:46 PM GMT+02:00, Jason Merrill 
> > >> >  wrote:
> > >> > >On 6/29/20 5:00 AM, Richard Biener wrote:
> > >> > >> On Fri, Jun 26, 2020 at 10:11 PM H.J. Lu  
> > >> > >> wrote:
> > >> > >>>
> > >> > >>> On Thu, Jun 25, 2020 at 1:10 AM Richard Biener
> > >> > >>>  wrote:
> > >> > 
> > >> >  On Thu, Jun 25, 2020 at 2:53 AM Sunil Pandey 
> > >> > >wrote:
> > >> > >
> > >> > > On Wed, Jun 24, 2020 at 12:30 AM Richard Biener
> > >> > >  wrote:
> > >> > >>
> > >> > >> On Tue, Jun 23, 2020 at 5:31 PM Sunil K Pandey via G

Re: [committed] amdgcn: Handle early debug info in mkoffload

2020-07-20 Thread Richard Biener via Gcc-patches
On Mon, Jul 20, 2020 at 1:04 PM Andrew Stubbs  wrote:
>
> On 20/07/2020 11:01, Richard Biener wrote:
> > On Mon, Jul 20, 2020 at 10:40 AM Andrew Stubbs  
> > wrote:
> >>
> >> On 20/07/2020 08:35, Richard Biener wrote:
>  The way simple_object is supposed to work is to clone (or merge) the ELF
>  headers from an existing binary. Unfortunately, the way mkoffload is
>  currently coded we don't have any to clone from until too late. We could
>  separate the assemble and link steps, but I chose not to go that way at
>  this time.
> >>>
> >>> You could defer the debug copying to after mkoffload assembled the
> >>> offloaded code and use those objects as template.  Maybe that's
> >>> what you refer to with separating assemble and link steps.
> >>
> >> That's exactly it.
> >>
> >> We'd still have to solve the relocation and symbol issues though, so for
> >> now we may as well solve both problems in one place.
> >>
> >> Does it even make sense to add support for those steps to simple_object?
> >>
> >> The relocation translation is highly target-specific, of course. The
> >> symbol weakening is necessary because the early debug info contains
> >> references to symbols that will not exist in the offloaded code (leading
> >> to either link failure or bloat).
> >
> > The odd thing is that the early debug should _not_ contain any relocations
> > to data nor extra symbols.  Only relocations within the debug info
> > should be there.
>
> The problem with the relocations themselves is only that the x86_64
> relocation codes don't match the amdgcn codes, making them gibberish to
> the linker.
>
> As for the symbol references, I observed it trying to link in symbols
> that only exist in the host libgomp.

It shouldn't do that.

> My assumption is that the generated
> early debug was intended for the LTO use-case, in which the output
> program will contain everything, and that this is merely the first
> offload target to support debug in this way. If it's not supposed to
> have those references at all then I'm out of my depth.

Even with LTO we cannot say for sure a symbol will survive optimization
so the early debug has to be self-contained and linkable without any
prerequesites.  Otherwise you hit a latent bug.

That said, there will still be some data relocations to/from debug sections,
mainly .debug_info referencing .debug_str + offset but maybe also some
others.  I would have hoped there are "common" ELF relocation types
with the same relocation types across targets for this ... but oh well.

> I can't say for sure whether the debug info generated with my patch is
> broken or misleading (because I'm clueless), but weakening the symbols
> works for the test cases and debug sessions that I tried.

As said it shouldn't be necessary and just hides an existing bug.

Richard.

> Andrew


preprocessor: line-map tidying

2020-07-20 Thread Nathan Sidwell via Gcc-patches
I found the linemap logic dealing with running out of column numbers 
confusing.  There's no need for completely separate code blocks there, 
as we can rely on the masking operations working all the way down to 
zero bits.  The two binary searches for linemap lookups could do with 
modernization of placing the var decls at their initialization point. 
(These two searches work in opposite directions, and while lower_bound 
would work there, the caching got in the way and I decided to be 
conservative.)


libcpp/
* line-map.c (linemap_add): Simplify column overflow 
calculation.

Add comment about range and column bit init.
(linemap_ordinary_map_lookup): Refactor for RAII
(linemap_macro_map_lookup): Likewise.

pushed
--
Nathan Sidwell : Facebook
diff --git i/libcpp/line-map.c w/libcpp/line-map.c
index 8a390d0857b..a8d52861dee 100644
--- i/libcpp/line-map.c
+++ w/libcpp/line-map.c
@@ -462,17 +462,12 @@ linemap_add (line_maps *set, enum lc_reason reason,
 {
   /* Generate a start_location above the current highest_location.
  If possible, make the low range bits be zero.  */
-  location_t start_location;
-  if (set->highest_location < LINE_MAP_MAX_LOCATION_WITH_COLS)
-{
-  start_location = set->highest_location + (1 << set->default_range_bits);
-  if (set->default_range_bits)
-	start_location &= ~((1 << set->default_range_bits) - 1);
-  linemap_assert (0 == (start_location
-			& ((1 << set->default_range_bits) - 1)));
-}
-  else
-start_location = set->highest_location + 1;
+  location_t start_location = set->highest_location + 1;
+  unsigned range_bits = 0;
+  if (start_location < LINE_MAP_MAX_LOCATION_WITH_COLS)
+range_bits = set->default_range_bits;
+  start_location += (1 << range_bits) - 1;
+  start_location &=  ~((1 << range_bits) - 1);
 
   linemap_assert (!LINEMAPS_ORDINARY_USED (set)
 		  || (start_location
@@ -537,8 +532,9 @@ linemap_add (line_maps *set, enum lc_reason reason,
   map->to_file = to_file;
   map->to_line = to_line;
   LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
-  map->m_column_and_range_bits = 0;
-  map->m_range_bits = 0;
+  /* Do not store range_bits here.  That's readjusted in
+ linemap_line_start.  */
+  map->m_range_bits = map->m_column_and_range_bits = 0;
   set->highest_location = start_location;
   set->highest_line = start_location;
   set->max_column_hint = 0;
@@ -954,19 +950,16 @@ linemap_lookup (const line_maps *set, location_t line)
 static const line_map_ordinary *
 linemap_ordinary_map_lookup (const line_maps *set, location_t line)
 {
-  unsigned int md, mn, mx;
-  const line_map_ordinary *cached, *result;
-
   if (IS_ADHOC_LOC (line))
 line = get_location_from_adhoc_loc (set, line);
 
   if (set ==  NULL || line < RESERVED_LOCATION_COUNT)
 return NULL;
 
-  mn = LINEMAPS_ORDINARY_CACHE (set);
-  mx = LINEMAPS_ORDINARY_USED (set);
+  unsigned mn = LINEMAPS_ORDINARY_CACHE (set);
+  unsigned mx = LINEMAPS_ORDINARY_USED (set);
 
-  cached = LINEMAPS_ORDINARY_MAP_AT (set, mn);
+  const line_map_ordinary *cached = LINEMAPS_ORDINARY_MAP_AT (set, mn);
   /* We should get a segfault if no line_maps have been added yet.  */
   if (line >= MAP_START_LOCATION (cached))
 {
@@ -981,7 +974,7 @@ linemap_ordinary_map_lookup (const line_maps *set, location_t line)
 
   while (mx - mn > 1)
 {
-  md = (mn + mx) / 2;
+  unsigned md = (mn + mx) / 2;
   if (MAP_START_LOCATION (LINEMAPS_ORDINARY_MAP_AT (set, md)) > line)
 	mx = md;
   else
@@ -989,7 +982,7 @@ linemap_ordinary_map_lookup (const line_maps *set, location_t line)
 }
 
   LINEMAPS_ORDINARY_CACHE (set) = mn;
-  result = LINEMAPS_ORDINARY_MAP_AT (set, mn);
+  const line_map_ordinary *result = LINEMAPS_ORDINARY_MAP_AT (set, mn);
   linemap_assert (line >= MAP_START_LOCATION (result));
   return result;
 }
@@ -1002,21 +995,18 @@ linemap_ordinary_map_lookup (const line_maps *set, location_t line)
 static const line_map_macro *
 linemap_macro_map_lookup (const line_maps *set, location_t line)
 {
-  unsigned int md, mn, mx;
-  const struct line_map_macro *cached, *result;
-
   if (IS_ADHOC_LOC (line))
 line = get_location_from_adhoc_loc (set, line);
 
   linemap_assert (line >= LINEMAPS_MACRO_LOWEST_LOCATION (set));
 
-  if (set ==  NULL)
+  if (set == NULL)
 return NULL;
 
-  mn = LINEMAPS_MACRO_CACHE (set);
-  mx = LINEMAPS_MACRO_USED (set);
-  cached = LINEMAPS_MACRO_MAP_AT (set, mn);
-  
+  unsigned mn = LINEMAPS_MACRO_CACHE (set);
+  unsigned mx = LINEMAPS_MACRO_USED (set);
+  const struct line_map_macro *cached = LINEMAPS_MACRO_MAP_AT (set, mn);
+
   if (line >= MAP_START_LOCATION (cached))
 {
   if (mn == 0 || line < MAP_START_LOCATION (&cached[-1]))
@@ -1027,7 +1017,7 @@ linemap_macro_map_lookup (const line_maps *set, location_t line)
 
   while (mn < mx)
 {
-  md = (mx + mn) / 2;
+  unsigned md = (mx + mn) / 2;
   if (MAP_START_LOCATION (LINEMA

preprocessor: line-map tidying

2020-07-20 Thread Nathan Sidwell
I found the linemap logic dealing with running out of column numbers 
confusing.  There's no need for completely separate code blocks there, 
as we can rely on the masking operations working all the way down to 
zero bits.  The two binary searches for linemap lookups could do with 
modernization of placing the var decls at their initialization point. 
(These two searches work in opposite directions, and while lower_bound 
would work there, the caching got in the way and I decided to be 
conservative.)


libcpp/
* line-map.c (linemap_add): Simplify column overflow 
calculation.

Add comment about range and column bit init.
(linemap_ordinary_map_lookup): Refactor for RAII
(linemap_macro_map_lookup): Likewise.

pushed
--
Nathan Sidwell
diff --git i/libcpp/line-map.c w/libcpp/line-map.c
index 8a390d0857b..a8d52861dee 100644
--- i/libcpp/line-map.c
+++ w/libcpp/line-map.c
@@ -462,17 +462,12 @@ linemap_add (line_maps *set, enum lc_reason reason,
 {
   /* Generate a start_location above the current highest_location.
  If possible, make the low range bits be zero.  */
-  location_t start_location;
-  if (set->highest_location < LINE_MAP_MAX_LOCATION_WITH_COLS)
-{
-  start_location = set->highest_location + (1 << set->default_range_bits);
-  if (set->default_range_bits)
-	start_location &= ~((1 << set->default_range_bits) - 1);
-  linemap_assert (0 == (start_location
-			& ((1 << set->default_range_bits) - 1)));
-}
-  else
-start_location = set->highest_location + 1;
+  location_t start_location = set->highest_location + 1;
+  unsigned range_bits = 0;
+  if (start_location < LINE_MAP_MAX_LOCATION_WITH_COLS)
+range_bits = set->default_range_bits;
+  start_location += (1 << range_bits) - 1;
+  start_location &=  ~((1 << range_bits) - 1);
 
   linemap_assert (!LINEMAPS_ORDINARY_USED (set)
 		  || (start_location
@@ -537,8 +532,9 @@ linemap_add (line_maps *set, enum lc_reason reason,
   map->to_file = to_file;
   map->to_line = to_line;
   LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
-  map->m_column_and_range_bits = 0;
-  map->m_range_bits = 0;
+  /* Do not store range_bits here.  That's readjusted in
+ linemap_line_start.  */
+  map->m_range_bits = map->m_column_and_range_bits = 0;
   set->highest_location = start_location;
   set->highest_line = start_location;
   set->max_column_hint = 0;
@@ -954,19 +950,16 @@ linemap_lookup (const line_maps *set, location_t line)
 static const line_map_ordinary *
 linemap_ordinary_map_lookup (const line_maps *set, location_t line)
 {
-  unsigned int md, mn, mx;
-  const line_map_ordinary *cached, *result;
-
   if (IS_ADHOC_LOC (line))
 line = get_location_from_adhoc_loc (set, line);
 
   if (set ==  NULL || line < RESERVED_LOCATION_COUNT)
 return NULL;
 
-  mn = LINEMAPS_ORDINARY_CACHE (set);
-  mx = LINEMAPS_ORDINARY_USED (set);
+  unsigned mn = LINEMAPS_ORDINARY_CACHE (set);
+  unsigned mx = LINEMAPS_ORDINARY_USED (set);
 
-  cached = LINEMAPS_ORDINARY_MAP_AT (set, mn);
+  const line_map_ordinary *cached = LINEMAPS_ORDINARY_MAP_AT (set, mn);
   /* We should get a segfault if no line_maps have been added yet.  */
   if (line >= MAP_START_LOCATION (cached))
 {
@@ -981,7 +974,7 @@ linemap_ordinary_map_lookup (const line_maps *set, location_t line)
 
   while (mx - mn > 1)
 {
-  md = (mn + mx) / 2;
+  unsigned md = (mn + mx) / 2;
   if (MAP_START_LOCATION (LINEMAPS_ORDINARY_MAP_AT (set, md)) > line)
 	mx = md;
   else
@@ -989,7 +982,7 @@ linemap_ordinary_map_lookup (const line_maps *set, location_t line)
 }
 
   LINEMAPS_ORDINARY_CACHE (set) = mn;
-  result = LINEMAPS_ORDINARY_MAP_AT (set, mn);
+  const line_map_ordinary *result = LINEMAPS_ORDINARY_MAP_AT (set, mn);
   linemap_assert (line >= MAP_START_LOCATION (result));
   return result;
 }
@@ -1002,21 +995,18 @@ linemap_ordinary_map_lookup (const line_maps *set, location_t line)
 static const line_map_macro *
 linemap_macro_map_lookup (const line_maps *set, location_t line)
 {
-  unsigned int md, mn, mx;
-  const struct line_map_macro *cached, *result;
-
   if (IS_ADHOC_LOC (line))
 line = get_location_from_adhoc_loc (set, line);
 
   linemap_assert (line >= LINEMAPS_MACRO_LOWEST_LOCATION (set));
 
-  if (set ==  NULL)
+  if (set == NULL)
 return NULL;
 
-  mn = LINEMAPS_MACRO_CACHE (set);
-  mx = LINEMAPS_MACRO_USED (set);
-  cached = LINEMAPS_MACRO_MAP_AT (set, mn);
-  
+  unsigned mn = LINEMAPS_MACRO_CACHE (set);
+  unsigned mx = LINEMAPS_MACRO_USED (set);
+  const struct line_map_macro *cached = LINEMAPS_MACRO_MAP_AT (set, mn);
+
   if (line >= MAP_START_LOCATION (cached))
 {
   if (mn == 0 || line < MAP_START_LOCATION (&cached[-1]))
@@ -1027,7 +1017,7 @@ linemap_macro_map_lookup (const line_maps *set, location_t line)
 
   while (mn < mx)
 {
-  md = (mx + mn) / 2;
+  unsigned md = (mx + mn) / 2;
   if (MAP_START_LOCATION (LINEMAPS_MACRO_MA

Re: [PATCH 3/4] libstdc++: Add floating-point std::to_chars implementation

2020-07-20 Thread Jonathan Wakely via Gcc-patches

On 19/07/20 23:37 -0400, Patrick Palka via Libstdc++ wrote:

On Fri, 17 Jul 2020, Patrick Palka wrote:


On Fri, 17 Jul 2020, Patrick Palka wrote:

> On Wed, 15 Jul 2020, Patrick Palka wrote:
>
> > On Tue, 14 Jul 2020, Patrick Palka wrote:
> >
> > > This implements the floating-point std::to_chars overloads for float,
> > > double and long double.  We use the Ryu library to compute the shortest
> > > round-trippable fixed and scientific forms of a number for float, double
> > > and long double.  We also use Ryu for performing fixed and scientific
> > > formatting of float and double. For formatting long double with an
> > > explicit precision argument we use a printf fallback.  Hexadecimal
> > > formatting for float, double and long double is implemented from
> > > scratch.
> > >
> > > The supported long double binary formats are float64 (same as double),
> > > float80 (x86 extended precision), float128 and ibm128.
> > >
> > > Much of the complexity of the implementation is in computing the exact
> > > output length before handing it off to Ryu (which doesn't do bounds
> > > checking).  In some cases it's hard to compute the output length before
> > > the fact, so in these cases we instead compute an upper bound on the
> > > output length and use a sufficiently-sized intermediate buffer (if the
> > > output range is smaller than the upper bound).
> > >
> > > Another source of complexity is in the general-with-precision formatting
> > > mode, where we need to do zero-trimming of the string returned by Ryu, and
> > > where we also take care to avoid having to format the string a second
> > > time when the general formatting mode resolves to fixed.
> > >
> > > Tested on x86_64-pc-linux-gnu, aarch64-unknown-linux-gnu,
> > > s390x-ibm-linux-gnu, and powerpc64-unknown-linux-gnu.
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > >  * acinclude.m4 (libtool_VERSION): Bump to 6:29:0.
> > >  * config/abi/pre/gnu.ver: Add new exports.
> > >  * configure: Regenerate.
> > >  * include/std/charconv (to_chars): Declare the floating-point
> > >  overloads for float, double and long double.
> > >  * src/c++17/Makefile.am (sources): Add floating_to_chars.cc.
> > >  * src/c++17/Makefile.in: Regenerate.
> > >  * src/c++17/floating_to_chars.cc: New file.
> > >  * testsuite/20_util/to_chars/long_double.cc: New test.
> > >  * testsuite/util/testsuite_abi.cc: Add new symbol version.
> >
> > Here is v2 of this patch, which fixes a build failure on i386 due to
> > __int128 being unavailable, by refactoring the long double binary format
> > selection to avoid referring to __int128 when it doesn't exist.  The
> > patch also makes the hex formatting for 80-bit long double use uint64_t
> > instead of __int128 since the mantissa has exactly 64 bits in this case.
>
> Here's v3 which just makes some minor stylistic adjustments, and most
> notably replaces the use of _GLIBCXX_DEBUG with _GLIBCXX_ASSERTIONS
> since we just want to enable __glibcxx_assert and not all of debug mode.

Here's v4, which should now correctly support using  with
-mlong-double-64 on targets with a large default long double type.
This is done by defining the long double to_chars overloads as inline
wrappers around the double overloads within  whenever
__DBL_MANT_DIG__ equals __LDBL_MANT_DIG__.




-- >8 --

Subject: [PATCH 3/4] libstdc++: Add floating-point std::to_chars
 implementation

This implements the floating-point std::to_chars overloads for float,
double and long double.  We use the Ryu library to compute the shortest
round-trippable fixed and scientific forms of a number for float, double
and long double.  We also use Ryu for performing explicit-precision
fixed and scientific formatting of float and double. For
explicit-precision formatting of long double we fall back to using
printf.  Hexadecimal formatting for float, double and long double is
implemented from scratch.

The supported long double binary formats are binary64, binary80 (x86
80-bit extended precision), binary128 and ibm128.

Much of the complexity of the implementation is in computing the exact
output length before handing it off to Ryu (which doesn't do bounds
checking).  In some cases it's hard to compute the output length
beforehand, so in these cases we instead compute an upper bound on the
output length and use a sufficiently-sized intermediate buffer if
necessary.

Another source of complexity is in the general-with-precision formatting
mode, where we need to do zero-trimming of the string returned by Ryu,
and where we also take care to avoid having to format the string a
second time when the general formatting mode resolves to fixed.

This implementation is non-conforming in a couple of ways:

1. For the shortest hexadecimal formatting, we currently follow the
   Microsoft implementation's approach of being consistent with the
   output of printf's '%a' specifier at the expense of sometimes not
   printing the shortest representation.  For example, the shortest hex
   form of 1.08p+0 is 2.

Re: [PATCH 3/4] libstdc++: Add floating-point std::to_chars implementation

2020-07-20 Thread Patrick Palka via Gcc-patches
On Mon, 20 Jul 2020, Jonathan Wakely wrote:

> On 19/07/20 23:37 -0400, Patrick Palka via Libstdc++ wrote:
> > On Fri, 17 Jul 2020, Patrick Palka wrote:
> > 
> > > On Fri, 17 Jul 2020, Patrick Palka wrote:
> > > 
> > > > On Wed, 15 Jul 2020, Patrick Palka wrote:
> > > >
> > > > > On Tue, 14 Jul 2020, Patrick Palka wrote:
> > > > >
> > > > > > This implements the floating-point std::to_chars overloads for
> > > float,
> > > > > > double and long double.  We use the Ryu library to compute the
> > > shortest
> > > > > > round-trippable fixed and scientific forms of a number for float,
> > > double
> > > > > > and long double.  We also use Ryu for performing fixed and
> > > scientific
> > > > > > formatting of float and double. For formatting long double with an
> > > > > > explicit precision argument we use a printf fallback.  Hexadecimal
> > > > > > formatting for float, double and long double is implemented from
> > > > > > scratch.
> > > > > >
> > > > > > The supported long double binary formats are float64 (same as
> > > double),
> > > > > > float80 (x86 extended precision), float128 and ibm128.
> > > > > >
> > > > > > Much of the complexity of the implementation is in computing the
> > > exact
> > > > > > output length before handing it off to Ryu (which doesn't do bounds
> > > > > > checking).  In some cases it's hard to compute the output length
> > > before
> > > > > > the fact, so in these cases we instead compute an upper bound on the
> > > > > > output length and use a sufficiently-sized intermediate buffer (if
> > > the
> > > > > > output range is smaller than the upper bound).
> > > > > >
> > > > > > Another source of complexity is in the general-with-precision
> > > formatting
> > > > > > mode, where we need to do zero-trimming of the string returned by
> > > Ryu, and
> > > > > > where we also take care to avoid having to format the string a
> > > second
> > > > > > time when the general formatting mode resolves to fixed.
> > > > > >
> > > > > > Tested on x86_64-pc-linux-gnu, aarch64-unknown-linux-gnu,
> > > > > > s390x-ibm-linux-gnu, and powerpc64-unknown-linux-gnu.
> > > > > >
> > > > > > libstdc++-v3/ChangeLog:
> > > > > >
> > > > > > * acinclude.m4 (libtool_VERSION): Bump to 6:29:0.
> > > > > > * config/abi/pre/gnu.ver: Add new exports.
> > > > > > * configure: Regenerate.
> > > > > > * include/std/charconv (to_chars): Declare the floating-point
> > > > > > overloads for float, double and long double.
> > > > > > * src/c++17/Makefile.am (sources): Add floating_to_chars.cc.
> > > > > > * src/c++17/Makefile.in: Regenerate.
> > > > > > * src/c++17/floating_to_chars.cc: New file.
> > > > > > * testsuite/20_util/to_chars/long_double.cc: New test.
> > > > > > * testsuite/util/testsuite_abi.cc: Add new symbol version.
> > > > >
> > > > > Here is v2 of this patch, which fixes a build failure on i386 due to
> > > > > __int128 being unavailable, by refactoring the long double binary
> > > format
> > > > > selection to avoid referring to __int128 when it doesn't exist.  The
> > > > > patch also makes the hex formatting for 80-bit long double use
> > > uint64_t
> > > > > instead of __int128 since the mantissa has exactly 64 bits in this
> > > case.
> > > >
> > > > Here's v3 which just makes some minor stylistic adjustments, and most
> > > > notably replaces the use of _GLIBCXX_DEBUG with _GLIBCXX_ASSERTIONS
> > > > since we just want to enable __glibcxx_assert and not all of debug mode.
> > > 
> > > Here's v4, which should now correctly support using  with
> > > -mlong-double-64 on targets with a large default long double type.
> > > This is done by defining the long double to_chars overloads as inline
> > > wrappers around the double overloads within  whenever
> > > __DBL_MANT_DIG__ equals __LDBL_MANT_DIG__.
> > 
> > > 
> > > -- >8 --
> > > 
> > > Subject: [PATCH 3/4] libstdc++: Add floating-point std::to_chars
> > >  implementation
> > > 
> > > This implements the floating-point std::to_chars overloads for float,
> > > double and long double.  We use the Ryu library to compute the shortest
> > > round-trippable fixed and scientific forms of a number for float, double
> > > and long double.  We also use Ryu for performing explicit-precision
> > > fixed and scientific formatting of float and double. For
> > > explicit-precision formatting of long double we fall back to using
> > > printf.  Hexadecimal formatting for float, double and long double is
> > > implemented from scratch.
> > > 
> > > The supported long double binary formats are binary64, binary80 (x86
> > > 80-bit extended precision), binary128 and ibm128.
> > > 
> > > Much of the complexity of the implementation is in computing the exact
> > > output length before handing it off to Ryu (which doesn't do bounds
> > > checking).  In some cases it's hard to compute the output length
> > > beforehand, so in these cases we instead compute an upper bound on the
> > > output length and use a sufficiently-sized 

[PATCH] middle-end: Fold popcount(x&4) to (x>>2)&1 and friends.

2020-07-20 Thread Roger Sayle

This patch complements one from June 12th which is still awaiting
review: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547937.html

This patch optimizes popcount and parity of an argument known to have
at most a single bit set, to be that single bit.  Hence, popcount(x&8)
is simplified to (x>>3)&1.   This generalizes the existing optimization
of popcount(x&1) being simplified to x&1, which is moved with this
patch to avoid a duplicate pattern warning in match.pd.

This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
and "make -k check" with no new failures.  If this is approved after
(or at the same time) as the patch above, I'm happy to resolve the
conflicts and retest before committing.


2020-07-20  Roger Sayle  

gcc/ChangeLog
* match.pd (popcount(x) -> x>>C): New simplification.

gcc/testsuite
* gcc.dg/fold-popcount-5.c: New test.
* gcc.dg/fold-parity-5.c: Likewise.


Ok for mainline?
Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/match.pd b/gcc/match.pd
index c6ae7a7..0b3b626 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5966,11 +5966,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* POPCOUNT simplifications.  */
 (for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL
   BUILT_IN_POPCOUNTIMAX)
-  /* popcount(X&1) is nop_expr(X&1).  */
-  (simplify
-(popcount @0)
-(if (tree_nonzero_bits (@0) == 1)
-  (convert @0)))
   /* popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero.  */
   (simplify
 (plus (popcount:s @0) (popcount:s @1))
@@ -5983,6 +5978,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cmp (popcount @0) integer_zerop)
   (rep @0 { build_zero_cst (TREE_TYPE (@0)); }
 
+/* popcount(X&C1) is (X>>C2)&1 when C1 == 1<

Re: [PATCH] nvptx: Add support for subword compare-and-swap

2020-07-20 Thread Kwok Cheung Yeung

On 01/07/2020 3:28 pm, Tom de Vries wrote:

So, I think gcc needs a copy of (some of) the
gcc/testsuite/gcc.dg/ia64-sync-*.c tests for effective target
sync_char_short.

However, since this patch only adds partial support, we cannot enable
sync_char_short for nvptx yet.  So, if you stick to partial support, you
should add a char/short copy of ia64-sync-3.c to gcc.target/nvptx (which
ideally could be an include of a generic test-case that is active for
sync_char_short only, with mention that it can be removed once
sync_char_short is enabled for nvptx).



I have added gcc.target/nvptx/sync.c, which is a version of ia64-sync-3.c 
extended to test chars and shorts too. I kept the original int and long tests 
because sync_int_long isn't indicated as being supported on nvptx either.



I looked at the implementation, and it looks ok to me, though I think we
need to make explicit in a comment what the assumptions are:
- that we have read and write access to the entire word, and
- that the word is not volatile.



I've added some extra comments in the implementation. Like I said previously, 
the loop accounts for the larger word being volatile.



As for the oacc test-case, you could add the __int128 bit, perhaps along
the lines of how things are done in
libgomp/testsuite/libgomp.c++/target-8.C ?



I've added a extra test for __int128 types in my libgomp testcase that runs if 
128-bit types are supported.


I've tested that there are no regressions with the patch on standalone nvptx, 
and that the new reduction-16.c testcase passes with both nvptx and AMD GCN 
offloading.


Is this version okay for master and og10?

Thanks

Kwok
commit 4661232905d55a4bc1354cb717b2e5d950d215af
Author: Kwok Cheung Yeung 
Date:   Thu Jul 16 12:00:24 2020 -0700

nvptx: Add support for subword compare-and-swap

This adds support for __sync_val_compare_and_swap and
__sync_bool_compare_and_swap for 1-byte and 2-byte long
values, which are not natively supported on nvptx.

2020-07-16  Kwok Cheung Yeung  

libgcc/
* config/nvptx/atomic.c: New.
* config/nvptx/t-nvptx (LIB2ADD): Add atomic.c.

gcc/testsuite/
* gcc.target/nvptx/sync.c: New.

libgomp/
* testsuite/libgomp.c-c++-common/reduction-16.c: New.

diff --git a/gcc/testsuite/gcc.target/nvptx/sync.c 
b/gcc/testsuite/gcc.target/nvptx/sync.c
new file mode 100644
index 000..a573824
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sync.c
@@ -0,0 +1,143 @@
+/* { dg-do run } */
+
+/* Test basic functionality of the intrinsics.  */
+
+/* This is a copy of gcc.dg/ia64-sync-2.c, extended to test 8-bit and 16-bit
+   values as well.  */
+
+/* Ideally this test should require sync_char_short and sync_int_long, but we
+   only support a subset at the moment.  */
+
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+static char AC[4];
+static char init_qi[4] = { -30,-30,-50,-50 };
+static char test_qi[4] = { -115,-115,25,25 };
+
+static void
+do_qi (void)
+{
+  if (__sync_val_compare_and_swap(AC+0, -30, -115) != -30)
+abort ();
+  if (__sync_val_compare_and_swap(AC+0, -30, -115) != -115)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+1, -30, -115) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+1, -30, -115) != 0)
+abort ();
+
+  if (__sync_val_compare_and_swap(AC+2, AC[2], 25) != -50)
+abort ();
+  if (__sync_val_compare_and_swap(AC+2, AC[2], 25) != 25)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+3, AC[3], 25) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+3, AC[3], 25) != 1)
+abort ();
+}
+
+static short AS[4];
+static short init_hi[4] = { -30,-30,-50,-50 };
+static short test_hi[4] = { -115,-115,25,25 };
+
+static void
+do_hi (void)
+{
+  if (__sync_val_compare_and_swap(AS+0, -30, -115) != -30)
+abort ();
+  if (__sync_val_compare_and_swap(AS+0, -30, -115) != -115)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+1, -30, -115) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+1, -30, -115) != 0)
+abort ();
+
+  if (__sync_val_compare_and_swap(AS+2, AS[2], 25) != -50)
+abort ();
+  if (__sync_val_compare_and_swap(AS+2, AS[2], 25) != 25)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+3, AS[3], 25) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+3, AS[3], 25) != 1)
+abort ();
+}
+
+static int AI[4];
+static int init_si[4] = { -30,-30,-50,-50 };
+static int test_si[4] = { -115,-115,25,25 };
+
+static void
+do_si (void)
+{
+  if (__sync_val_compare_and_swap(AI+0, -30, -115) != -30)
+abort ();
+  if (__sync_val_compare_and_swap(AI+0, -30, -115) != -115)
+abort ();
+  if (__sync_bool_compare_and_swap(AI+1, -30, -115) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AI+1, -30, -115) != 0)
+abort ();
+
+  if (__sync_val_compare_and_swap(AI+2, AI

Re: [PATCH] non-power-of-2 group size can be vectorized for 2-element vectors case (PR96208)

2020-07-20 Thread Richard Biener via Gcc-patches
On Wed, Jul 15, 2020 at 5:40 PM Dmitrij Pochepko
 wrote:
>
> Hi,
>
> here is an enhancement to gcc, which allows load/store groups with size being 
> non-power-of-2 to be vectorized.
> Current implementation is using interleaving permutations to transform 
> load/store groups. That is where power-of-2 requirements comes from.
> For N-element vectors simplest approch would be to use N single element 
> insertions for any required vector permutation.
> And for 2-element vectors it is a reasonable amount of insertions.
> Using this approach allows vectorization for cases, which were not supported 
> before.
>
> bootstrapped and tested on x86_64-pc-linux-gnu and aarch64-linux-gnu.

I believe a more general fix revolves around making SLP discovery not
fail on the
not grouped load *k.  Quoting the testcase:

typedef struct {
double m1, m2, m3, m4, m5;
} the_struct_t;

double bar1 (the_struct_t*);

double foo (double* k, unsigned int n, the_struct_t* the_struct)
{
unsigned int u;
the_struct_t result;
for (u=0; u < n; u++, k--) {
   result.m1 += (*k)*the_struct[u].m1;
   result.m2 += (*k)*the_struct[u].m2;
   result.m3 += (*k)*the_struct[u].m3;
   result.m4 += (*k)*the_struct[u].m4;
}
return bar1 (&result);
}

here *k could be accepted because it is the same in every
SLP lane.  Implementation-wise I think we'd handle a DR
group with a single element just fine here, we just have to be
careful (at this point) to not overeagerly make them so.

I've played with similar changes in this area already but some
refactoring could make things nicer.  So it's still on my TODO
to make the above SLP vectorized.

Richard.

> Thanks,
> Dmitrij


Re: [PATCH][GCC][Arm] PR target/95646: Do not clobber callee saved registers with CMSE

2020-07-20 Thread Andre Vieira (lists)



On 08/07/2020 09:04, Andre Simoes Dias Vieira wrote:


On 07/07/2020 13:43, Christophe Lyon wrote:

Hi,


On Mon, 6 Jul 2020 at 16:31, Andre Vieira (lists)
 wrote:


On 30/06/2020 14:50, Andre Vieira (lists) wrote:

On 29/06/2020 11:15, Christophe Lyon wrote:

On Mon, 29 Jun 2020 at 10:56, Andre Vieira (lists)
 wrote:

On 23/06/2020 21:52, Christophe Lyon wrote:

On Tue, 23 Jun 2020 at 15:28, Andre Vieira (lists)
 wrote:

On 23/06/2020 13:10, Kyrylo Tkachov wrote:

-Original Message-
From: Andre Vieira (lists) 
Sent: 22 June 2020 09:52
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov 
Subject: [PATCH][GCC][Arm] PR target/95646: Do not clobber
callee saved
registers with CMSE

Hi,

As reported in bugzilla when the -mcmse option is used while
compiling
for size (-Os) with a thumb-1 target the generated code will
clear the
registers r7-r10. These however are callee saved and should be
preserved
accross ABI boundaries. The reason this happens is because these
registers are made "fixed" when optimising for size with Thumb-1
in a
way to make sure they are not used, as pushing and popping
hi-registers
requires extra moves to and from LO_REGS.

To fix this, this patch uses 'callee_saved_reg_p', which
accounts for
this optimisation, instead of 'call_used_or_fixed_reg_p'. Be
aware of
'callee_saved_reg_p''s definition, as it does still take call 
used

registers into account, which aren't callee_saved in my opinion,
so it
is a rather misnoemer, works in our advantage here though as it
does
exactly what we need.

Regression tested on arm-none-eabi.

Is this OK for trunk? (Will eventually backport to previous
versions if
stable.)

Ok.
Thanks,
Kyrill

As I was getting ready to push this I noticed I didn't add any
skip-ifs
to prevent this failing with specific target options. So here's 
a new

version with those.

Still OK?


Hi,

This is not sufficient to skip arm-linux-gnueabi* configs built 
with

non-default cpu/fpu.

For instance, with arm-linux-gnueabihf --with-cpu=cortex-a9
--with-fpu=neon-fp16 --with-float=hard
I see:
FAIL: gcc.target/arm/pr95646.c (test for excess errors)
Excess errors:
cc1: error: ARMv8-M Security Extensions incompatible with 
selected FPU

cc1: error: target CPU does not support ARM mode

and the testcase is compiled with -mcpu=cortex-m23 -mcmse -Os

Resending as I don't think my earlier one made it to the lists
(sorry if
you are receiving this double!)

I'm not following this, before I go off and try to reproduce it,
what do
you mean by 'the testcase is compiled with -mcpu=cortex-m23 -mcmse
-Os'?
These are the options you are seeing in the log file? Surely they
should
override the default options? Only thing I can think of is this 
might

need an extra -mfloat-abi=soft to make sure it overrides the default
float-abi.  Could you give that a try?

No it doesn't make a difference alone.

I also had to add:
-mfpu=auto (that clears the above warning)
-mthumb otherwise we now get cc1: error: target CPU does not support
ARM mode

Looks like some effective-target machinery is needed

So I had a look at this,  I was pretty sure that -mfloat-abi=soft
overwrote -mfpu=<>, which in large it does, as in no FP instructions
will be generated but the error you see only checks for the right
number of FP registers. Which doesn't check whether
'TARGET_HARD_FLOAT' is set or not. I'll fix this too and use the
check-effective-target for armv8-m.base for this test as it is indeed
a better approach than my bag of skip-ifs. I'm testing it locally to
make sure my changes don't break anything.

Cheers,
Andre

Hi,

Sorry for the delay. So I changed the test to use the effective-target
machinery as you suggested and I also made sure that you don't get the
"ARMv8-M Security Extensions incompatible with selected FPU" when
-mfloat-abi=soft.
Further changed 'asm' to '__asm__' to avoid failures with '-std=' 
options.


Regression tested on arm-none-eabi.
@Christophe: could you test this for your configuration, shouldn't fail
anymore!


Indeed with your patch I don't see any failure with pr95646.c

Note that it is still unsupported with arm-eabi when running the tests
with -mcpu=cortex-mXX
because the compiler complains that -mcpu=cortex-mXX conflicts with
-march=armv8-m.base,
thus the effective-target test fails.

BTW, is that warning useful/practical? Wouldn't it be more convenient
if the last -mcpu/-march
on the command line was the only one taken into account? (I had a
similar issue when
running tests (libstdc++) getting -march=armv8-m.main+fp from their
multilib environment
and forcing -mcpu=cortex-m33 because it also means '+dsp' and produces
a warning;
I had to use -mcpu=cortex-m33 -march=armv8-m.main+fp+dsp to 
workaround this)
Yeah I've been annoyed by that before, also in the context of testing 
multilibs.


Even though I can see how it can be a useful warning though, if you 
are using these in build-systems and you accidentally introduce a new 
(incompatible) -mcpu/-march alongside the old one. Though it see

Re: [PATCH] middle-end: Fold popcount(x&4) to (x>>2)&1 and friends.

2020-07-20 Thread Richard Biener via Gcc-patches
On Mon, Jul 20, 2020 at 3:06 PM Roger Sayle  wrote:
>
>
> This patch complements one from June 12th which is still awaiting
> review: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547937.html
>
> This patch optimizes popcount and parity of an argument known to have
> at most a single bit set, to be that single bit.  Hence, popcount(x&8)
> is simplified to (x>>3)&1.   This generalizes the existing optimization
> of popcount(x&1) being simplified to x&1, which is moved with this
> patch to avoid a duplicate pattern warning in match.pd.
>
> This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
> and "make -k check" with no new failures.  If this is approved after
> (or at the same time) as the patch above, I'm happy to resolve the
> conflicts and retest before committing.

Given you know the constant bit position of the possibly nonzero
bit you can elide the conversion to unsigned for all but the case
of a possibly negative input (IIRC GCC doesn't yet take advantage
of negative right shift undefinedness - but maybe sanitizers complain).
Also the shift amount doesn't need to be in the same type as
the shifted amount so using either size_int() or integer_type_node
for that argument should reduce INTEGER_CST waste.

Any reason you are not tackling IFN_POPCOUNT/PARITY?
You could use

(for pfun (POPCOUNT PARITY)
 ...

and automagically get all builtins and the IFN.

Thanks,
Richard.



>
> 2020-07-20  Roger Sayle  
>
> gcc/ChangeLog
> * match.pd (popcount(x) -> x>>C): New simplification.
>
> gcc/testsuite
> * gcc.dg/fold-popcount-5.c: New test.
> * gcc.dg/fold-parity-5.c: Likewise.
>
>
> Ok for mainline?
> Thanks in advance,
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>


[PATCH] remove write-only array in rev_post_order_and_mark_dfs_back_seme

2020-07-20 Thread Richard Biener
This removes a write-only array in
rev_post_order_and_mark_dfs_back_seme.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

2020-07-20  Richard Biener  

* cfganal.c (rev_post_order_and_mark_dfs_back_seme): Remove
write-only post array.
---
 gcc/cfganal.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index b057c90368b..1c91c1e6bf5 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -1181,13 +1181,12 @@ rev_post_order_and_mark_dfs_back_seme (struct function 
*fn, edge entry,
  a need to re-allocate.  */
   auto_vec stack (2 * n_basic_blocks_for_fn (fn));
 
-  int *pre = XNEWVEC (int, 2 * last_basic_block_for_fn (fn));
-  int *post = pre + last_basic_block_for_fn (fn);
+  int *pre = XNEWVEC (int, last_basic_block_for_fn (fn));
 
   /* BB flag to track nodes that have been visited.  */
   auto_bb_flag visited (fn);
-  /* BB flag to track which nodes have post[] assigned to avoid
- zeroing post.  */
+  /* BB flag to track which nodes have postorder visting completed.  Used
+ for backedge marking.  */
   auto_bb_flag post_assigned (fn);
 
   /* Push the first edge on to the stack.  */
@@ -1235,7 +1234,6 @@ rev_post_order_and_mark_dfs_back_seme (struct function 
*fn, edge entry,
{
  /* There are no successors for the DEST node so assign
 its reverse completion number.  */
- post[dest->index] = rev_post_order_num;
  dest->flags |= post_assigned;
  rev_post_order[rev_post_order_num] = dest->index;
  rev_post_order_num++;
@@ -1244,16 +1242,15 @@ rev_post_order_and_mark_dfs_back_seme (struct function 
*fn, edge entry,
   else
{
  if (dest->flags & visited
+ && !(dest->flags & post_assigned)
  && src != entry->src
- && pre[src->index] >= pre[dest->index]
- && !(dest->flags & post_assigned))
+ && pre[src->index] >= pre[dest->index])
e->flags |= EDGE_DFS_BACK;
 
  if (idx != 0 && stack[idx - 1]->src != src)
{
  /* There are no more successors for the SRC node
 so assign its reverse completion number.  */
- post[src->index] = rev_post_order_num;
  src->flags |= post_assigned;
  rev_post_order[rev_post_order_num] = src->index;
  rev_post_order_num++;
-- 
2.26.2


[PATCH] [RFC] Sort region RPO according to SCC membership

2020-07-20 Thread Richard Biener
This produces a more optimal RPO order for iteration processing
by making sure that SCC members are processed before blocks reachable
from SCC exits.  This avoids iterating blocks unrelated to the current
iteration for RPO VN.

Overall reduction in the number of visited blocks isn't spectacular
for bootstrap (~1%) but single cases see up to a 10% reduction.

There's cleanup on the plate for followups which is merging the SCC
DFS walk with the RPO computing one as well as always requesting
iteration for the single caller.  The same function can be used
to optimize var-tracking iteration order (but that would already
benefit from "simpler" SCC finding and only toplevel sorting).
The sort comparator is a bit ugly and it is O(scc depth) which
makes the sorting itself approach quadraticness.  As said
var-tracking would benefit from simpler separation of the
function into independent regions which this also provides.

The DFS based SCC discovery could also replace our dominator
based loop finding with appropriate marking (or representing)
of irreducible regions.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2020-07-20  Richard Biener  

* cfganal.c (cmp_rpo_for_iteration): New callback for qsort.
(tag_header): New helper.
(rev_post_order_and_mark_dfs_back_seme): Compute SCC
membership and sort RPO when requesting an iteration
optimized order.
---
 gcc/cfganal.c | 191 ++
 1 file changed, 191 insertions(+)

diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index 5c85ebe3c1a..1c91c1e6bf5 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -1060,6 +1060,108 @@ pre_and_rev_post_order_compute (int *pre_order, int 
*rev_post_order,
   return pre_order_num;
 }
 
+
+struct crfi_data { int *scc; int *bb_to_rpo; unsigned is_header; };
+
+/* qsort callback sorting a RPO array of block indices in a way
+   bringing SCC members next to each other.  */
+
+static int
+cmp_rpo_for_iteration (const void *a_, const void *b_, void *data_)
+{
+  int a = *(const int *)a_;
+  int b = *(const int *)b_;
+  const crfi_data *data = (const crfi_data *)data_;
+
+  /* Fast case.  */
+  if (data->scc[a] == data->scc[b]
+  || data->scc[a] == b
+  || data->scc[b] == a)
+return data->bb_to_rpo[a] - data->bb_to_rpo[b];
+
+  /* Need to find a common loop containing A and B.  */
+  unsigned depth_a
+= (BASIC_BLOCK_FOR_FN (cfun, a)->flags & data->is_header) ? 1 : 0;
+  unsigned depth_b
+= (BASIC_BLOCK_FOR_FN (cfun, b)->flags & data->is_header) ? 1 : 0;
+  int header_of_a = depth_a ? a : data->scc[a];
+  int header_of_b = depth_b ? b : data->scc[b];
+  int tem = a;
+  while (1)
+{
+  /* A nested in B.  */
+  if (data->scc[tem] == header_of_b)
+   return data->bb_to_rpo[tem] - data->bb_to_rpo[b];
+  tem = data->scc[tem];
+  if (tem == -1)
+   break;
+  depth_a++;
+}
+  tem = b;
+  while (1)
+{
+  /* B nested in A.  */
+  if (data->scc[tem] == header_of_a)
+   return data->bb_to_rpo[a] - data->bb_to_rpo[tem];
+  tem = data->scc[tem];
+  if (tem == -1)
+   break;
+  depth_b++;
+}
+
+  /* The loops of A and B are siblings, find the parents at the same
+ loop depth.  */
+  a = header_of_a;
+  b = header_of_b;
+  while (depth_a > depth_b)
+{
+  depth_a--;
+  a = data->scc[a];
+}
+  while (depth_b > depth_a)
+{
+  depth_b--;
+  b = data->scc[b];
+}
+  gcc_assert (a != b);
+  /* All of the above could be O(1) with something like loop->superloops[]. */
+
+  while (data->scc[a] != data->scc[b])
+{
+  a = data->scc[a];
+  b = data->scc[b];
+  gcc_assert (a != -1 && b != -1);
+}
+
+  return data->bb_to_rpo[a] - data->bb_to_rpo[b];
+}
+
+/* Helper for the SCC finding in rev_post_order_and_mark_dfs_back_seme.  */
+
+static void
+tag_header (int b, int h, int *scc, int *dfs)
+{
+  if (h == -1 || b == h)
+return;
+  int cur1 = b;
+  int cur2 = h;
+  while (scc[cur1] != -1)
+{
+  int ih = scc[cur1];
+  if (ih == cur2)
+   return;
+  if (dfs[ih] < dfs[cur2])
+   {
+ scc[cur1] = cur2;
+ cur1 = cur2;
+ cur2 = ih;
+   }
+  else
+   cur1 = ih;
+}
+  scc[cur1] = cur2;
+}
+
 /* Unlike pre_and_rev_post_order_compute we fill rev_post_order backwards
so iterating in RPO order needs to start with rev_post_order[n - 1]
going to rev_post_order[0].  If FOR_ITERATION is true then try to
@@ -1165,6 +1267,95 @@ rev_post_order_and_mark_dfs_back_seme (struct function 
*fn, edge entry,
 BASIC_BLOCK_FOR_FN (fn, rev_post_order[i])->flags
   &= ~(post_assigned|visited);
 
+  if (for_iteration)
+{
+  auto_vec stack (n_basic_blocks_for_fn (fn) + 1);
+  auto_bb_flag is_header (fn);
+  int dfsnum = 1;
+  int *dfs = XNEWVEC (int, 2 * last_basic_block_for_fn (fn));
+  int *scc = dfs + last_basic_block_for_fn (fn);
+
+  basic_block dest = entry-

Re: [PATCH] target: fix default value checking of x_str_align_functions in aarch64.c

2020-07-20 Thread Richard Sandiford
Hu Jiangping  writes:
> Hi,
>
> This patch deal with the -falign-X=0 options. According to man pages,
> if zero is specified, a machine-dependent default value should be used.
> But in fact, zero was used in internal process, it is inconsistent.
>
> Tested on aarch64-linux cross compiler, Is that OK?
>
> BTW, the similar problems exists in other target sources.
> I can submit them all in another patch if needed,
> but I can test on i386 target only.

Sorry for the slow response on this.  Like you say, it seems to be
a pretty pervasive problem.  In fact I couldn't see anywhere that
actually treated -falign-foo=0 as anything other than -falign-foo=1.

Technically using an alignment of one for zero is within what the
documentation allows, but not in a useful way.  The documentation
also isn't clear about whether:

  -falign-loops=0:8

(“align to whatever you think is best, but don't skip more than 8 bytes”)
is supposed to be valid.  The implication is that it's OK, but in practice
it doesn't work.

If there isn't anywhere that handles zero in the way that the documentation
implies (i.e. with -falign-loops=0 being equivalent to -falign-loops)
then maybe we should instead change the documentation to match the
actual behaviour.

If instead this is a regression from previous compilers, then I guess
we should fix it.  But I think it would be good to have a helper function
that tests whether the default should be used for a given x_flag_align_foo
and x_str_align_foo pair.  That we we could reuse it in other targets
and would have only one place to update.  (For example, we might decide
to use parse_and_check_align_values rather than strcmp.)

Don't know whether anyone else has any thoughts about the best fix.

Thanks, and sorry again for the slow reply.

Richard

>
> Regards!
> Hujp
>
> ---
>  gcc/config/aarch64/aarch64.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 17dbe673978..697ac676f4d 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -14221,11 +14221,14 @@ aarch64_override_options_after_change_1 (struct 
> gcc_options *opts)
>   alignment to what the target wants.  */
>if (!opts->x_optimize_size)
>  {
> -  if (opts->x_flag_align_loops && !opts->x_str_align_loops)
> +  if ((opts->x_flag_align_loops && !opts->x_str_align_loops)
> +|| (opts->x_str_align_loops && strcmp(opts->x_str_align_loops, "0") 
> == 0))
>   opts->x_str_align_loops = aarch64_tune_params.loop_align;
> -  if (opts->x_flag_align_jumps && !opts->x_str_align_jumps)
> +  if ((opts->x_flag_align_jumps && !opts->x_str_align_jumps)
> +|| (opts->x_str_align_jumps && strcmp(opts->x_str_align_jumps, "0") 
> == 0))
>   opts->x_str_align_jumps = aarch64_tune_params.jump_align;
> -  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
> +  if ((opts->x_flag_align_functions && !opts->x_str_align_functions)
> +|| (opts->x_str_align_functions && 
> strcmp(opts->x_str_align_functions, "0") == 0))
>   opts->x_str_align_functions = aarch64_tune_params.function_align;
>  }


Re: [PATCH 3/4] libstdc++: Add floating-point std::to_chars implementation

2020-07-20 Thread Jonathan Wakely via Gcc-patches

On 20/07/20 08:53 -0400, Patrick Palka via Libstdc++ wrote:

On Mon, 20 Jul 2020, Jonathan Wakely wrote:


On 19/07/20 23:37 -0400, Patrick Palka via Libstdc++ wrote:
> On Fri, 17 Jul 2020, Patrick Palka wrote:
>
> > On Fri, 17 Jul 2020, Patrick Palka wrote:
> >
> > > On Wed, 15 Jul 2020, Patrick Palka wrote:
> > >
> > > > On Tue, 14 Jul 2020, Patrick Palka wrote:
> > > >
> > > > > This implements the floating-point std::to_chars overloads for
> > float,
> > > > > double and long double.  We use the Ryu library to compute the
> > shortest
> > > > > round-trippable fixed and scientific forms of a number for float,
> > double
> > > > > and long double.  We also use Ryu for performing fixed and
> > scientific
> > > > > formatting of float and double. For formatting long double with an
> > > > > explicit precision argument we use a printf fallback.  Hexadecimal
> > > > > formatting for float, double and long double is implemented from
> > > > > scratch.
> > > > >
> > > > > The supported long double binary formats are float64 (same as
> > double),
> > > > > float80 (x86 extended precision), float128 and ibm128.
> > > > >
> > > > > Much of the complexity of the implementation is in computing the
> > exact
> > > > > output length before handing it off to Ryu (which doesn't do bounds
> > > > > checking).  In some cases it's hard to compute the output length
> > before
> > > > > the fact, so in these cases we instead compute an upper bound on the
> > > > > output length and use a sufficiently-sized intermediate buffer (if
> > the
> > > > > output range is smaller than the upper bound).
> > > > >
> > > > > Another source of complexity is in the general-with-precision
> > formatting
> > > > > mode, where we need to do zero-trimming of the string returned by
> > Ryu, and
> > > > > where we also take care to avoid having to format the string a
> > second
> > > > > time when the general formatting mode resolves to fixed.
> > > > >
> > > > > Tested on x86_64-pc-linux-gnu, aarch64-unknown-linux-gnu,
> > > > > s390x-ibm-linux-gnu, and powerpc64-unknown-linux-gnu.
> > > > >
> > > > > libstdc++-v3/ChangeLog:
> > > > >
> > > > >* acinclude.m4 (libtool_VERSION): Bump to 6:29:0.
> > > > >* config/abi/pre/gnu.ver: Add new exports.
> > > > >* configure: Regenerate.
> > > > >* include/std/charconv (to_chars): Declare the floating-point
> > > > >overloads for float, double and long double.
> > > > >* src/c++17/Makefile.am (sources): Add floating_to_chars.cc.
> > > > >* src/c++17/Makefile.in: Regenerate.
> > > > >* src/c++17/floating_to_chars.cc: New file.
> > > > >* testsuite/20_util/to_chars/long_double.cc: New test.
> > > > >* testsuite/util/testsuite_abi.cc: Add new symbol version.
> > > >
> > > > Here is v2 of this patch, which fixes a build failure on i386 due to
> > > > __int128 being unavailable, by refactoring the long double binary
> > format
> > > > selection to avoid referring to __int128 when it doesn't exist.  The
> > > > patch also makes the hex formatting for 80-bit long double use
> > uint64_t
> > > > instead of __int128 since the mantissa has exactly 64 bits in this
> > case.
> > >
> > > Here's v3 which just makes some minor stylistic adjustments, and most
> > > notably replaces the use of _GLIBCXX_DEBUG with _GLIBCXX_ASSERTIONS
> > > since we just want to enable __glibcxx_assert and not all of debug mode.
> >
> > Here's v4, which should now correctly support using  with
> > -mlong-double-64 on targets with a large default long double type.
> > This is done by defining the long double to_chars overloads as inline
> > wrappers around the double overloads within  whenever
> > __DBL_MANT_DIG__ equals __LDBL_MANT_DIG__.
>
> >
> > -- >8 --
> >
> > Subject: [PATCH 3/4] libstdc++: Add floating-point std::to_chars
> >  implementation
> >
> > This implements the floating-point std::to_chars overloads for float,
> > double and long double.  We use the Ryu library to compute the shortest
> > round-trippable fixed and scientific forms of a number for float, double
> > and long double.  We also use Ryu for performing explicit-precision
> > fixed and scientific formatting of float and double. For
> > explicit-precision formatting of long double we fall back to using
> > printf.  Hexadecimal formatting for float, double and long double is
> > implemented from scratch.
> >
> > The supported long double binary formats are binary64, binary80 (x86
> > 80-bit extended precision), binary128 and ibm128.
> >
> > Much of the complexity of the implementation is in computing the exact
> > output length before handing it off to Ryu (which doesn't do bounds
> > checking).  In some cases it's hard to compute the output length
> > beforehand, so in these cases we instead compute an upper bound on the
> > output length and use a sufficiently-sized intermediate buffer if
> > necessary.
> >
> > Another source of complexity is in the general-with-precision formatting
>

Re: [PATCH] middle-end: Fold popcount(x&4) to (x>>2)&1 and friends.

2020-07-20 Thread Florian Weimer via Gcc-patches
* Richard Biener via Gcc-patches:

> Given you know the constant bit position of the possibly nonzero
> bit you can elide the conversion to unsigned for all but the case
> of a possibly negative input (IIRC GCC doesn't yet take advantage
> of negative right shift undefinedness - but maybe sanitizers complain).

It's not undefined: “Signed '>>' acts on negative numbers by sign
extension.”

Thanks,
Florian



Re: pragma-eof.c

2020-07-20 Thread Nathan Sidwell

On 7/18/20 5:11 PM, Jakub Jelinek wrote:

On Sat, Jul 18, 2020 at 05:04:56PM -0400, David Edelsohn via Gcc-patches wrote:

H-P,

After your patch to the testsuite, the cpp/pragma-eof.c testcase is
failing on all targets.  Would you please investigate and fix?


That is because the dg-error directive had line number of the last line
in the file (without EOL) and the added line changed that.

I've committed following, tested on x86_64-linux, committed to trunk as
obvious.
thanks jakub!



--
Nathan Sidwell


Re: [PATCH] jit: Fix random truncation of testsuite output

2020-07-20 Thread David Malcolm via Gcc-patches
On Mon, 2020-07-20 at 12:10 +0100, Alex Coplan wrote:
> Hello,
> 
> This patch fixes a bug in jit.exp which causes the DejaGnu output of
> the
> libgccjit testsuite to be nondeterministically truncated. This bug
> was
> copied from DejaGnu's own implementation of the host_execute
> function.
> See the upstream bug report [0] where the maintainers point out that
> the
> regex patterns in host_execute should (but don't currently)
> explicitly
> match newlines to avoid relying on DejaGnu not reading more than one
> line of the output (which is not guaranteed).
> 
> To reproduce the bug, run:
> 
> $ make check-jit RUNTESTFLAGS="jit.exp=test-arith-overflow.c"
> $ grep -v iteration testsuite/jit/jit.sum
> 
> and you should see some lines that have been truncated (I see the
> word
> iteration partially or fully truncated). Alternatively, simply run
> the
> testsuite twice (saving a copy of testsuite/jit/jit.sum from the
> first
> run) and diff the two jit.sum files to observe the random truncations
> to
> the output.
> 
> This patch should make it easier to test jit patches in the future,
> since it makes it possible to reliably compare the output of two
> jit.sum
> files (as with the other tests in GCC).
> 
> Testing:
>  * Ran the testsuite before and after the patch, observing that the
> only
>differences in jit.sum were in test-threads.c (nondeterministic
> test)
>and where the truncated output from the first run was no longer
>truncated.
>  * Ran the testsuite twice after the patch, observing that the only
>differences in jit.sum between the two runs were in test-
> threads.c.
> 
> OK for master?
> 
> Thanks,
> Alex
> 
> ---
> 
> 2020-07-20  Alex Coplan  
> 
> gcc/testsuite/ChangeLog:
> 
>   * jit.dg/jit.exp (fixed_host_execute): Fix regex patterns to
>   always explicitly match newlines.
> 
> 
> [0] : https://debbugs.gnu.org/cgi/bugreport.cgi?bug=42399
`
Thanks for chasing this up. This looks a lot like the issues being
tracked in PR jit/69435, so please add that to the ChangeLog when
committing.

OK for master.
Dave



Re: [PATCH 3/4] libstdc++: Add floating-point std::to_chars implementation

2020-07-20 Thread Patrick Palka via Gcc-patches
On Mon, 20 Jul 2020, Jonathan Wakely wrote:

> On 20/07/20 08:53 -0400, Patrick Palka via Libstdc++ wrote:
> > On Mon, 20 Jul 2020, Jonathan Wakely wrote:
> > 
> > > On 19/07/20 23:37 -0400, Patrick Palka via Libstdc++ wrote:
> > > > On Fri, 17 Jul 2020, Patrick Palka wrote:
> > > >
> > > > > On Fri, 17 Jul 2020, Patrick Palka wrote:
> > > > >
> > > > > > On Wed, 15 Jul 2020, Patrick Palka wrote:
> > > > > >
> > > > > > > On Tue, 14 Jul 2020, Patrick Palka wrote:
> > > > > > >
> > > > > > > > This implements the floating-point std::to_chars overloads for
> > > > > float,
> > > > > > > > double and long double.  We use the Ryu library to compute the
> > > > > shortest
> > > > > > > > round-trippable fixed and scientific forms of a number for
> > > float,
> > > > > double
> > > > > > > > and long double.  We also use Ryu for performing fixed and
> > > > > scientific
> > > > > > > > formatting of float and double. For formatting long double with
> > > an
> > > > > > > > explicit precision argument we use a printf fallback.
> > > Hexadecimal
> > > > > > > > formatting for float, double and long double is implemented from
> > > > > > > > scratch.
> > > > > > > >
> > > > > > > > The supported long double binary formats are float64 (same as
> > > > > double),
> > > > > > > > float80 (x86 extended precision), float128 and ibm128.
> > > > > > > >
> > > > > > > > Much of the complexity of the implementation is in computing the
> > > > > exact
> > > > > > > > output length before handing it off to Ryu (which doesn't do
> > > bounds
> > > > > > > > checking).  In some cases it's hard to compute the output length
> > > > > before
> > > > > > > > the fact, so in these cases we instead compute an upper bound on
> > > the
> > > > > > > > output length and use a sufficiently-sized intermediate buffer
> > > (if
> > > > > the
> > > > > > > > output range is smaller than the upper bound).
> > > > > > > >
> > > > > > > > Another source of complexity is in the general-with-precision
> > > > > formatting
> > > > > > > > mode, where we need to do zero-trimming of the string returned
> > > by
> > > > > Ryu, and
> > > > > > > > where we also take care to avoid having to format the string a
> > > > > second
> > > > > > > > time when the general formatting mode resolves to fixed.
> > > > > > > >
> > > > > > > > Tested on x86_64-pc-linux-gnu, aarch64-unknown-linux-gnu,
> > > > > > > > s390x-ibm-linux-gnu, and powerpc64-unknown-linux-gnu.
> > > > > > > >
> > > > > > > > libstdc++-v3/ChangeLog:
> > > > > > > >
> > > > > > > > * acinclude.m4 (libtool_VERSION): Bump to 6:29:0.
> > > > > > > > * config/abi/pre/gnu.ver: Add new exports.
> > > > > > > > * configure: Regenerate.
> > > > > > > > * include/std/charconv (to_chars): Declare the 
> > > > > > > > floating-point
> > > > > > > > overloads for float, double and long double.
> > > > > > > > * src/c++17/Makefile.am (sources): Add 
> > > > > > > > floating_to_chars.cc.
> > > > > > > > * src/c++17/Makefile.in: Regenerate.
> > > > > > > > * src/c++17/floating_to_chars.cc: New file.
> > > > > > > > * testsuite/20_util/to_chars/long_double.cc: New test.
> > > > > > > > * testsuite/util/testsuite_abi.cc: Add new symbol 
> > > > > > > > version.
> > > > > > >
> > > > > > > Here is v2 of this patch, which fixes a build failure on i386 due
> > > to
> > > > > > > __int128 being unavailable, by refactoring the long double binary
> > > > > format
> > > > > > > selection to avoid referring to __int128 when it doesn't exist.
> > > The
> > > > > > > patch also makes the hex formatting for 80-bit long double use
> > > > > uint64_t
> > > > > > > instead of __int128 since the mantissa has exactly 64 bits in this
> > > > > case.
> > > > > >
> > > > > > Here's v3 which just makes some minor stylistic adjustments, and
> > > most
> > > > > > notably replaces the use of _GLIBCXX_DEBUG with _GLIBCXX_ASSERTIONS
> > > > > > since we just want to enable __glibcxx_assert and not all of debug
> > > mode.
> > > > >
> > > > > Here's v4, which should now correctly support using  with
> > > > > -mlong-double-64 on targets with a large default long double type.
> > > > > This is done by defining the long double to_chars overloads as inline
> > > > > wrappers around the double overloads within  whenever
> > > > > __DBL_MANT_DIG__ equals __LDBL_MANT_DIG__.
> > > >
> > > > >
> > > > > -- >8 --
> > > > >
> > > > > Subject: [PATCH 3/4] libstdc++: Add floating-point std::to_chars
> > > > >  implementation
> > > > >
> > > > > This implements the floating-point std::to_chars overloads for float,
> > > > > double and long double.  We use the Ryu library to compute the
> > > shortest
> > > > > round-trippable fixed and scientific forms of a number for float,
> > > double
> > > > > and long double.  We also use Ryu for performing explicit-precision
> > > > > fixed and scientific formatting of float and double. For
> > > > > explicit-precision formattin

Re: [gcc r11-2209] testsuite: fix goacc/finalize-1.f "original" regex for 32 bits.

2020-07-20 Thread Thomas Schwinge
Hi David, Jakub, Tobias, gfortran/OMP developers!

This is about how an OpenACC (but also OpenMP 'target', I suppose) data
clause for Fortran 'allocatable' with array descriptor appears in the
'original' dump.


On 2020-07-18T16:02:12+, David Edelsohn  wrote:
> https://gcc.gnu.org/g:60c1baebbaa62eb588ec4ab263de3b88283fdbee
>
> commit r11-2209-g60c1baebbaa62eb588ec4ab263de3b88283fdbee
> Author: David Edelsohn 
> Date:   Fri Jul 17 19:38:35 2020 -0400
>
> testsuite: fix goacc/finalize-1.f "original" regex for 32 bits.

(No patch submission email for that one?)


> The "bias" portion of the regex for "original" expects
>
> bias: (integer(kind=) parm.0.data - (integer(kind=)) del_f_p.data
>
> (or cpo_f_p.data)
>
> on 32 bit platforms, the dump file can show (signed int) instead of
> (integer(kind=8)... .

Hmm, interesting.  What system/configuration is that on?  And, I'm not
yet understanding how that'd apply to 32-bit systems specifically?

One step back, gfortran/OMP developers (Jakiub, Tobias?): is this
expected/alright?  Isn't there sometime generally wrong with how the
'bias' gets calculated, or am I now confused?  Given the testcase's:

INTEGER (1), DIMENSION (:), ALLOCATABLE :: del_f_p'

... etc., why is the 'bias' calculation for OpenACC
'delete(del_f_p(2:5))' displaying/calculated as an non-pointer integer
type:

map(alloc:[...] [pointer assign, bias: (integer(kind=4)) parm.0.data - 
(integer(kind=4)) del_f_p.data])

..., or '(signed int) del_f_p.data' etc. as David ran into?  Shouldn't
that refer to 'del_f_p.data' etc. via a pointer type, as've got in:

map(alloc:(integer(kind=1)[0:] * restrict) del_f_p.data

..., for example?


Grüße
 Thomas


> This patch adjusts the regex to allow any content
> containing the word int between the parentheses.
>
> 2020-07-18  David Edelsohn  
>
> gcc/testsuite/ChangeLog
>
> * gfortran.dg/goacc/finalize-1.f: Adjust regex for 32 bits.
>
> Diff:
> ---
>  gcc/testsuite/gfortran.dg/goacc/finalize-1.f | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gfortran.dg/goacc/finalize-1.f 
> b/gcc/testsuite/gfortran.dg/goacc/finalize-1.f
> index 266ead35192..a7788580819 100644
> --- a/gcc/testsuite/gfortran.dg/goacc/finalize-1.f
> +++ b/gcc/testsuite/gfortran.dg/goacc/finalize-1.f
> @@ -20,7 +20,7 @@
>  ! { dg-final { scan-tree-dump-times "(?n)#pragma omp target 
> oacc_enter_exit_data map\\(delete:del_f \\\[len: \[0-9\]+\\\]\\) finalize$" 1 
> "gimple" } }
>
>  !$ACC EXIT DATA FINALIZE DELETE (del_f_p(2:5))
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc exit data 
> map\\(release:\\*\\(c_char \\*\\) parm\\.0\\.data \\\[len: \[^\\\]\]+\\\]\\) 
> map\\(to:del_f_p \\\[pointer set, len: \[0-9\]+\\\]\\) 
> map\\(alloc:\\(integer\\(kind=1\\)\\\[0:\\\] \\* restrict\\) del_f_p\\.data 
> \\\[pointer assign, bias: \\(integer\\(kind=.\\)\\) parm\\.0\\.data - 
> \\(integer\\(kind=.\\)\\) del_f_p\\.data\\\]\\) finalize;$" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc exit data 
> map\\(release:\\*\\(c_char \\*\\) parm\\.0\\.data \\\[len: \[^\\\]\]+\\\]\\) 
> map\\(to:del_f_p \\\[pointer set, len: \[0-9\]+\\\]\\) 
> map\\(alloc:\\(integer\\(kind=1\\)\\\[0:\\\] \\* restrict\\) del_f_p\\.data 
> \\\[pointer assign, bias: \\(.*int.*\\) parm\\.0\\.data - \\(.*int.*\\) 
> del_f_p\\.data\\\]\\) finalize;$" 1 "original" } }
>  ! { dg-final { scan-tree-dump-times "(?n)#pragma omp target 
> oacc_enter_exit_data map\\(delete:MEM\\\[\\(c_char \\*\\)\[^\\\]\]+\\\] 
> \\\[len: \[^\\\]\]+\\\]\\) map\\(to:del_f_p \\\[pointer set, len: 
> \[0-9\]+\\\]\\) map\\(alloc:del_f_p\\.data \\\[pointer assign, bias: 
> \[^\\\]\]+\\\]\\) finalize$" 1 "gimple" } }
>
>  !$ACC EXIT DATA COPYOUT (cpo_r)
> @@ -32,6 +32,6 @@
>  ! { dg-final { scan-tree-dump-times "(?n)#pragma omp target 
> oacc_enter_exit_data map\\(force_from:cpo_f \\\[len: \[0-9\]+\\\]\\) 
> finalize$" 1 "gimple" } }
>
>  !$ACC EXIT DATA COPYOUT (cpo_f_p(4:10)) FINALIZE
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc exit data 
> map\\(from:\\*\\(c_char \\*\\) parm\\.1\\.data \\\[len: \[^\\\]\]+\\\]\\) 
> map\\(to:cpo_f_p \\\[pointer set, len: \[0-9\]+\\\]\\) 
> map\\(alloc:\\(integer\\(kind=1\\)\\\[0:\\\] \\* restrict\\) cpo_f_p\\.data 
> \\\[pointer assign, bias: \\(integer\\(kind=.\\)\\) parm\\.1\\.data - 
> \\(integer\\(kind=.\\)\\) cpo_f_p\\.data\\\]\\) finalize;$" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc exit data 
> map\\(from:\\*\\(c_char \\*\\) parm\\.1\\.data \\\[len: \[^\\\]\]+\\\]\\) 
> map\\(to:cpo_f_p \\\[pointer set, len: \[0-9\]+\\\]\\) 
> map\\(alloc:\\(integer\\(kind=1\\)\\\[0:\\\] \\* restrict\\) cpo_f_p\\.data 
> \\\[pointer assign, bias: \\(.*int.*\\) parm\\.1\\.data - \\(.*int.*\\) 
> cpo_f_p\\.data\\\]\\) finalize;$" 1 "original" } }
>  ! { dg-final { scan-tree-dump-times "(?n)#pragma omp target 
> oacc_enter_exit_data map\\(force_from:MEM\\\[\\(c_cha

Re: [PATCH] rs6000: Define movsf_from_si2 to extract high part SF element from DImode[PR89310]

2020-07-20 Thread Segher Boessenkool
On Mon, Jul 13, 2020 at 02:30:28PM +0800, luoxhu wrote:
> For extracting high part element from DImode register like:
> 
> {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
> 
> split it before reload with "and mask" to avoid generating shift right
> 32 bit then shift left 32 bit.  This pattern also exists in PR42475 and
> PR67741, etc.
> 
> srdi 3,3,32
> sldi 9,3,32
> mtvsrd 1,9
> xscvspdpn 1,1
> 
> =>
> 
> rldicr 3,3,0,31
> mtvsrd 1,3
> xscvspdpn 1,1

>   * config/rs6000/rs6000.md (movsf_from_si2): New
>   define_insn_and_split.

(That fits on one line).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr89310.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */

> +/* { dg-final { scan-assembler-not {\msrdi\M} } } */
> +/* { dg-final { scan-assembler-not {\msldi\M} } } */
> +/* { dg-final { scan-assembler-times {\mrldicr\M} 1 } } */

I'm not sure that works on older cpus?  Please test there, and add
-mdejagnu-cpu=power8 to the dg-options if needed.  Also test on BE please.

Okay for trunk with those last details taking care of.  Thank you!


Segher


Re: [PATCH PR95696] regrename creates overlapping register allocations for vliw

2020-07-20 Thread Richard Sandiford
Hi,

Zhongyunde  writes:
> Hi,
>
> In most target, it is limited to issue two insns with change the same 
> register. So a register is not realy unused if there is another insn, which 
> set the register in the save VLIW.
>
> For example, The insn 73 start with insn:TI, so it will be issued together 
> with others insns until a new insn start with insn:TI, here is insn 243.
>
> The regrename pass known the mode V2VF in insn 73 need two successive 
> registers, i.e. v2 and v3, here is dump snippet before the regrename:
>
>
>
> ==
>
> (insn:TI 73 76 71 4 (set (reg/v:V2VF 37 v2 [orig:180 _62 ] [180])
>
> (unspec:V2VF [
>
> (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])
>
> (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])
>
> ] UNSPEC_HFSQMAG_32X32)) "../test_modify.c":57 710 {hfsqmag_v2vf}
>
>  (expr_list:REG_DEAD (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])
>
> (expr_list:REG_UNUSED (reg:VHF 38 v3)
>
> (expr_list:REG_STAGE (const_int 2 [0x2])
>
> (expr_list:REG_CYCLE (const_int 2 [0x2])
>
> (expr_list:REG_UNITS (const_int 256 [0x100])
>
> (nil)))
>
>
>
> (insn 71 73 243 4 (set (reg:VHF 43 v8 [orig:265 MEM[(const vfloat32x16 
> *)Src_base_134] ] [265])
>
> (mem:VHF (reg/v/f:DI 13 a13 [orig:207 Src_base ] [207]) [1 MEM[(const 
> vfloat32x16 *)Src_base_134]+0 S64 A512])) "../test_modify.c":56 450 
> {movvhf_internal}
>
>  (expr_list:REG_STAGE (const_int 1 [0x1])
>
> (expr_list:REG_CYCLE (const_int 2 [0x2])
>
> (nil
>
>
>
> (insn:TI 243 …
>
>
>
> Then, in the regrename, the insn 71 will be transformed as following code 
> with register v3, and there is an conflict between insn 73 and insn 71 (as 
> both of them set the v3 register).
>
>
>
> Register v2 (2): 73 [SVEC_REGS]
>
> Register v8 (1): 71 [VEC_ALL_REGS]
>
>
>
> 
>
>
>
> (insn 71 73 243 4 (set (reg:VHF 38 v3 [orig:265 MEM[(const vfloat32x16 
> *)Src_base_134] ] [265])
>
> (mem:VHF (reg/v/f:DI 13 a13 [orig:207 Src_base ] [207]) [1 MEM[(const 
> vfloat32x16 *)Src_base_134]+0 S64 A512])) "../test_modify.c":56 450 
> {movvhf_internal}
>
>  (expr_list:REG_STAGE (const_int 1 [0x1])
>
> (expr_list:REG_CYCLE (const_int 2 [0x2])

Do you need this for correctness, or is it “just” an optimisation?

The reason for asking is that regrename runs quite a long time after
the first scheduling pass, with things like register allocation
happening in between.  The :TI markers on the instructions are
therefore going to be very stale by the time that regrename runs.

If it's just a heuristic then it might be OK to use them anyway,
but it would be worth having a comment to say what's going on.

One alternative would be to run the DFA over the instructions
to recompute the VLIW bundles, but that isn't simple.  It also isn't
necessarily a better heuristic, since the second scheduling pass is
likely to change things anyway.

There's also the problem that :TI markers are used on non-VLIW
targets too, whereas the problem is really specific to VLIW targets.

Unfortunately, I can't think of any good counter-suggestions
at the moment…

Thanks,
Richard

>
> diff --git a/gcc/regrename.c b/gcc/regrename.c
> index c38173a77..e54794413 100644
> --- a/gcc/regrename.c
> +++ b/gcc/regrename.c
> @@ -1614,12 +1614,26 @@ record_out_operands (rtx_insn *insn, bool 
> earlyclobber, insn_rr_info *insn_info)
>cur_operand = NULL;
>  }
>  
> +/* Get the first real insn of next vliw in current BB.  */
> +static rtx_insn *
> +get_next_vliw_first_insn (rtx_insn *cur_insn, basic_block bb)
> +{
> +  rtx_insn *insn = next_real_insn (cur_insn);
> +
> +  for (; insn && insn != BB_END (bb); insn = next_real_insn (insn))
> +if (GET_MODE (insn) == TImode)
> +  return insn;
> +
> +  return cur_insn;
> +}
> +
>  /* Build def/use chain.  */
>  
>  static bool
>  build_def_use (basic_block bb)
>  {
>rtx_insn *insn;
> +  rtx_insn *vliw_start_insn = NULL;
>unsigned HOST_WIDE_INT untracked_operands;
>  
>fail_current_block = false;
> @@ -1663,6 +1677,9 @@ build_def_use (basic_block bb)
>to be marked unrenamable or even cause us to abort the entire
>basic block.  */
>  
> +   if (GET_MODE (insn) == TImode)
> + vliw_start_insn = insn;
> +
> extract_constrain_insn (insn);
> preprocess_constraints (insn);
> const operand_alternative *op_alt = which_op_alt ();
> @@ -1858,17 +1875,26 @@ build_def_use (basic_block bb)
> scan_rtx (insn, &XEXP (note, 0), ALL_REGS, mark_access,
>   OP_INOUT);
>  
> -   /* Step 7: Close chains for registers that were never
> -  really used here.  */
> -   for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
> - if (REG_NOTE_KIND (note) == REG_UNUSED)
> -   {
> -

Re: gcc.dg/Wno-frame-address.c: Skip for cris and mmix.

2020-07-20 Thread Mike Stump via Gcc-patches
On Jul 18, 2020, at 8:19 PM, Hans-Peter Nilsson  wrote:
> 
> Long-standing FAIL remedied; committed.  Maybe better to list
> the targets that *do* support arbitrary frame access?

Yes, it would be better if test cases that fall way to low in portability are 
listed instead against what platforms the test is expected to work on, instead 
of having everyone else join a long list in the exceptions clause.

Another way to handle this is to expose the ability to do such a thing in the 
first place and expose that via a .h or a preprocessing condition, then the 
test itself will only test on machines that claim it should work in the first 
place.

In this case, and with this feature specifically, it isn't portable and 
reliable enough.

> gcc/testsuite:
>   * gcc.dg/Wno-frame-address.c: Skip for cris and mmix.
> 
> --- gcc/gcc/testsuite/gcc.dg/Wno-frame-address.c.orig Mon Jan 13 22:30:47 2020
> +++ gcc/gcc/testsuite/gcc.dg/Wno-frame-address.c  Sun Jul 19 05:05:49 2020
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-skip-if "Cannot access arbitrary stack frames" { arm*-*-* amdgpu-*-* 
> avr-*-* hppa*-*-* ia64-*-* visium-*-* csky-*-* msp430-*-* } } */
> +/* { dg-skip-if "Cannot access arbitrary stack frames" { arm*-*-* amdgpu-*-* 
> avr-*-* hppa*-*-* ia64-*-* visium-*-* csky-*-* msp430-*-* cris-*-* mmix-*-* } 
> } */
> /* { dg-options "-Werror" } */
> /* { dg-additional-options "-mbackchain" { target { s390*-*-* } } } */


[PATCH 1/4] testsuite: Filter unaligned pointer value warning

2020-07-20 Thread Dimitar Dimitrov
Targets which pack structures by default will not get warnings about
unaligned access to structure members.

gcc/testsuite/ChangeLog:

* c-c++-common/Waddress-of-packed-member-1.c: Filter dg-warning
for targets who pack by default.
* c-c++-common/Waddress-of-packed-member-2.c: Ditto.
* c-c++-common/pr51628-13.c: Ditto.
* c-c++-common/pr51628-15.c: Ditto.
* c-c++-common/pr51628-16.c: Ditto.
* c-c++-common/pr51628-26.c: Ditto.
* c-c++-common/pr51628-27.c: Ditto.
* c-c++-common/pr51628-28.c: Ditto.
* c-c++-common/pr51628-29.c: Ditto.
* c-c++-common/pr51628-3.c: Ditto.
* c-c++-common/pr51628-30.c: Ditto.
* c-c++-common/pr51628-31.c: Ditto.
* c-c++-common/pr51628-32.c: Ditto.
* c-c++-common/pr51628-33.c: Ditto.
* c-c++-common/pr51628-35.c: Ditto.
* c-c++-common/pr51628-4.c: Ditto.
* c-c++-common/pr51628-5.c: Ditto.
* c-c++-common/pr51628-6.c: Ditto.
* c-c++-common/pr51628-8.c: Ditto.
* c-c++-common/pr51628-9.c: Ditto.
* c-c++-common/pr88664-2.c: Ditto.
* gcc.dg/pr51628-17.c: Ditto.
* gcc.dg/pr51628-19.c: Ditto.
* gcc.dg/pr51628-20.c: Ditto.
* gcc.dg/pr51628-21.c: Ditto.
* gcc.dg/pr51628-22.c: Ditto.
* gcc.dg/pr51628-24.c: Ditto.
* gcc.dg/pr51628-25.c: Ditto.
* gcc.dg/pr51628-34.c: Ditto.
* gcc.dg/pr88928.c: Ditto.

Signed-off-by: Dimitar Dimitrov 
---
 .../Waddress-of-packed-member-1.c | 48 +--
 .../Waddress-of-packed-member-2.c | 36 +++---
 gcc/testsuite/c-c++-common/pr51628-13.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-15.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-16.c   |  4 +-
 gcc/testsuite/c-c++-common/pr51628-26.c   |  6 +--
 gcc/testsuite/c-c++-common/pr51628-27.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-28.c   | 10 ++--
 gcc/testsuite/c-c++-common/pr51628-29.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-3.c| 12 ++---
 gcc/testsuite/c-c++-common/pr51628-30.c   |  4 +-
 gcc/testsuite/c-c++-common/pr51628-31.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-32.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-33.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-35.c   |  4 +-
 gcc/testsuite/c-c++-common/pr51628-4.c| 12 ++---
 gcc/testsuite/c-c++-common/pr51628-5.c| 12 ++---
 gcc/testsuite/c-c++-common/pr51628-6.c| 12 ++---
 gcc/testsuite/c-c++-common/pr51628-8.c| 14 +++---
 gcc/testsuite/c-c++-common/pr51628-9.c| 14 +++---
 gcc/testsuite/c-c++-common/pr88664-2.c|  4 +-
 gcc/testsuite/gcc.dg/pr51628-17.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-19.c |  6 +--
 gcc/testsuite/gcc.dg/pr51628-20.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-21.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-22.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-24.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-25.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-34.c |  8 ++--
 gcc/testsuite/gcc.dg/pr88928.c|  2 +-
 30 files changed, 117 insertions(+), 117 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/Waddress-of-packed-member-1.c 
b/gcc/testsuite/c-c++-common/Waddress-of-packed-member-1.c
index afad603dfa2..95a376664da 100644
--- a/gcc/testsuite/c-c++-common/Waddress-of-packed-member-1.c
+++ b/gcc/testsuite/c-c++-common/Waddress-of-packed-member-1.c
@@ -52,28 +52,28 @@ void foo (void)
   f0 = *&__real__ t0.f;/* { dg-bogus "may result in an unaligned 
pointer value" } */
   f0 = *&__imag__ t0.f;/* { dg-bogus "may result in an unaligned 
pointer value" } */
   i1 = (&t0.c, (int*) 0);  /* { dg-bogus "may result in an unaligned 
pointer value" } */
-  t2 = (struct t**) t10; /* { dg-warning "may result in an unaligned 
pointer value" } */
-  t2 = (struct t**) t100;/* { dg-warning "may result in an unaligned 
pointer value" } */
-  t2 = (struct t**) t1;  /* { dg-warning "may result in an unaligned 
pointer value" } */
-  t2 = (struct t**) bar();   /* { dg-warning "may result in an unaligned 
pointer value" } */
-  t2 = (struct t**) baz();   /* { dg-warning "may result in an unaligned 
pointer value" } */
-  t2 = (struct t**) bazz();  /* { dg-warning "may result in an unaligned 
pointer value" } */
-  i1 = &t0.b;/* { dg-warning "may result in an unaligned 
pointer value" } */
-  i1 = &t1->b;   /* { dg-warning "may result in an unaligned 
pointer value" } */
-  i1 = &t10[0].b;/* { dg-warning "may result in an unaligned 
pointer value" } */
-  i1 = t0.d; /* { dg-warning "may result in an unaligned 
pointer value" } */
-  i1 = t1->d;/* { dg-warning "may result in an unaligned 
pointer value" } */
-  i1 = t10[0].d; /* { dg-warning "may result in an

[PATCH 0/4] testsuite: Add markers for default_packed targets

2020-07-20 Thread Dimitar Dimitrov
Hi,

I'm sending a few minor testsuite updates to add markers for targets using 
packed structures by default. From those targets, I tested AVR and PRU. I don't 
have setup to test cris and m32c.

I also tested x86_64 to ensure there are neither dropped nor newly failing 
tests.

Regards,
Dimitar

Dimitar Dimitrov (4):
  testsuite: Filter unaligned pointer value warning
  testsuite: Add expected warning for packed attribute
  testsuite: Relax pattern to include "packed" targets
  testsuite: Add default_packed filters

 .../Waddress-of-packed-member-1.c | 48 +--
 .../Waddress-of-packed-member-2.c | 37 +++---
 gcc/testsuite/c-c++-common/Wattributes.c  |  2 +-
 gcc/testsuite/c-c++-common/attr-copy.c|  1 +
 .../c-c++-common/builtin-has-attribute-4.c|  2 +-
 gcc/testsuite/c-c++-common/pr51628-13.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-15.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-16.c   |  4 +-
 gcc/testsuite/c-c++-common/pr51628-26.c   |  6 +--
 gcc/testsuite/c-c++-common/pr51628-27.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-28.c   | 10 ++--
 gcc/testsuite/c-c++-common/pr51628-29.c   |  3 +-
 gcc/testsuite/c-c++-common/pr51628-3.c| 12 ++---
 gcc/testsuite/c-c++-common/pr51628-30.c   |  5 +-
 gcc/testsuite/c-c++-common/pr51628-31.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-32.c   |  3 +-
 gcc/testsuite/c-c++-common/pr51628-33.c   |  2 +-
 gcc/testsuite/c-c++-common/pr51628-35.c   |  4 +-
 gcc/testsuite/c-c++-common/pr51628-4.c| 12 ++---
 gcc/testsuite/c-c++-common/pr51628-5.c| 12 ++---
 gcc/testsuite/c-c++-common/pr51628-6.c| 12 ++---
 gcc/testsuite/c-c++-common/pr51628-8.c| 14 +++---
 gcc/testsuite/c-c++-common/pr51628-9.c| 14 +++---
 gcc/testsuite/c-c++-common/pr88664-2.c|  4 +-
 gcc/testsuite/gcc.dg/Wattributes-6.c  |  2 +-
 gcc/testsuite/gcc.dg/attr-copy-4.c|  4 +-
 gcc/testsuite/gcc.dg/attr-copy-8.c| 25 ++
 gcc/testsuite/gcc.dg/c11-align-9.c|  4 +-
 gcc/testsuite/gcc.dg/pr51628-17.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-19.c |  6 +--
 gcc/testsuite/gcc.dg/pr51628-20.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-21.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-22.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-24.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-25.c |  2 +-
 gcc/testsuite/gcc.dg/pr51628-34.c |  8 ++--
 gcc/testsuite/gcc.dg/pr53037-1.c  |  4 +-
 gcc/testsuite/gcc.dg/pr88928.c|  2 +-
 38 files changed, 157 insertions(+), 125 deletions(-)

-- 
2.20.1



[PATCH 3/4] testsuite: Relax pattern to include "packed" targets

2020-07-20 Thread Dimitar Dimitrov
The actual warning message depends on the default alignment of the
target. With this update the test correctly passes on AVR and PRU
targets.

gcc/testsuite/ChangeLog:

* gcc.dg/pr53037-1.c: Relax warning pattern.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/gcc.dg/pr53037-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr53037-1.c b/gcc/testsuite/gcc.dg/pr53037-1.c
index 3ea5ae6a34e..b4e9049c746 100644
--- a/gcc/testsuite/gcc.dg/pr53037-1.c
+++ b/gcc/testsuite/gcc.dg/pr53037-1.c
@@ -40,7 +40,7 @@ struct foo5
 {
   int i1;
   int x __attribute__((warn_if_not_aligned(16))); /* { dg-warning "'x' offset 
4 in 'struct foo5' isn't aligned to 16" } */
-}; /* { dg-warning "alignment 4 of 'struct foo5' is less than 16" } */
+}; /* { dg-warning "alignment .* of 'struct foo5' is less than 16" } */
 
 struct foo6
 {
@@ -73,7 +73,7 @@ union bar3
 {
   int i1;
   int x __attribute__((warn_if_not_aligned(16))); 
-}; /* { dg-warning "alignment 4 of 'union bar3' is less than 16" } */
+}; /* { dg-warning "alignment .* of 'union bar3' is less than 16" } */
 
 union bar4
 {
-- 
2.20.1



[PATCH 4/4] testsuite: Add default_packed filters

2020-07-20 Thread Dimitar Dimitrov
Fix test cases assumptions that target has alignment constraints.

gcc/testsuite/ChangeLog:

* gcc.dg/attr-copy-4.c: Unpacked may still have alignment of 1
on targets with default_packed.
* gcc.dg/c11-align-9.c: Remove AVR target filter and replace
with default_packed filter.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/gcc.dg/attr-copy-4.c | 1 +
 gcc/testsuite/gcc.dg/c11-align-9.c | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/attr-copy-4.c 
b/gcc/testsuite/gcc.dg/attr-copy-4.c
index 796724bb950..01fae3f78d4 100644
--- a/gcc/testsuite/gcc.dg/attr-copy-4.c
+++ b/gcc/testsuite/gcc.dg/attr-copy-4.c
@@ -32,6 +32,7 @@ extern const struct PackedA packed;
 
 struct Unpacked { int i; char c; };
 Assert (__alignof (struct Unpacked) > 1);
+/* { dg-error "size of array .* is negative" "" { target default_packed } .-1 
} */
 
 /* Verify that copying the packed attribute to the declaration
of an object is ignored with a warning.  (There should be
diff --git a/gcc/testsuite/gcc.dg/c11-align-9.c 
b/gcc/testsuite/gcc.dg/c11-align-9.c
index 3c9cf55756e..6a0d4248f1b 100644
--- a/gcc/testsuite/gcc.dg/c11-align-9.c
+++ b/gcc/testsuite/gcc.dg/c11-align-9.c
@@ -2,8 +2,8 @@
are at least some alignment constraints), case of compound literals.  */
 /* { dg-do compile } */
 /* { dg-options "-std=c11 -pedantic-errors" } */
-/* { dg-skip-if "no alignment constraints" { "avr-*-*" } } */
 
 #include 
 
-max_align_t *p = &(_Alignas (_Alignof (char)) max_align_t) { 1 }; /* { 
dg-error "reduce alignment" } */
+max_align_t *p = &(_Alignas (_Alignof (char)) max_align_t) { 1 };
+/* { dg-error "reduce alignment" "" { target { ! default_packed } } .-1 } */
-- 
2.20.1



[PATCH 2/4] testsuite: Add expected warning for packed attribute

2020-07-20 Thread Dimitar Dimitrov
Targets which pack structures by default get warnings for packed structure
attributes. This is expected, so add markers in the test cases.

gcc/testsuite/ChangeLog:

* c-c++-common/Waddress-of-packed-member-2.c: Add dg-warning for
ignored attribute if target is default_packed.
* c-c++-common/Wattributes.c: Ditto.
* c-c++-common/attr-copy.c: Ditto.
* c-c++-common/builtin-has-attribute-4.c: Ditto.
* c-c++-common/pr51628-29.c: Ditto.
* c-c++-common/pr51628-30.c: Ditto.
* c-c++-common/pr51628-32.c: Ditto.
* gcc.dg/Wattributes-6.c: Ditto.
* gcc.dg/attr-copy-4.c: Ditto.
* gcc.dg/attr-copy-8.c: Ditto.

Signed-off-by: Dimitar Dimitrov 
---
 .../Waddress-of-packed-member-2.c |  1 +
 gcc/testsuite/c-c++-common/Wattributes.c  |  2 +-
 gcc/testsuite/c-c++-common/attr-copy.c|  1 +
 .../c-c++-common/builtin-has-attribute-4.c|  2 +-
 gcc/testsuite/c-c++-common/pr51628-29.c   |  1 +
 gcc/testsuite/c-c++-common/pr51628-30.c   |  1 +
 gcc/testsuite/c-c++-common/pr51628-32.c   |  1 +
 gcc/testsuite/gcc.dg/Wattributes-6.c  |  2 +-
 gcc/testsuite/gcc.dg/attr-copy-4.c|  3 ++-
 gcc/testsuite/gcc.dg/attr-copy-8.c| 25 +++
 10 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/Waddress-of-packed-member-2.c 
b/gcc/testsuite/c-c++-common/Waddress-of-packed-member-2.c
index 5dbcb89ffbc..802dd8156cb 100644
--- a/gcc/testsuite/c-c++-common/Waddress-of-packed-member-2.c
+++ b/gcc/testsuite/c-c++-common/Waddress-of-packed-member-2.c
@@ -15,6 +15,7 @@ struct s {
 struct t {
   char c;
   struct r p __attribute__((packed));
+  /* { dg-warning "attribute ignored" "" { target default_packed } .-1 } */
   struct r u;
 };
 
diff --git a/gcc/testsuite/c-c++-common/Wattributes.c 
b/gcc/testsuite/c-c++-common/Wattributes.c
index 3f176a04660..4ad90441b4d 100644
--- a/gcc/testsuite/c-c++-common/Wattributes.c
+++ b/gcc/testsuite/c-c++-common/Wattributes.c
@@ -21,7 +21,7 @@ PackedAligned { int i; };
 struct ATTR ((aligned (2)))
 AlignedMemberPacked
 {
-  int ATTR ((packed)) i;
+  int ATTR ((packed)) i; // { dg-warning "attribute ignored" "" { target 
default_packed } }
 };
 
 struct ATTR ((packed))
diff --git a/gcc/testsuite/c-c++-common/attr-copy.c 
b/gcc/testsuite/c-c++-common/attr-copy.c
index 284088a8b97..f0db0fd1a27 100644
--- a/gcc/testsuite/c-c++-common/attr-copy.c
+++ b/gcc/testsuite/c-c++-common/attr-copy.c
@@ -21,6 +21,7 @@ struct C
 {
   char c;
   ATTR (copy ((bar (), ((struct A *)(0))[0]))) int i;
+  /* { dg-warning "attribute ignored" "" { target default_packed } .-1 } */
 };
 
 /* Verify the attribute has been copied.  */
diff --git a/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c 
b/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c
index ec3127794b5..3a960aae2ff 100644
--- a/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c
+++ b/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c
@@ -130,7 +130,7 @@ struct PackedMember
   char c;
   short s;
   int i;
-  ATTR (packed) int a[2];
+  ATTR (packed) int a[2]; /* { dg-warning "attribute ignored" "" { target 
default_packed } } */
 } gpak[2];
 
 void test_packed (struct PackedMember *p)
diff --git a/gcc/testsuite/c-c++-common/pr51628-29.c 
b/gcc/testsuite/c-c++-common/pr51628-29.c
index a3e77455b6b..1ad9a7d2d9f 100644
--- a/gcc/testsuite/c-c++-common/pr51628-29.c
+++ b/gcc/testsuite/c-c++-common/pr51628-29.c
@@ -5,6 +5,7 @@
 struct A { int i; };
 struct B { struct A a; };
 struct C { struct B b __attribute__ ((packed)); };
+/* { dg-warning "attribute ignored" "" { target default_packed } .-1 } */
 
 extern struct C *p;
 
diff --git a/gcc/testsuite/c-c++-common/pr51628-30.c 
b/gcc/testsuite/c-c++-common/pr51628-30.c
index b31e73ec036..387fc71db13 100644
--- a/gcc/testsuite/c-c++-common/pr51628-30.c
+++ b/gcc/testsuite/c-c++-common/pr51628-30.c
@@ -5,6 +5,7 @@
 struct A { __complex int i; };
 struct B { struct A a; };
 struct C { struct B b __attribute__ ((packed)); };
+/* { dg-warning "attribute ignored" "" { target default_packed } .-1 } */
 
 extern struct C *p;
 
diff --git a/gcc/testsuite/c-c++-common/pr51628-32.c 
b/gcc/testsuite/c-c++-common/pr51628-32.c
index 52f5e543ab7..908c0b8cbf4 100644
--- a/gcc/testsuite/c-c++-common/pr51628-32.c
+++ b/gcc/testsuite/c-c++-common/pr51628-32.c
@@ -11,6 +11,7 @@ struct B
 {
char c;
__attribute ((packed)) struct A ar[4];
+   /* { dg-warning "attribute ignored" "" { target default_packed } .-1 } */
 };
 
 struct B b;
diff --git a/gcc/testsuite/gcc.dg/Wattributes-6.c 
b/gcc/testsuite/gcc.dg/Wattributes-6.c
index d3dd22d85b9..4ba59bf2806 100644
--- a/gcc/testsuite/gcc.dg/Wattributes-6.c
+++ b/gcc/testsuite/gcc.dg/Wattributes-6.c
@@ -21,7 +21,7 @@ PackedAligned { int i; };
 struct ATTR ((aligned (2)))
 AlignedMemberPacked
 {
-  int ATTR ((packed)) i;
+  int ATTR ((packed)) i; // { dg-warning "attribute ignored" "" { target 

Re: [PATCH 7/7 v2] rs6000/testsuite: Vector with length test cases

2020-07-20 Thread Segher Boessenkool
Hi!

On Fri, Jul 10, 2020 at 06:07:16PM +0800, Kewen.Lin wrote:
> +/* { dg-do compile { target { powerpc*-*-* } && { lp64 && 
> powerpc_p9vector_ok } } } */

Everything in gcc.targer/powerpc/ requires powerpc*-*-* automatically
(is never run on other targets).

> +/* { dg-final { scan-assembler-times {\mlxv\M|\mlxvx\M} 20 } } */

You can write {\mlxvx?\M} if you think that is better.  Each option has
its own downsides and upsides here ;-)

> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-4.c
> @@ -0,0 +1,10 @@
> +/* { dg-do run { target { powerpc64*-*-* && { lp64 && p9vector_hw } } } } */

Testing for powerpc64*-*-* is always wrong (it doesn't matter what the
*default* target is: it is usual to run the tests with RUNTESTFLAGS
{-m32,-m64} for example.

Random example from my bash history:
  make check-gcc-c RUNTESTFLAGS="--target_board=unix'{-m64,-m32}' 
powerpc.exp=volatile-mem.c"
but my usual is
  make -k -j60 check RUNTESTFLAGS="--target_board=unix'{-m64,-m32}'"

Other than that this looks fine.  Please make sure to test it on an older
machine as well (you cannot really test on a BE p9, but ideally you would
do that as well ;-) )

So, okay for trunk if all patches that are required for these tests have
been committed.  Thanks!


Segher


Re: [PATCH 1/4] testsuite: Filter unaligned pointer value warning

2020-07-20 Thread Richard Sandiford
Dimitar Dimitrov  writes:
> Targets which pack structures by default will not get warnings about
> unaligned access to structure members.
>
> gcc/testsuite/ChangeLog:
>
>   * c-c++-common/Waddress-of-packed-member-1.c: Filter dg-warning
>   for targets who pack by default.
>   * c-c++-common/Waddress-of-packed-member-2.c: Ditto.
>   * c-c++-common/pr51628-13.c: Ditto.
>   * c-c++-common/pr51628-15.c: Ditto.
>   * c-c++-common/pr51628-16.c: Ditto.
>   * c-c++-common/pr51628-26.c: Ditto.
>   * c-c++-common/pr51628-27.c: Ditto.
>   * c-c++-common/pr51628-28.c: Ditto.
>   * c-c++-common/pr51628-29.c: Ditto.
>   * c-c++-common/pr51628-3.c: Ditto.
>   * c-c++-common/pr51628-30.c: Ditto.
>   * c-c++-common/pr51628-31.c: Ditto.
>   * c-c++-common/pr51628-32.c: Ditto.
>   * c-c++-common/pr51628-33.c: Ditto.
>   * c-c++-common/pr51628-35.c: Ditto.
>   * c-c++-common/pr51628-4.c: Ditto.
>   * c-c++-common/pr51628-5.c: Ditto.
>   * c-c++-common/pr51628-6.c: Ditto.
>   * c-c++-common/pr51628-8.c: Ditto.
>   * c-c++-common/pr51628-9.c: Ditto.
>   * c-c++-common/pr88664-2.c: Ditto.
>   * gcc.dg/pr51628-17.c: Ditto.
>   * gcc.dg/pr51628-19.c: Ditto.
>   * gcc.dg/pr51628-20.c: Ditto.
>   * gcc.dg/pr51628-21.c: Ditto.
>   * gcc.dg/pr51628-22.c: Ditto.
>   * gcc.dg/pr51628-24.c: Ditto.
>   * gcc.dg/pr51628-25.c: Ditto.
>   * gcc.dg/pr51628-34.c: Ditto.
>   * gcc.dg/pr88928.c: Ditto.
>
> Signed-off-by: Dimitar Dimitrov 

OK, thanks.

For avoidance of doubt, it's not likely that people will remember
to add this target selector to new tests, so keeping the testsuite
clean will be an ongoing problem for people who test the affected
targets.  Adding missing selectors would qualify as obvious though.

Richard

> ---
>  .../Waddress-of-packed-member-1.c | 48 +--
>  .../Waddress-of-packed-member-2.c | 36 +++---
>  gcc/testsuite/c-c++-common/pr51628-13.c   |  2 +-
>  gcc/testsuite/c-c++-common/pr51628-15.c   |  2 +-
>  gcc/testsuite/c-c++-common/pr51628-16.c   |  4 +-
>  gcc/testsuite/c-c++-common/pr51628-26.c   |  6 +--
>  gcc/testsuite/c-c++-common/pr51628-27.c   |  2 +-
>  gcc/testsuite/c-c++-common/pr51628-28.c   | 10 ++--
>  gcc/testsuite/c-c++-common/pr51628-29.c   |  2 +-
>  gcc/testsuite/c-c++-common/pr51628-3.c| 12 ++---
>  gcc/testsuite/c-c++-common/pr51628-30.c   |  4 +-
>  gcc/testsuite/c-c++-common/pr51628-31.c   |  2 +-
>  gcc/testsuite/c-c++-common/pr51628-32.c   |  2 +-
>  gcc/testsuite/c-c++-common/pr51628-33.c   |  2 +-
>  gcc/testsuite/c-c++-common/pr51628-35.c   |  4 +-
>  gcc/testsuite/c-c++-common/pr51628-4.c| 12 ++---
>  gcc/testsuite/c-c++-common/pr51628-5.c| 12 ++---
>  gcc/testsuite/c-c++-common/pr51628-6.c| 12 ++---
>  gcc/testsuite/c-c++-common/pr51628-8.c| 14 +++---
>  gcc/testsuite/c-c++-common/pr51628-9.c| 14 +++---
>  gcc/testsuite/c-c++-common/pr88664-2.c|  4 +-
>  gcc/testsuite/gcc.dg/pr51628-17.c |  2 +-
>  gcc/testsuite/gcc.dg/pr51628-19.c |  6 +--
>  gcc/testsuite/gcc.dg/pr51628-20.c |  2 +-
>  gcc/testsuite/gcc.dg/pr51628-21.c |  2 +-
>  gcc/testsuite/gcc.dg/pr51628-22.c |  2 +-
>  gcc/testsuite/gcc.dg/pr51628-24.c |  2 +-
>  gcc/testsuite/gcc.dg/pr51628-25.c |  2 +-
>  gcc/testsuite/gcc.dg/pr51628-34.c |  8 ++--
>  gcc/testsuite/gcc.dg/pr88928.c|  2 +-
>  30 files changed, 117 insertions(+), 117 deletions(-)
>
> diff --git a/gcc/testsuite/c-c++-common/Waddress-of-packed-member-1.c 
> b/gcc/testsuite/c-c++-common/Waddress-of-packed-member-1.c
> index afad603dfa2..95a376664da 100644
> --- a/gcc/testsuite/c-c++-common/Waddress-of-packed-member-1.c
> +++ b/gcc/testsuite/c-c++-common/Waddress-of-packed-member-1.c
> @@ -52,28 +52,28 @@ void foo (void)
>f0 = *&__real__ t0.f;/* { dg-bogus "may result in an unaligned 
> pointer value" } */
>f0 = *&__imag__ t0.f;/* { dg-bogus "may result in an unaligned 
> pointer value" } */
>i1 = (&t0.c, (int*) 0);  /* { dg-bogus "may result in an unaligned 
> pointer value" } */
> -  t2 = (struct t**) t10; /* { dg-warning "may result in an unaligned 
> pointer value" } */
> -  t2 = (struct t**) t100;/* { dg-warning "may result in an unaligned 
> pointer value" } */
> -  t2 = (struct t**) t1;  /* { dg-warning "may result in an unaligned 
> pointer value" } */
> -  t2 = (struct t**) bar();   /* { dg-warning "may result in an unaligned 
> pointer value" } */
> -  t2 = (struct t**) baz();   /* { dg-warning "may result in an unaligned 
> pointer value" } */
> -  t2 = (struct t**) bazz();  /* { dg-warning "may result in an unaligned 
> pointer value" } */
> -  i1 = &t0.b;/* { dg-warning "may result in an unaligned 
> pointer value" } *

Re: [PATCH 2/4] testsuite: Add expected warning for packed attribute

2020-07-20 Thread Richard Sandiford
Dimitar Dimitrov  writes:
> Targets which pack structures by default get warnings for packed structure
> attributes. This is expected, so add markers in the test cases.
>
> gcc/testsuite/ChangeLog:
>
>   * c-c++-common/Waddress-of-packed-member-2.c: Add dg-warning for
>   ignored attribute if target is default_packed.
>   * c-c++-common/Wattributes.c: Ditto.
>   * c-c++-common/attr-copy.c: Ditto.
>   * c-c++-common/builtin-has-attribute-4.c: Ditto.
>   * c-c++-common/pr51628-29.c: Ditto.
>   * c-c++-common/pr51628-30.c: Ditto.
>   * c-c++-common/pr51628-32.c: Ditto.
>   * gcc.dg/Wattributes-6.c: Ditto.
>   * gcc.dg/attr-copy-4.c: Ditto.
>   * gcc.dg/attr-copy-8.c: Ditto.

OK, thanks.  I wondered whether we should handle this in prune.exp,
but there's no precedent that I can see for doing that based on
target selectors.  The number of affected tests is also pretty small,
so it might not have been worth it anyway.

Thanks,
Richard


Re: [PATCH 3/4] testsuite: Relax pattern to include "packed" targets

2020-07-20 Thread Richard Sandiford
Dimitar Dimitrov  writes:
> The actual warning message depends on the default alignment of the
> target. With this update the test correctly passes on AVR and PRU
> targets.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/pr53037-1.c: Relax warning pattern.
>
> Signed-off-by: Dimitar Dimitrov 
> ---
>  gcc/testsuite/gcc.dg/pr53037-1.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/pr53037-1.c 
> b/gcc/testsuite/gcc.dg/pr53037-1.c
> index 3ea5ae6a34e..b4e9049c746 100644
> --- a/gcc/testsuite/gcc.dg/pr53037-1.c
> +++ b/gcc/testsuite/gcc.dg/pr53037-1.c
> @@ -40,7 +40,7 @@ struct foo5
>  {
>int i1;
>int x __attribute__((warn_if_not_aligned(16))); /* { dg-warning "'x' 
> offset 4 in 'struct foo5' isn't aligned to 16" } */
> -}; /* { dg-warning "alignment 4 of 'struct foo5' is less than 16" } */
> +}; /* { dg-warning "alignment .* of 'struct foo5' is less than 16" } */
>  
>  struct foo6
>  {
> @@ -73,7 +73,7 @@ union bar3
>  {
>int i1;
>int x __attribute__((warn_if_not_aligned(16))); 
> -}; /* { dg-warning "alignment 4 of 'union bar3' is less than 16" } */
> +}; /* { dg-warning "alignment .* of 'union bar3' is less than 16" } */

Better to use [0-9]+, and change the quoting to {…} rather than "…"
so that there's no need to add backslashes for the [ and ].

OK with that change, thanks.

Richard


Re: [PATCH 4/4] testsuite: Add default_packed filters

2020-07-20 Thread Richard Sandiford
Dimitar Dimitrov  writes:
> Fix test cases assumptions that target has alignment constraints.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/attr-copy-4.c: Unpacked may still have alignment of 1
>   on targets with default_packed.
>   * gcc.dg/c11-align-9.c: Remove AVR target filter and replace
>   with default_packed filter.

OK, thanks.

Richard

>
> Signed-off-by: Dimitar Dimitrov 
> ---
>  gcc/testsuite/gcc.dg/attr-copy-4.c | 1 +
>  gcc/testsuite/gcc.dg/c11-align-9.c | 4 ++--
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/attr-copy-4.c 
> b/gcc/testsuite/gcc.dg/attr-copy-4.c
> index 796724bb950..01fae3f78d4 100644
> --- a/gcc/testsuite/gcc.dg/attr-copy-4.c
> +++ b/gcc/testsuite/gcc.dg/attr-copy-4.c
> @@ -32,6 +32,7 @@ extern const struct PackedA packed;
>  
>  struct Unpacked { int i; char c; };
>  Assert (__alignof (struct Unpacked) > 1);
> +/* { dg-error "size of array .* is negative" "" { target default_packed } 
> .-1 } */
>  
>  /* Verify that copying the packed attribute to the declaration
> of an object is ignored with a warning.  (There should be
> diff --git a/gcc/testsuite/gcc.dg/c11-align-9.c 
> b/gcc/testsuite/gcc.dg/c11-align-9.c
> index 3c9cf55756e..6a0d4248f1b 100644
> --- a/gcc/testsuite/gcc.dg/c11-align-9.c
> +++ b/gcc/testsuite/gcc.dg/c11-align-9.c
> @@ -2,8 +2,8 @@
> are at least some alignment constraints), case of compound literals.  */
>  /* { dg-do compile } */
>  /* { dg-options "-std=c11 -pedantic-errors" } */
> -/* { dg-skip-if "no alignment constraints" { "avr-*-*" } } */
>  
>  #include 
>  
> -max_align_t *p = &(_Alignas (_Alignof (char)) max_align_t) { 1 }; /* { 
> dg-error "reduce alignment" } */
> +max_align_t *p = &(_Alignas (_Alignof (char)) max_align_t) { 1 };
> +/* { dg-error "reduce alignment" "" { target { ! default_packed } } .-1 } */


[Patch] OpenMP: Fix tmp-var handling with tree-nested.c [PR93553]

2020-07-20 Thread Tobias Burnus

This is about a PARAM_DECL of a procedure whose
internal/nested procedure uses this inside an
omp parallel. This leads to the code:

D.3940 = x;
(*D.3940)[D.3924] = …;

And the temporary variable "D.3940" introduced for
the nesting was not recorded as DECL for OpenMP,
leading to the ICE in scan_omp_1_op as shown in
the ICE.

This patch adds those temporary variables as PRIVATE
to the clause – fixing the ICE.

OK for the trunk?

Tobias

PS: For other other variables, that's done with
gimplify.c which adds it to the splay_tree, hence,
this issue only occurs if the variable is added later
– as here for tree-nested.c.
The new code is also used for some C OpenMP/OpenACC testcases
in the testsuite, but seemingly it works either way.
Interestingly, I do not see any 'private' in the dump for
the new testcase – seemingly, it gets optimized away.

PPS: Thanks for Jakub for some suggestions!

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP: Fix tmp-var handling with tree-nested.c [PR93553]

gcc/ChangeLog:

	PR fortran/93553
	* tree-nested.c (omp_new_clauses): New global var.
	(convert_nonlocal_reference_op): Add init_tmp_var/init_tmp_var
	vars to it.
	(convert_nonlocal_omp_clauses, convert_local_omp_clauses): Add
	those as 'private' to the OpenMP clause.

libgomp/ChangeLog:

	PR fortran/93553
	* testsuite/libgomp.fortran/pr93553.f90: New test.

 gcc/tree-nested.c | 37 +--
 libgomp/testsuite/libgomp.fortran/pr93553.f90 | 21 +++
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index 4dc5533be84..8c4dadc9f21 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -111,6 +111,7 @@ struct nesting_info
   char static_chain_added;
 };
 
+static tree omp_new_clauses;
 
 /* Iterate over the nesting tree, starting with ROOT, depth first.  */
 
@@ -1068,6 +1069,11 @@ convert_nonlocal_reference_op (tree *tp, int *walk_subtrees, void *data)
 	if (use_pointer_in_frame (t))
 	  {
 		x = init_tmp_var (info, x, &wi->gsi);
+		tree c = build_omp_clause (DECL_SOURCE_LOCATION (x),
+	   OMP_CLAUSE_PRIVATE);
+		OMP_CLAUSE_DECL (c) = x;
+		OMP_CLAUSE_CHAIN (c) = omp_new_clauses;
+		omp_new_clauses = c;
 		x = build_simple_mem_ref_notrap (x);
 	  }
 	  }
@@ -1078,6 +1084,11 @@ convert_nonlocal_reference_op (tree *tp, int *walk_subtrees, void *data)
 	  x = save_tmp_var (info, x, &wi->gsi);
 	else
 	  x = init_tmp_var (info, x, &wi->gsi);
+	tree c = build_omp_clause (DECL_SOURCE_LOCATION (x),
+   OMP_CLAUSE_PRIVATE);
+	OMP_CLAUSE_DECL (c) = x;
+	OMP_CLAUSE_CHAIN (c) = omp_new_clauses;
+	omp_new_clauses = c;
 	  }
 
 	*tp = x;
@@ -1186,15 +1197,18 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 {
   struct nesting_info *const info = (struct nesting_info *) wi->info;
   bool need_chain = false, need_stmts = false;
-  tree clause, decl, *pdecl;
+  tree clause, last_clause, decl, *pdecl;
   int dummy;
   bitmap new_suppress;
 
+  tree saved_omp_new_clauses = omp_new_clauses;
+  omp_new_clauses = NULL_TREE;
   new_suppress = BITMAP_GGC_ALLOC ();
   bitmap_copy (new_suppress, info->suppress_expansion);
 
   for (clause = *pclauses; clause ; clause = OMP_CLAUSE_CHAIN (clause))
 {
+  last_clause = clause;
   pdecl = NULL;
   switch (OMP_CLAUSE_CODE (clause))
 	{
@@ -1450,6 +1464,14 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	  break;
 	}
 
+
+  if (omp_new_clauses)
+{
+  gcc_assert (*pclauses);
+  OMP_CLAUSE_CHAIN (last_clause) = omp_new_clauses;
+}
+  omp_new_clauses = saved_omp_new_clauses;
+
   return need_chain;
 }
 
@@ -1919,15 +1941,19 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 {
   struct nesting_info *const info = (struct nesting_info *) wi->info;
   bool need_frame = false, need_stmts = false;
-  tree clause, decl, *pdecl;
+  tree clause, last_clause, decl, *pdecl;
   int dummy;
   bitmap new_suppress;
 
+  tree saved_omp_new_clauses = omp_new_clauses;
+  omp_new_clauses = NULL_TREE;
+
   new_suppress = BITMAP_GGC_ALLOC ();
   bitmap_copy (new_suppress, info->suppress_expansion);
 
   for (clause = *pclauses; clause ; clause = OMP_CLAUSE_CHAIN (clause))
 {
+  last_clause = clause;
   pdecl = NULL;
   switch (OMP_CLAUSE_CODE (clause))
 	{
@@ -2193,6 +2219,13 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	  break;
 	}
 
+  if (omp_new_clauses)
+{
+  gcc_assert (*pclauses);
+  OMP_CLAUSE_CHAIN (last_clause) = omp_new_clauses;
+}
+  omp_new_clauses = saved_omp_new_clauses;
+
   return need_frame;
 }
 
diff --git a/libgomp/testsuite/libgomp.fortran/pr93553.f90 b/libgomp/testsuite/libgomp.fortran/pr93553.f90
new file mode 100644
index 000..5d6f10fe

[committed] i386: Use lock prefixed insn instead of MFENCE [PR95750]

2020-07-20 Thread Uros Bizjak via Gcc-patches
Currently, __atomic_thread_fence(seq_cst) on x86 and x86-64 generates
mfence instruction. A dummy atomic instruction (a lock-prefixed instruction
or xchg with a memory operand) would provide the same sequential consistency
guarantees while being more efficient on most current CPUs. The mfence
instruction additionally orders non-temporal stores, which is not relevant
for atomic operations and are not ordered by seq_cst atomic operations anyway.

2020-07-20  Uroš Bizjak  

gcc/ChangeLog:
PR target/95750
* config/i386/i386.h (TARGET_AVOID_MFENCE):
Rename from TARGET_USE_XCHG_FOR_ATOMIC_STORE.
* config/i386/sync.md (mfence_sse2): Disable for TARGET_AVOID_MFENCE.
(mfence_nosse): Enable also for TARGET_AVOID_MFENCE. Emit stack
referred memory in word_mode.
(mem_thread_fence): Do not generate mfence_sse2 pattern when
TARGET_AVOID_MFENCE is true.
(atomic_store): Update for rename.
* config/i386/x86-tune.def (X86_TUNE_AVOID_MFENCE):
Rename from X86_TUNE_USE_XCHG_FOR_ATOMIC_STORE.

gcc/testsuite/ChangeLog:
PR target/95750
* gcc.target/i386/pr95750.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index f4a8f1391fa..114967e49a3 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -598,8 +598,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
ix86_tune_features[X86_TUNE_AVOID_FALSE_DEP_FOR_BMI]
 #define TARGET_ONE_IF_CONV_INSN \
ix86_tune_features[X86_TUNE_ONE_IF_CONV_INSN]
-#define TARGET_USE_XCHG_FOR_ATOMIC_STORE \
-   ix86_tune_features[X86_TUNE_USE_XCHG_FOR_ATOMIC_STORE]
+#define TARGET_AVOID_MFENCE ix86_tune_features[X86_TUNE_AVOID_MFENCE]
 #define TARGET_EMIT_VZEROUPPER \
ix86_tune_features[X86_TUNE_EMIT_VZEROUPPER]
 #define TARGET_EXPAND_ABS \
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index e22109039c1..c6827037abf 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -89,7 +89,8 @@
 (define_insn "mfence_sse2"
   [(set (match_operand:BLK 0)
(unspec:BLK [(match_dup 0)] UNSPEC_MFENCE))]
-  "TARGET_64BIT || TARGET_SSE2"
+  "(TARGET_64BIT || TARGET_SSE2)
+   && !TARGET_AVOID_MFENCE"
   "mfence"
   [(set_attr "type" "sse")
(set_attr "length_address" "0")
@@ -100,8 +101,14 @@
   [(set (match_operand:BLK 0)
(unspec:BLK [(match_dup 0)] UNSPEC_MFENCE))
(clobber (reg:CC FLAGS_REG))]
-  "!(TARGET_64BIT || TARGET_SSE2)"
-  "lock{%;} or{l}\t{$0, (%%esp)|DWORD PTR [esp], 0}"
+  "!(TARGET_64BIT || TARGET_SSE2)
+   || TARGET_AVOID_MFENCE"
+{
+  rtx mem = gen_rtx_MEM (word_mode, stack_pointer_rtx);
+
+  output_asm_insn ("lock{%;} or%z0\t{$0, %0|%0, 0}", &mem);
+  return "";
+}
   [(set_attr "memory" "unknown")])
 
 (define_expand "mem_thread_fence"
@@ -117,7 +124,8 @@
   rtx (*mfence_insn)(rtx);
   rtx mem;
 
-  if (TARGET_64BIT || TARGET_SSE2)
+  if ((TARGET_64BIT || TARGET_SSE2)
+ && !TARGET_AVOID_MFENCE)
mfence_insn = gen_mfence_sse2;
   else
mfence_insn = gen_mfence_nosse;
@@ -306,11 +314,10 @@
 {
   operands[1] = force_reg (mode, operands[1]);
 
-  /* For seq-cst stores, use XCHG when we lack MFENCE
-or when target prefers XCHG.  */
+  /* For seq-cst stores, use XCHG when we lack MFENCE.  */
   if (is_mm_seq_cst (model)
  && (!(TARGET_64BIT || TARGET_SSE2)
- || TARGET_USE_XCHG_FOR_ATOMIC_STORE))
+ || TARGET_AVOID_MFENCE))
{
  emit_insn (gen_atomic_exchange (gen_reg_rtx (mode),
operands[0], operands[1],
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 1776aba2d17..6eff8256897 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -313,8 +313,8 @@ DEF_TUNE (X86_TUNE_ONE_IF_CONV_INSN, "one_if_conv_insn",
  m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_CORE_ALL | m_GOLDMONT
  | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC)
 
-/* X86_TUNE_USE_XCHG_FOR_ATOMIC_STORE: Use xchg instead of mov+mfence.  */
-DEF_TUNE (X86_TUNE_USE_XCHG_FOR_ATOMIC_STORE, "use_xchg_for_atomic_store",
+/* X86_TUNE_AVOID_MFENCE: Use lock prefixed instructions instead of mfence.  */
+DEF_TUNE (X86_TUNE_AVOID_MFENCE, "avoid_mfence",
 m_CORE_ALL | m_BDVER | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_EXPAND_ABS: This enables a new abs pattern by
diff --git a/gcc/testsuite/gcc.target/i386/pr95750.c 
b/gcc/testsuite/gcc.target/i386/pr95750.c
new file mode 100644
index 000..c47108fb796
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr95750.c
@@ -0,0 +1,19 @@
+/* PR target/95750 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=core2" } */
+
+void
+foo (void)
+{
+  __atomic_thread_fence (__ATOMIC_SEQ_CST);
+}
+
+int x;
+
+void
+bar (void)
+{
+  __atomic_store_n (&x, -1, __ATOMIC_SEQ_CST);
+}
+
+/* { dg-final { scan-assembler-not "mfence" } } */


committed] correct memcmp expansion of constant representations containing embedded nuls (PR 95189)

2020-07-20 Thread Martin Sebor via Gcc-patches

I have committed this change in r11-2231 after Jeff approved it
off list last Thursday.

On 6/30/20 6:23 PM, Martin Sebor wrote:

An enhancement to GCC 10 to improve the expansion of strncmp
calls with strings with embedded nuls introduced a regression
in similar calls to memcmp.  A review of the changes that led
up to the regression exposed a number of questionable choices
that likely conspired to cause the bug.

For example, the name of the function with both the strncmp
enhancement as well as the memcmp bug is
inline_expand_builtin_string_cmp().  It's easy to assume that
the function handles calls to strcmp and strncmp but not also
memcmp.

Another similar example is the name of the second c_getstr()
argument -- strlen -- that doesn't actually store the length
of the retrieved string but rather its size in bytes
(including any embedded nuls, but excluding those appended
implicitly to zero out the remainder of an array the string
is stored in, up to the array's size).

Yet another example of a dubious choice is string_constant()
returning the empty string (i.e., STRING_CST with size 1) for
zero initializers of constants of any type (as opposed to one
of the same size as the constant object).

Besides fixing the memcmp bug the attached patch (hopefully)
also rectifies some of the otherwise more or less benign
mistakes that precipitated it, mostly by clarifying comments
and changing misleading names of functions, their arguments,
or local variables.

A happy consequence of the fixes is that they improve codegen
for calls to memcpy with constants whose representation includes
embedded nuls.

Tested on x86_64-linux.

Martin





[committed] remove stray text from option description (PR 96249)

2020-07-20 Thread Martin Sebor via Gcc-patches

I have committed this trivial change below in r11-2229.

Martin

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index a09f15d..2b1aca1 100644 (file)
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1248,7 +1248,6 @@ C ObjC C++ LTO ObjC++ Var(warn_vla_limit) Warning 
Joined Host_Wide_Int ByteSize

 -Wvla-larger-than= Warn on unbounded uses of variable-length
 arrays, and on bounded uses of variable-length arrays whose bound can be
 larger than  bytes.
- bytes.

 Wno-vla-larger-than
 C ObjC C++ LTO ObjC++ 
Alias(Wvla-larger-than=,18446744073709551615EiB,none) Warning


Re: [PATCH PR96195] aarch64: ICE during GIMPLE pass:vect

2020-07-20 Thread Richard Sandiford
Sorry for the slow reply.

"yangyang (ET)"  writes:
> Hi, 
>
>   This is a simple fix for PR96195.
>
>   For the test case, GCC generates the following gimple statement in 
> pass_vect:
>
> vect__21.16_58 = zp.simdclone.2 (vect_vec_iv_.15_56);
>
>   The mode of vect__21.16_58 is VNx2SI while the mode of zp.simdclone.2 
> (vect_vec_iv_.15_56) is V4SI, resulting in the crash.
>
>   In vectorizable_simd_clone_call, type compatibility is handled based on 
> the number of elements and the type compatibility of elements, which is not 
> enough. 
>   This patch add VIEW_CONVERT_EXPRs if the arguments types and return 
> type of simd clone function are distinct with the vectype of stmt.
>
>   Added one testcase for this. Bootstrap and tested on both aarch64 and 
> x86 Linux platform, no new regression witnessed.

I agree this looks correct as far as the target-independent interface
goes.  However, the underlying problem is that we haven't yet added
support for SVE omp simd functions.  What should happen for the testcase
is that we assume both SVE and Advanced SIMD versions of zp exist and
call the SVE version instead of the Advanced SIMD version.

There again, for little-endian -msve-vector-bits=128 there should
be no overhead with using the Advanced SIMD version, and big-endian
-msve-vector-bits=128 is equivalent to -msve-vector-bits=scalable.

Things would get more interesting for:

  #pragma omp declare simd simdlen(8)
  int
  zp (int);

and -msve-vector-bits=256, but again, we don't yet support simdlen(8)
for Advanced SIMD.

So all in all, I agree this is the right fix.  Pushed to master with
a minor whitespace fixup for:

> + gassign *new_stmt
> +   = gimple_build_assign (make_ssa_name (atype),
> +vec_oprnd0);

…the indentation on this line.

Thanks,
Richard


[committed] libstdc++: Avoid overflow in istream::get(streambuf&) [LWG 3464]

2020-07-20 Thread Jonathan Wakely via Gcc-patches
Similar to the recent changes to basic_istream::ignore, this change
ensures that _M_gcount doesn't overflow when extracting characters and
inserting them into another streambuf.

The solution used here is to use unsigned long long for the count. We
assume that the number of characters extracted won't exceed the maximum
value for that type, but even if it does we avoid any undefined
behaviour.

libstdc++-v3/ChangeLog:

* include/bits/istream.tcc
(basic_istream::get(__streambuf_type&, char_type): Use unsigned
long long for counter and check if it would overflow _M_gcount.
* testsuite/27_io/basic_istream/get/char/lwg3464.cc: New test.
* testsuite/27_io/basic_istream/get/wchar_t/lwg3464.cc: New test.

Tested powerpc64le-linux, i686-linux. Committed to trunk.


commit 4d1c5b4957db2cb07f1053b7b87767275497d52e
Author: Jonathan Wakely 
Date:   Mon Jul 20 20:06:46 2020

libstdc++: Avoid overflow in istream::get(streambuf&) [LWG 3464]

Similar to the recent changes to basic_istream::ignore, this change
ensures that _M_gcount doesn't overflow when extracting characters and
inserting them into another streambuf.

The solution used here is to use unsigned long long for the count. We
assume that the number of characters extracted won't exceed the maximum
value for that type, but even if it does we avoid any undefined
behaviour.

libstdc++-v3/ChangeLog:

* include/bits/istream.tcc
(basic_istream::get(__streambuf_type&, char_type): Use unsigned
long long for counter and check if it would overflow _M_gcount.
* testsuite/27_io/basic_istream/get/char/lwg3464.cc: New test.
* testsuite/27_io/basic_istream/get/wchar_t/lwg3464.cc: New test.

diff --git a/libstdc++-v3/include/bits/istream.tcc 
b/libstdc++-v3/include/bits/istream.tcc
index 5983e51873f..0289867c50b 100644
--- a/libstdc++-v3/include/bits/istream.tcc
+++ b/libstdc++-v3/include/bits/istream.tcc
@@ -375,17 +375,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __streambuf_type* __this_sb = this->rdbuf();
  int_type __c = __this_sb->sgetc();
  char_type __c2 = traits_type::to_char_type(__c);
+ unsigned long long __gcount = 0;
 
  while (!traits_type::eq_int_type(__c, __eof)
 && !traits_type::eq_int_type(__c, __idelim)
 && !traits_type::eq_int_type(__sb.sputc(__c2), __eof))
{
- ++_M_gcount;
+ ++__gcount;
  __c = __this_sb->snextc();
  __c2 = traits_type::to_char_type(__c);
}
  if (traits_type::eq_int_type(__c, __eof))
__err |= ios_base::eofbit;
+ // _GLIBCXX_RESOLVE_LIB_DEFECTS
+ // 3464. istream::gcount() can overflow
+ if (__gcount <= __gnu_cxx::__numeric_traits::__max)
+   _M_gcount = __gcount;
+ else
+   _M_gcount = __gnu_cxx::__numeric_traits::__max;
}
  __catch(__cxxabiv1::__forced_unwind&)
{
diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/get/char/lwg3464.cc 
b/libstdc++-v3/testsuite/27_io/basic_istream/get/char/lwg3464.cc
new file mode 100644
index 000..6123ca5b713
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_istream/get/char/lwg3464.cc
@@ -0,0 +1,91 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target { ! lp64 } } }
+
+#include 
+#include 
+#include 
+#include 
+
+typedef char C;
+
+struct buff : std::basic_streambuf
+{
+  typedef std::streamsizestreamsize;
+  typedef std::numeric_limits limits;
+
+  buff() : count(0), buf() { }
+
+  int_type underflow()
+  {
+// Number of characters left until we overflow the counter
+const streamsize headroom = limits::max() - count;
+
+if (headroom == 0)
+  return traits_type::eof();
+
+if (bufsz < headroom)
+  count += bufsz;
+else
+  count = limits::max();
+
+this->setg(buf + 1, buf + 1, buf + bufsz);
+
+return buf[0];
+  }
+
+  int_type overflow(int_type c)
+  {
+if (traits_type::eq_int_type(c , traits_type::eof()))
+  return c;
+th

Re: [Patch] OpenMP: Fix tmp-var handling with tree-nested.c [PR93553]

2020-07-20 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 20, 2020 at 08:35:56PM +0200, Tobias Burnus wrote:
> gcc/ChangeLog:
> 
>   PR fortran/93553
>   * tree-nested.c (omp_new_clauses): New global var.
>   (convert_nonlocal_reference_op): Add init_tmp_var/init_tmp_var
>   vars to it.
>   (convert_nonlocal_omp_clauses, convert_local_omp_clauses): Add
>   those as 'private' to the OpenMP clause.

> --- a/gcc/tree-nested.c
> +++ b/gcc/tree-nested.c
> @@ -111,6 +111,7 @@ struct nesting_info
>char static_chain_added;
>  };
>  
> +static tree omp_new_clauses;

I don't like this global variable.
Can you please instead stick it into struct nesting_info and make sure it is
cleared where it is allocated?

Jakub



[PATCH] c++: abbreviated function template friend matching [PR96106]

2020-07-20 Thread Patrick Palka via Gcc-patches
In the below testcase, duplicate_decls wasn't merging the tsubsted
friend declaration for 'void add(auto)' with its definition, because
reduce_template_parm_level (during tsubst_friend_function) lost the
DECL_VIRTUAL_P flag on the invented 'auto' template parameter, which
made template_heads_equivalent_p deem the two template heads as not
equivalent in C++20 mode.

This patch makes reduce_template_parm_level carry over the
DECL_VIRTUAL_P from the original TEMPLATE_PARM_DECL.

Passes 'make check-c++' and the cmcstl2 testsuite.  Does this look OK
for trunk after a full bootstrap and regtest?

gcc/cp/ChangeLog:

PR c++/96106
* pt.c (reduce_template_parm_level): Carry over DECL_VIRTUAL_P
from the original TEMPLATE_PARM_DECL.

gcc/testsuite/ChangeLog:

PR c++/96106
* g++.dg/concepts/abbrev7.C: New test.
---
 gcc/cp/pt.c |  1 +
 gcc/testsuite/g++.dg/concepts/abbrev7.C | 14 ++
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/concepts/abbrev7.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index defc2a9abd8..3f7b89141b6 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -4453,6 +4453,7 @@ reduce_template_parm_level (tree index, tree type, int 
levels, tree args,
  type);
   TREE_CONSTANT (decl) = TREE_CONSTANT (orig_decl);
   TREE_READONLY (decl) = TREE_READONLY (orig_decl);
+  DECL_VIRTUAL_P (decl) = DECL_VIRTUAL_P (orig_decl);
   DECL_ARTIFICIAL (decl) = 1;
   SET_DECL_TEMPLATE_PARM_P (decl);
 
diff --git a/gcc/testsuite/g++.dg/concepts/abbrev7.C 
b/gcc/testsuite/g++.dg/concepts/abbrev7.C
new file mode 100644
index 000..443c1b7871b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/abbrev7.C
@@ -0,0 +1,14 @@
+// PR c++/96106
+// { dg-do compile { target concepts } }
+
+template
+struct number {
+  friend void add(auto);
+};
+
+void add(auto) { }
+
+void foo() {
+  number n;
+  add(n); // { dg-bogus "ambiguous" }
+}
-- 
2.28.0.rc1



Re: [Patch] OpenMP: Fix tmp-var handling with tree-nested.c [PR93553]

2020-07-20 Thread Tobias Burnus

On 7/20/20 9:12 PM, Jakub Jelinek wrote:

I don't like this global variable.
Can you please instead stick it into struct nesting_info and make sure it is
cleared where it is allocated?


Done. The existing code uses
   struct nesting_info *info = XCNEW (struct nesting_info);
in create_nesting_tree; hence, the clearing is already done.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP: Fix tmp-var handling with tree-nested.c [PR93553]

gcc/ChangeLog:

	PR fortran/93553
	* tree-nested.c (struct nesting_info): Add omp_new_clauses.
	(convert_nonlocal_reference_op): Add init_tmp_var/init_tmp_var
	vars to omp_new_clauses.
	(convert_nonlocal_omp_clauses, convert_local_omp_clauses): Add
	those as 'private' to the OpenMP clause.

libgomp/ChangeLog:

	PR fortran/93553
	* testsuite/libgomp.fortran/pr93553.f90: New test.

 gcc/tree-nested.c | 35 +++
 libgomp/testsuite/libgomp.fortran/pr93553.f90 | 21 
 2 files changed, 56 insertions(+)

diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index 4dc5533be84..aad4c8094b0 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -103,6 +103,7 @@ struct nesting_info
   tree chain_field;
   tree chain_decl;
   tree nl_goto_field;
+  tree omp_new_clauses;
 
   bool thunk_p;
   bool any_parm_remapped;
@@ -1068,6 +1069,11 @@ convert_nonlocal_reference_op (tree *tp, int *walk_subtrees, void *data)
 	if (use_pointer_in_frame (t))
 	  {
 		x = init_tmp_var (info, x, &wi->gsi);
+		tree c = build_omp_clause (DECL_SOURCE_LOCATION (x),
+	   OMP_CLAUSE_PRIVATE);
+		OMP_CLAUSE_DECL (c) = x;
+		OMP_CLAUSE_CHAIN (c) = info->omp_new_clauses;
+		info->omp_new_clauses = c;
 		x = build_simple_mem_ref_notrap (x);
 	  }
 	  }
@@ -1078,6 +1084,11 @@ convert_nonlocal_reference_op (tree *tp, int *walk_subtrees, void *data)
 	  x = save_tmp_var (info, x, &wi->gsi);
 	else
 	  x = init_tmp_var (info, x, &wi->gsi);
+	tree c = build_omp_clause (DECL_SOURCE_LOCATION (x),
+   OMP_CLAUSE_PRIVATE);
+	OMP_CLAUSE_DECL (c) = x;
+	OMP_CLAUSE_CHAIN (c) = info->omp_new_clauses;
+	info->omp_new_clauses = c;
 	  }
 
 	*tp = x;
@@ -1450,6 +1461,18 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	  break;
 	}
 
+  struct nesting_info *n;
+  FOR_EACH_NEST_INFO (n, info)
+if (n->omp_new_clauses)
+  {
+	tree last_clause = n->omp_new_clauses;
+	while (OMP_CLAUSE_CHAIN (last_clause))
+	  last_clause = OMP_CLAUSE_CHAIN (last_clause);
+	OMP_CLAUSE_CHAIN (last_clause) = *pclauses;
+	*pclauses = n->omp_new_clauses;
+	n->omp_new_clauses = NULL_TREE;
+  }
+
   return need_chain;
 }
 
@@ -2193,6 +2216,18 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	  break;
 	}
 
+  struct nesting_info *n;
+  FOR_EACH_NEST_INFO (n, info)
+if (n->omp_new_clauses)
+  {
+	tree last_clause = n->omp_new_clauses;
+	while (OMP_CLAUSE_CHAIN (last_clause))
+	  last_clause = OMP_CLAUSE_CHAIN (last_clause);
+	OMP_CLAUSE_CHAIN (last_clause) = *pclauses;
+	*pclauses = n->omp_new_clauses;
+	n->omp_new_clauses = NULL_TREE;
+  }
+
   return need_frame;
 }
 
diff --git a/libgomp/testsuite/libgomp.fortran/pr93553.f90 b/libgomp/testsuite/libgomp.fortran/pr93553.f90
new file mode 100644
index 000..5d6f10febed
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/pr93553.f90
@@ -0,0 +1,21 @@
+program p
+   implicit none
+   integer :: x(8) = 0
+   call sub(x)
+end
+subroutine sub(x)
+   implicit none
+   integer i
+   integer :: x(8)
+   integer :: c(8) = [(11*i, i=1,8)]
+   call s
+   if (any (x /= c)) stop 1
+contains
+   subroutine s
+  integer :: i
+  !$omp parallel do reduction(+:x)
+  do i = 1, 8
+ x(i) = c(i)
+  end do
+   end
+end


[pushed] c++: Aggregate CTAD and string constants.

2020-07-20 Thread Jason Merrill via Gcc-patches
In CWG discussion, it was suggested that deduction from a string literal
should be to reference-to-const, so that we deduce 'char' rather than 'const
char' for T.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* pt.c (collect_ctor_idx_types): Add 'const' when deducing from
a string constant.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr7.C: New test.
---
 gcc/cp/pt.c|  9 +++--
 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr7.C | 14 ++
 2 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr7.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index defc2a9abd8..5f43e9c5c69 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28357,8 +28357,13 @@ collect_ctor_idx_types (tree ctor, tree list, tree elt 
= NULL_TREE)
   if (TREE_CODE (ftype) == ARRAY_TYPE
  && (BRACE_ENCLOSED_INITIALIZER_P (val)
  || TREE_CODE (val) == STRING_CST))
-   ftype = (cp_build_reference_type
-(ftype, BRACE_ENCLOSED_INITIALIZER_P (val)));
+   {
+ if (TREE_CODE (val) == STRING_CST)
+   ftype = cp_build_qualified_type
+ (ftype, cp_type_quals (ftype) | TYPE_QUAL_CONST);
+ ftype = (cp_build_reference_type
+  (ftype, BRACE_ENCLOSED_INITIALIZER_P (val)));
+   }
   list = tree_cons (arg, ftype, list);
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr7.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr7.C
new file mode 100644
index 000..3505a8c97db
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr7.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++20 } }
+
+template 
+struct A
+{
+  T ar[N];
+};
+
+A a = { "foo" };
+
+template struct same;
+template struct same {};
+same s;
+

base-commit: 87891d5eafe8d1de90b9d9b056eca81c508d1c77
-- 
2.18.1



[pushed] c++: Allow subobject references in C++20.

2020-07-20 Thread Jason Merrill via Gcc-patches
The last new thing allowed by P1907R1: subobject addresses as template
arguments.  The ABI group has discussed mangling for this; there has been
some talk of a compressed subobject mangling, but it hasn't been finalized,
so for now I'm using normal expression mangling.  In order for two array
subobject references to compare as equal template arguments, the offsets
need to have the same type, so I convert them to always be the same type,
currently ptrdiff_t.  Base class conversions are represented as a cast to
reference type, only if necessary to resolve an ambiguity.

This patch also updates the value of __cpp_nontype_template_args, since
the paper is fully implemented.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* mangle.c (write_base_ref): New.
(write_expression): Use it for base field COMPONENT_REFs.
* pt.c (invalid_tparm_referent_p): Canonicalize the type
of array offsets.  Allow subobjects.

gcc/c-family/ChangeLog:

* c-cppbuiltin.c (c_cpp_builtins): Update
__cpp_nontype_template_args for C++20.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nontype2.C: No error in C++20.
* g++.dg/template/nontype25.C: No error in C++20.
* g++.dg/template/nontype8.C: No error in C++20.
* g++.dg/cpp2a/nontype-subob1.C: New test.
* g++.dg/cpp2a/nontype-subob2.C: New test.
* g++.dg/cpp1z/nontype3.C: Now C++17-only.
* g++.dg/cpp2a/feat-cxx2a.C: Adjust expected value.
---
 gcc/c-family/c-cppbuiltin.c |  4 +-
 gcc/cp/mangle.c | 56 +
 gcc/cp/pt.c | 24 +++--
 gcc/testsuite/g++.dg/cpp1z/nontype2.C   |  2 +-
 gcc/testsuite/g++.dg/cpp1z/nontype3.C   |  2 +-
 gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C |  4 +-
 gcc/testsuite/g++.dg/cpp2a/nontype-subob1.C | 25 +
 gcc/testsuite/g++.dg/cpp2a/nontype-subob2.C | 13 +
 gcc/testsuite/g++.dg/template/nontype25.C   |  6 +--
 gcc/testsuite/g++.dg/template/nontype8.C|  4 +-
 10 files changed, 126 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-subob1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-subob2.C

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 83f52fdf5d8..74ecca8de8e 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -967,7 +967,8 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_enumerator_attributes=201411L");
  cpp_define (pfile, "__cpp_nested_namespace_definitions=201411L");
  cpp_define (pfile, "__cpp_fold_expressions=201603L");
- cpp_define (pfile, "__cpp_nontype_template_args=201411L");
+ if (cxx_dialect <= cxx17)
+   cpp_define (pfile, "__cpp_nontype_template_args=201411L");
  cpp_define (pfile, "__cpp_range_based_for=201603L");
  if (cxx_dialect <= cxx17)
cpp_define (pfile, "__cpp_constexpr=201603L");
@@ -998,6 +999,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_consteval=201811L");
  cpp_define (pfile, "__cpp_constinit=201907L");
  cpp_define (pfile, "__cpp_deduction_guides=201907L");
+ cpp_define (pfile, "__cpp_nontype_template_args=201911L");
  cpp_define (pfile, "__cpp_nontype_template_parameter_class=201806L");
  cpp_define (pfile, "__cpp_impl_destroying_delete=201806L");
  cpp_define (pfile, "__cpp_constexpr_dynamic_alloc=201907L");
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index ab2d8ecf2f2..43ff2e84db5 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -2850,6 +2850,60 @@ write_member_name (tree member)
 write_expression (member);
 }
 
+/* EXPR is a base COMPONENT_REF; write the minimized base conversion path for
+   converting to BASE, or just the conversion of EXPR if BASE is null.
+
+   "Given a fully explicit base path P := C_n -> ... -> C_0, the minimized base
+   path Min(P) is defined as follows: let C_i be the last element for which the
+   conversion to C_0 is unambiguous; if that element is C_n, the minimized path
+   is C_n -> C_0; otherwise, the minimized path is Min(C_n -> ... -> C_i) ->
+   C_0."
+
+   We mangle the conversion to C_i if it's different from C_n.  */
+
+static bool
+write_base_ref (tree expr, tree base = NULL_TREE)
+{
+  if (TREE_CODE (expr) != COMPONENT_REF)
+return false;
+
+  tree field = TREE_OPERAND (expr, 1);
+
+  if (TREE_CODE (field) != FIELD_DECL || !DECL_FIELD_IS_BASE (field))
+return false;
+
+  tree object = TREE_OPERAND (expr, 0);
+
+  tree binfo = NULL_TREE;
+  if (base)
+{
+  tree cur = TREE_TYPE (object);
+  binfo = lookup_base (cur, base, ba_unique, NULL, tf_none);
+}
+  else
+/* We're at the end of the base conversion chain, so it can't be
+   ambiguous.  */
+base = TREE_TYPE (field);
+
+  if (binfo == error_mark_node)
+{
+  /* cur->base is ambiguous, 

Re: [PATCH v2] sparc/sparc64: use crtendS.o for default-pie executables [PR96190]

2020-07-20 Thread Sergei Trofimovich via Gcc-patches
On Fri, 17 Jul 2020 10:19:41 +0200
Eric Botcazou  wrote:

> > Oh! Sent out v3 with tweaked description as
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550168.html  
> 
> Thanks.
> 
> > I don't have a push access to gcc tree. Should I request one via
> > https://sourceware.org/cgi-bin/pdw/ps_form.cgi ?  
> 
> Sure, you can put me (ebotca...@libertysurf.fr) as sponsor if need be.

Got access and pushed as https://gcc.gnu.org/g:87891d5eafe8
Will cherry-pick it to gcc-10 as well after 10.2 release.

Thank you!

-- 

  Sergei


[pushend] c++: Pseudo-destructor ends object lifetime.

2020-07-20 Thread Jason Merrill via Gcc-patches
P0593R6 is mostly about a new object model whereby malloc and the like are
treated as implicitly starting the lifetime of whatever trivial types are
necessary to give the program well-defined semantics; that seems only
relevant to TBAA, and is not implemented here.

The paper also specifies that a pseudo-destructor call (a destructor call
for a non-class type) ends the lifetime of the object like a destructor call
for an object of class type, even though it doesn't call a destructor; this
patch implements that change.

The paper was voted as a DR, so I'm applying this change to all standard
levels.  Like class end-of-life clobbers, it is controlled by
 -flifetime-dse.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* semantics.c (finish_call_expr): Use build_trivial_dtor_call for
pseudo-destructor.

gcc/testsuite/ChangeLog:

* g++.dg/opt/flifetime-dse7.C: New test.
---
 gcc/cp/pt.c   |  3 +--
 gcc/cp/semantics.c| 21 ++---
 gcc/testsuite/g++.dg/opt/flifetime-dse7.C | 16 
 3 files changed, 31 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/flifetime-dse7.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index cfe5dcd59cf..f9e80e5a1c3 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -26729,8 +26729,7 @@ type_dependent_expression_p (tree expression)
 return true;
 
   /* Some expression forms are never type-dependent.  */
-  if (TREE_CODE (expression) == PSEUDO_DTOR_EXPR
-  || TREE_CODE (expression) == SIZEOF_EXPR
+  if (TREE_CODE (expression) == SIZEOF_EXPR
   || TREE_CODE (expression) == ALIGNOF_EXPR
   || TREE_CODE (expression) == AT_ENCODE_EXPR
   || TREE_CODE (expression) == NOEXCEPT_EXPR
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 4a3ef3d2839..3096fe83433 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -2707,12 +2707,16 @@ finish_call_expr (tree fn, vec **args, 
bool disallow_virtual,
 {
   if (!vec_safe_is_empty (*args))
error ("arguments to destructor are not allowed");
-  /* Mark the pseudo-destructor call as having side-effects so
-that we do not issue warnings about its use.  */
-  result = build1 (NOP_EXPR,
-  void_type_node,
-  TREE_OPERAND (fn, 0));
-  TREE_SIDE_EFFECTS (result) = 1;
+  /* C++20/DR: If the postfix-expression names a pseudo-destructor (in
+which case the postfix-expression is a possibly-parenthesized class
+member access), the function call destroys the object of scalar type
+denoted by the object expression of the class member access.  */
+  tree ob = TREE_OPERAND (fn, 0);
+  if (obvalue_p (ob))
+   result = build_trivial_dtor_call (ob);
+  else
+   /* No location to clobber.  */
+   result = convert_to_void (ob, ICV_STATEMENT, complain);
 }
   else if (CLASS_TYPE_P (TREE_TYPE (fn)))
 /* If the "function" is really an object of class type, it might
@@ -2845,7 +2849,10 @@ finish_pseudo_destructor_expr (tree object, tree scope, 
tree destructor,
}
 }
 
-  return build3_loc (loc, PSEUDO_DTOR_EXPR, void_type_node, object,
+  tree type = (type_dependent_expression_p (object)
+  ? NULL_TREE : void_type_node);
+
+  return build3_loc (loc, PSEUDO_DTOR_EXPR, type, object,
 scope, destructor);
 }
 
diff --git a/gcc/testsuite/g++.dg/opt/flifetime-dse7.C 
b/gcc/testsuite/g++.dg/opt/flifetime-dse7.C
new file mode 100644
index 000..4fe1eb062f4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/flifetime-dse7.C
@@ -0,0 +1,16 @@
+// { dg-options "-O3 -flifetime-dse" }
+// { dg-do run }
+
+template 
+void f()
+{
+  T t = 42;
+  t.~T();
+  if (t == 42) __builtin_abort();
+}
+
+int main()
+{
+  f();
+}
+

base-commit: 812798917c59e95405a71b31ab37bd78c0f43f79
-- 
2.18.1



[PATCH] libgomp: Add helper functions for memory handling.

2020-07-20 Thread y2s1982 via Gcc-patches
This patch adds few helper functions aims to improve readability of
fetching addresses, sizes, and values. It also proposes a syntax for
querying these information through the callback functions, similar to
that of LLVM implementation. The syntax format is _, or __. '_' is the
delimiter between fields. '', as currently defined in the
enum, is either gompd_query_address or gompd_query_size: the first
handles address or offset queries while the second handles the size of
the variable/member. '' refers to the variable type, and
'' refers to the data type of the member of the variable.
This code is incomplete: in particular, it currently lacks CUDA support,
as well as segment handling, and inlining of the functions.

2020-07-20  Tony Sim  

libgomp/ChangeLog:

* Makefile.am (libgompd_la_SOURCES): Add ompd-helper.c.
* Makefile.in: Regenerate.
* libgompd.h (FOREACH_QUERYTYPE): Add macro for query types.
(query_type): Add enum for query types.
(gompd_getQueryStringSize): Add declaration.
(gompd_getQueryString): Add declaration.
(gompd_getAddress): Add declaration.
(gompd_getSize): Add declaration.
(gompd_getValue): Add declaration.
(gompd_getVariableAddress): Add declaration.
(gompd_getVariableSize): Add declaration.
(gompd_getVariableValue): Add declaration.
(gompd_getMemberAddress): Add declaration.
(gompd_getMemberSize): Add declaration.
(gompd_getMemberValue): Add declaration.
* ompd-helper.c: New file.

---
 libgomp/Makefile.am   |   2 +-
 libgomp/Makefile.in   |   5 +-
 libgomp/libgompd.h|  53 ++
 libgomp/ompd-helper.c | 228 ++
 4 files changed, 285 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/ompd-helper.c

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index fe0a92122ea..d126bc655fc 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -90,7 +90,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c oacc-target.c
 
-libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-helper.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index f74d5f3ac8e..fdd488ca98e 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -235,7 +235,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
critical.lo \
$(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 libgompd_la_LIBADD =
-am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo
+am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo ompd-helper.lo
 libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -593,7 +593,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c $(am__append_4)
-libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-helper.c
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info 
$(libtool_VERSION)
@@ -817,6 +817,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-plugin.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-profiling.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-target.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-helper.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-lib.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-proc.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
diff --git a/libgomp/libgompd.h b/libgomp/libgompd.h
index 495995e00d3..69bf3573d3c 100644
--- a/libgomp/libgompd.h
+++ b/libgomp/libgompd.h
@@ -30,12 +30,18 @@
 #define LIBGOMPD_H 1
 
 #include "omp-tools.h"
+#include 
 
 #define ompd_stringify(x) ompd_str2(x)
 #define ompd_str2(x) #x
 
 #define OMPD_VERSION 201811
 
+#define FOREACH_QUERYTYPE(TYPE)\
+   TYPE (gompd_query_address)\
+   TYPE (gompd_query_size)\
+
+
 extern ompd_callbacks_t gompd_callbacks;
 
 typedef struct _ompd_aspace_handle {
@@ -47,4 +53,51 @@ typedef struct _ompd_aspace_handle {
   ompd_size_t ref_count;
 } ompd_address_space_handle_t;
 
+typedef enum gompd_query_type {
+#define GENERATE_ENUM(ENUM) ENUM,
+  FOREACH_QUERYTYPE (GENERATE_ENUM)
+#undef GENERATE_ENUM
+} query_type;
+
+ompd_rc_t gompd_getQueryStringSize (size_t *, query_type, const char*,
+   const char *);
+
+ompd_rc_t gompd_getQueryString (char **, query_type, const char*, const char 
*);
+
+ompd_rc_t gompd_getAddress (ompd_

[committed] libstdc++: Add std::from_chars for floating-point types

2020-07-20 Thread Jonathan Wakely via Gcc-patches
This adds the missing std::from_chars overloads for floating-point
types, as required for C++17 conformance.

The implementation is a hack and not intended to be used in the long
term. Rather than parsing the string directly, this determines the
initial portion of the string that matches the pattern determined by the
chars_format parameter, then creates a NTBS to be parsed by strtod (or
strtold or strtof).

Because creating a NTBS requires allocating memory, but std::from_chars
is noexcept, we need to be careful to minimise allocation. Even after
being careful, allocation failure is still possible, and so a
non-conforming std::no_more_memory error code might be returned.

Because strtod et al depend on the current locale, but std::from_chars
does not, we change the current thread's locale to "C" using newlocale
and uselocale before calling strtod, and restore it afterwards.

Because strtod doesn't have the equivalent of a std::chars_format
parameter, it has to examine the input to determine the format in use,
even though the std::from_chars code has already parsed it once (or
twice for large input strings!)

By replacing the use of strtod we could avoid allocation, avoid changing
locale, and use optimised code paths specific to each std::chars_format
case. We would also get more portable behaviour, rather than depending
on the presence of uselocale, and on any bugs or quirks of the target
libc's strtod. Replacing strtod is a project for a later date.

libstdc++-v3/ChangeLog:

* acinclude.m4 (libtool_VERSION): Bump version.
* config.h.in: Regenerate.
* config/abi/pre/gnu.ver: Add GLIBCXX_3.4.29 version and new
exports.
* config/os/gnu-linux/ldbl-extra.ver: Add _GLIBCXX_LDBL_3.4.29
version and new export.
* configure: Regenerate.
* configure.ac: Check for  and uselocale.
* crossconfig.m4: Add macro or checks for uselocale.
* include/std/charconv (from_chars): Declare overloads for
float, double, and long double.
* src/c++17/Makefile.am: Add new file.
* src/c++17/Makefile.in: Regenerate.
* src/c++17/floating_from_chars.cc: New file.
(from_chars): Define for float, double, and long double.
* testsuite/20_util/from_chars/1_c++20_neg.cc: Prune extra
diagnostics caused by new overloads.
* testsuite/20_util/from_chars/1_neg.cc: Likewise.
* testsuite/20_util/from_chars/2.cc: Check leading '+'.
* testsuite/20_util/from_chars/4.cc: New test.
* testsuite/20_util/from_chars/5.cc: New test.
* testsuite/util/testsuite_abi.cc: Add new symbol versions.

Tested x86_64-linux, powerpc64l-linux. Committed to trunk.


commit 932fbc868ad429167a3d4d5625aa9d6dc0b4506b
Author: Jonathan Wakely 
Date:   Mon Jul 20 23:49:27 2020

libstdc++: Add std::from_chars for floating-point types

This adds the missing std::from_chars overloads for floating-point
types, as required for C++17 conformance.

The implementation is a hack and not intended to be used in the long
term. Rather than parsing the string directly, this determines the
initial portion of the string that matches the pattern determined by the
chars_format parameter, then creates a NTBS to be parsed by strtod (or
strtold or strtof).

Because creating a NTBS requires allocating memory, but std::from_chars
is noexcept, we need to be careful to minimise allocation. Even after
being careful, allocation failure is still possible, and so a
non-conforming std::no_more_memory error code might be returned.

Because strtod et al depend on the current locale, but std::from_chars
does not, we change the current thread's locale to "C" using newlocale
and uselocale before calling strtod, and restore it afterwards.

Because strtod doesn't have the equivalent of a std::chars_format
parameter, it has to examine the input to determine the format in use,
even though the std::from_chars code has already parsed it once (or
twice for large input strings!)

By replacing the use of strtod we could avoid allocation, avoid changing
locale, and use optimised code paths specific to each std::chars_format
case. We would also get more portable behaviour, rather than depending
on the presence of uselocale, and on any bugs or quirks of the target
libc's strtod. Replacing strtod is a project for a later date.

libstdc++-v3/ChangeLog:

* acinclude.m4 (libtool_VERSION): Bump version.
* config.h.in: Regenerate.
* config/abi/pre/gnu.ver: Add GLIBCXX_3.4.29 version and new
exports.
* config/os/gnu-linux/ldbl-extra.ver: Add _GLIBCXX_LDBL_3.4.29
version and new export.
* configure: Regenerate.
* configure.ac: Check for  and uselocale.
* crossconfig.m4: Add macro or checks for uselocale.
* include/st

Re: [patch] gcc/testsuite: Scale down long-running tree-prof.exp tests on slow targets

2020-07-20 Thread Sandra Loosemore

On 7/20/20 2:15 AM, Richard Biener wrote:


I think at least parts of tree-prof.exp exercises sample-based profiling
which might require more iterations.  For example cold_partition_label.c
was changed by

commit f63ba78ce6d50bf627dd18018179eb03bf89716f
Author: Andi Kleen 
Date:   Thu Jul 14 02:14:56 2016 +

 Some fixes for profile test cases for autofdo

 This fixes some basic issues with the profile test cases with autofdo.

 - Disable checking for value transformations that autofdo does not
   support.
 - Disable checking for fixed hit counts which autofdo does not support
 - Enable dumping of afdo log file and check right log file.
 - Increase run time of test cases to 1M iterations because autofdo needs
   a few samples to make sense of a program. The test case don't run
   noticeable slower with that.

 There are still failures unfortunately, especially the indirect call
 transformations do not trigger because autofdo thinks they are not hot.
 This can be addressed later.

so the change to a larger number of iterations was intended.  Maybe
we can arrange to pass -DFOR_AUTOFDO_TESTING for the
autofdo compiles and gate the larger number of iterations on that
(most targets do not support autofdo and to not run that mode)?


Something like the attached updated patch?  Unfortunately I'm not set up 
to test that this actually works on an autofdo target, maybe somebody 
else could give it a try?


-Sandra
commit 2608a0ae0e81f039f354ec4c0c2fb0c3dbb8ea08
Author: Sandra Loosemore 
Date:   Mon Jul 20 16:02:53 2020 -0700

Scale down long-running tree-prof.exp tests for non-FDO testing.

2020-07-20  Sandra Loosemore  

	gcc/testsuite/
	* lib/profopt.exp (auto-profopt-execute): Pass -DFOR_AUTOFDO_TESTING
	on command line for both compiles.
	* gcc.dg/tree-prof/cold_partition_label.c: Scale down for
	non-FDO testing.
	* gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: Likewise.
	* gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: Likewise.
	* gcc.dg/tree-prof/indir-call-prof-topn.c: Likewise.
	* gcc.dg/tree-prof/section-attr-1.c: Likewise.
	* gcc.dg/tree-prof/section-attr-2.c: Likewise.
	* gcc.dg/tree-prof/section-attr-3.c: Likewise.

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 7e8dc55..0928855 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,16 @@
+2020-07-20  Sandra Loosemore  
+
+	* lib/profopt.exp (auto-profopt-execute): Pass -DFOR_AUTOFDO_TESTING
+	on command line for both compiles.
+	* gcc.dg/tree-prof/cold_partition_label.c: Scale down for
+	non-FDO testing.
+	* gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: Likewise.
+	* gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: Likewise.
+	* gcc.dg/tree-prof/indir-call-prof-topn.c: Likewise.
+	* gcc.dg/tree-prof/section-attr-1.c: Likewise.
+	* gcc.dg/tree-prof/section-attr-2.c: Likewise.
+	* gcc.dg/tree-prof/section-attr-3.c: Likewise.
+
 2020-07-19  H.J. Lu  
 
 	PR target/95973
diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
index 450308d..511b610 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
@@ -3,6 +3,12 @@
 /* { dg-require-effective-target freorder } */
 /* { dg-options "-O2 -freorder-blocks-and-partition -save-temps -fdump-tree-optimized" } */
 
+#ifdef FOR_AUTOFDO_TESTING
+#define MAXITER 100
+#else
+#define MAXITER 1
+#endif
+
 #define SIZE 1
 
 const char *sarr[SIZE];
@@ -32,7 +38,7 @@ main (int argc, char *argv[])
   int i;
   buf_hot =  "hello";
   buf_cold = "world";
-  for (i = 0; i < 100; i++)
+  for (i = 0; i < MAXITER; i++)
 foo (argc);
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
index a13b08c..b57d30f 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
@@ -3,6 +3,12 @@
 /* { dg-require-profiling "-fprofile-generate" } */
 /* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate" } */
 
+#ifdef FOR_AUTOFDO_TESTING
+#define MAXITER 35000
+#else
+#define MAXITER 350
+#endif
+
 #include 
 
 typedef int (*fptr) (int);
@@ -22,7 +28,7 @@ main()
 
   x = one (3);
 
-  for (i = 0; i < 35000; i++)
+  for (i = 0; i < MAXITER; i++)
 {
   x = (*p) (3);
   p = table[x];
diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
index 9b996fc..6b5ae93 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
@@ -3,6 +3,12 @@
 /* { dg-require-profiling "-fprofile-generate" } */
 /* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa

mmix: support -fstack-usage

2020-07-20 Thread Hans-Peter Nilsson
MMIX has two stacks; the regular one using register $254 as a
convention and the register-stack, pushed and popped by call
instructions (usually).  The decision to only report the stack usage
of the regular stack (and not of the register stack) may be updated,
perhaps the sum is better.  This initial decision is helped a little
bit by the order of passes: the size of the register-stack is
calculated only later (in the machine-dependent reorg pass), long
after finalization of the stack-usage info (in the prologue/epilogue
pass).  No regressions for mmix-knuth-mmixware (but a whole lot more
PASSes), committed.

gcc:
* config/mmix/mmix.c (mmix_expand_prologue): Calculate the total
allocated size and set current_function_static_stack_size, if
flag_stack_usage_info.

--- gcc/gcc/config/mmix/mmix.c.orig Mon Jan 13 22:30:46 2020
+++ gcc/gcc/config/mmix/mmix.c  Mon Jul 20 05:44:05 2020
@@ -1987,6 +1987,7 @@ mmix_expand_prologue (void)
+ crtl->args.pretend_args_size
+ locals_size + 7) & ~7;
   HOST_WIDE_INT offset = -8;
+  HOST_WIDE_INT total_allocated_stack_space = 0;

   /* Add room needed to save global non-register-stack registers.  */
   for (regno = 255;
@@ -2036,6 +2037,8 @@ mmix_expand_prologue (void)
? (256 - 8) : stack_space_to_allocate;

  mmix_emit_sp_add (-stack_chunk);
+ total_allocated_stack_space += stack_chunk;
+
  offset += stack_chunk;
  stack_space_to_allocate -= stack_chunk;
}
@@ -2064,6 +2067,7 @@ mmix_expand_prologue (void)
? (256 - 8 - 8) : stack_space_to_allocate;

  mmix_emit_sp_add (-stack_chunk);
+ total_allocated_stack_space += stack_chunk;

  offset += stack_chunk;
  stack_space_to_allocate -= stack_chunk;
@@ -2099,6 +2103,7 @@ mmix_expand_prologue (void)
? (256 - 8 - 8) : stack_space_to_allocate;

  mmix_emit_sp_add (-stack_chunk);
+ total_allocated_stack_space += stack_chunk;

  offset += stack_chunk;
  stack_space_to_allocate -= stack_chunk;
@@ -2143,6 +2148,7 @@ mmix_expand_prologue (void)
? (256 - 8 - 8) : stack_space_to_allocate;

  mmix_emit_sp_add (-stack_chunk);
+ total_allocated_stack_space += stack_chunk;

  offset += stack_chunk;
  stack_space_to_allocate -= stack_chunk;
@@ -2193,6 +2199,8 @@ mmix_expand_prologue (void)
 ? (256 - offset - 8) : stack_space_to_allocate);

mmix_emit_sp_add (-stack_chunk);
+   total_allocated_stack_space += stack_chunk;
+
offset += stack_chunk;
stack_space_to_allocate -= stack_chunk;
  }
@@ -2210,6 +2218,14 @@ mmix_expand_prologue (void)
  wasn't allocated above.  */
   if (stack_space_to_allocate)
 mmix_emit_sp_add (-stack_space_to_allocate);
+  total_allocated_stack_space += stack_space_to_allocate;
+
+  /* Let's assume that reporting the usage of the regular stack on its
+ own, is more useful than either not supporting -fstack-usage or
+ reporting the sum of the usages of the regular stack and the
+ register stack.  */
+  if (flag_stack_usage_info)
+current_function_static_stack_size = total_allocated_stack_space;
 }

 /* Expands the function epilogue into RTX.  */


[PATCH] c++: Fixing the wording of () aggregate-init [PR92812]

2020-07-20 Thread Marek Polacek via Gcc-patches
P1975R0 tweaks the static_cast wording: it says that "An expression e can be
explicitly converted to a type T if [...] T is an aggregate type having a first
element x and there is an implicit conversion sequence from e to the type of
x."  This already works for classes, e.g.:

  struct Aggr { int x; int y; };
  Aggr a = static_cast(1);

albeit I noticed a -Wmissing-field-initializer warning which is unlikely to be
helpful in this context, as there's nothing like static_cast(1, 2)
to quash that warning.

However, the proposal also mentions "If T is ``array of unknown bound of U'',
this direct-initialization defines the type of the expression as U[1]" which
suggest that this should work for arrays (they're aggregates too, after all).
Ville, can you confirm that these

  int (&&r)[3] = static_cast(42);
  int (&&r2)[1] = static_cast(42);

are supposed to work now?  There's no {} variant to check.  Thanks.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/92812
* typeck.c (build_static_cast_1): Add warning_sentinel for
-Wmissing-field-initializer.

gcc/testsuite/ChangeLog:

PR c++/92812
* g++.dg/cpp2a/paren-init27.C: New test.
---
 gcc/cp/typeck.c   |  3 +++
 gcc/testsuite/g++.dg/cpp2a/paren-init27.C | 24 +++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init27.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 589e014f855..062751a6379 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -7472,6 +7472,9 @@ build_static_cast_1 (location_t loc, tree type, tree 
expr, bool c_cast_p,
  static_cast of the form static_cast(e) if the declaration T
  t(e);" is well-formed, for some invented temporary variable
  t.  */
+
+  /* In C++20, we don't want to warn about static_cast(1).  */
+  warning_sentinel w (warn_missing_field_initializers);
   result = perform_direct_initialization_if_possible (type, expr,
  c_cast_p, complain);
   if (result)
diff --git a/gcc/testsuite/g++.dg/cpp2a/paren-init27.C 
b/gcc/testsuite/g++.dg/cpp2a/paren-init27.C
new file mode 100644
index 000..a856c7fd7be
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/paren-init27.C
@@ -0,0 +1,24 @@
+// PR c++/92812
+// P1975R0
+// { dg-do run { target c++20 } }
+// { dg-options "-Wall -Wextra" }
+
+struct Aggr { int x; int y; };
+struct Base { int i; Base(int i_) : i{i_} { } };
+struct BaseAggr : Base { };
+struct X { };
+struct AggrSDM { static X x; int i; int j; };
+
+int
+main ()
+{
+  Aggr a = static_cast(42);
+  if (a.x != 42 || a.y != 0)
+__builtin_abort ();
+  BaseAggr b = static_cast(42);
+  if (b.i != 42)
+__builtin_abort ();
+  AggrSDM s = static_cast(42);
+  if (s.i != 42 || s.j != 0)
+__builtin_abort ();
+}

base-commit: 932fbc868ad429167a3d4d5625aa9d6dc0b4506b
-- 
2.26.2



gcc.dg/cdce3.c: Update matched line-number.

2020-07-20 Thread Hans-Peter Nilsson
I missed updating the line-number when adding that dg-skip-if.
Committed as obvious.

* gcc.dg/cdce3.c: Update matched line-number.

diff --git a/gcc/testsuite/gcc.dg/cdce3.c b/gcc/testsuite/gcc.dg/cdce3.c
index 71aea9b..601ddf0 100644
--- a/gcc/testsuite/gcc.dg/cdce3.c
+++ b/gcc/testsuite/gcc.dg/cdce3.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
 /* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details 
-fdump-tree-optimized" } */
-/* { dg-final { scan-tree-dump "cdce3.c:10: \[^\n\r]* function call is 
shrink-wrapped into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "cdce3.c:11: \[^\n\r]* function call is 
shrink-wrapped into error conditions\." "cdce" } } */
 /* { dg-final { scan-tree-dump "sqrtf \\(\[^\n\r]*\\); \\\[tail call\\\]" 
"optimized" } } */
 /* { dg-skip-if "doesn't have a sqrtf insn" { mmix-*-* } } */



gcc.dg/independent-cloneids-1.c: Skip for mmix.

2020-07-20 Thread Hans-Peter Nilsson
Regular ELF label definitions for this test-case, matched by the
regexps, e.g.:
 /* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]0:} 1 } } */
typically look like this:
bar_constprop.0:

For MMIX, they look like this:
bar_constprop::0IS @

I think it's better to just skip the test for MMIX than further
uglifying the matching regexps, since the test is IIUC general
enough that nothing in the target port can reasonably make a
difference: it passes for all targets or fail for all targets.
Committed.

gcc/testsuite:
* gcc.dg/independent-cloneids-1.c: Skip for mmix.

--- gcc/gcc/testsuite/gcc.dg/independent-cloneids-1.c.orig  Tue Jul 21 
02:13:46 2020
+++ gcc/gcc/testsuite/gcc.dg/independent-cloneids-1.c   Tue Jul 21 02:15:27 2020
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -fipa-cp -fipa-cp-clone"  } */
+/* { dg-skip-if "Odd label definition syntax" { mmix-*-* } } */

 extern int printf (const char *, ...);



Re: [PATCH 7/7 v2] rs6000/testsuite: Vector with length test cases

2020-07-20 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2020/7/21 上午12:58, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jul 10, 2020 at 06:07:16PM +0800, Kewen.Lin wrote:
>> +/* { dg-do compile { target { powerpc*-*-* } && { lp64 && 
>> powerpc_p9vector_ok } } } */
> 
> Everything in gcc.targer/powerpc/ requires powerpc*-*-* automatically
> (is never run on other targets).

Done.

> 
>> +/* { dg-final { scan-assembler-times {\mlxv\M|\mlxvx\M} 20 } } */
> 
> You can write {\mlxvx?\M} if you think that is better.  Each option has
> its own downsides and upsides here ;-)

It looks shorter, done.

> 
>> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-4.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do run { target { powerpc64*-*-* && { lp64 && p9vector_hw } } } } */
> 
> Testing for powerpc64*-*-* is always wrong (it doesn't matter what the
> *default* target is: it is usual to run the tests with RUNTESTFLAGS
> {-m32,-m64} for example.

ah, thanks for the correction!  I think lp64 is already enough to ensure
it's 64-bit on power, powerpc64*-*-* removed.

> 
> Random example from my bash history:
>   make check-gcc-c RUNTESTFLAGS="--target_board=unix'{-m64,-m32}' 
> powerpc.exp=volatile-mem.c"
> but my usual is
>   make -k -j60 check RUNTESTFLAGS="--target_board=unix'{-m64,-m32}'"
> 
> Other than that this looks fine.  Please make sure to test it on an older
> machine as well (you cannot really test on a BE p9, but ideally you would
> do that as well ;-) )

Thanks for the remind, I tested it on P7 BE and got some unsupported cases
expectedly.  Checked v1 on P9 BE (aix), the result also looked fine.

> 
> So, okay for trunk if all patches that are required for these tests have
> been committed.  Thanks!

Thanks!

BR,
Kewen


[PATCH v2] genemit.c (main): split insn-emit.c for compiling parallelly

2020-07-20 Thread Jojo
gcc/ChangeLog:

* genemit.c (main): Print 'split line'.
* Makefile.in (insn-emit.c): Define split count and file

---
 gcc/Makefile.in | 10 ++
 gcc/genemit.c   | 86 +++--
 2 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2ba76656dbf..f805050a119 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1253,6 +1253,13 @@ ANALYZER_OBJS = \
 # We put the *-match.o and insn-*.o files first so that a parallel make
 # will build them sooner, because they are large and otherwise tend to be
 # the last objects to finish building.
+
+insn-generated-split-num = 15
+
+insn-emit-split-c = $(foreach o, $(shell for i in 
{1..$(insn-generated-split-num)}; do echo $$i; done), insn-emit$(o).c)
+insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c))
+$(insn-emit-split-c): insn-emit.c
+
 OBJS = \
gimple-match.o \
generic-match.o \
@@ -1260,6 +1267,7 @@ OBJS = \
insn-automata.o \
insn-dfatab.o \
insn-emit.o \
+   $(insn-emit-split-obj) \
insn-extract.o \
insn-latencytab.o \
insn-modes.o \
@@ -2367,6 +2375,8 @@ $(simple_generated_c:insn-%.c=s-%): s-%: 
build/gen%$(build_exeext)
$(RUN_GEN) build/gen$*$(build_exeext) $(md_file) \
  $(filter insn-conditions.md,$^) > tmp-$*.c
$(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c
+   -csplit insn-$*.c /i\ am\ split\ line/ -k -s 
{$(insn-generated-split-num)} -f insn-$* -b "%d.c"
+   -( [ ! -s insn-$*0.c ] && for i in {1..$(insn-generated-split-num)}; do 
touch insn-$*$$i.c; done && echo "" > insn-$*.c)
$(STAMP) s-$*
 
 # gencheck doesn't read the machine description, and the file produced
diff --git a/gcc/genemit.c b/gcc/genemit.c
index 84d07d388ee..fd60cdeeb96 100644
--- a/gcc/genemit.c
+++ b/gcc/genemit.c
@@ -847,6 +847,45 @@ handle_overloaded_gen (overloaded_name *oname)
 }
 }
 
+#define printf_include() \
+  printf ("/* Generated automatically by the program `genemit'\n\
+from the machine description file `md'.  */\n\n"); \
+  printf ("#define IN_TARGET_CODE 1\n"); \
+  printf ("#include \"config.h\"\n"); \
+  printf ("#include \"system.h\"\n"); \
+  printf ("#include \"coretypes.h\"\n"); \
+  printf ("#include \"backend.h\"\n"); \
+  printf ("#include \"predict.h\"\n"); \
+  printf ("#include \"tree.h\"\n"); \
+  printf ("#include \"rtl.h\"\n"); \
+  printf ("#include \"alias.h\"\n"); \
+  printf ("#include \"varasm.h\"\n"); \
+  printf ("#include \"stor-layout.h\"\n"); \
+  printf ("#include \"calls.h\"\n"); \
+  printf ("#include \"memmodel.h\"\n"); \
+  printf ("#include \"tm_p.h\"\n"); \
+  printf ("#include \"flags.h\"\n"); \
+  printf ("#include \"insn-config.h\"\n"); \
+  printf ("#include \"expmed.h\"\n"); \
+  printf ("#include \"dojump.h\"\n"); \
+  printf ("#include \"explow.h\"\n"); \
+  printf ("#include \"emit-rtl.h\"\n"); \
+  printf ("#include \"stmt.h\"\n"); \
+  printf ("#include \"expr.h\"\n"); \
+  printf ("#include \"insn-codes.h\"\n"); \
+  printf ("#include \"optabs.h\"\n"); \
+  printf ("#include \"dfp.h\"\n"); \
+  printf ("#include \"output.h\"\n"); \
+  printf ("#include \"recog.h\"\n"); \
+  printf ("#include \"df.h\"\n"); \
+  printf ("#include \"resource.h\"\n"); \
+  printf ("#include \"reload.h\"\n"); \
+  printf ("#include \"diagnostic-core.h\"\n"); \
+  printf ("#include \"regs.h\"\n"); \
+  printf ("#include \"tm-constrs.h\"\n"); \
+  printf ("#include \"ggc.h\"\n"); \
+  printf ("#include \"target.h\"\n\n"); \
+
 int
 main (int argc, const char **argv)
 {
@@ -862,49 +901,19 @@ main (int argc, const char **argv)
   /* Assign sequential codes to all entries in the machine description
  in parallel with the tables in insn-output.c.  */
 
-  printf ("/* Generated automatically by the program `genemit'\n\
-from the machine description file `md'.  */\n\n");
-
-  printf ("#define IN_TARGET_CODE 1\n");
-  printf ("#include \"config.h\"\n");
-  printf ("#include \"system.h\"\n");
-  printf ("#include \"coretypes.h\"\n");
-  printf ("#include \"backend.h\"\n");
-  printf ("#include \"predict.h\"\n");
-  printf ("#include \"tree.h\"\n");
-  printf ("#include \"rtl.h\"\n");
-  printf ("#include \"alias.h\"\n");
-  printf ("#include \"varasm.h\"\n");
-  printf ("#include \"stor-layout.h\"\n");
-  printf ("#include \"calls.h\"\n");
-  printf ("#include \"memmodel.h\"\n");
-  printf ("#include \"tm_p.h\"\n");
-  printf ("#include \"flags.h\"\n");
-  printf ("#include \"insn-config.h\"\n");
-  printf ("#include \"expmed.h\"\n");
-  printf ("#include \"dojump.h\"\n");
-  printf ("#include \"explow.h\"\n");
-  printf ("#include \"emit-rtl.h\"\n");
-  printf ("#include \"stmt.h\"\n");
-  printf ("#include \"expr.h\"\n");
-  printf ("#include \"insn-codes.h\"\n");
-  printf ("#include \"optabs.h\"\n");
-  printf ("#include \"dfp.h\"\n");
-  printf ("#include \"output.h\"\n");
-  printf ("#include \"recog.h\"\n");
-  prin

Re: [PATCH] rs6000: Define movsf_from_si2 to extract high part SF element from DImode[PR89310]

2020-07-20 Thread luoxhu via Gcc-patches

On 2020/7/20 23:31, Segher Boessenkool wrote:

On Mon, Jul 13, 2020 at 02:30:28PM +0800, luoxhu wrote:

For extracting high part element from DImode register like:

{%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}

split it before reload with "and mask" to avoid generating shift right
32 bit then shift left 32 bit.  This pattern also exists in PR42475 and
PR67741, etc.

srdi 3,3,32
sldi 9,3,32
mtvsrd 1,9
xscvspdpn 1,1

=>

rldicr 3,3,0,31
mtvsrd 1,3
xscvspdpn 1,1



* config/rs6000/rs6000.md (movsf_from_si2): New
define_insn_and_split.


(That fits on one line).


--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr89310.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */



+/* { dg-final { scan-assembler-not {\msrdi\M} } } */
+/* { dg-final { scan-assembler-not {\msldi\M} } } */
+/* { dg-final { scan-assembler-times {\mrldicr\M} 1 } } */


I'm not sure that works on older cpus?  Please test there, and add
-mdejagnu-cpu=power8 to the dg-options if needed.  Also test on BE please.

Okay for trunk with those last details taking care of.  Thank you!


Thanks for the remind.  Addressed the comments and committed in r11-2245.

Xionghu


[PATCH v2] Add --ld-path= to specify an arbitrary executable as the linker

2020-07-20 Thread Fangrui Song via Gcc-patches
If the value does not contain any path component separator (e.g. a
slash), the linker will be searched for using COMPILER_PATH followed by
PATH. Otherwise, it is either an absolute path or a path relative to the
current working directory.

--ld-path= complements and overrides -fuse-ld={bfd,gold,lld}. If in the
future, we want to make dfferent linker option decisions we can let
-fuse-ld= represent the linker flavor and --ld-path= the linker path.

PR driver/93645
* common.opt (--ld-path=): Add --ld-path=
* opts.c (common_handle_option): Handle OPT__ld_path_.
* gcc.c (driver_handle_option): Likewise.
* collect2.c (main): Likewise.
* doc/invoke.texi: Document --ld-path=.

---
Changes in v2:
* Renamed -fld-path= to --ld-path= (clang 12.0.0 new option).
  The option does not affect code generation and is not a language feature,
  -f* is not suitable. Additionally, clang has other similar --*-path=
  options, e.g. --cuda-path=.
---
 gcc/collect2.c  | 63 +++--
 gcc/common.opt  |  4 +++
 gcc/doc/invoke.texi |  9 +++
 gcc/gcc.c   |  2 +-
 gcc/opts.c  |  1 +
 5 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/gcc/collect2.c b/gcc/collect2.c
index f8a5ce45994..caa1b96ab52 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -844,6 +844,7 @@ main (int argc, char **argv)
   const char **ld1;
   bool use_plugin = false;
   bool use_collect_ld = false;
+  const char *ld_path = NULL;
 
   /* The kinds of symbols we will have to consider when scanning the
  outcome of a first pass link.  This is ALL to start with, then might
@@ -961,12 +962,21 @@ main (int argc, char **argv)
if (selected_linker == USE_DEFAULT_LD)
  selected_linker = USE_PLUGIN_LD;
  }
-   else if (strcmp (argv[i], "-fuse-ld=bfd") == 0)
- selected_linker = USE_BFD_LD;
-   else if (strcmp (argv[i], "-fuse-ld=gold") == 0)
- selected_linker = USE_GOLD_LD;
-   else if (strcmp (argv[i], "-fuse-ld=lld") == 0)
- selected_linker = USE_LLD_LD;
+   else if (strncmp (argv[i], "-fuse-ld=", 9) == 0
+&& selected_linker != USE_LD_MAX)
+ {
+   if (strcmp (argv[i] + 9, "bfd") == 0)
+ selected_linker = USE_BFD_LD;
+   else if (strcmp (argv[i] + 9, "gold") == 0)
+ selected_linker = USE_GOLD_LD;
+   else if (strcmp (argv[i] + 9, "lld") == 0)
+ selected_linker = USE_LLD_LD;
+ }
+   else if (strncmp (argv[i], "--ld-path=", 10) == 0)
+ {
+   ld_path = argv[i] + 10;
+   selected_linker = USE_LD_MAX;
+ }
else if (strncmp (argv[i], "-o", 2) == 0)
  {
/* Parse the output filename if it's given so that we can make
@@ -1117,14 +1127,34 @@ main (int argc, char **argv)
   ld_file_name = find_a_file (&cpath, collect_ld_suffix, X_OK);
   use_collect_ld = ld_file_name != 0;
 }
-  /* Search the compiler directories for `ld'.  We have protection against
- recursive calls in find_a_file.  */
-  if (ld_file_name == 0)
-ld_file_name = find_a_file (&cpath, ld_suffixes[selected_linker], X_OK);
-  /* Search the ordinary system bin directories
- for `ld' (if native linking) or `TARGET-ld' (if cross).  */
-  if (ld_file_name == 0)
-ld_file_name = find_a_file (&path, full_ld_suffixes[selected_linker], 
X_OK);
+  if (selected_linker == USE_LD_MAX)
+{
+  /* If --ld-path= does not contain a path component separator, search for
+the command using cpath, then using path.  Otherwise find the linker
+relative to the current working directory.  */
+  if (lbasename (ld_path) == ld_path)
+   {
+ ld_file_name = find_a_file (&cpath, ld_path, X_OK);
+ if (ld_file_name == 0)
+   ld_file_name = find_a_file (&path, ld_path, X_OK);
+   }
+  else if (file_exists (ld_path))
+   {
+ ld_file_name = ld_path;
+   }
+}
+  else
+{
+  /* Search the compiler directories for `ld'.  We have protection against
+recursive calls in find_a_file.  */
+  if (ld_file_name == 0)
+   ld_file_name = find_a_file (&cpath, ld_suffixes[selected_linker], X_OK);
+  /* Search the ordinary system bin directories
+for `ld' (if native linking) or `TARGET-ld' (if cross).  */
+  if (ld_file_name == 0)
+   ld_file_name =
+ find_a_file (&path, full_ld_suffixes[selected_linker], X_OK);
+}
 
 #ifdef REAL_NM_FILE_NAME
   nm_file_name = find_a_file (&path, REAL_NM_FILE_NAME, X_OK);
@@ -1461,6 +1491,11 @@ main (int argc, char **argv)
  ld2--;
 #endif
}
+ else if (strncmp (arg, "--ld-path=", 10) == 0)
+   {
+ ld1--;
+ ld2--;
+   }
  else if (strncmp (arg, "--sysroot=", 10) == 0)
target_system_root = arg + 10;
 

Re: [PATCH] Add TARGET_LOWER_LOCAL_DECL_ALIGNMENT [PR95237]

2020-07-20 Thread Sunil Pandey via Gcc-patches
On Mon, Jul 20, 2020 at 5:06 AM Richard Biener
 wrote:
>
> On Sat, Jul 18, 2020 at 7:57 AM Sunil Pandey  wrote:
> >
> > On Fri, Jul 17, 2020 at 1:22 AM Richard Biener
> >  wrote:
> > >
> > > On Fri, Jul 17, 2020 at 7:15 AM Sunil Pandey  wrote:
> > > >
> > > > Any comment on revised patch? At least,  in finish_decl, decl global 
> > > > attributes are populated.
> > >
> > > +static void
> > > +ix86_lower_local_decl_alignment (tree decl)
> > > +{
> > > +  unsigned new_align = LOCAL_DECL_ALIGNMENT (decl);
> > >
> > > please use the macro-expanded call here since we want to amend
> > > ix86_local_alignment to _not_ return a lower alignment when
> > > called as LOCAL_DECL_ALIGNMENT (by adding a new parameter
> > > to ix86_local_alignment).  Can you also amend the patch in this
> > > way?
> > >
> > > +  if (new_align < DECL_ALIGN (decl))
> > > +SET_DECL_ALIGN (decl, new_align);
> > >
> > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > index 81bd2ee94f0..1ae99e30ed1 100644
> > > --- a/gcc/c/c-decl.c
> > > +++ b/gcc/c/c-decl.c
> > > @@ -5601,6 +5601,8 @@ finish_decl (tree decl, location_t init_loc, tree 
> > > init,
> > >  }
> > >
> > >invoke_plugin_callbacks (PLUGIN_FINISH_DECL, decl);
> > > +  /* Lower local decl alignment.  */
> > > +  lower_decl_alignment (decl);
> > >  }
> > >
> > > should come before plugin hook invocation, likewise for the 
> > > cp_finish_decl case.
> > >
> > > +/* Lower DECL alignment.  */
> > > +
> > > +void
> > > +lower_decl_alignment (tree decl)
> > > +{
> > > +  if (VAR_P (decl)
> > > +  && !is_global_var (decl)
> > > +  && !DECL_HARD_REGISTER (decl))
> > > +targetm.lower_local_decl_alignment (decl);
> > > +}
> > >
> > > please avoid this function, it's name sounds too generic and it's not 
> > > worth
> > > adding a public API for two calls.
> > >
> > > Alltogether this should avoid the x86 issue leaving left-overs (your 
> > > identified
> > > inliner case) as missed optimization [for the linux kernel which 
> > > appearantly
> > > decided that -mpreferred-stack-boundary=2 is a good ABI to use].
> > >
> > > Richard.
> > >
> > >
> > Revised patch attached.
>
> @@ -16776,7 +16783,7 @@ ix86_data_alignment (tree type, unsigned int
> align, bool opt)
>
>  unsigned int
>  ix86_local_alignment (tree exp, machine_mode mode,
> - unsigned int align)
> + unsigned int align, bool setalign)
>  {
>tree type, decl;
>
> @@ -16801,6 +16808,10 @@ ix86_local_alignment (tree exp, machine_mode mode,
>&& (!decl || !DECL_USER_ALIGN (decl)))
>  align = 32;
>
> +  /* Lower decl alignment.  */
> +  if (setalign && align < DECL_ALIGN (decl))
> +SET_DECL_ALIGN (decl, align);
> +
>/* If TYPE is NULL, we are allocating a stack slot for caller-save
>   register in MODE.  We will return the largest alignment of XF
>   and DF.  */
>
> sorry for not being clear - the parameter should indicate whether an
> alignment lower
> than natural alignment is OK to return thus sth like
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 31757b044c8..19703cbceb9 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -16641,7 +16641,7 @@ ix86_data_alignment (tree type, unsigned int
> align, bool opt)
>
>  unsigned int
>  ix86_local_alignment (tree exp, machine_mode mode,
> - unsigned int align)
> + unsigned int align, bool may_lower)
>  {
>tree type, decl;
>
> @@ -16658,7 +16658,8 @@ ix86_local_alignment (tree exp, machine_mode mode,
>
>/* Don't do dynamic stack realignment for long long objects with
>   -mpreferred-stack-boundary=2.  */
> -  if (!TARGET_64BIT
> +  if (may_lower
> +  && !TARGET_64BIT
>&& align == 64
>&& ix86_preferred_stack_boundary < 64
>&& (mode == DImode || (type && TYPE_MODE (type) == DImode))
>
> I also believe that spill_slot_alignment () should be able to get the
> lower alignment
> for long long but not get_stack_local_alignment () (both use
> STACK_SLOT_ALIGNMENT).
> Some uses of STACK_SLOT_ALIGNMENT also look fishy with respect to mem 
> attributes
> and alignment.
>
> Otherwise the patch looks reasonable to salvage a misguided optimization for
> a non-standard ABI.  If it is sufficient to make the people using that ABI 
> happy
> is of course another question.  I'd rather see them stop using it ...
>
> That said, I'm hesitant to be the only one OKing this ugliness but I'd
> immediately
> OK a patch removing the questionable hunk from ix86_local_alignment ;)
>
> Jakub, Jeff - any opinion?
>
> Richard.
>

Revised patch attached.

> > > > On Tue, Jul 14, 2020 at 8:37 AM Sunil Pandey  wrote:
> > > >>
> > > >> On Sat, Jul 4, 2020 at 9:11 AM Richard Biener
> > > >>  wrote:
> > > >> >
> > > >> > On July 3, 2020 11:16:46 PM GMT+02:00, Jason Merrill 
> > > >> >  wrote:
> > > >> > >On 6/29/20 5:00 AM, Richard Biener wrote:
> > > >> > >> On Fri, Jun 26, 2020 at 10:11 PM H.J. Lu  
> > > >> > >> w

[PATCH] vect: Support vector with length cost modeling

2020-07-20 Thread Kewen.Lin via Gcc-patches
Hi,

This patch is to add the cost modeling for vector with length,
it mainly follows what we generate for vector with length in
functions vect_set_loop_controls_directly and vect_gen_len
at the worst case.

For Power, the length is expected to be in bits 0-7 (high bits),
we have to model the cost of shifting bits.  To allow other targets
not suffer this, I used one target hook to describe this extra cost,
I'm not sure if it's a correct way.

Bootstrapped/regtested on powerpc64le-linux-gnu (P9) with explicit
param vect-partial-vector-usage=1.

Any comments/suggestions are highly appreciated!

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/rs6000.c (TARGET_VECTORIZE_EXTRA_LENGTH_COST): New
macro.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_VECTORIZE_EXTRA_LENGTH_COST): New target hook.
* target.def (extra_length_cost): Likewise.
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost
modeling for vector with length.
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5a4f07d5810..1c5f02796f5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1668,6 +1668,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_DOLOOP_COST_FOR_ADDRESS
 #define TARGET_DOLOOP_COST_FOR_ADDRESS 10
 
+#undef TARGET_VECTORIZE_EXTRA_LENGTH_COST
+#define TARGET_VECTORIZE_EXTRA_LENGTH_COST 1
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 6e7d9dc54a9..ef37b5c1d6d 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6106,6 +6106,14 @@ This hook should complete calculations of the cost of 
vectorizing a loop or basi
 This hook should release @var{data} and any related data structures allocated 
by TARGET_VECTORIZE_INIT_COST.  The default releases the accumulator.
 @end deftypefn
 
+@deftypevr {Target Hook} unsigned TARGET_VECTORIZE_EXTRA_LENGTH_COST
+For loop vectorization using length-based partial vectors, some targets
+need extra operations for length preparation, like one shift operation is
+required on Power to make length be encoded in bits 0-7.  This hook is to
+provide a way for this kind of extra cost.
+The default value is zero.
+@end deftypevr
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_GATHER (const_tree 
@var{mem_vectype}, const_tree @var{index_type}, int @var{scale})
 Target builtin that implements vector gather operation.  @var{mem_vectype}
 is the vector type of the load and @var{index_type} is scalar type of
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 3be984bbd5c..ae9a513f529 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4191,6 +4191,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_DESTROY_COST_DATA
 
+@hook TARGET_VECTORIZE_EXTRA_LENGTH_COST
+
 @hook TARGET_VECTORIZE_BUILTIN_GATHER
 
 @hook TARGET_VECTORIZE_BUILTIN_SCATTER
diff --git a/gcc/target.def b/gcc/target.def
index 07059a87caf..3134be7ea7b 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2058,6 +2058,16 @@ DEFHOOK
  (void *data),
  default_destroy_cost_data)
 
+/* For target-specific cost on length-based vectorization.  */
+DEFHOOKPOD
+(extra_length_cost,
+ "For loop vectorization using length-based partial vectors, some targets\n\
+need extra operations for length preparation, like one shift operation is\n\
+required on Power to make length be encoded in bits 0-7.  This hook is to\n\
+provide a way for this kind of extra cost.\n\
+The default value is zero.",
+ unsigned, 0)
+
 HOOK_VECTOR_END (vectorize)
 
 #undef HOOK_PREFIX
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index e933441b922..294a445afac 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3652,7 +3652,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
  TODO: Build an expression that represents peel_iters for prologue and
  epilogue to be used in a run-time test.  */
 
-  if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+  if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
 {
   peel_iters_prologue = 0;
   peel_iters_epilogue = 0;
@@ -3663,45 +3663,149 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
  peel_iters_epilogue += 1;
  stmt_info_for_cost *si;
  int j;
- FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
-   j, si)
+ FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), j,
+   si)
(void) add_stmt_cost (loop_vinfo, target_cost_data, si->count,
  si->kind, si->stmt_info, si->vectype,
  si->misalign, vect_epilogue);
}
 
-  /* Calculate how many masks we need to generate.  */
-  unsigned int num_masks = 0;
-  rgroup_controls *r

Re: [committed] libstdc++: Add std::from_chars for floating-point types

2020-07-20 Thread Florian Weimer via Gcc-patches
* Jonathan Wakely via Libstdc:

> By replacing the use of strtod we could avoid allocation, avoid changing
> locale, and use optimised code paths specific to each std::chars_format
> case. We would also get more portable behaviour, rather than depending
> on the presence of uselocale, and on any bugs or quirks of the target
> libc's strtod. Replacing strtod is a project for a later date.

glibc already has strtod_l (since glibc 2.1, undocumented, but declared
in ).

What seems to be missing is a function that takes an explicit buffer
length.  A static reference to the C locale object would be helpful as
well, I assume.

Maybe this is sufficiently clean that we can export this for libstdc++'s
use?  Without repeating the libio mess?

Thanks,
Florian



[committed] testsuite: Add signal checking for signal related testcase in analyzer.

2020-07-20 Thread Kito Cheng
 - Verifed on RISC-V and x86.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/signal-1.c: Add dg-require-effective-target
signal.
* gcc.dg/analyzer/signal-2.c: Ditto.
* gcc.dg/analyzer/signal-3.c: Ditto.
* gcc.dg/analyzer/signal-4a.c: Ditto.
* gcc.dg/analyzer/signal-4b.c: Ditto.
* gcc.dg/analyzer/signal-5.c: Ditto.
* gcc.dg/analyzer/signal-6.c: Ditto.
* gcc.dg/analyzer/signal-exit.c: Ditto.
---
 gcc/testsuite/gcc.dg/analyzer/signal-1.c| 1 +
 gcc/testsuite/gcc.dg/analyzer/signal-2.c| 1 +
 gcc/testsuite/gcc.dg/analyzer/signal-3.c| 1 +
 gcc/testsuite/gcc.dg/analyzer/signal-4a.c   | 1 +
 gcc/testsuite/gcc.dg/analyzer/signal-4b.c   | 1 +
 gcc/testsuite/gcc.dg/analyzer/signal-5.c| 1 +
 gcc/testsuite/gcc.dg/analyzer/signal-6.c| 1 +
 gcc/testsuite/gcc.dg/analyzer/signal-exit.c | 1 +
 8 files changed, 8 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/analyzer/signal-1.c 
b/gcc/testsuite/gcc.dg/analyzer/signal-1.c
index 4dcbcc0fc6bd..43f911ba648b 100644
--- a/gcc/testsuite/gcc.dg/analyzer/signal-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/signal-1.c
@@ -1,6 +1,7 @@
 /* Example of a bad call within a signal handler.
'handler' calls 'custom_logger' which calls 'fprintf', and 'fprintf' is
not allowed from a signal handler.  */
+/* { dg-require-effective-target signal } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/signal-2.c 
b/gcc/testsuite/gcc.dg/analyzer/signal-2.c
index a56acb060ec8..d047c677c419 100644
--- a/gcc/testsuite/gcc.dg/analyzer/signal-2.c
+++ b/gcc/testsuite/gcc.dg/analyzer/signal-2.c
@@ -1,6 +1,7 @@
 /* Example of a bad call within a signal handler.
'handler' calls 'custom_logger' which calls 'fprintf', and 'fprintf' is
not allowed from a signal handler.  */
+/* { dg-require-effective-target signal } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/signal-3.c 
b/gcc/testsuite/gcc.dg/analyzer/signal-3.c
index 5b307771..f5072b52f08b 100644
--- a/gcc/testsuite/gcc.dg/analyzer/signal-3.c
+++ b/gcc/testsuite/gcc.dg/analyzer/signal-3.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target signal } */
 #include 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/signal-4a.c 
b/gcc/testsuite/gcc.dg/analyzer/signal-4a.c
index 4b68b6d045b9..4ee6f0e7d0e0 100644
--- a/gcc/testsuite/gcc.dg/analyzer/signal-4a.c
+++ b/gcc/testsuite/gcc.dg/analyzer/signal-4a.c
@@ -2,6 +2,7 @@
 
 /* { dg-options "-fanalyzer -fdiagnostics-show-line-numbers 
-fdiagnostics-path-format=inline-events -fdiagnostics-show-caret" } */
 /* { dg-enable-nn-line-numbers "" } */
+/* { dg-require-effective-target signal } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/signal-4b.c 
b/gcc/testsuite/gcc.dg/analyzer/signal-4b.c
index 38d402473574..cb1e7e475ae3 100644
--- a/gcc/testsuite/gcc.dg/analyzer/signal-4b.c
+++ b/gcc/testsuite/gcc.dg/analyzer/signal-4b.c
@@ -2,6 +2,7 @@
 
 /* { dg-options "-fanalyzer -fdiagnostics-show-line-numbers 
-fdiagnostics-path-format=inline-events -fdiagnostics-show-caret" } */
 /* { dg-enable-nn-line-numbers "" } */
+/* { dg-require-effective-target signal } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/signal-5.c 
b/gcc/testsuite/gcc.dg/analyzer/signal-5.c
index 4e464fffda54..81ac812ebbd3 100644
--- a/gcc/testsuite/gcc.dg/analyzer/signal-5.c
+++ b/gcc/testsuite/gcc.dg/analyzer/signal-5.c
@@ -1,4 +1,5 @@
 /* Example of other bad calls within a signal handler.  */
+/* { dg-require-effective-target signal } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/signal-6.c 
b/gcc/testsuite/gcc.dg/analyzer/signal-6.c
index f51845167f5c..ea2290c4296a 100644
--- a/gcc/testsuite/gcc.dg/analyzer/signal-6.c
+++ b/gcc/testsuite/gcc.dg/analyzer/signal-6.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target signal } */
 #include 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/signal-exit.c 
b/gcc/testsuite/gcc.dg/analyzer/signal-exit.c
index a567124c7d4d..41a819b838c8 100644
--- a/gcc/testsuite/gcc.dg/analyzer/signal-exit.c
+++ b/gcc/testsuite/gcc.dg/analyzer/signal-exit.c
@@ -1,6 +1,7 @@
 /* Example of a bad call within a signal handler with replacement
alternative.  'handler' calls 'exit', and 'exit' is not allowed
from a signal handler.  But '_exit' is allowed.  */
+/* { dg-require-effective-target signal } */
 
 #include 
 #include 
-- 
2.27.0



Re: [PATCH v2] genemit.c (main): split insn-emit.c for compiling parallelly

2020-07-20 Thread Bin.Cheng via Gcc-patches
On Tue, Jul 21, 2020 at 11:14 AM Jojo  wrote:
>
> gcc/ChangeLog:
>
> * genemit.c (main): Print 'split line'.
> * Makefile.in (insn-emit.c): Define split count and file
>

Thanks for working one this, following comments are based on the
assumption that the approach is feasible after your investigation.

It's great to accelerate compilation time, do you have any number
showing how much this can achieve, on a typical machine with
reasonable parallelism?

> ---
>  gcc/Makefile.in | 10 ++
>  gcc/genemit.c   | 86 +++--
>  2 files changed, 58 insertions(+), 38 deletions(-)
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 2ba76656dbf..f805050a119 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1253,6 +1253,13 @@ ANALYZER_OBJS = \
>  # We put the *-match.o and insn-*.o files first so that a parallel make
>  # will build them sooner, because they are large and otherwise tend to be
>  # the last objects to finish building.
> +
> +insn-generated-split-num = 15
Hardcode number 15 looks strange here, how shall we know it's enough?
Or one step further, can we use some kind of general match "*" for
writing make rules here?
> +
> +insn-emit-split-c = $(foreach o, $(shell for i in 
> {1..$(insn-generated-split-num)}; do echo $$i; done), insn-emit$(o).c)
> +insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c))
> +$(insn-emit-split-c): insn-emit.c
> +
>  OBJS = \
> gimple-match.o \
> generic-match.o \
> @@ -1260,6 +1267,7 @@ OBJS = \
> insn-automata.o \
> insn-dfatab.o \
> insn-emit.o \
> +   $(insn-emit-split-obj) \
> insn-extract.o \
> insn-latencytab.o \
> insn-modes.o \
> @@ -2367,6 +2375,8 @@ $(simple_generated_c:insn-%.c=s-%): s-%: 
> build/gen%$(build_exeext)
> $(RUN_GEN) build/gen$*$(build_exeext) $(md_file) \
>   $(filter insn-conditions.md,$^) > tmp-$*.c
> $(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c
> +   -csplit insn-$*.c /i\ am\ split\ line/ -k -s 
> {$(insn-generated-split-num)} -f insn-$* -b "%d.c"
> +   -( [ ! -s insn-$*0.c ] && for i in {1..$(insn-generated-split-num)}; 
> do touch insn-$*$$i.c; done && echo "" > insn-$*.c)
Not sure if this is the first time that csplit/coreutils is used,
shall we mention it here
https://gcc.gnu.org/install/prerequisites.html, even check it in
configure?
> $(STAMP) s-$*
>
>  # gencheck doesn't read the machine description, and the file produced
> diff --git a/gcc/genemit.c b/gcc/genemit.c
> index 84d07d388ee..fd60cdeeb96 100644
> --- a/gcc/genemit.c
> +++ b/gcc/genemit.c
> @@ -847,6 +847,45 @@ handle_overloaded_gen (overloaded_name *oname)
>  }
>  }
>
> +#define printf_include() \
> +  printf ("/* Generated automatically by the program `genemit'\n\
> +from the machine description file `md'.  */\n\n"); \
> +  printf ("#define IN_TARGET_CODE 1\n"); \
> +  printf ("#include \"config.h\"\n"); \
> +  printf ("#include \"system.h\"\n"); \
> +  printf ("#include \"coretypes.h\"\n"); \
> +  printf ("#include \"backend.h\"\n"); \
> +  printf ("#include \"predict.h\"\n"); \
> +  printf ("#include \"tree.h\"\n"); \
> +  printf ("#include \"rtl.h\"\n"); \
> +  printf ("#include \"alias.h\"\n"); \
> +  printf ("#include \"varasm.h\"\n"); \
> +  printf ("#include \"stor-layout.h\"\n"); \
> +  printf ("#include \"calls.h\"\n"); \
> +  printf ("#include \"memmodel.h\"\n"); \
> +  printf ("#include \"tm_p.h\"\n"); \
> +  printf ("#include \"flags.h\"\n"); \
> +  printf ("#include \"insn-config.h\"\n"); \
> +  printf ("#include \"expmed.h\"\n"); \
> +  printf ("#include \"dojump.h\"\n"); \
> +  printf ("#include \"explow.h\"\n"); \
> +  printf ("#include \"emit-rtl.h\"\n"); \
> +  printf ("#include \"stmt.h\"\n"); \
> +  printf ("#include \"expr.h\"\n"); \
> +  printf ("#include \"insn-codes.h\"\n"); \
> +  printf ("#include \"optabs.h\"\n"); \
> +  printf ("#include \"dfp.h\"\n"); \
> +  printf ("#include \"output.h\"\n"); \
> +  printf ("#include \"recog.h\"\n"); \
> +  printf ("#include \"df.h\"\n"); \
> +  printf ("#include \"resource.h\"\n"); \
> +  printf ("#include \"reload.h\"\n"); \
> +  printf ("#include \"diagnostic-core.h\"\n"); \
> +  printf ("#include \"regs.h\"\n"); \
> +  printf ("#include \"tm-constrs.h\"\n"); \
> +  printf ("#include \"ggc.h\"\n"); \
> +  printf ("#include \"target.h\"\n\n"); \

Can you use do {} while(0) style for defining this code block macro?
The trailing '\' is also strange here.
> +
>  int
>  main (int argc, const char **argv)
>  {
> @@ -862,49 +901,19 @@ main (int argc, const char **argv)
>/* Assign sequential codes to all entries in the machine description
>   in parallel with the tables in insn-output.c.  */
>
> -  printf ("/* Generated automatically by the program `genemit'\n\
> -from the machine description file `md'.  */\n\n");
> -
> -  printf ("#define IN_TARGET_CODE 1\n");
> -  printf ("#include \"config.h\"\n");
> -  pri