RE: [PATCH][GCC][ARM][AArch64] Testsuite framework changes and execution tests [Patch (8/8)]

2017-10-26 Thread Tamar Christina
Hi James,

> > b3e4a2d7f5b0 100644
> > --- a/gcc/doc/sourcebuild.texi
> > +++ b/gcc/doc/sourcebuild.texi
> > @@ -1684,6 +1684,17 @@ ARM target supports executing instructions from
> > ARMv8.2 with the FP16  extension.  Some multilibs may be incompatible
> with these options.
> >  Implies arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_scalar_hw.
> >
> > +@item arm_v8_2a_dotprod_neon_ok
> > +@anchor{arm_v8_2a_dotprod_neon_ok}
> > +ARM target supports options to generate instructions from ARMv8.2
> > +with
> 
> Armv8.2-A?

Nothing else in this documentation refers to the architectures as -a, the only 
usages I see
Are ARMv8.2 and ARMv8.1. I'm happy to change it but wanted to point out this is 
not how the
Rest of the documentation is written.

> 
> > +the Dot Product extension. Some multilibs may be incompatible with
> > +these options.
> > +
> > +@item arm_v8_2a_dotprod_neon_hw
> > +ARM target supports executing instructions from ARMv8.2 with the Dot
> 
> Likewise.
> 
> > +Product extension. Some multilibs may be incompatible with these
> options.
> > +Implies arm_v8_2a_dotprod_neon_ok.
> > +
> >  @item arm_prefer_ldrd_strd
> >  ARM target prefers @code{LDRD} and @code{STRD} instructions over
> > @code{LDM} and @code{STM} instructions.
> > @@ -2290,6 +2301,11 @@ supported by the target; see the
> > @ref{arm_v8_2a_fp16_neon_ok,,arm_v8_2a_fp16_neon_ok} effective
> target
> > keyword.
> >
> > +@item arm_v8_2a_dotprod_neon
> > +Add options for ARMv8.2 with Adv.SIMD Dot Product support, if this is
> > +supported by the target; see the @ref{arm_v8_2a_dotprod_neon_ok}
> > +effective target keyword.
> > +
> 
> Likewise.
> 
> >  @item bind_pic_locally
> >  Add the target-specific flags needed to enable functions to bind
> > locally when using pic/PIC passes in the testsuite.
> 
> > diff --git a/gcc/testsuite/lib/target-supports.exp
> > b/gcc/testsuite/lib/target-supports.exp
> > index
> >
> 57f646ce2df5bcd5619870403242e73f6e91ff77..2877f08393ac0de1ff3b3258a56
> d
> > ff1ab1852413 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -4311,6 +4311,48 @@ proc
> check_effective_target_arm_v8_2a_fp16_neon_ok { } {
> >
>   check_effective_target_arm_v8_2a_fp16_neon_ok_nocache]
> >  }
> >
> >  # Return 1 if the target supports executing ARMv8 NEON instructions,
> > 0  # otherwise.
> >
> > @@ -4448,6 +4490,42 @@ proc
> check_effective_target_arm_v8_2a_fp16_neon_hw { } {
> >  } [add_options_for_arm_v8_2a_fp16_neon ""]]  }
> >
> > +# Return 1 if the target supports executing AdvSIMD instructions from
> > +ARMv8.2 # with the Dot Product extension, 0 otherwise.  The test is
> > +valid for ARM and for # AArch64.
> > +
> > +proc check_effective_target_arm_v8_2a_dotprod_neon_hw { } {
> > +if { ![check_effective_target_arm_v8_2a_dotprod_neon_ok] } {
> > +return 0;
> > +}
> > +return [check_runtime arm_v8_2a_dotprod_neon_hw_available {
> > +#include "arm_neon.h"
> > +int
> > +main (void)
> > +{
> > +
> > + uint32x2_t results = {0,0};
> > + uint8x8_t a = {1,1,1,1,2,2,2,2};
> > + uint8x8_t b = {2,2,2,2,3,3,3,3};
> > +
> > +  #ifdef __ARM_ARCH_ISA_A64
> > +  asm ("udot %0.2s, %1.8b, %2.8b"
> > +   : "=w"(results)
> > +   : "w"(a), "w"(b)
> > +   : /* No clobbers.  */);
> > +
> > + #elif __ARM_ARCH >= 8
> 
> I don't think this does anything, should it just be else?
> 
> > +  asm ("vudot.u8 %P0, %P1, %P2"
> > +   : "=w"(results)
> > +   : "w"(a), "w"(b)
> > +   : /* No clobbers.  */);
> > +  #endif
> > +
> > +  return (results[0] == 8 && results[1] == 24) ? 1 : 0;
> > +}
> > +} [add_options_for_arm_v8_2a_dotprod_neon ""]] }
> > +



Re: [PATCH] Document --coverage and fork-like functions (PR gcov-profile/82457).

2017-10-26 Thread Martin Liška
On 10/20/2017 06:03 AM, Sandra Loosemore wrote:
> On 10/19/2017 12:26 PM, Eric Gallager wrote:
>> On 10/19/17, Martin Liška  wrote:
>>> Hi.
>>>
>>> As discussed in the PR, we should be more precise in our documentation.
>>> The patch does that.
>>>
>>> Ready for trunk?
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2017-10-19  Martin Liska  
>>>
>>> PR gcov-profile/82457
>>> * doc/invoke.texi: Document that one needs a non-strict ISO mode
>>> for fork-like functions to be properly instrumented.
>>> ---
>>>   gcc/doc/invoke.texi | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>>
>>>
>>
>> The wording is kinda unclear because the modes in the parentheses are
>> all strict ISO modes, but the part before the parentheses says
>> NON-strict... I think you either need an additional "not" inside the
>> parentheses, or to change all the instances of -std=c* to -std=gnu*.
> 
> The wording in the patch doesn't make sense to me, either.  If I understand 
> the issue correctly, the intent is probably to say something like
> 
> Unless a strict ISO C dialect option is in effect,
> @code{fork} calls are detected and correctly handled without double counting.
> 
> ??

Hi Sandra.

Thank you for the feedback, I'm sending version you suggested. Hope it's fine 
to install the patch?

Thanks,
Martin

> 
> -Sandra
> 
> 

>From 4395e7172b8af4628f3c294e16294d5136a5e51a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 19 Oct 2017 12:18:45 +0200
Subject: [PATCH] Document --coverage and fork-like functions (PR
 gcov-profile/82457).

gcc/ChangeLog:

2017-10-19  Martin Liska  

	PR gcov-profile/82457
	* doc/invoke.texi: Document that one needs a non-strict ISO mode
	for fork-like functions to be properly instrumented.
---
 gcc/doc/invoke.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 71b2445f70f..f5dbf866adc 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10864,7 +10864,8 @@ information.  This may be repeated any number of times.  You can run
 concurrent instances of your program, and provided that the file system
 supports locking, the data files will be correctly updated.  Also
 @code{fork} calls are detected and correctly handled (double counting
-will not happen).
+will not happen). Unless a strict ISO C dialect option is in effect,
+@code{fork} calls are detected and correctly handled without double counting.
 
 @item
 For profile-directed optimizations, compile the source files again with
-- 
2.14.2



Re: Add scatter/gather costs

2017-10-26 Thread Jan Hubicka
> Hi Honza, 
> 
> > +  /* VGATHERDPD is 23 uops and throughput is 9, VGATHERDPD is 35 uops,
> > + throughput 12.  Approx 9 uops do not depend on vector size and every
> > load
> > + is 7 uops.  */
> > +  18, 8,   /* Gather load static, per_elt.  */
> > +  18, 10,  /* Gather store static, per_elt.  */
> 
> Can you please help on how you arrived at 18 for the load/store static cost 
> (based on throughput)?
> Per_elt is 8  i.e. (latency of load ) 4 * 2 (reg-reg move ) ?

>From the number of uops it seemed that gather is roughly 9+7*n where n is 
>number of
entries. reg-reg move is 2, so 18 is 9*2.  I think we need to account that CPU
is indeed doing n independent load operations (so it does not save anything 
compared
to scalar code) and bit more.  Load cost is set to 6 (perhaps it should be 8 for
integer and more for FP?). So I went for 8 to make it bit more expensive.

I plan to experiment with the values incrementally so any suggestions are 
welcome.
Honza
>  
> 
> >32,  /* size of l1 cache.  */
> >512, /* size of l2 cache.  */
> >64,  /* size of prefetch block.  */
> > @@ -1539,6 +1574,8 @@ const struct processor_costs btver1_cost
> >in 32,64,128,256 and 512-bit */
> >{10, 10, 12, 24, 48},/* cost of unaligned stores.  */
> >14, 14,  /* SSE->integer and integer->SSE
> > moves */
> > +  10, 10,  /* Gather load static, per_elt.  */
> > +  10, 10,  /* Gather store static, per_elt.  */
> >32,  /* size of l1 cache.  */
> >512, /* size of l2 cache.  */
> >64,  /* size of prefetch block */
> > @@ -1624,6 +1661,8 @@ const struct processor_costs btver2_cost
> >in 32,64,128,256 and 512-bit */
> >{10, 10, 12, 24, 48},/* cost of unaligned stores.  */
> >14, 14,  /* SSE->integer and integer->SSE
> > moves */
> > +  10, 10,  /* Gather load static, per_elt.  */
> > +  10, 10,  /* Gather store static, per_elt.  */
> >32,  /* size of l1 cache.  */
> >2048,/* size of l2 cache.  */
> >64,  /* size of prefetch block */
> > @@ -1708,6 +1747,8 @@ struct processor_costs pentium4_cost = {
> >in 32,64,128,256 and 512-bit */
> >{32, 32, 32, 64, 128},   /* cost of unaligned stores.  */
> >20, 12,  /* SSE->integer and integer->SSE
> > moves */
> > +  16, 16,  /* Gather load static, per_elt.  */
> > +  16, 16,  /* Gather store static, per_elt.  */
> >8,   /* size of l1 cache.  */
> >256, /* size of l2 cache.  */
> >64,  /* size of prefetch block */
> > @@ -1795,6 +1836,8 @@ struct processor_costs nocona_cost = {
> >in 32,64,128,256 and 512-bit */
> >{24, 24, 24, 48, 96},/* cost of unaligned stores.  */
> >20, 12,  /* SSE->integer and integer->SSE
> > moves */
> > +  12, 12,  /* Gather load static, per_elt.  */
> > +  12, 12,  /* Gather store static, per_elt.  */
> >8,   /* size of l1 cache.  */
> >1024,/* size of l2 cache.  */
> >64,  /* size of prefetch block */
> > @@ -1880,6 +1923,8 @@ struct processor_costs atom_cost = {
> >in 32,64,128,256 and 512-bit */
> >{16, 16, 16, 32, 64},/* cost of unaligned stores.  */
> >8, 6,/* SSE->integer and integer->SSE
> > moves */
> > +  8, 8,/* Gather load static, per_elt. 
> >  */
> > +  8, 8,/* Gather store static, 
> > per_elt.  */
> >32,  /* size of l1 cache.  */
> >256, /* size of l2 cache.  */
> >64,  /* size of prefetch block */
> > @@ -1965,6 +2010,8 @@ struct processor_costs slm_cost = {
> >in 32,64,128,256 and 512-bit */
> >{16, 16, 16, 32, 64},/* cost of unaligned stores.  */
> >8, 6,/* SSE->integer and integer-

Re: [PATCH][GCC][ARM][AArch64] Testsuite framework changes and execution tests [Patch (8/8)]

2017-10-26 Thread James Greenhalgh
On Thu, Oct 26, 2017 at 08:10:28AM +0100, Tamar Christina wrote:
> Hi James,
> 
> > > b3e4a2d7f5b0 100644
> > > --- a/gcc/doc/sourcebuild.texi
> > > +++ b/gcc/doc/sourcebuild.texi
> > > @@ -1684,6 +1684,17 @@ ARM target supports executing instructions from
> > > ARMv8.2 with the FP16  extension.  Some multilibs may be incompatible
> > with these options.
> > >  Implies arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_scalar_hw.
> > >
> > > +@item arm_v8_2a_dotprod_neon_ok
> > > +@anchor{arm_v8_2a_dotprod_neon_ok}
> > > +ARM target supports options to generate instructions from ARMv8.2
> > > +with
> > 
> > Armv8.2-A?
> 
> Nothing else in this documentation refers to the architectures as -a, the 
> only usages I see
> Are ARMv8.2 and ARMv8.1. I'm happy to change it but wanted to point out this 
> is not how the
> Rest of the documentation is written.

OK, if it fits the current style I don't mind whether you make this
change or drop it.

It would be nice to update the rest of the documentation to be accurate,
but that doesn't need to happen for this patch to be OK by me.

James



[PATCH 2/7] GCOV: introduce usage of terminal colors.

2017-10-26 Thread marxin
I consider using colors in context of gcov as very useful. There's
example for tramp3d:
https://pste.eu/p/Tl2D.html

gcc/ChangeLog:

2017-10-23  Martin Liska  

* color-macros.h: New file.
* diagnostic-color.c: Factor out color related to macros to
color-macros.h.
* doc/gcov.texi: Document -k option.
* gcov.c (INCLUDE_STRING): Include string.h.
(print_usage): Add -k option.
(process_args): Parse it.
(pad_count_string): New function.
(output_line_beginning): Likewise.
(DEFAULT_LINE_START): New macro.
(output_lines): Support color output.
---
 gcc/color-macros.h |  50 +++
 gcc/diagnostic-color.c |  27 +
 gcc/doc/gcov.texi  |   9 +
 gcc/gcov.c | 105 +
 4 files changed, 148 insertions(+), 43 deletions(-)
 create mode 100644 gcc/color-macros.h

diff --git a/gcc/color-macros.h b/gcc/color-macros.h
new file mode 100644
index 000..b23cae49df7
--- /dev/null
+++ b/gcc/color-macros.h
@@ -0,0 +1,50 @@
+/* Terminal color manipulation macros.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_COLOR_MACROS_H
+#define GCC_COLOR_MACROS_H
+
+#define COLOR_SEPARATOR";"
+#define COLOR_NONE "00"
+#define COLOR_BOLD "01"
+#define COLOR_UNDERSCORE   "04"
+#define COLOR_BLINK"05"
+#define COLOR_REVERSE  "07"
+#define COLOR_FG_BLACK "30"
+#define COLOR_FG_RED   "31"
+#define COLOR_FG_GREEN "32"
+#define COLOR_FG_YELLOW"33"
+#define COLOR_FG_BLUE  "34"
+#define COLOR_FG_MAGENTA   "35"
+#define COLOR_FG_CYAN  "36"
+#define COLOR_FG_WHITE "37"
+#define COLOR_BG_BLACK "40"
+#define COLOR_BG_RED   "41"
+#define COLOR_BG_GREEN "42"
+#define COLOR_BG_YELLOW"43"
+#define COLOR_BG_BLUE  "44"
+#define COLOR_BG_MAGENTA   "45"
+#define COLOR_BG_CYAN  "46"
+#define COLOR_BG_WHITE "47"
+#define SGR_START  "\33["
+#define SGR_END"m\33[K"
+#define SGR_SEQ(str)   SGR_START str SGR_END
+#define SGR_RESET  SGR_SEQ("")
+
+#endif  /* GCC_COLOR_MACROS_H */
diff --git a/gcc/diagnostic-color.c b/gcc/diagnostic-color.c
index b8cf6f2c045..0ee4bb287e0 100644
--- a/gcc/diagnostic-color.c
+++ b/gcc/diagnostic-color.c
@@ -81,32 +81,7 @@
   It would be impractical for GCC to become a full-fledged
   terminal program linked against ncurses or the like, so it will
   not detect terminfo(5) capabilities.  */
-#define COLOR_SEPARATOR";"
-#define COLOR_NONE "00"
-#define COLOR_BOLD "01"
-#define COLOR_UNDERSCORE   "04"
-#define COLOR_BLINK"05"
-#define COLOR_REVERSE  "07"
-#define COLOR_FG_BLACK "30"
-#define COLOR_FG_RED   "31"
-#define COLOR_FG_GREEN "32"
-#define COLOR_FG_YELLOW"33"
-#define COLOR_FG_BLUE  "34"
-#define COLOR_FG_MAGENTA   "35"
-#define COLOR_FG_CYAN  "36"
-#define COLOR_FG_WHITE "37"
-#define COLOR_BG_BLACK "40"
-#define COLOR_BG_RED   "41"
-#define COLOR_BG_GREEN "42"
-#define COLOR_BG_YELLOW"43"
-#define COLOR_BG_BLUE  "44"
-#define COLOR_BG_MAGENTA   "45"
-#define COLOR_BG_CYAN  "46"
-#define COLOR_BG_WHITE "47"
-#define SGR_START  "\33["
-#define SGR_END"m\33[K"
-#define SGR_SEQ(str)   SGR_START str SGR_END
-#define SGR_RESET  SGR_SEQ("")
+#include "color-macros.h"
 
 
 /* The context and logic for choosing default --color screen attributes
diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index c527b89f13b..2aa7166e35d 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -125,6 +125,7 @@ gcov [@option{-v}|@option{--version}] 
[@option{-h}|@option{--help}]
  [@option{-d}|@option{--display-progress}]
  [@option{-f}|@option{--function-summaries}]
  [@option{-i}|@option{--intermediate-format}]
+ [@option{-k}|@option{--use-colors}]
  [@option{-l}|@option{--long-file-names}]
  [@option{-m}|@option{--demangled-names}]
  [@option{-n}|@option{--no-output}]
@@ -2

[PATCH 7/7] GCOV: std::vector refactoring III

2017-10-26 Thread marxin
gcc/ChangeLog:

2017-10-26  Martin Liska  

* gcov.c (struct name_map): do not use typedef.
Define operator== and operator<.
(name_search): Remove.
(name_sort): Remove.
(main): Do not allocate names.
(process_file): Add vertical space.
(generate_results): Use std::find.
(release_structures): Do not release memory.
(find_source): Use std::find.
---
 gcc/gcov.c | 132 ++---
 1 file changed, 57 insertions(+), 75 deletions(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index 7f6268c6460..865deaaafae 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -302,11 +302,37 @@ source_info::source_info (): name (NULL), file_time (), 
lines (),
 {
 }
 
-typedef struct name_map
+struct name_map
 {
-  char *name;  /* Source file name */
+  name_map ()
+  {
+  }
+
+  name_map (char *_name, unsigned _src): name (_name), src (_src)
+  {
+  }
+
+  bool operator== (const name_map &rhs) const
+  {
+#if HAVE_DOS_BASED_FILE_SYSTEM
+return strcasecmp (this->name, rhs.name) == 0;
+#else
+return strcmp (this->name, rhs.name) == 0;
+#endif
+  }
+
+  bool operator< (const name_map &rhs) const
+  {
+#if HAVE_DOS_BASED_FILE_SYSTEM
+return strcasecmp (this->name, rhs.name) < 0;
+#else
+return strcmp (this->name, rhs.name) < 0;
+#endif
+  }
+
+  const char *name;  /* Source file name */
   unsigned src;  /* Source file */
-} name_map_t;
+};
 
 /* Holds a list of function basic block graphs.  */
 
@@ -316,9 +342,8 @@ static function_t **fn_end = &functions;
 /* Vector of source files.  */
 static vector sources;
 
-static name_map_t *names;   /* Mapping of file names to sources */
-static unsigned n_names;/* Number of names */
-static unsigned a_names;/* Allocated names */
+/* Mapping of file names to sources */
+static vector names;
 
 /* This holds data summary information.  */
 
@@ -447,8 +472,6 @@ static void print_version (void) ATTRIBUTE_NORETURN;
 static void process_file (const char *);
 static void generate_results (const char *);
 static void create_file_names (const char *);
-static int name_search (const void *, const void *);
-static int name_sort (const void *, const void *);
 static char *canonicalize_name (const char *);
 static unsigned find_source (const char *);
 static function_t *read_graph_file (void);
@@ -679,9 +702,6 @@ main (int argc, char **argv)
   /* Handle response files.  */
   expandargv (&argc, &argv);
 
-  a_names = 10;
-  names = XNEWVEC (name_map_t, a_names);
-
   argno = process_args (argc, argv);
   if (optind == argc)
 print_usage (true);
@@ -950,7 +970,7 @@ process_file (const char *file_name)
  unsigned line = fn->line;
  unsigned block_no;
  function_t *probe, **prev;
- 
+
  /* Now insert it into the source file's list of
 functions. Normally functions will be encountered in
 ascending order, so a simple scan is quick.  Note we're
@@ -1047,12 +1067,15 @@ generate_results (const char *file_name)
}
 }
 
+  name_map needle;
+
   if (file_name)
 {
-  name_map_t *name_map = (name_map_t *)bsearch
-   (file_name, names, n_names, sizeof (*names), name_search);
-  if (name_map)
-   file_name = sources[name_map->src].coverage.name;
+  needle.name = file_name;
+  vector::iterator it = std::find (names.begin (), names.end (),
+needle);
+  if (it != names.end ())
+   file_name = sources[it->src].coverage.name;
   else
file_name = canonicalize_name (file_name);
 }
@@ -1095,13 +1118,8 @@ generate_results (const char *file_name)
 static void
 release_structures (void)
 {
-  unsigned ix;
   function_t *fn;
 
-  for (ix = n_names; ix--;)
-free (names[ix].name);
-  free (names);
-
   while ((fn = functions))
 {
   functions = fn->next;
@@ -1177,77 +1195,42 @@ create_file_names (const char *file_name)
   return;
 }
 
-/* A is a string and B is a pointer to name_map_t.  Compare for file
-   name orderability.  */
-
-static int
-name_search (const void *a_, const void *b_)
-{
-  const char *a = (const char *)a_;
-  const name_map_t *b = (const name_map_t *)b_;
-
-#if HAVE_DOS_BASED_FILE_SYSTEM
-  return strcasecmp (a, b->name);
-#else
-  return strcmp (a, b->name);
-#endif
-}
-
-/* A and B are a pointer to name_map_t.  Compare for file name
-   orderability.  */
-
-static int
-name_sort (const void *a_, const void *b_)
-{
-  const name_map_t *a = (const name_map_t *)a_;
-  return name_search (a->name, b_);
-}
-
 /* Find or create a source file structure for FILE_NAME. Copies
FILE_NAME on creation */
 
 static unsigned
 find_source (const char *file_name)
 {
-  name_map_t *name_map;
   char *canon;
   unsigned idx;
   struct stat status;
 
   if (!file_name)
 file_name = "";
-  name_map = (name_map_t *)bsearch
-(file_name, names, n_names, sizeof (*names), name_search);
-  if (name_map)
-

[PATCH 4/7] GCOV: add -j argument (human readable format).

2017-10-26 Thread marxin
Human readable format is quite useful in my opinion. There's example:

-:1:unsigned
   14.00K:2:loop (unsigned n, int value)
-:3:{
   21.00M:4:  for (unsigned i = 0; i < n - 1; i++)
-:5:  {
   20.99M:6:value += i;
-:7:  }
-:8:
   14.00K:9:  return value;
-:   10:}
-:   11:
1:   12:int main(int argc)
-:   13:{
1:   14:  unsigned sum = 0;
7.00K:   15:  for (unsigned i = 0; i < 7 * 1000; i++)
-:   16:  {
7.00K:   17:sum += loop (1000, sum);
7.00K:   18:sum += loop (2000, sum);
-:   19:  }
-:   20:
1:   21:  return 0;
-:   22:}

Question is do we want to do it by default, or a new option is fine?
Note that all external tools using gcov should use intermediate format
which is obviously unchanged.

gcc/ChangeLog:

2017-10-23  Martin Liska  

* doc/gcov.texi: Document new option.
* gcov.c (print_usage): Likewise print it.
(process_args): Support the argument.
(format_count): New function.
(format_gcov): Use the function.

gcc/testsuite/ChangeLog:

2017-10-23  Martin Liska  

* g++.dg/gcov/loop.C: New test.
* lib/gcov.exp: Support human readable format for counts.
---
 gcc/doc/gcov.texi|  5 +
 gcc/gcov.c   | 43 ++--
 gcc/testsuite/g++.dg/gcov/loop.C | 27 +
 gcc/testsuite/lib/gcov.exp   |  2 +-
 4 files changed, 74 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/loop.C

diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index 4029ccb0a93..9d955827706 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -125,6 +125,7 @@ gcov [@option{-v}|@option{--version}] 
[@option{-h}|@option{--help}]
  [@option{-d}|@option{--display-progress}]
  [@option{-f}|@option{--function-summaries}]
  [@option{-i}|@option{--intermediate-format}]
+ [@option{-j}|@option{--human-numbers}]
  [@option{-k}|@option{--use-colors}]
  [@option{-l}|@option{--long-file-names}]
  [@option{-m}|@option{--demangled-names}]
@@ -186,6 +187,10 @@ be used by @command{lcov} or other tools. The output is a 
single
 The format of the intermediate @file{.gcov} file is plain text with
 one entry per line
 
+@item -j
+@itemx --human-numbers
+Write counts in human readable format (like 24.64K).
+
 @smallexample
 file:@var{source_file_name}
 function:@var{line_number},@var{execution_count},@var{function_name}
diff --git a/gcc/gcov.c b/gcc/gcov.c
index f9334f96eb3..8ba63f002d8 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -393,6 +393,10 @@ static int flag_use_colors = 0;
 
 static int flag_all_blocks = 0;
 
+/* Output human readable numbers.  */
+
+static int flag_human_readable_numbers = 0;
+
 /* Output summary info for each function.  */
 
 static int flag_function_summary = 0;
@@ -710,6 +714,7 @@ print_usage (int error_p)
   fnotice (file, "  -f, --function-summariesOutput summaries for each 
function\n");
   fnotice (file, "  -h, --help  Print this help, then 
exit\n");
   fnotice (file, "  -i, --intermediate-format   Output .gcov file in 
intermediate text format\n");
+  fnotice (file, "  -j, --human-numbers Output human readable 
numbers\n");
   fnotice (file, "  -k, --use-colorsEmit colored output\n");
   fnotice (file, "  -l, --long-file-names   Use long output file names 
for included\n\
 source files\n");
@@ -752,6 +757,7 @@ static const struct option options[] =
   { "branch-probabilities", no_argument,   NULL, 'b' },
   { "branch-counts",no_argument,   NULL, 'c' },
   { "intermediate-format",  no_argument,   NULL, 'i' },
+  { "human-numbers",   no_argument,   NULL, 'j' },
   { "no-output",no_argument,   NULL, 'n' },
   { "long-file-names",  no_argument,   NULL, 'l' },
   { "function-summaries",   no_argument,   NULL, 'f' },
@@ -775,7 +781,7 @@ process_args (int argc, char **argv)
 {
   int opt;
 
-  const char *opts = "abcdfhiklmno:prs:uvwx";
+  const char *opts = "abcdfhijklmno:prs:uvwx";
   while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1)
 {
   switch (opt)
@@ -798,6 +804,9 @@ process_args (int argc, char **argv)
case 'l':
  flag_long_names = 1;
  break;
+   case 'j':
+ flag_human_readable_numbers = 1;
+ break;
case 'k':
  flag_use_colors = 1;
  break;
@@ -1938,6 +1947,36 @@ add_branch_counts (coverage_t *coverage, const arc_t 
*arc)
 }
 }
 
+/* Format COUNT, if flag_human_readable_numbers is set, return it human
+   readable format.  */
+
+static char const *
+format_count (gcov_type count)
+{
+  static char buffer[64];
+  float v = count;
+  const char *units[] = {"", "K", "M", "G", "Y", "P", "E", "Z"};
+
+

[PATCH 3/7] GCOV: add support for lines with an unexecuted lines.

2017-10-26 Thread marxin
It's possible to have a line of code that has a non-zero coverage.
However, it can contain unexecuted blocks and I hope adding a
notification can be usefull. LLVM also does that:

-:0:Source:ternary.c
-:0:Graph:ternary.gcno
-:0:Data:ternary.gcda
-:0:Runs:1
-:0:Programs:1
-:1:int b, c, d, e;
-:2:
1:3:int main()
-:4:{
   1*:5:int a = b < 1 ? (c < 3 ? d : c) : e;
1:6:return a;
-:7:}

It's also implemented for intermediate format, as well as color output
supports that.

gcc/ChangeLog:

2017-10-23  Martin Liska  

* doc/gcov.texi: Document that.
* gcov.c (add_line_counts): Mark lines with a non-executed
statement.
(output_line_beginning): Handle such lines.
(output_lines): Pass new argument.
(output_intermediate_file): Print it in intermediate format.

gcc/testsuite/ChangeLog:

2017-10-23  Martin Liska  

* g++.dg/gcov/ternary.C: New test.
* g++.dg/gcov/gcov-threads-1.C (main): Update expected line
count.
* lib/gcov.exp: Support new format for intermediate file format.
---
 gcc/doc/gcov.texi  | 13 +---
 gcc/gcov.c | 50 ++
 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C |  4 +--
 gcc/testsuite/g++.dg/gcov/ternary.C| 12 +++
 gcc/testsuite/lib/gcov.exp |  2 +-
 5 files changed, 46 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/ternary.C

diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index 2aa7166e35d..4029ccb0a93 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -189,7 +189,7 @@ one entry per line
 @smallexample
 file:@var{source_file_name}
 function:@var{line_number},@var{execution_count},@var{function_name}
-lcount:@var{line number},@var{execution_count}
+lcount:@var{line number},@var{execution_count},@var{has_unexecuted_statement}
 branch:@var{line_number},@var{branch_coverage_type}
 
 Where the @var{branch_coverage_type} is
@@ -208,11 +208,11 @@ Here is a sample when @option{-i} is used in conjunction 
with @option{-b} option
 file:array.cc
 function:11,1,_Z3sumRKSt6vectorIPiSaIS0_EE
 function:22,1,main
-lcount:11,1
-lcount:12,1
-lcount:14,1
+lcount:11,1,0
+lcount:12,1,0
+lcount:14,1,0
 branch:14,taken
-lcount:26,1
+lcount:26,1,0
 branch:28,nottaken
 @end smallexample
 
@@ -341,6 +341,9 @@ used in a compilation unit.  Such functions are marked with 
@samp{-}
 even though they contain a code.  Use @option{-fkeep-inline-functions} and
 @option{-fkeep-static-functions} in order to properly
 record @var{execution_count} of such functions.
+Executed lines having a statement with zero @var{execution_count} end with
+@samp{*} character and are colored with magenta color with @option{-k}
+option.
 
 Some lines of information at the start have @var{line_number} of zero.
 These preamble lines are of the form
diff --git a/gcc/gcov.c b/gcc/gcov.c
index e53bcf0fd88..f9334f96eb3 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -256,6 +256,7 @@ typedef struct line_info
  Used in all-blocks mode.  */
   unsigned exists : 1;
   unsigned unexceptional : 1;
+  unsigned has_unexecuted_block : 1;
 } line_t;
 
 bool
@@ -850,28 +851,7 @@ process_args (int argc, char **argv)
 /* Output the result in intermediate format used by 'lcov'.
 
 The intermediate format contains a single file named 'foo.cc.gcov',
-with no source code included. A sample output is
-
-file:foo.cc
-function:5,1,_Z3foov
-function:13,1,main
-function:19,1,_GLOBAL__sub_I__Z3foov
-function:19,1,_Z41__static_initialization_and_destruction_0ii
-lcount:5,1
-lcount:7,9
-lcount:9,8
-lcount:11,1
-file:/.../iostream
-lcount:74,1
-file:/.../basic_ios.h
-file:/.../ostream
-file:/.../ios_base.h
-function:157,0,_ZStorSt12_Ios_IostateS_
-lcount:157,0
-file:/.../char_traits.h
-function:258,0,_ZNSt11char_traitsIcE6lengthEPKc
-lcount:258,0
-...
+with no source code included.
 
 The default gcov outputs multiple files: 'foo.cc.gcov',
 'iostream.gcov', 'ios_base.h.gcov', etc. with source code
@@ -901,8 +881,8 @@ output_intermediate_file (FILE *gcov_file, source_t *src)
 {
   arc_t *arc;
   if (line->exists)
-   fprintf (gcov_file, "lcount:%u,%s\n", line_num,
-format_gcov (line->count, 0, -1));
+   fprintf (gcov_file, "lcount:%u,%s,%d\n", line_num,
+format_gcov (line->count, 0, -1), line->has_unexecuted_block);
   if (flag_branches)
for (arc = line->branches; arc; arc = arc->line_next)
   {
@@ -2289,7 +2269,11 @@ add_line_counts (coverage_t *coverage, function_t *fn)
}
  line->exists = 1;
  if (!block->exceptional)
-   line->unexceptional = 1;
+   {
+ line->unexceptional = 1;
+ if (block->count == 0)
+

[PATCH 1/7] GCOV: document behavior of -fkeep-{static,inline}-functions (PR gcov-profile/82633).

2017-10-26 Thread marxin
gcc/ChangeLog:

2017-10-23  Martin Liska  

PR gcov-profile/82633
* doc/gcov.texi: Document -fkeep-{static,inline}-functions and
their interaction with GCOV infrastructure.
* configure.ac: Add -fkeep-{inline,static}-functions to
coverage_flags.
* configure: Regenerate.
---
 gcc/configure | 4 ++--
 gcc/configure.ac  | 4 ++--
 gcc/doc/gcov.texi | 5 +
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index aa5937df84c..7f9d740e93c 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7321,10 +7321,10 @@ fi
 if test "${enable_coverage+set}" = set; then :
   enableval=$enable_coverage; case "${enableval}" in
   yes|noopt)
-coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O0"
+coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O0 
-fkeep-inline-functions -fkeep-static-functions"
 ;;
   opt)
-coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O2"
+coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O2 
-fkeep-inline-functions -fkeep-static-functions"
 ;;
   no)
 # a.k.a. --disable-coverage
diff --git a/gcc/configure.ac b/gcc/configure.ac
index d905d0d980a..46b4a80b9a1 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -728,10 +728,10 @@ AC_ARG_ENABLE(coverage,
 default is noopt])],
 [case "${enableval}" in
   yes|noopt)
-coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O0"
+coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O0 
-fkeep-inline-functions -fkeep-static-functions"
 ;;
   opt)
-coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O2"
+coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O2 
-fkeep-inline-functions -fkeep-static-functions"
 ;;
   no)
 # a.k.a. --disable-coverage
diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index 706aa6cf0b0..c527b89f13b 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -327,6 +327,11 @@ non-exceptional paths or only exceptional paths such as 
C++ exception
 handlers, respectively. Given @samp{-a} option, unexecuted blocks are
 marked @samp{$} or @samp{%}, depending on whether a basic block
 is reachable via non-exceptional or exceptional paths.
+Note that GCC can perform function removal for functions obviously not
+used in a compilation unit.  Such functions are marked with @samp{-}
+even though they contain a code.  Use @option{-fkeep-inline-functions} and
+@option{-fkeep-static-functions} in order to properly
+record @var{execution_count} of such functions.
 
 Some lines of information at the start have @var{line_number} of zero.
 These preamble lines are of the form
-- 
2.14.2




[PATCH 5/7] GCOV: std::vector refactoring.

2017-10-26 Thread marxin
gcc/ChangeLog:

2017-10-26  Martin Liska  

* gcov.c (struct source_info): Remove typedef.
(source_info::source_info): Add proper ctor.
(accumulate_line_counts): Use struct, not it's typedef.
(output_gcov_file): Likewise.
(output_lines): Likewise.
(main): Do not allocate an array.
(output_intermediate_file): Use size of vector container.
(process_file): Resize the vector.
(generate_results): Do not preallocate, use newly added vector
lines.
(release_structures): Do not release sources.
(find_source): Use vector methods.
(add_line_counts): Do not use typedef.
---
 gcc/gcov.c | 89 +++---
 1 file changed, 39 insertions(+), 50 deletions(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index 8ba63f002d8..e2d33edb984 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -272,22 +272,29 @@ line_t::has_block (block_t *needle)
 /* Describes a file mentioned in the block graph.  Contains an array
of line info.  */
 
-typedef struct source_info
+struct source_info
 {
+  /* Default constructor.  */
+  source_info ();
+
   /* Canonical name of source file.  */
   char *name;
   time_t file_time;
 
-  /* Array of line information.  */
-  line_t *lines;
-  unsigned num_lines;
+  /* Vector of line information.  */
+  vector lines;
 
   coverage_t coverage;
 
   /* Functions in this source file.  These are in ascending line
  number order.  */
   function_t *functions;
-} source_t;
+};
+
+source_info::source_info (): name (NULL), file_time (), lines (),
+  coverage (), functions (NULL)
+{
+}
 
 typedef struct name_map
 {
@@ -300,9 +307,8 @@ typedef struct name_map
 static function_t *functions;
 static function_t **fn_end = &functions;
 
-static source_t *sources;   /* Array of source files  */
-static unsigned n_sources;  /* Number of sources */
-static unsigned a_sources;  /* Allocated sources */
+/* Vector of source files.  */
+static vector sources;
 
 static name_map_t *names;   /* Mapping of file names to sources */
 static unsigned n_names;/* Number of names */
@@ -448,10 +454,10 @@ static void add_line_counts (coverage_t *, function_t *);
 static void executed_summary (unsigned, unsigned);
 static void function_summary (const coverage_t *, const char *);
 static const char *format_gcov (gcov_type, gcov_type, int);
-static void accumulate_line_counts (source_t *);
-static void output_gcov_file (const char *, source_t *);
+static void accumulate_line_counts (source_info *);
+static void output_gcov_file (const char *, source_info *);
 static int output_branch_count (FILE *, int, const arc_t *);
-static void output_lines (FILE *, const source_t *);
+static void output_lines (FILE *, const source_info *);
 static char *make_gcov_file_name (const char *, const char *);
 static char *mangle_name (const char *, char *);
 static void release_structures (void);
@@ -668,8 +674,6 @@ main (int argc, char **argv)
 
   a_names = 10;
   names = XNEWVEC (name_map_t, a_names);
-  a_sources = 10;
-  sources = XNEWVEC (source_t, a_sources);
 
   argno = process_args (argc, argv);
   if (optind == argc)
@@ -868,7 +872,7 @@ included. Instead the intermediate format here outputs only 
a single
 file 'foo.cc.gcov' similar to the above example. */
 
 static void
-output_intermediate_file (FILE *gcov_file, source_t *src)
+output_intermediate_file (FILE *gcov_file, source_info *src)
 {
   unsigned line_num;/* current line number.  */
   const line_t *line;   /* current line info ptr.  */
@@ -885,7 +889,7 @@ output_intermediate_file (FILE *gcov_file, source_t *src)
 }
 
   for (line_num = 1, line = &src->lines[line_num];
-   line_num < src->num_lines;
+   line_num < src->lines.size ();
line_num++, line++)
 {
   arc_t *arc;
@@ -967,8 +971,8 @@ process_file (const char *file_name)
{
  unsigned last_line
= block->locations[i].lines.back () + 1;
- if (last_line > sources[s].num_lines)
-   sources[s].num_lines = last_line;
+ if (last_line > sources[s].lines.size ())
+   sources[s].lines.resize (last_line);
}
}
}
@@ -987,7 +991,7 @@ process_file (const char *file_name)
 }
 
 static void
-output_gcov_file (const char *file_name, source_t *src)
+output_gcov_file (const char *file_name, source_info *src)
 {
   char *gcov_file_name = make_gcov_file_name (file_name, src->coverage.name);
 
@@ -1020,14 +1024,8 @@ output_gcov_file (const char *file_name, source_t *src)
 static void
 generate_results (const char *file_name)
 {
-  unsigned ix;
-  source_t *src;
   function_t *fn;
 
-  for (ix = n_sources, src = sources; ix--; src++)
-if (src->num_lines)
-  src->lines = XCNEWVEC (line_t, src->num_lines);
-
   for (fn = functions; fn; fn = fn->next)
 {
   coverage_t covera

[PATCH 0/7] GCOV: another set of improvements

2017-10-26 Thread marxin
Hi.

As I've spent recently some time in gcov, I decided to enhance the tool
a bit. My main target is PR48463 which will remove assumption that multiple
functions can't start on a same line. I've got pending patch that will be
send soon.

Thanks for review,
Martin

marxin (7):
  GCOV: document behavior of -fkeep-{static,inline}-functions (PR
gcov-profile/82633).
  GCOV: introduce usage of terminal colors.
  GCOV: add support for lines with an unexecuted lines.
  GCOV: add -j argument (human readable format).
  GCOV: std::vector refactoring.
  GCOV: Vector refactoring II
  GCOV: std::vector refactoring III

 gcc/color-macros.h |  50 +++
 gcc/configure  |   4 +-
 gcc/configure.ac   |   4 +-
 gcc/diagnostic-color.c |  27 +-
 gcc/doc/gcov.texi  |  32 +-
 gcc/gcov.c | 552 -
 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C |   4 +-
 gcc/testsuite/g++.dg/gcov/loop.C   |  27 ++
 gcc/testsuite/g++.dg/gcov/ternary.C|  12 +
 gcc/testsuite/lib/gcov.exp |   4 +-
 10 files changed, 438 insertions(+), 278 deletions(-)
 create mode 100644 gcc/color-macros.h
 create mode 100644 gcc/testsuite/g++.dg/gcov/loop.C
 create mode 100644 gcc/testsuite/g++.dg/gcov/ternary.C

-- 
2.14.2



[PATCH 6/7] GCOV: Vector refactoring II

2017-10-26 Thread marxin
gcc/ChangeLog:

2017-10-26  Martin Liska  

* gcov.c (struct line_info): Remove it's typedef.
(line_info::line_info): Add proper ctor.
(line_info::has_block): Do not use a typedef.
(struct source_info): Do not use typedef.
(circuit): Likewise.
(get_cycles_count): Likewise.
(output_intermediate_file): Iterate via vector iterator.
(add_line_counts): Use std::vector methods.
(accumulate_line_counts): Likewise.
(output_lines): Likewise.
---
 gcc/gcov.c | 149 ++---
 1 file changed, 73 insertions(+), 76 deletions(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index e2d33edb984..7f6268c6460 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -108,9 +108,6 @@ typedef struct arc_info
   /* Loop making arc.  */
   unsigned int cycle : 1;
 
-  /* Next branch on line.  */
-  struct arc_info *line_next;
-
   /* Links to next arc on src and dst lists.  */
   struct arc_info *succ_next;
   struct arc_info *pred_next;
@@ -245,28 +242,37 @@ typedef struct coverage_info
 /* Describes a single line of source. Contains a chain of basic blocks
with code on it.  */
 
-typedef struct line_info
+struct line_info
 {
+  /* Default constructor.  */
+  line_info ();
+
   /* Return true when NEEDLE is one of basic blocks the line belongs to.  */
   bool has_block (block_t *needle);
 
-  gcov_type count;/* execution count */
-  arc_t *branches;/* branches from blocks that end on this line.  */
-  block_t *blocks;/* blocks which start on this line.
- Used in all-blocks mode.  */
+  /* Execution count.  */
+  gcov_type count;
+
+  /* Branches from blocks that end on this line.  */
+  vector branches;
+
+  /* blocks which start on this line.  Used in all-blocks mode.  */
+  vector blocks;
+
   unsigned exists : 1;
   unsigned unexceptional : 1;
   unsigned has_unexecuted_block : 1;
-} line_t;
+};
 
-bool
-line_t::has_block (block_t *needle)
+line_info::line_info (): count (0), branches (), blocks (), exists (false),
+  unexceptional (0), has_unexecuted_block (0)
 {
-  for (block_t *n = blocks; n; n = n->chain)
-if (n == needle)
-  return true;
+}
 
-  return false;
+bool
+line_info::has_block (block_t *needle)
+{
+  return std::find (blocks.begin (), blocks.end (), needle) != blocks.end ();
 }
 
 /* Describes a file mentioned in the block graph.  Contains an array
@@ -282,7 +288,7 @@ struct source_info
   time_t file_time;
 
   /* Vector of line information.  */
-  vector lines;
+  vector lines;
 
   coverage_t coverage;
 
@@ -573,7 +579,7 @@ unblock (const block_t *u, block_vector_t &blocked,
 static loop_type
 circuit (block_t *v, arc_vector_t &path, block_t *start,
 block_vector_t &blocked, vector &block_lists,
-line_t &linfo, int64_t &count)
+line_info &linfo, int64_t &count)
 {
   loop_type result = NO_LOOP;
 
@@ -622,7 +628,7 @@ circuit (block_t *v, arc_vector_t &path, block_t *start,
contains a negative loop, then perform the same function once again.  */
 
 static gcov_type
-get_cycles_count (line_t &linfo, bool handle_negative_cycles = true)
+get_cycles_count (line_info &linfo, bool handle_negative_cycles = true)
 {
   /* Note that this algorithm works even if blocks aren't in sorted order.
  Each iteration of the circuit detection is completely independent
@@ -632,12 +638,13 @@ get_cycles_count (line_t &linfo, bool 
handle_negative_cycles = true)
 
   loop_type result = NO_LOOP;
   gcov_type count = 0;
-  for (block_t *block = linfo.blocks; block; block = block->chain)
+  for (vector::iterator it = linfo.blocks.begin ();
+   it != linfo.blocks.end (); it++)
 {
   arc_vector_t path;
   block_vector_t blocked;
   vector block_lists;
-  result |= circuit (block, path, block, blocked, block_lists, linfo,
+  result |= circuit (*it, path, *it, blocked, block_lists, linfo,
 count);
 }
 
@@ -875,7 +882,7 @@ static void
 output_intermediate_file (FILE *gcov_file, source_info *src)
 {
   unsigned line_num;/* current line number.  */
-  const line_t *line;   /* current line info ptr.  */
+  const line_info *line;   /* current line info ptr.  */
   function_t *fn;   /* current function info ptr. */
 
   fprintf (gcov_file, "file:%s\n", src->name);/* source file name */
@@ -892,29 +899,29 @@ output_intermediate_file (FILE *gcov_file, source_info 
*src)
line_num < src->lines.size ();
line_num++, line++)
 {
-  arc_t *arc;
   if (line->exists)
fprintf (gcov_file, "lcount:%u,%s,%d\n", line_num,
 format_gcov (line->count, 0, -1), line->has_unexecuted_block);
   if (flag_branches)
-   for (arc = line->branches; arc; arc = arc->line_next)
-  {
-if (!arc->is_unconditional && !arc->is_call_non_return)
-  {
-const char *branch_type;
-/* branch:,
-  

[PATCH] Fix DWARF5 .debug_loclist handling with hot/cold partitioning (PR debug/82718)

2017-10-26 Thread Jakub Jelinek
Hi!

The code in output_loc_list for DWARF5 relies on dw_loc_list_node's
section field accuracy, in particular that nodes with labels in
the hot subsection have one section (label) and nodes with labels
in the cold subsection have another one.  But that is actually not the case,
so we end up with cross-section symbol difference which assembler doesn't
assemble.

The following patch fixes that by making sure that section in the nodes
is accurate.  We already have loc_list->last_before_switch which points
to the last node in the first partition or NULL if either no section
switch was seen (if !crtl->has_bb_partition) or if the first node is after
the section switch (if crtl->has_bb_partition) and use it for the regions
that need to be split among the two.  The patch has lots of reindentation,
so I'm including here also diff -upbd output of the dwarf2out.c changes.
The first if ensures that secname is correct for the first node and the
other changes update it in the loop after processing node equal to
last_before_switch (or, if doing range_across_switch, after emitting
first partition's entry and before emitting second partition's entry).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

diff -upbd here, patch afterwards:
--- gcc/dwarf2out.c.jj  2017-10-23 22:39:27.0 +0200
+++ gcc/dwarf2out.c 2017-10-25 21:01:13.237929750 +0200
@@ -16333,21 +16333,31 @@ dw_loc_list (var_loc_list *loc_list, tre
  This means we have to special case the last node, and generate
  a range of [last location start, end of function label].  */
 
+  if (cfun && crtl->has_bb_partition)
+{
+  bool save_in_cold_section_p = in_cold_section_p;
+  in_cold_section_p = first_function_block_is_cold;
+  if (loc_list->last_before_switch == NULL)
+   in_cold_section_p = !in_cold_section_p;
+  secname = secname_for_decl (decl);
+  in_cold_section_p = save_in_cold_section_p;
+}
+  else
   secname = secname_for_decl (decl);
 
   for (node = loc_list->first; node; node = node->next)
+{
+  bool range_across_switch = false;
 if (GET_CODE (node->loc) == EXPR_LIST
|| NOTE_VAR_LOCATION_LOC (node->loc) != NULL_RTX)
   {
if (GET_CODE (node->loc) == EXPR_LIST)
  {
+ descr = NULL;
/* This requires DW_OP_{,bit_}piece, which is not usable
   inside DWARF expressions.  */
-   if (want_address != 2)
- continue;
+ if (want_address == 2)
descr = dw_sra_loc_expr (decl, node->loc);
-   if (descr == NULL)
- continue;
  }
else
  {
@@ -16357,7 +16367,6 @@ dw_loc_list (var_loc_list *loc_list, tre
  }
if (descr)
  {
-   bool range_across_switch = false;
/* If section switch happens in between node->label
   and node->next->label (or end of function) and
   we can't emit it as a single entry list,
@@ -16393,6 +16402,18 @@ dw_loc_list (var_loc_list *loc_list, tre
&& strcmp (node->label, endname) == 0)
  (*listp)->force = true;
listp = &(*listp)->dw_loc_next;
+   }
+   }
+
+  if (cfun
+ && crtl->has_bb_partition
+ && node == loc_list->last_before_switch)
+   {
+ bool save_in_cold_section_p = in_cold_section_p;
+ in_cold_section_p = !first_function_block_is_cold;
+ secname = secname_for_decl (decl);
+ in_cold_section_p = save_in_cold_section_p;
+   }
 
if (range_across_switch)
  {
@@ -16412,13 +16433,11 @@ dw_loc_list (var_loc_list *loc_list, tre
  endname = node->next->label;
else
  endname = cfun->fde->dw_fde_second_end;
-   *listp = new_loc_list (descr,
-  cfun->fde->dw_fde_second_begin,
+ *listp = new_loc_list (descr, cfun->fde->dw_fde_second_begin,
   endname, secname);
listp = &(*listp)->dw_loc_next;
  }
  }
-  }
 
   /* Try to avoid the overhead of a location list emitting a location
  expression instead, but only if we didn't have more than one



2017-10-26  Jakub Jelinek  

PR debug/82718
* dwarf2out.c (dw_loc_list): If crtl->has_bb_partition, temporarily
set in_cold_section_p to the partition containing loc_list->first.
When seeing loc_list->last_before_switch node, update secname and
perform range_across_switch second partition handling only after that.

* gcc.dg/debug/dwarf2/pr82718.c: New test.

--- gcc/dwarf2out.c.jj  2017-10-23 22:39:27.0 +0200
+++ gcc/dwarf2out.c 2017-10-25 21:01:13.237929750 +0200
@@ -16333,92 +16333,111 @@ dw_loc_list (var_loc_list *loc_list, tre
  This means we have to special case the last node, and generate
  a range of [last location start, end of function 

Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Richard Sandiford
Martin Sebor  writes:
> On 10/25/2017 03:31 PM, Richard Sandiford wrote:
>> Martin Sebor  writes:
>>> On 10/23/2017 11:00 AM, Richard Sandiford wrote:
 +#if NUM_POLY_INT_COEFFS == 1
 +extern inline __attribute__ ((__gnu_inline__)) poly_int64
 +tree_to_poly_int64 (const_tree t)
>>>
>>> I'm curious about the extern inline and __gnu_inline__ here and
>>> not in poly_int_tree_p below.  Am I correct in assuming that
>>> the combination is a holdover from the days when GCC was compiled
>>> using a C compiler, and that the way to write the same definition
>>> in C++ 98 is simply:
>>>
>>>inline poly_int64
>>>tree_to_poly_int64 (const_tree t)
>>>
 +{
 +  gcc_assert (tree_fits_poly_int64_p (t));
 +  return TREE_INT_CST_LOW (t);
 +}
>>>
>>> If yes, I would suggest to use the C++ form (and at some point,
>>> changing the existing uses of the GCC/C idiom to the C++ form
>>> as well).
>>>
>>> Otherwise, if something requires the use of the C form I would
>>> suggest to add a brief comment explaining it.
>>
>> You probably saw that this is based on tree_to_[su]hwi.  AIUI the
>> differences from plain C++ inline are that:
>>
>> a) with __gnu_inline__, an out-of-line definition must still exist.
>>That fits this use case well, because the inline is conditional on
>>the #ifdef and tree.c has an out-of-line definition either way.
>>If we used normal inline, we'd need to add extra #ifs to tree.c
>>as well, to avoid multiple definitions.
>>
>> b) __gnu_inline__ has the strength of __always_inline__, but without the
>>correctness implications if inlining is impossible for any reason.
>>I did try normal inline first, but it wasn't strong enough.  The
>>compiler ended up measurably faster if I copied the tree_to_[su]hwi
>>approach.
>
> Thanks for the clarification.  I'm not sure I fully understand
> it but I'm happy to take your word for it that's necessary.  I
> would just recommend adding a brief comment to this effect since
> it isn't obvious.
>
 +
 +inline bool
 +poly_int_tree_p (const_tree t, poly_int64_pod *value)
 +{
>>> ...
>>
>> [This one is unconditionally inline.]
>>
  /* The tree and const_tree overload templates.   */
  namespace wi
  {
 +  class unextended_tree
 +  {
 +  private:
 +const_tree m_t;
 +
 +  public:
 +unextended_tree () {}
>>>
>>> Defining no-op ctors is quite dangerous and error-prone.  I suggest
>>> to instead default initialize the member(s):
>>>
>>>unextended_tree (): m_t () {}
>>>
>>> Ditto everywhere else, such as in:
>>
>> This is really performance-senesitive code though, so I don't think
>> we want to add any unnecessary initialisation.  Primitive types are
>> uninitalised by default too, and the point of this class is to
>> provide an integer-like interface.
>
> I understand the performance concern (more on that below), but
> to clarify the usability issues,  I don't think the analogy with
> primitive types is quite fitting here: int() evaluates to zero,
> as do the values of i and a[0] and a[1] after an object of type
> S is constructed using its default ctor, i.e., S ():
>
>struct S {
>  int i;
>  int a[2];
>
>  S (): i (), a () { }
>};

Sure, I realise that.  I meant that:

  int x;

doesn't initialise x to zero.  So it's a question of which case is the
most motivating one: using "x ()" to initialise x to 0 in a constructor
or "int x;" to declare a variable of type x, uninitialised.  I think the
latter use case is much more common (at least in GCC).  Rearranging
things, I said later:

>> In your other message you used the example of explicit default
>> initialisation, such as:
>>
>> class foo
>> {
>>   foo () : x () {}
>>   unextended_tree x;
>> };
>>
>> But I think we should strongly discourage that kind of thing.
>> If someone wants to initialise x to a particular value, like
>> integer_zero_node, then it would be better to do it explicitly.
>> If they don't care what the initial value is, then for these
>> integer-mimicing classes, uninitialised is as good as anything
>> else. :-)

What I meant was: if you want to initialise "i" to 1 in your example,
you'd have to write "i (1)".  Being able to write "i ()" instead of
"i (0)" saves one character but I don't think it adds much clarity.
Explicitly initialising something only seems worthwhile if you say
what you're initialising it to.

> With the new (and some existing) classes that's not so, and it
> makes them harder and more error-prone to use (I just recently
> learned this the hard way about offset_int and the debugging
> experience is still fresh in my memory).

Sorry about the bad experience.  But that kind of thing cuts
both ways.  If I write:

poly_int64
foo (void)
{
  poly_int64 x;
  x += 2;
  return x;
}

then I get a warning about x being used uninitialised, without
having had to run anything.  If we add default initialisation
then this becomes something that has to be debugge

[PING#4, Makefile] improve libsubdir variable transmission to sub-makes

2017-10-26 Thread Olivier Hainque
Hello,

ping #4, please.

https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00017.html

Thanks much in advance,

Olivier


> On 04 Oct 2017, at 09:16, Olivier Hainque  wrote:
> 
> Hello,
> 
> Ping #3 for https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00017.html
> 
> please.
> 
> Took the liberty to cc a maintainer.
> 
> Thanks much in advance!
> 
> With Kind Regards,
> 
> Olivier
> 
>> On Sep 25, 2017, at 08:49 , Olivier Hainque  wrote:
>> 
>> Hello,
>> 
>> Ping #2 for https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00017.html
>> 
>>> Makefile.in:
>>> ...
>>> export libsubdir
>>> 
>>> This is not working well on cygwin environments where environment
>>> variable names are translated to uppercase (so sub-makes evaluating
>>> the variable with the lowercase name don't get the value).
>> 
>>> 2017-09-01  Jerome Lambourg  
>>> 
>>> * Makefile.in (FLAGS_TO_PASS): Add libsubdir.
>> 
>> Thanks much in advance!
>> 
>> With Kind Regards,
>> 
>> Olivier
>> 
> 



[PATCH 8/N][RFC] GCOV: support multiple functions per a line

2017-10-26 Thread Martin Liška
Hi.

As mentioned in cover letter this patch was main motivation for the whole 
series.
Currently we have a list of lines (source_info::lines) per a source file. That's
changed in the patch, now each functions has:
map> source_lines;
Thus separate lines for function for each source file the function lives in.
Having a group of function starting on a line, we print first summary and then
each individual function:

-:1:template
-:2:class Foo
-:3:{
-:4:  public:
3:5:  Foo()
-:6:  {
3:7:b = 123;
3:8:  }
--
Foo::Foo():
1:5:  Foo()
-:6:  {
1:7:b = 123;
1:8:  }
--
Foo::Foo():
2:5:  Foo()
-:6:  {
2:7:b = 123;
2:8:  }
--
-:9:
   1*:   10:  void test() { if (!b) __builtin_abort (); b = 111; }
--
Foo::test():
   1*:   10:  void test() { if (!b) __builtin_abort (); b = 111; }
1:   10-block  0
%:   10-block  1
--
Foo::test():
#:   10:  void test() { if (!b) __builtin_abort (); b = 111; }
%:   10-block  0
%:   10-block  1
--
-:   11:
-:   12:  private:
-:   13:  int b;
-:   14:};
-:   15:
-:   16:template class Foo;
-:   17:template class Foo;
-:   18:
1:   19:int main()
-:   20:{
1:   21:  Foo xx;
1:   21-block  0
1:   22:  Foo yy;
1:   22-block  0
1:   23:  Foo zz;
1:   23-block  0
1:   24:  xx.test();
1:   24-block  0
-:   25:
1:   26:  return 0;
1:   26-block  0
-:   27:}

It's also reflected in intermediate format, where lines are repeated. Currently 
no summary is done.

That patch is work in progress, tests are missing, documentation should be 
improved significantly
and changelog has to be written.

However I would like to get a feedback before I'll finish it?

Thanks,
Martin

>From 98b91fc78f4ac70e0bc626ef08afb1f999eb Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 26 Oct 2017 10:39:40 +0200
Subject: [PATCH] GCOV: support multiple functions per a line

---
 gcc/coverage.c  |   1 +
 gcc/gcov-dump.c |   3 +
 gcc/gcov.c  | 635 +---
 3 files changed, 379 insertions(+), 260 deletions(-)

diff --git a/gcc/coverage.c b/gcc/coverage.c
index 8a56a677f15..c84cc634bb1 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -663,6 +663,7 @@ coverage_begin_function (unsigned lineno_checksum, unsigned cfg_checksum)
   gcov_write_unsigned (cfg_checksum);
   gcov_write_string (IDENTIFIER_POINTER
 		 (DECL_ASSEMBLER_NAME (current_function_decl)));
+  gcov_write_unsigned (DECL_ARTIFICIAL (current_function_decl));
   gcov_write_filename (xloc.file);
   gcov_write_unsigned (xloc.line);
   gcov_write_length (offset);
diff --git a/gcc/gcov-dump.c b/gcc/gcov-dump.c
index d24e72ac4a1..08524123f89 100644
--- a/gcc/gcov-dump.c
+++ b/gcc/gcov-dump.c
@@ -308,9 +308,12 @@ tag_function (const char *filename ATTRIBUTE_UNUSED,
 	  
 	  name = gcov_read_string ();
 	  printf (", `%s'", name ? name : "NULL");
+	  unsigned artificial = gcov_read_unsigned ();
 	  name = gcov_read_string ();
 	  printf (" %s", name ? name : "NULL");
 	  printf (":%u", gcov_read_unsigned ());
+	  if (artificial)
+	printf (", artificial");
 	}
 }
 }
diff --git a/gcc/gcov.c b/gcc/gcov.c
index 865deaaafae..e250cd831a2 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -34,6 +34,8 @@ along with Gcov; see the file COPYING3.  If not see
 #define INCLUDE_ALGORITHM
 #define INCLUDE_VECTOR
 #define INCLUDE_STRING
+#define INCLUDE_MAP
+#define INCLUDE_SET
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
@@ -183,6 +185,42 @@ block_info::block_info (): succ (NULL), pred (NULL), num_succ (0), num_pred (0),
   cycle.arc = NULL;
 }
 
+/* Describes a single line of source. Contains a chain of basic blocks
+   with code on it.  */
+
+struct line_info
+{
+  /* Default constructor.  */
+  line_info ();
+
+  /* Return true when NEEDLE is one of basic blocks the line belongs to.  */
+  bool has_block (block_t *needle);
+
+  /* Execution count.  */
+  gcov_type count;
+
+  /* Branches from blocks that end on this line.  */
+  vector branches;
+
+  /* blocks which start on this line.  Used in all-blocks mode.  */
+  vector blocks;
+
+  unsigned exists : 1;
+  unsigned unexceptional : 1;
+  unsigned has_unexecuted_block : 1;
+};
+
+line_info::line_info (): count (0), branches (), blocks (), exists (false),
+  unexceptional (0), has_unexecuted_block (0)
+{
+}
+
+bool
+line_info::has_block (block_t *needle)
+{
+  return std::find (blocks.begin (), blocks.end (), needle) != blocks.end ();
+}
+
 /* Describes a single function. Contains an array of basic blocks.  */
 
 typedef struct function_

Re: a new libgcov interface: __gcov_dump_all

2017-10-26 Thread Martin Liška
On 07/22/2014 06:04 PM, Xinliang David Li wrote:
> Please take a look the updated patch. It addresses the issue of using
> dlclose before dump, and potential races (between a thread closing a
> library and the dumper call).
> 
> David
> 
> On Sun, Jul 20, 2014 at 11:12 PM, Nathan Sidwell  wrote:
>> On 07/20/14 21:38, Xinliang David Li wrote:
>>>
>>> The gcov_info chain is not duplicated -- there is already one chain
>>> (linking only modules of the library) per shared library in current
>>> implementation.  My change does not affect underlying behavior at all
>>> -- it merely introduces a new interface to access private dumper
>>> methods associated with shared libs.
>>
>>
>> ah, got it.  thanks for clarifying.  Can't help thinking gcov_init should be
>> doing this, and wondering about dlload/dlclose.  Let me think
>>
>> nathan

Hi.

Unfortunately, it looks the patch hasn't been installed to trunk. Some folks 
from Firefox
are interested in that.

Are you Nathan OK with the patch? I guess a rebase will be needed.

Martin


Re: [PATCH][GCC][ARM][AArch64] Testsuite framework changes and execution tests [Patch (8/8)]

2017-10-26 Thread Kyrill Tkachov

Hi Tamar,

On 06/10/17 13:45, Tamar Christina wrote:

Hi All,

this is a minor respin of the patch with the comments addressed. Note 
this patch is now 7/8 in the series.



Regtested on arm-none-eabi, armeb-none-eabi,
aarch64-none-elf and aarch64_be-none-elf with no issues found.

Ok for trunk?

gcc/testsuite
2017-10-06  Tamar Christina  

* lib/target-supports.exp
(check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache): New.
(check_effective_target_arm_v8_2a_dotprod_neon_ok): New.
(add_options_for_arm_v8_2a_dotprod_neon): New.
(check_effective_target_arm_v8_2a_dotprod_neon_hw): New.
(check_effective_target_vect_sdot_qi): New.
(check_effective_target_vect_udot_qi): New.
* gcc.target/arm/simd/vdot-exec.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vdot-exec.c: New.
* gcc/doc/sourcebuild.texi: Document arm_v8_2a_dotprod_neon.

From: Tamar Christina
Sent: Monday, September 4, 2017 2:01:40 PM
To: Christophe Lyon
Cc: gcc-patches@gcc.gnu.org; nd; James Greenhalgh; Richard Earnshaw; 
Marcus Shawcroft
Subject: RE: [PATCH][GCC][ARM][AArch64] Testsuite framework changes 
and execution tests [Patch (8/8)]


Hi Christophe,

> >
> > gcc/testsuite
> > 2017-09-01  Tamar Christina 
> >
> > * lib/target-supports.exp
> > (check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache):
> New.
> > (check_effective_target_arm_v8_2a_dotprod_neon_ok): New.
> > (add_options_for_arm_v8_2a_dotprod_neon): New.
> > (check_effective_target_arm_v8_2a_dotprod_neon_hw): New.
> > (check_effective_target_vect_sdot_qi): New.
> > (check_effective_target_vect_udot_qi): New.
> > * gcc.target/arm/simd/vdot-exec.c: New.
>
> Aren't you defining twice P() and ARR() in vdot-exec.c ?
> I'd expect a preprocessor error, did I read too quickly?
>

Yes they are defined twice but they're not redefined, all the definitions
are exactly the same so the pre-processor doesn't care. I can leave only
one if this is confusing.

>
> Thanks,
>
> Christophe
>
> > * gcc.target/aarch64/advsimd-intrinsics/vdot-exec.c: New.
> > * gcc/doc/sourcebuild.texi: Document arm_v8_2a_dotprod_neon.
> >
> > --

This looks ok to me.

Thanks,
Kyrill






Re: [PATCH, rs6000] Gimple folding for vec_madd()

2017-10-26 Thread Richard Biener
On Wed, Oct 25, 2017 at 4:38 PM, Will Schmidt  wrote:
> Hi,
>
> Add support for gimple folding of the vec_madd() (vector multiply-add)
> intrinsics.
> Testcase coverage is provided by the existing tests
>  gcc.target/powerpc/fold-vec-madd-*.c
>
> Sniff-tests appear clean.  A full regtest is currently running across 
> assorted Power systems. (P6-P9).
> OK for trunk (pending clean run results)?

You can use FMA_EXPR on integer operands as well.  Otherwise you risk
the FMA be not matched by combine later when part of the operation is
CSEd.

Richard.

> Thanks,
> -Will
>
> [gcc]
>
> 2017-10-25  Will Schmidt 
>
> * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support for
>   gimple folding of vec_madd() intrinsics.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 4837e14..04c2b15 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -16606,10 +16606,43 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> *gsi)
>build_int_cst (arg2_type, 0)), 
> arg0);
>  gimple_set_location (g, loc);
>  gsi_replace (gsi, g, true);
>  return true;
>}
> +
> +/* vec_madd (Float) */
> +case ALTIVEC_BUILTIN_VMADDFP:
> +case VSX_BUILTIN_XVMADDDP:
> +  {
> +   arg0 = gimple_call_arg (stmt, 0);
> +   arg1 = gimple_call_arg (stmt, 1);
> +   tree arg2 = gimple_call_arg (stmt, 2);
> +   lhs = gimple_call_lhs (stmt);
> +   gimple *g = gimple_build_assign (lhs, FMA_EXPR , arg0, arg1, arg2);
> +   gimple_set_location (g, gimple_location (stmt));
> +   gsi_replace (gsi, g, true);
> +   return true;
> +  }
> +/* vec_madd (Integral) */
> +case ALTIVEC_BUILTIN_VMLADDUHM:
> +  {
> +   arg0 = gimple_call_arg (stmt, 0);
> +   arg1 = gimple_call_arg (stmt, 1);
> +   tree arg2 = gimple_call_arg (stmt, 2);
> +   lhs = gimple_call_lhs (stmt);
> +   tree lhs_type = TREE_TYPE (lhs);
> +   location_t loc = gimple_location (stmt);
> +   gimple_seq stmts = NULL;
> +   tree mult_result = gimple_build (&stmts, loc, MULT_EXPR,
> +  lhs_type, arg0, arg1);
> +   tree plus_result = gimple_build (&stmts, loc, PLUS_EXPR,
> +  lhs_type, mult_result, arg2);
> +   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +   update_call_from_tree (gsi, plus_result);
> +   return true;
> +  }
> +
>  default:
> if (TARGET_DEBUG_BUILTIN)
>fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
> fn_code, fn_name1, fn_name2);
>break;
>
>


Re: [RFA][PATCH] Convert sprintf warning code to use a dominator walk

2017-10-26 Thread Richard Biener
On Wed, Oct 25, 2017 at 5:44 PM, Jeff Law  wrote:
> On 10/24/2017 11:35 AM, Martin Sebor wrote:
>> On 10/23/2017 05:14 PM, Jeff Law wrote:
>>>
>>> Martin,
>>>
>>> I'd like your thoughts on this patch.
>>>
>>> One of the things I'm working on is changes that would allow passes that
>>> use dominator walks to trivially perform context sensitive range
>>> analysis as a part of their dominator walk.
>>>
>>> As I outlined earlier this would allow us to easily fix the false
>>> positive sprintf warning reported a week or two ago.
>>>
>>> This patch converts the sprintf warning code to perform a dominator walk
>>> rather than just walking the blocks in whatever order they appear in the
>>> basic block array.
>>>
>>> From an implementation standpoint we derive a new class sprintf_dom_walk
>>> from the dom_walker class.  Like other dom walkers we walk statements
>>> from within the before_dom_children member function.  Very standard
>>> stuff.
>>>
>>> I moved handle_gimple_call and various dependencies into the
>>> sprintf_dom_walker class to facilitate calling handle_gimple_call from
>>> within the before_dom_children member function.  There's light fallout
>>> in various places where the call_info structure was explicitly expected
>>> to be found in the pass_sprintf_length class, but is now found in the
>>> sprintf_dom_walker class.
>>>
>>> This has been bootstrapped and regression tested on x86_64-linux-gnu.
>>> I've also layered my embedded VRP analysis on top of this work and
>>> verified that it does indeed fix the reported false positive.
>>>
>>> Thoughts?
>>
>> If it lets us improve the quality of the range information I can't
>> think of a downside.
> It's potentially slower simply because the domwalk interface is more
> expensive than just iterating over the blocks with FOR_EACH_BB.  But
> that's about it.  I think the ability to get more accurate range
> information will make the compile-time hit worth it.
>
>>
>> Besides the sprintf pass, a number of other areas depend on ranges,
>> most of all the -Wstringop-overflow and truncation warnings and
>> now -Wrestrict (once my enhancement is approved).  It would be nice
>> to be able to get the same improvements there.  Does it mean that
>> those warnings will need to be moved into a standalone pass?  (I'm
>> not opposed to it, just wondering what to expect if this is
>> the route we want to go.)
> They don't necessarily have to be a standalone pass -- they just have to
> be implementable as part of a dominator walk to get the cheap context
> sensitive range data.
>
> So IIRC you've got some code to add additional warnings within the
> strlen pass.  That pass is already a dominator walk.  In theory you'll
> just add a member to the strlen_dom_walker class, then a call in
> before_dom_children and after_dom_children virtuals and you should be
> able to query the context sensitive range information.
>
> For warnings that occur in code that is not easily structured as a
> dominator walk, Andrew's work will definitely be a better choice.
>
> Andrew's work will almost certainly also generate even finer grained
> ranges because it can work on an arbitrary path through the CFG rather
> than relying on dominance relationships.  Consider
>
> A
>/ \
>   B   C
>\ /
> D
>
> Range information implied by the edge A->B is usable within B because
> the edge A->B dominates B.  Similarly for range information implied by
> A->C being available in C.  But range information implied by A->B is not
> available in D because A->B does not dominate D.  SImilarly range
> information implied by A->C is not available in D.
>
> I touched on this in a private message recently.  Namely that exploiting
> range data in non-dominated blocks feels a *lot* like jump threading and
> should likely be structured as a backwards walk query (and thus is more
> suitable for Andrew's infrastructure).

On the contrary - with a backward walk you don't know which way to go.
>From D to B or to C?  With a forward walk there's no such ambiguity
(unless you start from A).

Note I have patches for EVRP merging ranges from B and C to make
the info available for D but usually there's nothing to recover here
that isn't also valid in A.  Just ranges derived from non-conditional
stmts (by means of exploiting undefined behavior) can help here.

Richard.

>
> jeff


Re: [RFA][PATCH] Provide a class interface into substitute_and_fold.

2017-10-26 Thread Richard Biener
On Tue, Oct 24, 2017 at 8:44 PM, Jeff Law  wrote:
> This is similar to the introduction of the ssa_propagate_engine, but for
> the substitution/replacements bits.
>
> In a couple places the pass specific virtual functions are just wrappers
> around existing functions.  A good example of this is
> ccp_folder::get_value.  Many other routines in tree-ssa-ccp.c want to
> use get_constant_value.  Some may be convertable to use the class
> instance, but I haven't looked closely.
>
> Another example is vrp_folder::get_value.  In this case we're wrapping
> op_with_constant_singleton_value.  In a later patch that moves into the
> to-be-introduced vr_values class so we'll delegate to that class rather
> than wrap.
>
> FWIW I did look at having a single class for the propagation engine and
> the substitution engine.  That turned out to be a bit problematical due
> to the calls into the substitution engine from the evrp bits which don't
> use the propagation engine at all.  Given propagation and substitution
> are distinct concepts I ultimately decided the cleanest path forward was
> to keep the two classes separate.
>
> Bootstrapped and regression tested on x86_64.  OK for the trunk?

So what I don't understand in this 2 part series is why you put
substitute-and-fold into a different class.

This makes it difficult for users to inherit and put the lattice in
the deriving class as we have the visit routines which will update
the lattice and the get_value hook which queries it.

So from maintaining the state for the users using a single
class whould be more appropriate.  Of course it seems like
substitute-and-fold can be used without using the SSA
propagator itself and the SSA propagator can be used
without the substitute and fold engine.

IIRC we decided against using multiple inheritance?  Which
means a user would put the lattice in the SSA propagation
engine derived class and do the inheriting via composition
as member in the substitute_and_fold engine?

Your patches keep things simple (aka the lattice and most
functions are globals), but is composition what you had
in mind when doing this class-ification?

Just thinking that if we can encapsulate the propagation
part of all our propagators we should be able to make
them work on ranges and instantiated by other consumers
(basically what you want to do for EVRP), like a hypothetical
static analysis pass.

Both patches look ok to me though it would be nice to
do the actual composition with a comment that the
lattices might be moved here (if all workers became
member functions as well).

Thanks,
Richard.

> Jeff
>
>
>
> * tree-ssa-ccp.c (ccp_folder): New class derived from
> substitute_and_fold_engine.
> (ccp_folder::get_value): New member function.
> (ccp_folder::fold_stmt): Renamed from ccp_fold_stmt.
> (ccp_fold_stmt): Remove prototype.
> (ccp_finalize): Call substitute_and_fold from the ccp_class.
> * tree-ssa-copy.c (copy_folder): New class derived from
> substitute_and_fold_engine.
> (copy_folder::get_value): Renamed from get_value.
> (fini_copy_prop): Call substitute_and_fold from copy_folder class.
> * tree-vrp.c (vrp_folder): New class derived from
> substitute_and_fold_engine.
> (vrp_folder::fold_stmt): Renamed from vrp_fold_stmt.
> (vrp_folder::get_value): New member function.
> (vrp_finalize): Call substitute_and_fold from vrp_folder class.
> (evrp_dom_walker::before_dom_children): Similarly for replace_uses_in.
> * tree-ssa-propagate.h (substitute_and_fold_engine): New class to
> provide a class interface to folder/substitute routines.
> (ssa_prop_fold_stmt_fn): Remove typedef.
> (ssa_prop_get_value_fn): Likewise.
> (subsitute_and_fold): Remove prototype.
> (replace_uses_in): Likewise.
> * tree-ssa-propagate.c (substitute_and_fold_engine::replace_uses_in):
> Renamed from replace_uses_in.  Call the virtual member function
> (substitute_and_fold_engine::replace_phi_args_in): Similarly.
> (substitute_and_fold_dom_walker): Remove initialization of
> data member entries for calbacks.  Add substitute_and_fold_engine
> member and initialize it.
> (substitute_and_fold_dom_walker::before_dom_children0: Use the
> member functions for get_value, replace_phi_args_in c
> replace_uses_in, and fold_stmt calls.
> (substitute_and_fold_engine::substitute_and_fold): Renamed from
> substitute_and_fold.  Remove assert.   Update ctor call.
>
>
> diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
> index fec562e..da06172 100644
> --- a/gcc/tree-ssa-ccp.c
> +++ b/gcc/tree-ssa-ccp.c
> @@ -188,7 +188,6 @@ static ccp_prop_value_t *const_val;
>  static unsigned n_const_val;
>
>  static void canonicalize_value (ccp_prop_value_t *);
> -static bool ccp_fold_stmt (gimple_stmt_iterator *);
>  static void ccp_lattice_meet (ccp_p

Re: [RFA][PATCH] Don't use wi->info to pass gimple location to array warning callbacks in tree-vrp.c

2017-10-26 Thread Richard Biener
On Wed, Oct 25, 2017 at 7:30 PM, Jeff Law  wrote:
>
> The array dereference warnings in tree-vrp.c use the gimple walkers to
> dig down into gimple statements looking for array accesses.  I wasn't
> keen to convert all the clients of the gimple walkers to C++ classes at
> this time.
>
> And the gimple walkers have a mechanism by which we could pass around a
> class instance -- they've got an opaque pointer (wi->info).
>
> THe pointer is already in use to hold a gimple location.  So I initially
> thought I'd have to instead have it point to a structure that would hold
> the gimple location and a class instance.
>
> However, we can get to the gimple location via other means -- we can
> just extract it from the gimple statement which is a first class entity
> within the wi structure.  That's all this patch does.
>
>
> That frees up the opaque pointer and in a future patch I can just shove
> the vr_values class instance into it.
>
> Bootstrapped and regression tested on x86_64.
>
> OK for the trunk?

OK.  The user probably was introduced before the stmt field in
walk_stmt_info.

Thanks,
Richard.

> Jeff
>
> ps.  Now to figure out a strategy for vrp_valueize, which are the last
> callbacks that need fixing to allow encapsulation of the vr_values bits.
>
>
> * tree-vrp.c (check_all_array_refs): Do not use wi->info to smuggle
> gimple statement locations.
> (check_array_bounds): Corresponding changes.  Get the statement's
> location directly from wi->stmt.
>
>
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index 2bc485c..9defbce 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -6837,10 +6837,7 @@ check_array_bounds (tree *tp, int *walk_subtree, void 
> *data)
>if (EXPR_HAS_LOCATION (t))
>  location = EXPR_LOCATION (t);
>else
> -{
> -  location_t *locp = (location_t *) wi->info;
> -  location = *locp;
> -}
> +location = gimple_location (wi->stmt);
>
>*walk_subtree = TRUE;
>
> @@ -6887,9 +6884,6 @@ check_all_array_refs (void)
>
>   memset (&wi, 0, sizeof (wi));
>
> - location_t loc = gimple_location (stmt);
> - wi.info = &loc;
> -
>   walk_gimple_op (gsi_stmt (si),
>   check_array_bounds,
>   &wi);
>


Re: [PATCH 00/13] Removal of SDB debug info support

2017-10-26 Thread Richard Biener
On Wed, Oct 25, 2017 at 11:24 PM, Jim Wilson  wrote:
> We have no targets that emit SDB debug info by default.  We dropped all
> of the SVR3 Unix and embedded COFF targets a while ago.  The only
> targets that are still able to emit SDB debug info are cygwin, mingw,
> and msdosdjgpp.
>
> I tried a cygwin build with sources modified to emit SDB by default, to
> see if the support was still usable.  I ran into multiple problems.
>  There is no SDB support for IMPORTED_DECL which was added in 2008.  -
> freorder-functions and -freorder-blocks-and-partition did not work and
> had to be disabled.  I hit a cgraph assert because sdbout.c uses
> assemble_name on types, which fails if there is a function and type
> with the same name.  This also causes types to be added to the debug
> info with prepended underscores which is wrong.  I then ran into a
> problem with the i386_pe_declare_function_type call from
> i386_pe_file_end and gave up because I didn't see an easy workaround.
>
> It seems clear that the SDB support is no longer usable, and probably
> hasn't been for a while.  This support should just be removed.
>
> SDB is both a debug info format and an old Unix debugger.  There were
> some references to the debugger that I left in, changing to past tense,
> as the comments are useful history to explain why the code was written
> the was it was.  Otherwise, I tried to eliminate all references to sdb
> as a debug info format.
>
> This patch series was tested with a C only cross compiler build for all
> modified embedded targets, a default languages build for power aix,
> i686 cygwin, and x86_64 linux.  I also did gdb testsuite runs for
> cygwin and linux.  There were no regressions.
>
> As a debug info maintainer, I can self approve some of this stuff,
> would be would be good to get a review from one of the other global
> reviewers, and/or target maintainers.

You have my approval for this.  Can you add a blurb to gcc-8/changes.html,
like "support for emitting SDB debug info has been removed" in the caveats
section?

Thanks,
Richard.

> Jim
>


Re: [PATCH 06/13] remove sdb and -gcoff from non-target files

2017-10-26 Thread Richard Biener
On Wed, Oct 25, 2017 at 11:45 PM, Jim Wilson  wrote:
> This removes the -gcoff option, and various sdb related references in
> non-target files.  I also poison SDB_DEBUGGING_INFO and SDB_DEBUG.  I
> didn't see any point in poisoning the other SDB_* macros, as no one has
> used any of them in a very long time.
>
> I noticed one odd thing from removing -gcoff, use of it or any other
> unrecognized debug info type now gives an odd looking error message.
>
> palantir:2016$ gcc -gfoo -S tmp.c
> cc1: error: unrecognised debug output level ‘foo’
> palantir:2017$
>
> We probably should only emit this error when we have a number after -g,
> and emit some other error when a non-number appears after -g, such as
> "unrecognized debug info type 'foo'".  This is a separate problem that
> I haven't tried to fix here.

You can eventually keep the option, marking it as Ignore (like we do
for options we remove but "keep" for backward compatibility).  The
diagnostic (as warning, given the option will be just ignored) could
be emited from option processing in opts.c then.

Richard.

> Jim
>
> gcc/
> * common.opt (gcoff): Delete.
> (gxcoff+): Update Negative chain.
> * defaults.h: Delete all references to SDB_DEBUGGING_INFO and
> SDB_DEBUG.
> * dwarf2out.c (gen_array_type_die): Change SDB to debuggers.
> * flag-types.h (enum debug_info_type): Delete SDB_DEBUG.
> * function.c (number_blocks): Delete SDB_DEBUGGING_INFO, SDB_DEBUG,
> and SDB references.
> (expand_function_start): Change sdb reference to past tense.
> (expand_function_end): Change sdb reference to past tense.
> * gcc.c (cpp_unique_options): Delete gcoff3 reference.
> * opts.c (debug_type_names): Delete coff entry.
> (common_handle_option): Delete OPT_gcoff case.
> * system.h (SDB_DEBUG, SDB_DEBUGGING_INFO): Poison.
> ---
>  gcc/common.opt   |  6 +-
>  gcc/defaults.h   |  9 +
>  gcc/dwarf2out.c  | 12 ++--
>  gcc/flag-types.h |  1 -
>  gcc/function.c   | 10 +-
>  gcc/gcc.c|  2 +-
>  gcc/opts.c   |  6 +-
>  gcc/system.h |  3 ++-
>  8 files changed, 17 insertions(+), 32 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 836f05b..25e86ec 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2868,10 +2868,6 @@ g
>  Common Driver RejectNegative JoinedOrMissing
>  Generate debug information in default format.
>
> -gcoff
> -Common Driver JoinedOrMissing Negative(gdwarf)
> -Generate debug information in COFF format.
> -
>  gcolumn-info
>  Common Driver Var(debug_column_info,1) Init(1)
>  Record DW_AT_decl_column and DW_AT_call_column in DWARF.
> @@ -2937,7 +2933,7 @@ Common Driver JoinedOrMissing Negative(gxcoff+)
>  Generate debug information in XCOFF format.
>
>  gxcoff+
> -Common Driver JoinedOrMissing Negative(gcoff)
> +Common Driver JoinedOrMissing Negative(gdwarf)
>  Generate debug information in extended XCOFF format.
>
>  Enum
> diff --git a/gcc/defaults.h b/gcc/defaults.h
> index 99cd9db..768c987 100644
> --- a/gcc/defaults.h
> +++ b/gcc/defaults.h
> @@ -894,14 +894,10 @@ see the files COPYING3 and COPYING.RUNTIME 
> respectively.  If not, see
>  #define DEFAULT_GDB_EXTENSIONS 1
>  #endif
>
> -#ifndef SDB_DEBUGGING_INFO
> -#define SDB_DEBUGGING_INFO 0
> -#endif
> -
>  /* If more than one debugging type is supported, you must define
> PREFERRED_DEBUGGING_TYPE to choose the default.  */
>
> -#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
> +#if 1 < (defined (DBX_DEBUGGING_INFO) \
>   + defined (DWARF2_DEBUGGING_INFO) + defined (XCOFF_DEBUGGING_INFO) \
>   + defined (VMS_DEBUGGING_INFO))
>  #ifndef PREFERRED_DEBUGGING_TYPE
> @@ -913,9 +909,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  #elif defined DBX_DEBUGGING_INFO
>  #define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
>
> -#elif SDB_DEBUGGING_INFO
> -#define PREFERRED_DEBUGGING_TYPE SDB_DEBUG
> -
>  #elif defined DWARF2_DEBUGGING_INFO || defined DWARF2_LINENO_DEBUGGING_INFO
>  #define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 81c95ec..ab66baf 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -20938,12 +20938,12 @@ gen_array_type_die (tree type, dw_die_ref 
> context_die)
>  add_AT_unsigned (array_die, DW_AT_ordering, DW_ORD_col_major);
>
>  #if 0
> -  /* We default the array ordering.  SDB will probably do
> - the right things even if DW_AT_ordering is not present.  It's not even
> - an issue until we start to get into multidimensional arrays anyway.  If
> - SDB is ever caught doing the Wrong Thing for multi-dimensional arrays,
> - then we'll have to put the DW_AT_ordering attribute back in.  (But if
> - and when we find out that we need to put these in, we will only do so
> +  /* We default the array ordering.  Debuggers will probably do the right
> + things even if DW_AT_o

Re: [PATCH GCC][3/3]Refine CFG and bound information for split loops

2017-10-26 Thread Richard Biener
On Fri, Oct 20, 2017 at 3:08 PM, Bin Cheng  wrote:
>
>
>
>
>
>
>
> From: Richard Biener 
> Sent: 20 October 2017 12:24
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org; nd
> Subject: Re: [PATCH GCC][3/3]Refine CFG and bound information for split loops
>
> On Thu, Oct 19, 2017 at 3:26 PM, Bin Cheng  wrote:
>> Hi,
>> This is a rework of patch at  
>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01037.html.
>> The new patch doesn't try to handle all cases, instead, it only handles 
>> obvious cases.
>> It also tries to add tests illustrating different cases handled.
>> Bootstrap and test for patch set on x86_64 and AArch64.  Comments?
>
> ENOPATCH
>
> Sorry for the mistake, here is the one.

+  tree cmp_rslt = gimple_build (&tmp, cmp_code, boolean_type_node, border,
+   gimple_convert (&tmp, TREE_TYPE (border),
+   newbound));
+  /* For case in which second loop must be executed, we only handle simple
+ case with unit step.  */
+  if (cmp_rslt != NULL_TREE

will always be non-NULL

+
+  gimple_seq_discard (tmp);
+  return border;
+}
+
+  /* For newbound equals border case, we can handle arbitrary steps.  */
+  cmp_rslt = gimple_build (&tmp, EQ_EXPR, boolean_type_node, border, newbound);
+  gimple_seq_discard (tmp);

you are always discarding the built stmts so why are you building them?
This is a case for the fold_buildN () API given you are only simplifying
expressions?  Or for the tree-affine stuff though that doesn't handle
compares.

What is the difficulty for niter analysis to figure an upper bound for the
second loop?  Why isn't the generated condition for the second loop
optimized in the first place if it doesn't iterate?  Why do we generate
it at all in that case?  This is for loop-split-2?  Why is the condition
not removed by scalar passes?  It looks to me loop splitting isn't
the correct place to do this optimization.

Note that if we know the loop(s) do not iterate (zero latch executions)
we should eventually just peel for the code generation rather
than duplicating the loop and un-looping it later.

Thanks,
Richard.

> Thanks,
> bin
>
>> Thanks,
>> bin
>> 2017-10-16  Bin Cheng  
>>
>> * tree-ssa-loop-split.c (compute_new_first_bound): New parameter.
>> Compute and return bound information for the second split loop.
>> (adjust_loop_split): New function.
>> (split_loop): Update use and call above function.
>>
>> gcc/testsuite/ChangeLog
>> 2017-10-16  Bin Cheng  
>>
>> * gcc.dg/loop-split-1.c: New test.
>> * gcc.dg/loop-split-2.c: New test.
>> * gcc.dg/loop-split-3.c: New test.
>


[build, libgcc, libgo] Adapt Solaris 12 references

2017-10-26 Thread Rainer Orth
With the change in the Solaris release model (no more major releases
like Solaris 12 but only minor ones like 11.4), the Solaris 12
references in GCC need to be adapted.

The following patch does this, consisting mostly of comment changes.

Only a few changes bear comment:

* Solaris 11.4 introduced __cxa_atexit, so we have to enable it on
  *-*-solaris2.11.  Not a problem for native builds which check for the
  actual availability of the function.

* gcc.dg/torture/pr60092.c was xfailed on *-*-solaris2.11*, but the
  underlying bug was fixed in Solaris 12/11.4.  However, now 11.3 and
  11.4 have the same configure triplet.  To avoid noise on the newest
  release, I've removed the xfail.

I've left a few references to Solaris 12 builds in
libstdc++-v3/acinclude.m4 because those hadn't been renamed
retroactively, of course.

install.texi needs some work, too, but I'll address this separately
because there's more than just the version change.

Bootstrapped without regressions on {i386-pc, sparc-sun}-solaris2.1[01]
(both Solaris 11.3 and 11.4).  I believe I need approval only for the
libgo parts.

I'm going to backport the patch to the gcc-7 and gcc-6 branches after a
bit of soak time.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2017-07-29  Rainer Orth  

libgo:
* Makefile.am [LIBGO_IS_SOLARIS && HAVE_STAT_TIMESPEC]: Adapt
comment for Solaris 12 renaming.
* Makefile.in: Regenerate.
* configure.ac (have_stat_timespec): Likewise.
* configure: Regenerate.
* mkrsysinfo.sh (_flow_arp_desc_t, _flow_l3_desc_t, _mac_ipaddr_t)
(_mactun_info_t): Adapt comments for Solaris 12 renaming and
backports.
* mkrsysinfo.sh (_flow_arp_desc_t, _flow_l3_desc_t, _mac_ipaddr_t)
(_mactun_info_t): Likewise.

libgcc:
* config.host (*-*-solaris2*): Adapt comment for Solaris 12
renaming.
* config/sol2/crtpg.c (__start_crt_compiler): Likewise.
* configure.ac (libgcc_cv_solaris_crts): Likewise.
* configure: Regenerate.

gcc:
* config.gcc (*-*-solaris2*): Enable default_use_cxa_atexit since
Solaris 11.  Update comment.
* configure.ac (gcc_cv_ld_pid): Adapt comment for Solaris 12
renaming.
* config/sol2.h (STARTFILE_SPEC): Likewise.
* configure: Regenerate.

gcc/testsuite:
* lib/target-supports.exp (check_effective_target_pie): Adapt
comment for Solaris 12 renaming.

* gcc.dg/torture/pr60092.c: Remove *-*-solaris2.11* dg-xfail-run-if.

# HG changeset patch
# Parent  f752fe4435b62bc0cae5d59f32c22db221b0c6f0
Adapt Solaris 12 references

diff --git a/gcc/config.gcc b/gcc/config.gcc
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -851,8 +851,8 @@ case ${target} in
   sol2_tm_file_tail="${cpu_type}/sol2.h sol2.h"
   sol2_tm_file="${sol2_tm_file_head} ${sol2_tm_file_tail}"
   case ${target} in
-*-*-solaris2.1[2-9]*)
-  # __cxa_atexit was introduced in Solaris 12.
+*-*-solaris2.1[1-9]*)
+  # __cxa_atexit was introduced in Solaris 11.4.
   default_use_cxa_atexit=yes
   ;;
   esac
diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h
--- a/gcc/config/sol2.h
+++ b/gcc/config/sol2.h
@@ -208,8 +208,8 @@ along with GCC; see the file COPYING3.  
 /* We don't use the standard svr4 STARTFILE_SPEC because it's wrong for us.  */
 #undef STARTFILE_SPEC
 #ifdef HAVE_SOLARIS_CRTS
-/* Since Solaris 11.x and Solaris 12, the OS delivers crt1.o, crti.o, and
-   crtn.o, with a hook for compiler-dependent stuff like profile handling.  */
+/* Since Solaris 11.4, the OS delivers crt1.o, crti.o, and crtn.o, with a hook
+   for compiler-dependent stuff like profile handling.  */
 #define STARTFILE_SPEC "%{!shared:%{!symbolic: \
 			  crt1.o%s \
 			  %{p:%e-p is not supported; \
diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5091,7 +5091,7 @@ elif test x$gcc_cv_ld != x; then
   else
 case "$target" in
   *-*-solaris2.1[[1-9]]*)
-	# Solaris 11.x and Solaris 12 added PIE support.
+	# Solaris 11.3 added PIE support.
 	if $gcc_cv_ld -z help 2>&1 | grep -- type.*pie > /dev/null; then
 	  gcc_cv_ld_pie=yes
 	fi
diff --git a/gcc/testsuite/gcc.dg/torture/pr60092.c b/gcc/testsuite/gcc.dg/torture/pr60092.c
--- a/gcc/testsuite/gcc.dg/torture/pr60092.c
+++ b/gcc/testsuite/gcc.dg/torture/pr60092.c
@@ -4,7 +4,6 @@
 /* { dg-skip-if "No undefined weak" { nvptx-*-* } } */
 /* { dg-additional-options "-Wl,-undefined,dynamic_lookup" { target *-*-darwin* } } */
 /* { dg-additional-options "-Wl,-flat_namespace" { target *-*-darwin[89]* } } */
-/* { dg-xfail-run-if "posix_memalign modifies first arg on error" { *-*-solaris2.11* } { "-O0" } } */
 
 typedef __SIZE_TYPE__ size_t;
 extern int posix_memalign(void **memptr, size_t alignment, size_t size) __attribute__((weak));
diff --git 

Re: [C++ Patch] PR 65579 ("gcc requires definition of a static constexpr member...")

2017-10-26 Thread Paolo Carlini

Hi again,

On 24/10/2017 20:58, Jason Merrill wrote:

This seems like an odd place to add the complete_type call.  What
happens if we change the COMPLETE_TYPE_P (type) in
cp_apply_type_quals_to_decl to COMPLETE_TYPE_P (complete_type (type))?

Finally I'm back with some information.

Simply doing the above doesn't fully work. The first symptom is the 
failure of g++.dg/init/mutable1.C which is precisely the testcase that 
you added together with the "|| !COMPLETE_TYPE_P (type)" itself: 
clearly, the additional condition isn't able anymore to do its work, 
because, first, when the type isn't complete, TYPE_HAS_MUTABLE_P (type) 
is false and then, when in fact it would be found true, we check 
!COMPLETE_TYPE_P (complete_type (type)) which is false, because 
completing succeeded.


Thus it seems we need at least something like:

   TREE_TYPE (decl) = type = complete_type (type);

   if (TYPE_HAS_MUTABLE_P (type) || !COMPLETE_TYPE_P (type))
 type_quals &= ~TYPE_QUAL_CONST;

But then, toward the end of the testsuite, we notice a more serious 
issue, which is unrelated to the above: g++.old-deja/g++.pt/poi1.C


// { dg-do assemble  }
// Origin: Gerald Pfeifer 

template 
class TLITERAL : public T
    {
    int x;
    };

class GATOM;

typedef TLITERAL x;
extern TLITERAL y;

also fails:

poi1.C: In instantiation of ‘class TLITERAL’:
poi1.C:13:24:   required from here
poi1.C:5:7: error: invalid use of incomplete type ‘class GATOM’
 class TLITERAL : public T
poi1.C:10:7: note: forward declaration of ‘class GATOM’
 class GATOM;

that is, trying to complete GATOM at the 'extern TLITERAL y;" 
line obviously fails. Note, in case isn't obvious, that this happens 
exactly for the cp_apply_type_quals_to_decl call at the end of 
grokdeclarator which I tried to change in my first try: the failure of 
poi1.C seems rather useful to figure out what we want to do for this bug.


Well, as expected, explicitly checking VAR_P && 
DECL_DECLARED_CONSTEXPR_P works again - it seems to me that after all it 
could make sense given the comment precisely talking about the 
additional complexities related to constexpr. Anyway, I'm attaching the 
corresponding complete patch.


Thanks!
Paolo.

/
Index: cp/typeck.c
===
--- cp/typeck.c (revision 254071)
+++ cp/typeck.c (working copy)
@@ -9544,6 +9544,9 @@ cp_apply_type_quals_to_decl (int type_quals, tree
 
   /* If the type has (or might have) a mutable component, that component
  might be modified.  */
+  if (VAR_P (decl) && DECL_DECLARED_CONSTEXPR_P (decl))
+TREE_TYPE (decl) = type = complete_type (type);
+
   if (TYPE_HAS_MUTABLE_P (type) || !COMPLETE_TYPE_P (type))
 type_quals &= ~TYPE_QUAL_CONST;
 
Index: testsuite/g++.dg/cpp0x/constexpr-template11.C
===
--- testsuite/g++.dg/cpp0x/constexpr-template11.C   (nonexistent)
+++ testsuite/g++.dg/cpp0x/constexpr-template11.C   (working copy)
@@ -0,0 +1,16 @@
+// PR c++/65579
+// { dg-do link { target c++11 } }
+
+template 
+struct S {
+int i;
+};
+
+struct T {
+  static constexpr S s = { 1 };
+};
+
+int main()
+{
+  return T::s.i;
+}


[build] Use -xbrace_comment=no with recent Solaris/x86 as

2017-10-26 Thread Rainer Orth
Recent versions of Solaris/x86 as (based on Studio 12.6 fbe) support
AVX512 insns, but with a caveat as explained in as(1):

   -xbrace_comment=yes

   The  assembler treats the text within the braces {} as comments. If
   you want the text within braces to be  treated  as  regular  AVX512
   instruction, place the text within double braces {{}}.

This was done for backwards compatibility reasons, it seems.  To have
full compatiblity with gas syntax, one needs to pass
-xbrace_comment=no.  The following patch checks if the assembler used
supports that option and passes it on if so.

Bootstrapped without regressions on i386-pc-solaris2.1[01].  Will
install shortly and backport to the gcc-7 and gcc-6 branches.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2016-12-16  Rainer Orth  

* configure.ac (gcc_cv_as_ix86_xbrace_comment): Check if assembler
supports -xbrace_comment option.
* configure: Regenerate.
* config.in: Regenerate.
* config/i386/sol2.h (ASM_XBRACE_COMMENT_SPEC): Define.
(ASM_CPU_SPEC): Use it.

# HG changeset patch
# Parent  bbb7e8e9e5d2b76cd92a4ae42703471b2a14f898
Use -xbrace_comment=no with recent Solaris/x86 as

diff --git a/gcc/config/i386/sol2.h b/gcc/config/i386/sol2.h
--- a/gcc/config/i386/sol2.h
+++ b/gcc/config/i386/sol2.h
@@ -65,8 +65,16 @@ along with GCC; see the file COPYING3.  
 #define ASM_CPU64_DEFAULT_SPEC "-xarch=generic64"
 #endif
 
+/* Since Studio 12.6, as needs -xbrace_comment=no so its AVX512 syntax is
+   fully compatible with gas.  */
+#ifdef HAVE_AS_XBRACE_COMMENT_OPTION
+#define ASM_XBRACE_COMMENT_SPEC "-xbrace_comment=no"
+#else
+#define ASM_XBRACE_COMMENT_SPEC ""
+#endif
+
 #undef ASM_CPU_SPEC
-#define ASM_CPU_SPEC "%(asm_cpu_default)"
+#define ASM_CPU_SPEC "%(asm_cpu_default) " ASM_XBRACE_COMMENT_SPEC
 
 /* Don't include ASM_PIC_SPEC.  While the Solaris 10+ assembler accepts -K PIC,
it gives many warnings: 
diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4033,6 +4033,11 @@ foo:	nop
 	;;
 esac
 
+gcc_GAS_CHECK_FEATURE([-xbrace_comment], gcc_cv_as_ix86_xbrace_comment,,
+  [-xbrace_comment=no], [.text],,
+  [AC_DEFINE(HAVE_AS_XBRACE_COMMENT_OPTION, 1,
+		[Define if your assembler supports -xbrace_comment option.])])
+
 # Test if the assembler supports the section flag 'e' for specifying
 # an excluded section.
 gcc_GAS_CHECK_FEATURE([.section with e], gcc_cv_as_section_has_e,


[Diagnostic Patch] don't print column zero

2017-10-26 Thread Nathan Sidwell
On the modules branch, I'm starting to add location information.  Line 
numbers don't really make sense when reporting errors reading a binary 
file, so I wanted to change the diagnostics such that line number zero 
(which is not a line) is not printed -- one just gets the file name.  I 
then noticed that we don't elide column zero (also, not a column outside 
of emacsland).


This patch changes the diagnostics, such that line-zero prints neither 
line nor column and column-zero doesn't print the column.


The testsuite presumes that all diagnostics have a column (which may or 
may not be specified in the test pattern).  This patch augments it such 
that a prefix of '-:' indicates 'no column'.  We still default to 
expecting a column


The vast bulk is annotating C & C++ tests that do not have a column. 
Some of those were explicitly checking for column-zero, but many just 
expected some arbitrary column number, which happened to be zero.  Of 
course many (most?) of these diagnostics could be improved to provide a 
column.  Most are from the preprocessor.


While this is a change in the compiler's output, it's effectively 
returning to a pre-column formatting for the cases where the column 
number is not known.  I'd expect (hope?) error message parsers to be 
robust in that case. (I've found it confusing when column-zero is 
printed, as I think columns might be zero-based after all.)


bootstrapped on all languages.

ok?

nathan
--
Nathan Sidwell
2017-10-25  Nathan Sidwell  

	* diagnostic.c (maybe_line_and_column): New.
	(diagnostic_get_location_text): Use it.
	(diagnostic_report_current_module): Likewise.
	testsuite/
	* lib/gcc-dg.exp (process-message): Use -: for no column.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Mark elided
	column messages.
	* c-c++-common/cpp/pr58844-1.c: Likewise.
	* c-c++-common/cpp/pr58844-2.c: Likewise.
	* c-c++-common/cpp/warning-zero-location.c
	* g++.dg/diagnostic/pr77949.C: Likewise.
	* g++.dg/gomp/macro-4.C: Likewise.
	* gcc.dg/Wunknownprag.c: Likewise.
	* gcc.dg/builtin-redefine.c: Likewise.
	* gcc.dg/cpp/Wunknown-pragmas-1.c: Likewise.
	* gcc.dg/cpp/Wunused.c: Likewise.
	* gcc.dg/cpp/misspelled-directive-1.c: Likewise.
	* gcc.dg/cpp/redef2.c: Likewise.
	* gcc.dg/cpp/redef3.c: Likewise.
	* gcc.dg/cpp/redef4.c: Likewise.
	* gcc.dg/cpp/trad/Wunused.c: Likewise.
	* gcc.dg/cpp/trad/argcount.c: Likewise.
	* gcc.dg/cpp/trad/comment-3.c: Likewise.
	* gcc.dg/cpp/trad/comment.c: Likewise.
	* gcc.dg/cpp/trad/defined.c: Likewise.
	* gcc.dg/cpp/trad/directive.c: Likewise.
	* gcc.dg/cpp/trad/funlike-3.c: Likewise.
	* gcc.dg/cpp/trad/funlike.c: Likewise.
	* gcc.dg/cpp/trad/literals-2.c: Likewise.
	* gcc.dg/cpp/trad/macro.c: Likewise.
	* gcc.dg/cpp/trad/pr65238-4.c: Likewise.
	* gcc.dg/cpp/trad/recurse-1.c: Likewise.
	* gcc.dg/cpp/trad/recurse-2.c: Likewise.
	* gcc.dg/cpp/trad/redef2.c: Likewise.
	* gcc.dg/cpp/ucnid-11.c: Likewise.
	* gcc.dg/cpp/unc1.c: Likewise.
	* gcc.dg/cpp/unc2.c: Likewise.
	* gcc.dg/cpp/unc3.c: Likewise.
	* gcc.dg/cpp/unc4.c: Likewise.
	* gcc.dg/cpp/undef2.c: Likewise.
	* gcc.dg/cpp/warn-redefined-2.c: Likewise.
	* gcc.dg/cpp/warn-redefined.c: Likewise.
	* gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
	* gcc.dg/cpp/warn-unused-macros.c: Likewise.
	* gcc.dg/empty-source-2.c: Likewise.
	* gcc.dg/empty-source-3.c: Likewise.
	* gcc.dg/gomp/macro-4.c: Likewise.
	* gcc.dg/noncompile/pr35447-1.c: Likewise.
	* gcc.dg/plugin/location-overflow-test-1.c: Likewise.
	* gcc.dg/pr20245-1.c: Likewise.
	* gcc.dg/pr28419.c: Likewise.
	* gcc.dg/rtl/truncated-rtl-file.c: Likewise.
	* gcc.dg/unclosed-init.c: Likewise.

Index: gcc/diagnostic.c
===
--- gcc/diagnostic.c	(revision 254060)
+++ gcc/diagnostic.c	(working copy)
@@ -293,6 +293,24 @@ diagnostic_get_color_for_kind (diagnosti
   return diagnostic_kind_color[kind];
 }
 
+/* Return a formatted line and column ':%line:%column'.  Elided if
+   zero.  The result is a statically allocated buffer.  */
+
+static const char *
+maybe_line_and_column (int line, int col)
+{
+  static char result[32];
+
+  if (line)
+{
+  size_t l = sprintf (result, col ? ":%d:%d" : ":%d", line, col);
+  gcc_checking_assert (l + 1 < sizeof (result));
+}
+  else
+result[0] = 0;
+  return result;
+}
+
 /* Return a malloc'd string describing a location e.g. "foo.c:42:10".
The caller is responsible for freeing the memory.  */
 
@@ -303,19 +321,13 @@ diagnostic_get_location_text (diagnostic
   pretty_printer *pp = context->printer;
   const char *locus_cs = colorize_start (pp_show_color (pp), "locus");
   const char *locus_ce = colorize_stop (pp_show_color (pp));
-
-  if (s.file == NULL)
-return build_message_string ("%s%s:%s", locus_cs, progname, locus_ce);
-
-  if (!strcmp (s.file, N_("")))
-return build_message_string ("%s%s:%s", locus_cs, s.file, locus_ce);
-
-  if (context->show_column)
-return build_message_string ("%s%s:%d:%d:%s", locus_cs, s.file, 

Re: [04/nn] Add a VEC_SERIES rtl code

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:19 PM, Richard Sandiford
 wrote:
> This patch adds an rtl representation of a vector linear series
> of the form:
>
>   a[I] = BASE + I * STEP
>
> Like vec_duplicate;
>
> - the new rtx can be used for both constant and non-constant vectors
> - when used for constant vectors it is wrapped in a (const ...)
> - the constant form is only used for variable-length vectors;
>   fixed-length vectors still use CONST_VECTOR
>
> At the moment the code is restricted to integer elements, to avoid
> concerns over floating-point rounding.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * doc/rtl.texi (vec_series): Document.
> (const): Say that the operand can be a vec_series.
> * rtl.def (VEC_SERIES): New rtx code.
> * rtl.h (const_vec_series_p_1): Declare.
> (const_vec_series_p): New function.
> * emit-rtl.h (gen_const_vec_series): Declare.
> (gen_vec_series): Likewise.
> * emit-rtl.c (const_vec_series_p_1, gen_const_vec_series)
> (gen_vec_series): Likewise.
> * optabs.c (expand_mult_highpart): Use gen_const_vec_series.
> * simplify-rtx.c (simplify_unary_operation): Handle negations
> of vector series.
> (simplify_binary_operation_series): New function.
> (simplify_binary_operation_1): Use it.  Handle VEC_SERIES.
> (test_vector_ops_series): New function.
> (test_vector_ops): Call it.
> * config/powerpcspe/altivec.md (altivec_lvsl): Use
> gen_const_vec_series.
> (altivec_lvsr): Likewise.
> * config/rs6000/altivec.md (altivec_lvsl, altivec_lvsr): Likewise.
>
> Index: gcc/doc/rtl.texi
> ===
> --- gcc/doc/rtl.texi2017-10-23 11:41:39.185050437 +0100
> +++ gcc/doc/rtl.texi2017-10-23 11:41:41.547050496 +0100
> @@ -1677,7 +1677,8 @@ are target-specific and typically repres
>  operator.  @var{m} should be a valid address mode.
>
>  The second use of @code{const} is to wrap a vector operation.
> -In this case @var{exp} must be a @code{vec_duplicate} expression.
> +In this case @var{exp} must be a @code{vec_duplicate} or
> +@code{vec_series} expression.
>
>  @findex high
>  @item (high:@var{m} @var{exp})
> @@ -2722,6 +2723,10 @@ the same submodes as the input vector mo
>  number of output parts must be an integer multiple of the number of input
>  parts.
>
> +@findex vec_series
> +@item (vec_series:@var{m} @var{base} @var{step})
> +This operation creates a vector in which element @var{i} is equal to
> +@samp{@var{base} + @var{i}*@var{step}}.  @var{m} must be a vector integer 
> mode.
>  @end table
>
>  @node Conversions
> Index: gcc/rtl.def
> ===
> --- gcc/rtl.def 2017-10-23 11:40:11.378243915 +0100
> +++ gcc/rtl.def 2017-10-23 11:41:41.549050496 +0100
> @@ -710,6 +710,11 @@ DEF_RTL_EXPR(VEC_CONCAT, "vec_concat", "
> an integer multiple of the number of input parts.  */
>  DEF_RTL_EXPR(VEC_DUPLICATE, "vec_duplicate", "e", RTX_UNARY)
>
> +/* Creation of a vector in which element I has the value BASE + I * STEP,
> +   where BASE is the first operand and STEP is the second.  The result
> +   must have a vector integer mode.  */
> +DEF_RTL_EXPR(VEC_SERIES, "vec_series", "ee", RTX_BIN_ARITH)
> +
>  /* Addition with signed saturation */
>  DEF_RTL_EXPR(SS_PLUS, "ss_plus", "ee", RTX_COMM_ARITH)
>
> Index: gcc/rtl.h
> ===
> --- gcc/rtl.h   2017-10-23 11:41:39.188050437 +0100
> +++ gcc/rtl.h   2017-10-23 11:41:41.549050496 +0100
> @@ -2816,6 +2816,51 @@ unwrap_const_vec_duplicate (T x)
>return x;
>  }
>
> +/* In emit-rtl.c.  */
> +extern bool const_vec_series_p_1 (const_rtx, rtx *, rtx *);
> +
> +/* Return true if X is a constant vector that contains a linear series
> +   of the form:
> +
> +   { B, B + S, B + 2 * S, B + 3 * S, ... }
> +
> +   for a nonzero S.  Store B and S in *BASE_OUT and *STEP_OUT on sucess.  */
> +
> +inline bool
> +const_vec_series_p (const_rtx x, rtx *base_out, rtx *step_out)
> +{
> +  if (GET_CODE (x) == CONST_VECTOR
> +  && GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
> +return const_vec_series_p_1 (x, base_out, step_out);
> +  if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_SERIES)
> +{
> +  *base_out = XEXP (XEXP (x, 0), 0);
> +  *step_out = XEXP (XEXP (x, 0), 1);
> +  return true;
> +}
> +  return false;
> +}
> +
> +/* Return true if X is a vector that contains a linear series of the
> +   form:
> +
> +   { B, B + S, B + 2 * S, B + 3 * S, ... }
> +
> +   where B and S are constant or nonconstant.  Store B and S in
> +   *BASE_OUT and *STEP_OUT on sucess.  */
> +
> +inline bool
> +vec_series_p (const_rtx x, rtx *base_out, rtx *step_out)
> +{
> +  if (GET_CODE (x) == VEC_SERIES)
> +{
> +  *base_out = X

Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
 wrote:
> SVE needs a way of broadcasting a scalar to a variable-length vector.
> This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for
> fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would
> be used for fixed-length vectors.  VEC_DUPLICATE_EXPR is the tree
> equivalent of the existing rtl code VEC_DUPLICATE.
>
> Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT
> to mark constant nodes, but in response to last year's RFC, Richard B.
> suggested it would be better to have separate codes for the constant
> and non-constant cases.  This allows VEC_DUPLICATE_EXPR to be treated
> as a normal unary operation and avoids the previous need for treating
> it as a GIMPLE_SINGLE_RHS.
>
> It might make sense to use VEC_DUPLICATE_CST for all duplicated
> vector constants, since it's a bit more compact than VECTOR_CST
> in that case, and is potentially more efficient to process.
> However, the nice thing about keeping it restricted to variable-length
> vectors is that there is then no need to handle combinations of
> VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use
> VECTOR_CST or never use it.
>
> The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR.

Index: gcc/tree-vect-generic.c
===
--- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100
@@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
 ssa_uniform_vector_p (tree op)
  {
 if (TREE_CODE (op) == VECTOR_CST
 +  || TREE_CODE (op) == VEC_DUPLICATE_CST
|| TREE_CODE (op) == CONSTRUCTOR)
 return uniform_vector_p (op);

VEC_DUPLICATE_EXPR handling?  Looks like for VEC_DUPLICATE_CST
it could directly return true.

I didn't see uniform_vector_p being updated?

Can you add verification to either verify_expr or build_vec_duplicate_cst
that the type is one of variable size?  And amend tree.def docs
accordingly.  Because otherwise we miss a lot of cases in constant
folding (mixing VEC_DUPLICATE_CST and VECTOR_CST).

Otherwise looks ok to me.

Thanks,
Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document.
> (VEC_COND_EXPR): Add missing @tindex.
> * doc/md.texi (vec_duplicate@var{m}): Document.
> * tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes.
> * tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW
> are used for VEC_DUPLICATE_CST as well.
> (tree_vector): Access base.n.nelts directly.
> * tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of
> valid codes.
> (VEC_DUPLICATE_CST_ELT): New macro.
> (build_vec_duplicate_cst): Declare.
> * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
> (integer_zerop, integer_onep, integer_all_onesp, integer_truep)
> (real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop)
> (walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST.
> (build_vec_duplicate_cst): New function.
> (uniform_vector_p): Handle the new codes.
> (test_vec_duplicate_predicates_int): New function.
> (test_vec_duplicate_predicates_float): Likewise.
> (test_vec_duplicate_predicates): Likewise.
> (tree_c_tests): Call test_vec_duplicate_predicates.
> * cfgexpand.c (expand_debug_expr): Handle the new codes.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST.
> * gimple-expr.h (is_gimple_constant): Likewise.
> * gimplify.c (gimplify_expr): Likewise.
> * graphite-isl-ast-to-gimple.c
> (translate_isl_ast_to_gimple::is_constant): Likewise.
> * graphite-scop-detection.c (scan_tree_for_params): Likewise.
> * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
> (func_checker::compare_operand): Likewise.
> * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
> * match.pd (negate_expr_p): Likewise.
> * print-tree.c (print_node): Likewise.
> * tree-chkp.c (chkp_find_bounds_1): Likewise.
> * tree-loop-distribution.c (const_with_all_bytes_same): Likewise.
> * tree-ssa-loop.c (for_each_index): Likewise.
> * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
> (ao_ref_init_from_vn_reference): Likewise.
> * tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
> * varasm.c (const_hash_1, compare_constant): Likewise.
> * fold-const.c (negate_expr_p, fold_negate_expr_1

Re: [08/nn] Add a fixed_size_mode class

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
 wrote:
> This patch adds a fixed_size_mode machine_mode wrapper
> for modes that are known to have a fixed size.  That applies
> to all current modes, but future patches will add support for
> variable-sized modes.
>
> The use of this class should be pretty restricted.  One important
> use case is to hold the mode of static data, which can never be
> variable-sized with current file formats.  Another is to hold
> the modes of registers involved in __builtin_apply and
> __builtin_result, since those interfaces don't cope well with
> variable-sized data.
>
> The class can also be useful when reinterpreting the contents of
> a fixed-length bit string as a different kind of value.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * machmode.h (fixed_size_mode): New class.
> * rtl.h (get_pool_mode): Return fixed_size_mode.
> * gengtype.c (main): Add fixed_size_mode.
> * target.def (get_raw_result_mode): Return a fixed_size_mode.
> (get_raw_arg_mode): Likewise.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_get_reg_raw_mode): Return a fixed_size_mode.
> * targhooks.c (default_get_reg_raw_mode): Likewise.
> * config/ia64/ia64.c (ia64_get_reg_raw_mode): Likewise.
> * config/mips/mips.c (mips_get_reg_raw_mode): Likewise.
> * config/msp430/msp430.c (msp430_get_raw_arg_mode): Likewise.
> (msp430_get_raw_result_mode): Likewise.
> * config/avr/avr-protos.h (regmask): Use as_a 
> * dbxout.c (dbxout_parms): Require fixed-size modes.
> * expr.c (copy_blkmode_from_reg, copy_blkmode_to_reg): Likewise.
> * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Likewise.
> * omp-low.c (lower_oacc_reductions): Likewise.
> * simplify-rtx.c (simplify_immed_subreg): Take fixed_size_modes.
> (simplify_subreg): Update accordingly.
> * varasm.c (constant_descriptor_rtx::mode): Change to fixed_size_mode.
> (force_const_mem): Update accordingly.  Return NULL_RTX for modes
> that aren't fixed-size.
> (get_pool_mode): Return a fixed_size_mode.
> (output_constant_pool_2): Take a fixed_size_mode.
>
> Index: gcc/machmode.h
> ===
> --- gcc/machmode.h  2017-09-15 14:47:33.184331588 +0100
> +++ gcc/machmode.h  2017-10-23 11:42:52.014721093 +0100
> @@ -652,6 +652,39 @@ GET_MODE_2XWIDER_MODE (const T &m)
>  extern const unsigned char mode_complex[NUM_MACHINE_MODES];
>  #define GET_MODE_COMPLEX_MODE(MODE) ((machine_mode) mode_complex[MODE])
>
> +/* Represents a machine mode that must have a fixed size.  The main
> +   use of this class is to represent the modes of objects that always
> +   have static storage duration, such as constant pool entries.
> +   (No current target supports the concept of variable-size static data.)  */
> +class fixed_size_mode
> +{
> +public:
> +  typedef mode_traits::from_int from_int;
> +
> +  ALWAYS_INLINE fixed_size_mode () {}
> +  ALWAYS_INLINE fixed_size_mode (from_int m) : m_mode (machine_mode (m)) {}
> +  ALWAYS_INLINE fixed_size_mode (const scalar_mode &m) : m_mode (m) {}
> +  ALWAYS_INLINE fixed_size_mode (const scalar_int_mode &m) : m_mode (m) {}
> +  ALWAYS_INLINE fixed_size_mode (const scalar_float_mode &m) : m_mode (m) {}
> +  ALWAYS_INLINE fixed_size_mode (const scalar_mode_pod &m) : m_mode (m) {}
> +  ALWAYS_INLINE fixed_size_mode (const scalar_int_mode_pod &m) : m_mode (m) 
> {}
> +  ALWAYS_INLINE fixed_size_mode (const complex_mode &m) : m_mode (m) {}
> +  ALWAYS_INLINE operator machine_mode () const { return m_mode; }
> +
> +  static bool includes_p (machine_mode);
> +
> +protected:
> +  machine_mode m_mode;
> +};
> +
> +/* Return true if MODE has a fixed size.  */
> +
> +inline bool
> +fixed_size_mode::includes_p (machine_mode)
> +{
> +  return true;
> +}
> +
>  extern opt_machine_mode mode_for_size (unsigned int, enum mode_class, int);
>
>  /* Return the machine mode to use for a MODE_INT of SIZE bits, if one
> Index: gcc/rtl.h
> ===
> --- gcc/rtl.h   2017-10-23 11:42:47.297720974 +0100
> +++ gcc/rtl.h   2017-10-23 11:42:52.015721094 +0100
> @@ -3020,7 +3020,7 @@ extern rtx force_const_mem (machine_mode
>  struct function;
>  extern rtx get_pool_constant (const_rtx);
>  extern rtx get_pool_constant_mark (rtx, bool *);
> -extern machine_mode get_pool_mode (const_rtx);
> +extern fixed_size_mode get_pool_mode (const_rtx);
>  extern rtx simplify_subtraction (rtx);
>  extern void decide_function_section (tree);
>
> Index: gcc/gengtype.c
> ===
> --- gcc/gengtype.c  2017-05-23 19:29:56.919436344 +0100
> +++ gcc/gengtype.c  2017-10-23 11:42:52.014721093 +0100
> @@ -5197,6 +5197,7 @@ #def

Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
 wrote:
> This patch adds a POD version of fixed_size_mode.  The only current use
> is for storing the __builtin_apply and __builtin_result register modes,
> which were made fixed_size_modes by the previous patch.

Bah - can we update our host compiler to C++11/14 please ...?
(maybe requiring that build with GCC 4.8 as host compiler works,
GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).

Ok.

Thanks,
Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * coretypes.h (fixed_size_mode): Declare.
> (fixed_size_mode_pod): New typedef.
> * builtins.h (target_builtins::x_apply_args_mode)
> (target_builtins::x_apply_result_mode): Change type to
> fixed_size_mode_pod.
> * builtins.c (apply_args_size, apply_result_size, result_vector)
> (expand_builtin_apply_args_1, expand_builtin_apply)
> (expand_builtin_return): Update accordingly.
>
> Index: gcc/coretypes.h
> ===
> --- gcc/coretypes.h 2017-09-11 17:10:58.656085547 +0100
> +++ gcc/coretypes.h 2017-10-23 11:42:57.592545063 +0100
> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
>  class scalar_int_mode;
>  class scalar_float_mode;
>  class complex_mode;
> +class fixed_size_mode;
>  template class opt_mode;
>  typedef opt_mode opt_scalar_mode;
>  typedef opt_mode opt_scalar_int_mode;
> @@ -66,6 +67,7 @@ typedef opt_mode opt_
>  template class pod_mode;
>  typedef pod_mode scalar_mode_pod;
>  typedef pod_mode scalar_int_mode_pod;
> +typedef pod_mode fixed_size_mode_pod;
>
>  /* Subclasses of rtx_def, using indentation to show the class
> hierarchy, along with the relevant invariant.
> Index: gcc/builtins.h
> ===
> --- gcc/builtins.h  2017-08-30 12:18:46.602740973 +0100
> +++ gcc/builtins.h  2017-10-23 11:42:57.592545063 +0100
> @@ -29,14 +29,14 @@ struct target_builtins {
>   the register is not used for calling a function.  If the machine
>   has register windows, this gives only the outbound registers.
>   INCOMING_REGNO gives the corresponding inbound register.  */
> -  machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
> +  fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>
>/* For each register that may be used for returning values, this gives
>   a mode used to copy the register's value.  VOIDmode indicates the
>   register is not used for returning values.  If the machine has
>   register windows, this gives only the outbound registers.
>   INCOMING_REGNO gives the corresponding inbound register.  */
> -  machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
> +  fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>  };
>
>  extern struct target_builtins default_target_builtins;
> Index: gcc/builtins.c
> ===
> --- gcc/builtins.c  2017-10-23 11:41:23.140260335 +0100
> +++ gcc/builtins.c  2017-10-23 11:42:57.592545063 +0100
> @@ -1358,7 +1358,6 @@ apply_args_size (void)
>static int size = -1;
>int align;
>unsigned int regno;
> -  machine_mode mode;
>
>/* The values computed by this function never change.  */
>if (size < 0)
> @@ -1374,7 +1373,7 @@ apply_args_size (void)
>for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> if (FUNCTION_ARG_REGNO_P (regno))
>   {
> -   mode = targetm.calls.get_raw_arg_mode (regno);
> +   fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>
> gcc_assert (mode != VOIDmode);
>
> @@ -1386,7 +1385,7 @@ apply_args_size (void)
>   }
> else
>   {
> -   apply_args_mode[regno] = VOIDmode;
> +   apply_args_mode[regno] = as_a  (VOIDmode);
>   }
>  }
>return size;
> @@ -1400,7 +1399,6 @@ apply_result_size (void)
>  {
>static int size = -1;
>int align, regno;
> -  machine_mode mode;
>
>/* The values computed by this function never change.  */
>if (size < 0)
> @@ -1410,7 +1408,7 @@ apply_result_size (void)
>for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> if (targetm.calls.function_value_regno_p (regno))
>   {
> -   mode = targetm.calls.get_raw_result_mode (regno);
> +   fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
>
> gcc_assert (mode != VOIDmode);
>
> @@ -1421,7 +1419,7 @@ apply_result_size (void)
> apply_result_mode[regno] = mode;
>   }
> else
> - apply_result_mode[regno] = VOIDmode;
> + apply_result_mode[regno] = as_a  (VOIDmode);
>
>/* Allow targets that use untyped_call and untyped_return to override
>  the size so that machine-specific information can be stored

Re: [12/nn] Add an is_narrower_int_mode helper function

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:24 PM, Richard Sandiford
 wrote:
> This patch adds a function for testing whether an arbitrary mode X
> is an integer mode that is narrower than integer mode Y.  This is
> useful for code like expand_float and expand_fix that could in
> principle handle vectors as well as scalars.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * machmode.h (is_narrower_int_mode): New function
> * optabs.c (expand_float, expand_fix): Use it.
> * dwarf2out.c (rotate_loc_descriptor): Likewise.
>
> Index: gcc/machmode.h
> ===
> --- gcc/machmode.h  2017-10-23 11:44:06.561720156 +0100
> +++ gcc/machmode.h  2017-10-23 11:44:23.979432614 +0100
> @@ -893,6 +893,17 @@ is_complex_float_mode (machine_mode mode
>return false;
>  }
>
> +/* Return true if MODE is a scalar integer mode with a precision
> +   smaller than LIMIT's precision.  */
> +
> +inline bool
> +is_narrower_int_mode (machine_mode mode, scalar_int_mode limit)
> +{
> +  scalar_int_mode int_mode;
> +  return (is_a  (mode, &int_mode)
> + && GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (limit));
> +}
> +
>  namespace mode_iterator
>  {
>/* Start mode iterator *ITER at the first mode in class MCLASS, if any.  */
> Index: gcc/optabs.c
> ===
> --- gcc/optabs.c2017-10-23 11:44:07.732431531 +0100
> +++ gcc/optabs.c2017-10-23 11:44:23.980398548 +0100
> @@ -4820,7 +4820,7 @@ expand_float (rtx to, rtx from, int unsi
>rtx value;
>convert_optab tab = unsignedp ? ufloat_optab : sfloat_optab;
>
> -  if (GET_MODE_PRECISION (GET_MODE (from)) < GET_MODE_PRECISION (SImode))
> +  if (is_narrower_int_mode (GET_MODE (from), SImode))
> from = convert_to_mode (SImode, from, unsignedp);
>
>libfunc = convert_optab_libfunc (tab, GET_MODE (to), GET_MODE (from));
> @@ -5002,7 +5002,7 @@ expand_fix (rtx to, rtx from, int unsign
>   that the mode of TO is at least as wide as SImode, since those are the
>   only library calls we know about.  */
>
> -  if (GET_MODE_PRECISION (GET_MODE (to)) < GET_MODE_PRECISION (SImode))
> +  if (is_narrower_int_mode (GET_MODE (to), SImode))
>  {
>target = gen_reg_rtx (SImode);
>
> Index: gcc/dwarf2out.c
> ===
> --- gcc/dwarf2out.c 2017-10-23 11:44:05.684652559 +0100
> +++ gcc/dwarf2out.c 2017-10-23 11:44:23.979432614 +0100
> @@ -14530,8 +14530,7 @@ rotate_loc_descriptor (rtx rtl, scalar_i
>dw_loc_descr_ref op0, op1, ret, mask[2] = { NULL, NULL };
>int i;
>
> -  if (GET_MODE (rtlop1) != VOIDmode
> -  && GET_MODE_BITSIZE (GET_MODE (rtlop1)) < GET_MODE_BITSIZE (mode))
> +  if (is_narrower_int_mode (GET_MODE (rtlop1), mode))
>  rtlop1 = gen_rtx_ZERO_EXTEND (mode, rtlop1);
>op0 = mem_loc_descriptor (XEXP (rtl, 0), mode, mem_mode,
> VAR_INIT_STATUS_INITIALIZED);


Re: [13/nn] More is_a

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
 wrote:
> alias.c:find_base_term and find_base_value checked:
>
>   if (GET_MODE_SIZE (GET_MODE (src)) < GET_MODE_SIZE (Pmode))
>
> but (a) comparing the precision seems more correct, since it's possible
> for modes to have the same memory size as Pmode but fewer bits and
> (b) the functions are called on arbitrary rtl, so there's no guarantee
> that we're handling an integer truncation.
>
> Since there's no point processing truncations of anything other than an
> integer, this patch checks that first.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * alias.c (find_base_value, find_base_term): Only process integer
> truncations.  Check the precision rather than the size.
>
> Index: gcc/alias.c
> ===
> --- gcc/alias.c 2017-10-23 11:41:25.511925516 +0100
> +++ gcc/alias.c 2017-10-23 11:44:27.544693078 +0100
> @@ -1349,6 +1349,7 @@ known_base_value_p (rtx x)
>  find_base_value (rtx src)
>  {
>unsigned int regno;
> +  scalar_int_mode int_mode;
>
>  #if defined (FIND_BASE_TERM)
>/* Try machine-dependent ways to find the base term.  */
> @@ -1475,7 +1476,8 @@ find_base_value (rtx src)
>  address modes depending on the address space.  */
>if (!target_default_pointer_address_modes_p ())
> break;
> -  if (GET_MODE_SIZE (GET_MODE (src)) < GET_MODE_SIZE (Pmode))
> +  if (!is_a  (GET_MODE (src), &int_mode)
> + || GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (Pmode))
> break;
>/* Fall through.  */
>  case HIGH:
> @@ -1876,6 +1878,7 @@ find_base_term (rtx x)
>cselib_val *val;
>struct elt_loc_list *l, *f;
>rtx ret;
> +  scalar_int_mode int_mode;
>
>  #if defined (FIND_BASE_TERM)
>/* Try machine-dependent ways to find the base term.  */
> @@ -1893,7 +1896,8 @@ find_base_term (rtx x)
>  address modes depending on the address space.  */
>if (!target_default_pointer_address_modes_p ())
> return 0;
> -  if (GET_MODE_SIZE (GET_MODE (x)) < GET_MODE_SIZE (Pmode))
> +  if (!is_a  (GET_MODE (x), &int_mode)
> + || GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (Pmode))
> return 0;
>/* Fall through.  */
>  case HIGH:


Re: [PATCH 8/N][RFC] GCOV: support multiple functions per a line

2017-10-26 Thread Nathan Sidwell

On 10/26/2017 04:44 AM, Martin Liška wrote:

Hi.

As mentioned in cover letter this patch was main motivation for the whole 
series.
Currently we have a list of lines (source_info::lines) per a source file. That's
changed in the patch, now each functions has:
map> source_lines;
Thus separate lines for function for each source file the function lives in.
Having a group of function starting on a line, we print first summary and then
each individual function:


This looks great





--
Nathan Sidwell


Re: [14/nn] Add helpers for shift count modes

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
 wrote:
> This patch adds a stub helper routine to provide the mode
> of a scalar shift amount, given the mode of the values
> being shifted.
>
> One long-standing problem has been to decide what this mode
> should be for arbitrary rtxes (as opposed to those directly
> tied to a target pattern).  Is it the mode of the shifted
> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
> the corresponding target pattern says?  (In which case what
> should the mode be when the target doesn't have a pattern?)
>
> For now the patch picks word_mode, which should be safe on
> all targets but could perhaps become suboptimal if the helper
> routine is used more often than it is in this patch.  As it
> stands the patch does not change the generated code.
>
> The patch also adds a helper function that constructs rtxes
> for constant shift amounts, again given the mode of the value
> being shifted.  As well as helping with the SVE patches, this
> is one step towards allowing CONST_INTs to have a real mode.

I think gen_shift_amount_mode is flawed and while encapsulating
constant shift amount RTX generation into a gen_int_shift_amount
looks good to me I'd rather have that ??? in this function (and
I'd use the mode of the RTX shifted, not word_mode...).

In the end it's up to insn recognizing to convert the op to the
expected mode and for generic RTL it's us that should decide
on the mode -- on GENERIC the shift amount has to be an
integer so why not simply use a mode that is large enough to
make the constant fit?

Just throwing in some comments here, RTL isn't my primary
expertise.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * target.h (get_shift_amount_mode): New function.
> * emit-rtl.h (gen_int_shift_amount): Declare.
> * emit-rtl.c (gen_int_shift_amount): New function.
> * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
> instead of GEN_INT.
> * calls.c (shift_return_value): Likewise.
> * cse.c (fold_rtx): Likewise.
> * dse.c (find_shift_sequence): Likewise.
> * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
> (expand_shift, expand_smod_pow2): Likewise.
> * lower-subreg.c (shift_cost): Likewise.
> * simplify-rtx.c (simplify_unary_operation_1): Likewise.
> (simplify_binary_operation_1): Likewise.
> * combine.c (try_combine, find_split_point, force_int_to_mode)
> (simplify_shift_const_1, simplify_shift_const): Likewise.
> (change_zero_ext): Likewise.  Use simplify_gen_binary.
> * optabs.c (expand_superword_shift, expand_doubleword_mult)
> (expand_unop): Use gen_int_shift_amount instead of GEN_INT.
> (expand_binop): Likewise.  Use get_shift_amount_mode instead
> of word_mode as the mode of a CONST_INT shift amount.
> (shift_amt_for_vec_perm_mask): Add a machine_mode argument.
> Use gen_int_shift_amount instead of GEN_INT.
> (expand_vec_perm): Update caller accordingly.  Use
> gen_int_shift_amount instead of GEN_INT.
>
> Index: gcc/target.h
> ===
> --- gcc/target.h2017-10-23 11:47:06.643477568 +0100
> +++ gcc/target.h2017-10-23 11:47:11.277288162 +0100
> @@ -209,6 +209,17 @@ #define HOOKSTRUCT(FRAGMENT) FRAGMENT
>
>  extern struct gcc_target targetm;
>
> +/* Return the mode that should be used to hold a scalar shift amount
> +   when shifting values of the given mode.  */
> +/* ??? This could in principle be generated automatically from the .md
> +   shift patterns, but for now word_mode should be universally OK.  */
> +
> +inline scalar_int_mode
> +get_shift_amount_mode (machine_mode)
> +{
> +  return word_mode;
> +}
> +
>  #ifdef GCC_TM_H
>
>  #ifndef CUMULATIVE_ARGS_MAGIC
> Index: gcc/emit-rtl.h
> ===
> --- gcc/emit-rtl.h  2017-10-23 11:47:06.643477568 +0100
> +++ gcc/emit-rtl.h  2017-10-23 11:47:11.274393237 +0100
> @@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
>  extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
>  extern void adjust_reg_mode (rtx, machine_mode);
>  extern int mem_expr_equal_p (const_tree, const_tree);
> +extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
>
>  extern bool need_atomic_barrier_p (enum memmodel, bool);
>
> Index: gcc/emit-rtl.c
> ===
> --- gcc/emit-rtl.c  2017-10-23 11:47:06.643477568 +0100
> +++ gcc/emit-rtl.c  2017-10-23 11:47:11.273428262 +0100
> @@ -6478,6 +6478,15 @@ need_atomic_barrier_p (enum memmodel mod
>  }
>  }
>
> +/* Return a constant shift amount for shifting a value of mode MODE
> +   by VALUE bits.  */
> +
> +rtx
> +gen_int_shift_amount (machine_mode mode, HOST_W

Re: [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:27 PM, Richard Sandiford
 wrote:
> This avoids the double evaluation mentioned in the comments and
> simplifies the change to make MEM_OFFSET variable.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * var-tracking.c (INT_MEM_OFFSET): Replace with...
> (int_mem_offset): ...this new function.
> (var_mem_set, var_mem_delete_and_set, var_mem_delete)
> (find_mem_expr_in_1pdv, dataflow_set_preserve_mem_locs)
> (same_variable_part_p, use_type, add_stores, vt_get_decl_and_offset):
> Update accordingly.
>
> Index: gcc/var-tracking.c
> ===
> --- gcc/var-tracking.c  2017-09-12 14:28:56.401824826 +0100
> +++ gcc/var-tracking.c  2017-10-23 11:47:27.197231712 +0100
> @@ -390,8 +390,15 @@ struct variable
>  /* Pointer to the BB's information specific to variable tracking pass.  */
>  #define VTI(BB) ((variable_tracking_info *) (BB)->aux)
>
> -/* Macro to access MEM_OFFSET as an HOST_WIDE_INT.  Evaluates MEM twice.  */
> -#define INT_MEM_OFFSET(mem) (MEM_OFFSET_KNOWN_P (mem) ? MEM_OFFSET (mem) : 0)
> +/* Return MEM_OFFSET (MEM) as a HOST_WIDE_INT, or 0 if we can't.  */
> +
> +static inline HOST_WIDE_INT
> +int_mem_offset (const_rtx mem)
> +{
> +  if (MEM_OFFSET_KNOWN_P (mem))
> +return MEM_OFFSET (mem);
> +  return 0;
> +}
>
>  #if CHECKING_P && (GCC_VERSION >= 2007)
>
> @@ -2336,7 +2343,7 @@ var_mem_set (dataflow_set *set, rtx loc,
>  rtx set_src)
>  {
>tree decl = MEM_EXPR (loc);
> -  HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
> +  HOST_WIDE_INT offset = int_mem_offset (loc);
>
>var_mem_decl_set (set, loc, initialized,
> dv_from_decl (decl), offset, set_src, INSERT);
> @@ -2354,7 +2361,7 @@ var_mem_delete_and_set (dataflow_set *se
> enum var_init_status initialized, rtx set_src)
>  {
>tree decl = MEM_EXPR (loc);
> -  HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
> +  HOST_WIDE_INT offset = int_mem_offset (loc);
>
>clobber_overlapping_mems (set, loc);
>decl = var_debug_decl (decl);
> @@ -2375,7 +2382,7 @@ var_mem_delete_and_set (dataflow_set *se
>  var_mem_delete (dataflow_set *set, rtx loc, bool clobber)
>  {
>tree decl = MEM_EXPR (loc);
> -  HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
> +  HOST_WIDE_INT offset = int_mem_offset (loc);
>
>clobber_overlapping_mems (set, loc);
>decl = var_debug_decl (decl);
> @@ -4618,7 +4625,7 @@ find_mem_expr_in_1pdv (tree expr, rtx va
>for (node = var->var_part[0].loc_chain; node; node = node->next)
>  if (MEM_P (node->loc)
> && MEM_EXPR (node->loc) == expr
> -   && INT_MEM_OFFSET (node->loc) == 0)
> +   && int_mem_offset (node->loc) == 0)
>{
> where = node;
> break;
> @@ -4683,7 +4690,7 @@ dataflow_set_preserve_mem_locs (variable
>   /* We want to remove dying MEMs that don't refer to DECL.  */
>   if (GET_CODE (loc->loc) == MEM
>   && (MEM_EXPR (loc->loc) != decl
> - || INT_MEM_OFFSET (loc->loc) != 0)
> + || int_mem_offset (loc->loc) != 0)
>   && mem_dies_at_call (loc->loc))
> break;
>   /* We want to move here MEMs that do refer to DECL.  */
> @@ -4727,7 +4734,7 @@ dataflow_set_preserve_mem_locs (variable
>
>   if (GET_CODE (loc->loc) != MEM
>   || (MEM_EXPR (loc->loc) == decl
> - && INT_MEM_OFFSET (loc->loc) == 0)
> + && int_mem_offset (loc->loc) == 0)
>   || !mem_dies_at_call (loc->loc))
> {
>   if (old_loc != loc->loc && emit_notes)
> @@ -5254,7 +5261,7 @@ same_variable_part_p (rtx loc, tree expr
>else if (MEM_P (loc))
>  {
>expr2 = MEM_EXPR (loc);
> -  offset2 = INT_MEM_OFFSET (loc);
> +  offset2 = int_mem_offset (loc);
>  }
>else
>  return false;
> @@ -5522,7 +5529,7 @@ use_type (rtx loc, struct count_use_info
> return MO_CLOBBER;
>else if (target_for_debug_bind (var_debug_decl (expr)))
> return MO_CLOBBER;
> -  else if (track_loc_p (loc, expr, INT_MEM_OFFSET (loc),
> +  else if (track_loc_p (loc, expr, int_mem_offset (loc),
> false, modep, NULL)
>/* Multi-part variables shouldn't refer to one-part
>   variable names such as VALUEs (never happens) or
> @@ -6017,7 +6024,7 @@ add_stores (rtx loc, const_rtx expr, voi
>   rtx xexpr = gen_rtx_SET (loc, src);
>   if (same_variable_part_p (SET_SRC (xexpr),
> MEM_EXPR (loc),
> -   INT_MEM_OFFSET (loc)))
> +   int_mem_offset (loc)))
> mo.type = MO_COPY;
>   else
> mo.type = MO_S

Re: [16/nn] Factor out the mode handling in lower-subreg.c

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:27 PM, Richard Sandiford
 wrote:
> This patch adds a helper routine (interesting_mode_p) to lower-subreg.c,
> to make the decision about whether a mode can be split and, if so,
> calculate the number of bytes and words in the mode.  At present this
> function always returns true; a later patch will add cases in which it
> can return false.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * lower-subreg.c (interesting_mode_p): New function.
> (compute_costs, find_decomposable_subregs, decompose_register)
> (simplify_subreg_concatn, can_decompose_p, resolve_simple_move)
> (resolve_clobber, dump_choices): Use it.
>
> Index: gcc/lower-subreg.c
> ===
> --- gcc/lower-subreg.c  2017-10-23 11:47:11.274393237 +0100
> +++ gcc/lower-subreg.c  2017-10-23 11:47:23.555013148 +0100
> @@ -103,6 +103,18 @@ #define twice_word_mode \
>  #define choices \
>this_target_lower_subreg->x_choices
>
> +/* Return true if MODE is a mode we know how to lower.  When returning true,
> +   store its byte size in *BYTES and its word size in *WORDS.  */
> +
> +static inline bool
> +interesting_mode_p (machine_mode mode, unsigned int *bytes,
> +   unsigned int *words)
> +{
> +  *bytes = GET_MODE_SIZE (mode);
> +  *words = CEIL (*bytes, UNITS_PER_WORD);
> +  return true;
> +}
> +
>  /* RTXes used while computing costs.  */
>  struct cost_rtxes {
>/* Source and target registers.  */
> @@ -199,10 +211,10 @@ compute_costs (bool speed_p, struct cost
>for (i = 0; i < MAX_MACHINE_MODE; i++)
>  {
>machine_mode mode = (machine_mode) i;
> -  int factor = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
> -  if (factor > 1)
> +  unsigned int size, factor;
> +  if (interesting_mode_p (mode, &size, &factor) && factor > 1)
> {
> - int mode_move_cost;
> + unsigned int mode_move_cost;
>
>   PUT_MODE (rtxes->target, mode);
>   PUT_MODE (rtxes->source, mode);
> @@ -469,10 +481,10 @@ find_decomposable_subregs (rtx *loc, enu
>   continue;
> }
>
> - outer_size = GET_MODE_SIZE (GET_MODE (x));
> - inner_size = GET_MODE_SIZE (GET_MODE (inner));
> - outer_words = (outer_size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> - inner_words = (inner_size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> + if (!interesting_mode_p (GET_MODE (x), &outer_size, &outer_words)
> + || !interesting_mode_p (GET_MODE (inner), &inner_size,
> + &inner_words))
> +   continue;
>
>   /* We only try to decompose single word subregs of multi-word
>  registers.  When we find one, we return -1 to avoid iterating
> @@ -507,7 +519,7 @@ find_decomposable_subregs (rtx *loc, enu
> }
>else if (REG_P (x))
> {
> - unsigned int regno;
> + unsigned int regno, size, words;
>
>   /* We will see an outer SUBREG before we see the inner REG, so
>  when we see a plain REG here it means a direct reference to
> @@ -527,7 +539,8 @@ find_decomposable_subregs (rtx *loc, enu
>
>   regno = REGNO (x);
>   if (!HARD_REGISTER_NUM_P (regno)
> - && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD)
> + && interesting_mode_p (GET_MODE (x), &size, &words)
> + && words > 1)
> {
>   switch (*pcmi)
> {
> @@ -567,15 +580,15 @@ find_decomposable_subregs (rtx *loc, enu
>  decompose_register (unsigned int regno)
>  {
>rtx reg;
> -  unsigned int words, i;
> +  unsigned int size, words, i;
>rtvec v;
>
>reg = regno_reg_rtx[regno];
>
>regno_reg_rtx[regno] = NULL_RTX;
>
> -  words = GET_MODE_SIZE (GET_MODE (reg));
> -  words = (words + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> +  if (!interesting_mode_p (GET_MODE (reg), &size, &words))
> +gcc_unreachable ();
>
>v = rtvec_alloc (words);
>for (i = 0; i < words; ++i)
> @@ -599,25 +612,29 @@ decompose_register (unsigned int regno)
>  simplify_subreg_concatn (machine_mode outermode, rtx op,
>  unsigned int byte)
>  {
> -  unsigned int inner_size;
> +  unsigned int outer_size, outer_words, inner_size, inner_words;
>machine_mode innermode, partmode;
>rtx part;
>unsigned int final_offset;
>
> +  innermode = GET_MODE (op);
> +  if (!interesting_mode_p (outermode, &outer_size, &outer_words)
> +  || !interesting_mode_p (innermode, &inner_size, &inner_words))
> +gcc_unreachable ();
> +
>gcc_assert (GET_CODE (op) == CONCATN);
> -  gcc_assert (byte % GET_MODE_SIZE (outermode) == 0);
> +  gcc_assert (byte % outer_size == 0);
>
> -  innermode = GET_MODE (op);
> -  gcc_assert (byte < GET_MODE_SIZE (innermode));
> -  if (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode))
> +  

Re: [15/nn] Use more specific hash functions in rtlhash.c

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:26 PM, Richard Sandiford
 wrote:
> Avoid using add_object when we have more specific routines available.

Ok.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * rtlhash.c (add_rtx): Use add_hwi for 'w' and add_int for 'i'.
>
> Index: gcc/rtlhash.c
> ===
> --- gcc/rtlhash.c   2017-02-23 19:54:03.0 +
> +++ gcc/rtlhash.c   2017-10-23 11:47:20.120201389 +0100
> @@ -77,11 +77,11 @@ add_rtx (const_rtx x, hash &hstate)
>  switch (fmt[i])
>{
>case 'w':
> -   hstate.add_object (XWINT (x, i));
> +   hstate.add_hwi (XWINT (x, i));
> break;
>case 'n':
>case 'i':
> -   hstate.add_object (XINT (x, i));
> +   hstate.add_int (XINT (x, i));
> break;
>case 'V':
>case 'E':


Re: [19/nn] Don't treat zero-sized ranges as overlapping

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:29 PM, Richard Sandiford
 wrote:
> Most GCC ranges seem to be represented as an offset and a size (rather
> than a start and inclusive end or start and exclusive end).  The usual
> test for whether X is in a range is of course:
>
>   x >= start && x < start + size
> or:
>   x >= start && x - start < size
>
> which means that an empty range of size 0 contains nothing.  But other
> range tests aren't as obvious.
>
> The usual test for whether one range is contained within another
> range is:
>
>   start1 >= start2 && start1 + size1 <= start2 + size2
>
> while the test for whether two ranges overlap (from ranges_overlap_p) is:
>
>  (start1 >= start2 && start1 < start2 + size2)
>   || (start2 >= start1 && start2 < start1 + size1)
>
> i.e. the ranges overlap if one range contains the start of the other
> range.  This leads to strange results like:
>
>   (start X, size 0) is a subrange of (start X, size 0) but
>   (start X, size 0) does not overlap (start X, size 0)
>
> Similarly:
>
>   (start 4, size 0) is a subrange of (start 2, size 2) but
>   (start 4, size 0) does not overlap (start 2, size 2)
>
> It seems like "X is a subrange of Y" should imply "X overlaps Y".
>
> This becomes harder to ignore with the runtime sizes and offsets
> added for SVE.  The most obvious fix seemed to be to say that
> an empty range does not overlap anything, and is therefore not
> a subrange of anything.
>
> Using the new definition of subranges didn't seem to cause any
> codegen differences in the testsuite.  But there was one change
> with the new definition of overlapping ranges.  strncpy-chk.c has:
>
>   memset (dst, 0, sizeof (dst));
>   if (strncpy (dst, src, 0) != dst || strcmp (dst, ""))
> abort();
>
> The strncpy is detected as a zero-size write, and so with the new
> definition of overlapping ranges, we treat the strncpy as having
> no effect on the strcmp (which is true).  The reaching definition
> is the memset instead.
>
> This patch makes ranges_overlap_p return false for zero-sized
> ranges, even if the other range has an unknown size.

Ok.

Thanks,
Richard.

>
> 2017-10-23  Richard Sandiford  
>
> gcc/
> * tree-ssa-alias.h (ranges_overlap_p): Return false if either
> range is known to be empty.
>
> Index: gcc/tree-ssa-alias.h
> ===
> --- gcc/tree-ssa-alias.h2017-03-28 16:19:22.0 +0100
> +++ gcc/tree-ssa-alias.h2017-10-23 11:47:38.181155696 +0100
> @@ -171,6 +171,8 @@ ranges_overlap_p (HOST_WIDE_INT pos1,
>   HOST_WIDE_INT pos2,
>   unsigned HOST_WIDE_INT size2)
>  {
> +  if (size1 == 0 || size2 == 0)
> +return false;
>if (pos1 >= pos2
>&& (size2 == (unsigned HOST_WIDE_INT)-1
>   || pos1 < (pos2 + (HOST_WIDE_INT) size2)))


Re: [21/nn] Minor vn_reference_lookup_3 tweak

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:30 PM, Richard Sandiford
 wrote:
> The repeated checks for MEM_REF made this code hard to convert to
> poly_ints as-is.  Hopefully the new structure also makes it clearer
> at a glance what the two cases are.
>
>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree-ssa-sccvn.c (vn_reference_lookup_3): Avoid repeated
> checks for MEM_REF.
>
> Index: gcc/tree-ssa-sccvn.c
> ===
> --- gcc/tree-ssa-sccvn.c2017-10-23 11:47:03.852769480 +0100
> +++ gcc/tree-ssa-sccvn.c2017-10-23 11:47:44.596155858 +0100
> @@ -2234,6 +2234,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>   || offset % BITS_PER_UNIT != 0
>   || ref->size % BITS_PER_UNIT != 0)
> return (void *)-1;
> +  at = offset / BITS_PER_UNIT;

can you move this just

>/* Extract a pointer base and an offset for the destination.  */
>lhs = gimple_call_arg (def_stmt, 0);
> @@ -2301,19 +2302,18 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>copy_size = tree_to_uhwi (gimple_call_arg (def_stmt, 2));
>
>/* The bases of the destination and the references have to agree.  */

here? Ok with that change.

Richard.

> -  if ((TREE_CODE (base) != MEM_REF
> -  && !DECL_P (base))
> - || (TREE_CODE (base) == MEM_REF
> - && (TREE_OPERAND (base, 0) != lhs
> - || !tree_fits_uhwi_p (TREE_OPERAND (base, 1
> - || (DECL_P (base)
> - && (TREE_CODE (lhs) != ADDR_EXPR
> - || TREE_OPERAND (lhs, 0) != base)))
> +  if (TREE_CODE (base) == MEM_REF)
> +   {
> + if (TREE_OPERAND (base, 0) != lhs
> + || !tree_fits_uhwi_p (TREE_OPERAND (base, 1)))
> +   return (void *) -1;
> + at += tree_to_uhwi (TREE_OPERAND (base, 1));
> +   }
> +  else if (!DECL_P (base)
> +  || TREE_CODE (lhs) != ADDR_EXPR
> +  || TREE_OPERAND (lhs, 0) != base)
> return (void *)-1;
>
> -  at = offset / BITS_PER_UNIT;
> -  if (TREE_CODE (base) == MEM_REF)
> -   at += tree_to_uhwi (TREE_OPERAND (base, 1));
>/* If the access is completely outside of the memcpy destination
>  area there is no aliasing.  */
>if (lhs_offset >= at + maxsize / BITS_PER_UNIT


Re: [22/nn] Make dse.c use offset/width instead of start/end

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:30 PM, Richard Sandiford
 wrote:
> store_info and read_info_type in dse.c represented the ranges as
> start/end, but a lot of the internal code used offset/width instead.
> Using offset/width throughout fits better with the poly_int.h
> range-checking functions.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * dse.c (store_info, read_info_type): Replace begin and end with
> offset and width.
> (print_range): New function.
> (set_all_positions_unneeded, any_positions_needed_p)
> (check_mem_read_rtx, scan_stores, scan_reads, dse_step5): Update
> accordingly.
> (record_store): Likewise.  Optimize the case in which all positions
> are unneeded.
> (get_stored_val): Replace read_begin and read_end with read_offset
> and read_width.
> (replace_read): Update call accordingly.
>
> Index: gcc/dse.c
> ===
> --- gcc/dse.c   2017-10-23 11:47:11.273428262 +0100
> +++ gcc/dse.c   2017-10-23 11:47:48.294155952 +0100
> @@ -243,9 +243,12 @@ struct store_info
>/* Canonized MEM address for use by canon_true_dependence.  */
>rtx mem_addr;
>
> -  /* The offset of the first and byte before the last byte associated
> - with the operation.  */
> -  HOST_WIDE_INT begin, end;
> +  /* The offset of the first byte associated with the operation.  */
> +  HOST_WIDE_INT offset;
> +
> +  /* The number of bytes covered by the operation.  This is always exact
> + and known (rather than -1).  */
> +  HOST_WIDE_INT width;
>
>union
>  {
> @@ -261,7 +264,7 @@ struct store_info
>   bitmap bmap;
>
>   /* Number of set bits (i.e. unneeded bytes) in BITMAP.  If it is
> -equal to END - BEGIN, the whole store is unused.  */
> +equal to WIDTH, the whole store is unused.  */
>   int count;
> } large;
>  } positions_needed;
> @@ -304,10 +307,11 @@ struct read_info_type
>/* The id of the mem group of the base address.  */
>int group_id;
>
> -  /* The offset of the first and byte after the last byte associated
> - with the operation.  If begin == end == 0, the read did not have
> - a constant offset.  */
> -  int begin, end;
> +  /* The offset of the first byte associated with the operation.  */
> +  HOST_WIDE_INT offset;
> +
> +  /* The number of bytes covered by the operation, or -1 if not known.  */
> +  HOST_WIDE_INT width;
>
>/* The mem being read.  */
>rtx mem;
> @@ -586,6 +590,18 @@ static deferred_change *deferred_change_
>
>  /* The number of bits used in the global bitmaps.  */
>  static unsigned int current_position;
> +
> +/* Print offset range [OFFSET, OFFSET + WIDTH) to FILE.  */
> +
> +static void
> +print_range (FILE *file, poly_int64 offset, poly_int64 width)
> +{
> +  fprintf (file, "[");
> +  print_dec (offset, file, SIGNED);
> +  fprintf (file, "..");
> +  print_dec (offset + width, file, SIGNED);
> +  fprintf (file, ")");
> +}
>
>  
> /*
> Zeroth step.
> @@ -1212,10 +1228,9 @@ set_all_positions_unneeded (store_info *
>  {
>if (__builtin_expect (s_info->is_large, false))
>  {
> -  int pos, end = s_info->end - s_info->begin;
> -  for (pos = 0; pos < end; pos++)
> -   bitmap_set_bit (s_info->positions_needed.large.bmap, pos);
> -  s_info->positions_needed.large.count = end;
> +  bitmap_set_range (s_info->positions_needed.large.bmap,
> +   0, s_info->width);
> +  s_info->positions_needed.large.count = s_info->width;
>  }
>else
>  s_info->positions_needed.small_bitmask = HOST_WIDE_INT_0U;
> @@ -1227,8 +1242,7 @@ set_all_positions_unneeded (store_info *
>  any_positions_needed_p (store_info *s_info)
>  {
>if (__builtin_expect (s_info->is_large, false))
> -return (s_info->positions_needed.large.count
> -   < s_info->end - s_info->begin);
> +return s_info->positions_needed.large.count < s_info->width;
>else
>  return (s_info->positions_needed.small_bitmask != HOST_WIDE_INT_0U);
>  }
> @@ -1355,8 +1369,12 @@ record_store (rtx body, bb_info_t bb_inf
>set_usage_bits (group, offset, width, expr);
>
>if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file, " processing const base store gid=%d[%d..%d)\n",
> -group_id, (int)offset, (int)(offset+width));
> +   {
> + fprintf (dump_file, " processing const base store gid=%d",
> +  group_id);
> + print_range (dump_file, offset, width);
> + fprintf (dump_file, "\n");
> +   }
>  }
>else
>  {
> @@ -1368,8 +1386,11 @@ record_store (rtx body, bb_info_t bb_inf
>group_id = -1;
>
>if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file, " process

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Martin Jambor
Hi,

On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote:
> On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor  wrote:
> > Hi,
> >
> > I'd like to request comments to the patch below which aims to fix PR
> > 80689, which is an instance of a store-to-load forwarding stall on
> > x86_64 CPUs in the Image Magick benchmark, which is responsible for a
> > slow down of up to 9% compared to gcc 6, depending on options and HW
> > used.  (Actually, I have just seen 24% in one specific combination but
> > for various reasons can no longer verify it today.)
> >
> > The revision causing the regression is 237074, which increased the
> > size of the mode for copying aggregates "by pieces" to 128 bits,
> > incurring big stalls when the values being copied are also still being
> > stored in a smaller data type or if the copied values are loaded in a
> > smaller types shortly afterwards.  Such situations happen in Image
> > Magick even across calls, which means that any non-IPA flow-sensitive
> > approach would not detect them.  Therefore, the patch simply changes
> > the way we copy small BLKmode data that are simple combinations of
> > records and arrays (meaning no unions, bit-fields but also character
> > arrays are disallowed) and simply copies them one field and/or element
> > at a time.
> >
> > "Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs
> > (the structure in the benchmark has 32 bytes) but is subject to change
> > after more benchmarking and is actually zero - meaning element copying
> > never happens - on other architectures.  I believe that any
> > architecture with a store buffer can benefit but it's probably better
> > to leave it to their maintainers to find a different default value.  I
> > am not sure this is how such HW-dependant decisions should be done and
> > is the primary reason, why I am sending this RFC first.
> >
> > I have decided to implement this change at the expansion level because
> > at that point the type information is still readily available and at
> > the same time we can also handle various implicit copies, for example
> > those passing a parameter.  I found I could re-use some bits and
> > pieces of tree-SRA and so I did, creating tree-sra.h header file in
> > the process.
> >
> > I am fully aware that in the final patch the new parameter, or indeed
> > any new parameters, need to be documented.  I have skipped that
> > intentionally now and will write the documentation if feedback here is
> > generally good.
> >
> > I have bootstrapped and tested this patch on x86_64-linux, with
> > different values of the parameter and only found problems with
> > unreasonably high values leading to OOM.  I have done the same with a
> > previous version of the patch which was equivalent to the limit being
> > 64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran
> > into failures of tests which assumed that structure padding was copied
> > in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also
> > for example gcc.dg/vmx/varargs-4.c).
> >
> > The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and
> > 8% at -O2 and -Ofast compilation levels respectively on one particular
> > new AMD CPU and by 6% and 3% on one particular old Intel machine.
> >
> > Thanks in advance for any comments,
> 
> I wonder if you can at the place you choose to hook this in elide any
> copying of padding between fields.
> 
> I'd rather have hooked such "high level" optimization in
> expand_assignment where you can be reasonably sure you're seeing an
> actual source-level construct.

I have discussed this with Honza and we eventually decided to make the
elememnt-wise copy an alternative to emit_block_move (which uses the
larger mode for moving since GCC 7) exactly so that we handle not only
source-level assignments but also passing parameters by value and
other situations.

> 
> 35 bytes seems to be much - what is the code-size impact?

I will find out and report on that.  I need at least 32 bytes (four
long ints) to fix imagemagick, where the problematic structure is:

  typedef struct _RectangleInfo
  {
size_t
  width,
  height;
  
ssize_t
  x,
  y;
  } RectangleInfo;

...so 8 longs, no padding.  Since any aggregate having between 33-35
bytes needs to consist of smaller fields/elements, it seemed
reasonable to also copy them element-wise.

Nevertheless, I still intend to experiment with the limit, I sent out
this RFC exactly so that I don't spend a lot of time benchmarking
something that is eventually not deemed acceptable on principle.

> 
> IIRC the reason this may be slow isn't loading in smaller types than stored
> before by the copy - the store buffers can handle this reasonably well.  It's
> solely when previous smaller stores are
> 
>   a1) not mergeabe in the store buffer
>   a2) not merged because earlier stores are already committed
> 
> and
> 
>   b) loaded afterwards as a type that would access multiple store buffe

Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-26 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>  wrote:
>> This patch adds a POD version of fixed_size_mode.  The only current use
>> is for storing the __builtin_apply and __builtin_result register modes,
>> which were made fixed_size_modes by the previous patch.
>
> Bah - can we update our host compiler to C++11/14 please ...?
> (maybe requiring that build with GCC 4.8 as host compiler works,
> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).

That'd be great :-)  It would avoid all the poly_int_pod stuff too,
and allow some clean-up of wide-int.h.

Thanks for the reviews,
Richard


>
> Ok.
>
> Thanks,
> Richard.
>
>>
>> 2017-10-23  Richard Sandiford  
>> Alan Hayward  
>> David Sherwood  
>>
>> gcc/
>> * coretypes.h (fixed_size_mode): Declare.
>> (fixed_size_mode_pod): New typedef.
>> * builtins.h (target_builtins::x_apply_args_mode)
>> (target_builtins::x_apply_result_mode): Change type to
>> fixed_size_mode_pod.
>> * builtins.c (apply_args_size, apply_result_size, result_vector)
>> (expand_builtin_apply_args_1, expand_builtin_apply)
>> (expand_builtin_return): Update accordingly.
>>
>> Index: gcc/coretypes.h
>> ===
>> --- gcc/coretypes.h 2017-09-11 17:10:58.656085547 +0100
>> +++ gcc/coretypes.h 2017-10-23 11:42:57.592545063 +0100
>> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
>>  class scalar_int_mode;
>>  class scalar_float_mode;
>>  class complex_mode;
>> +class fixed_size_mode;
>>  template class opt_mode;
>>  typedef opt_mode opt_scalar_mode;
>>  typedef opt_mode opt_scalar_int_mode;
>> @@ -66,6 +67,7 @@ typedef opt_mode opt_
>>  template class pod_mode;
>>  typedef pod_mode scalar_mode_pod;
>>  typedef pod_mode scalar_int_mode_pod;
>> +typedef pod_mode fixed_size_mode_pod;
>>
>>  /* Subclasses of rtx_def, using indentation to show the class
>> hierarchy, along with the relevant invariant.
>> Index: gcc/builtins.h
>> ===
>> --- gcc/builtins.h  2017-08-30 12:18:46.602740973 +0100
>> +++ gcc/builtins.h  2017-10-23 11:42:57.592545063 +0100
>> @@ -29,14 +29,14 @@ struct target_builtins {
>>   the register is not used for calling a function.  If the machine
>>   has register windows, this gives only the outbound registers.
>>   INCOMING_REGNO gives the corresponding inbound register.  */
>> -  machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>> +  fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>
>>/* For each register that may be used for returning values, this gives
>>   a mode used to copy the register's value.  VOIDmode indicates the
>>   register is not used for returning values.  If the machine has
>>   register windows, this gives only the outbound registers.
>>   INCOMING_REGNO gives the corresponding inbound register.  */
>> -  machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>> +  fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>  };
>>
>>  extern struct target_builtins default_target_builtins;
>> Index: gcc/builtins.c
>> ===
>> --- gcc/builtins.c  2017-10-23 11:41:23.140260335 +0100
>> +++ gcc/builtins.c  2017-10-23 11:42:57.592545063 +0100
>> @@ -1358,7 +1358,6 @@ apply_args_size (void)
>>static int size = -1;
>>int align;
>>unsigned int regno;
>> -  machine_mode mode;
>>
>>/* The values computed by this function never change.  */
>>if (size < 0)
>> @@ -1374,7 +1373,7 @@ apply_args_size (void)
>>for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> if (FUNCTION_ARG_REGNO_P (regno))
>>   {
>> -   mode = targetm.calls.get_raw_arg_mode (regno);
>> +   fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>>
>> gcc_assert (mode != VOIDmode);
>>
>> @@ -1386,7 +1385,7 @@ apply_args_size (void)
>>   }
>> else
>>   {
>> -   apply_args_mode[regno] = VOIDmode;
>> +   apply_args_mode[regno] = as_a  (VOIDmode);
>>   }
>>  }
>>return size;
>> @@ -1400,7 +1399,6 @@ apply_result_size (void)
>>  {
>>static int size = -1;
>>int align, regno;
>> -  machine_mode mode;
>>
>>/* The values computed by this function never change.  */
>>if (size < 0)
>> @@ -1410,7 +1408,7 @@ apply_result_size (void)
>>for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> if (targetm.calls.function_value_regno_p (regno))
>>   {
>> -   mode = targetm.calls.get_raw_result_mode (regno);
>> +   fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
>>
>> gcc_assert (mode != VOIDmode);
>>
>> @@ -1421,7 +1419,7 @@ apply_result_size (void)
>> apply_result_mode[regno]

Re: [PATCH] Fix test-suite fallout of default -Wreturn-type.

2017-10-26 Thread Martin Liška
On 10/24/2017 04:39 PM, Jason Merrill wrote:
> On 10/18/2017 08:48 AM, Martin Liška wrote:
>> This is second patch that addresses test-suite fallout. All these tests fail 
>> because -Wreturn-type is
>> now on by default.
> 
>> +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C
>> -constexpr T g(T t) { return f(t); } // { dg-error "f.int" }
>> +constexpr T g(T t) { return f(t); } // { dg-error "f.int" "" { target 
>> c++14_only } }
> 
>> +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-neg3.C
>> -  constexpr int bar() { return a.foo(); } // { dg-error "foo" }
>> +  constexpr int bar() { return a.foo(); } // { dg-error "foo" "" { target 
>> c++14_only } }
> 
> Why are these changes needed?  They aren't "Return a value for functions with 
> non-void return type, or change type to void, or add -Wno-return-type for 
> test."
> 
> The rest of the patch is OK.
> 
> Jason

Hi.

Sorry, I forgot to describe this change. With -std=c++11 we do:

#0  massage_constexpr_body (fun=0x76955500, body=0x76813eb8) at 
../../gcc/cp/constexpr.c:708
#1  0x0087700b in explain_invalid_constexpr_fn (fun=0x76955500) at 
../../gcc/cp/constexpr.c:896
#2  0x008799dc in cxx_eval_call_expression (ctx=0x7fffd150, 
t=0x76820118, lval=false, non_constant_p=0x7fffd1cf, 
overflow_p=0x7fffd1ce) at ../../gcc/cp/constexpr.c:1558
#3  0x008843fe in cxx_eval_constant_expression (ctx=0x7fffd150, 
t=0x76820118, lval=false, non_constant_p=0x7fffd1cf, 
overflow_p=0x7fffd1ce, jump_target=0x0) at ../../gcc/cp/constexpr.c:4069

static tree
massage_constexpr_body (tree fun, tree body)
{
  if (DECL_CONSTRUCTOR_P (fun))
body = build_constexpr_constructor_member_initializers
  (DECL_CONTEXT (fun), body);
  else if (cxx_dialect < cxx14)
{
  if (TREE_CODE (body) == EH_SPEC_BLOCK)
body = EH_SPEC_STMTS (body);
  if (TREE_CODE (body) == MUST_NOT_THROW_EXPR)
body = TREE_OPERAND (body, 0);
  body = constexpr_fn_retval (body);
}
  return body;
}

and we end up with error_mark_node and thus potential_constant_expression_1 
does bail out.
That's why we don't print the later error with -std=c++11.

What should we do with that?
Thanks,
Martin


Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
 wrote:
> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
> tree code equivalents of the VEC_SERIES rtx code.  VEC_SERIES_EXPR
> is for non-constant inputs and is a normal tcc_binary.  VEC_SERIES_CST
> is a tcc_constant.
>
> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
> vectors.  This avoids the need to handle combinations of VECTOR_CST
> and VEC_SERIES_CST.

Similar to the other patch can you document and verify that VEC_SERIES_CST
is only used on variable length vectors?

Ok with that change.

Thanks,
Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
> * doc/md.texi (vec_series@var{m}): Document.
> * tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
> * tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
> codes.
> (VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
> (build_vec_series_cst, build_vec_series): Declare.
> * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
> (add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
> (build_vec_series_cst, build_vec_series): New functions.
> * cfgexpand.c (expand_debug_expr): Handle the new codes.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
> * gimple-expr.h (is_gimple_constant): Likewise.
> * gimplify.c (gimplify_expr): Likewise.
> * graphite-scop-detection.c (scan_tree_for_params): Likewise.
> * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
> (func_checker::compare_operand): Likewise.
> * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
> * print-tree.c (print_node): Likewise.
> * tree-ssa-loop.c (for_each_index): Likewise.
> * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
> (ao_ref_init_from_vn_reference): Likewise.
> * varasm.c (const_hash_1, compare_constant): Likewise.
> * fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
> (fold_checksum_tree): Likewise.
> (vec_series_equivalent_p): New function.
> (const_binop): Use it.  Fold VEC_SERIES_EXPRs of constants.
> * expmed.c (make_tree): Handle VEC_SERIES.
> * gimple-pretty-print.c (dump_binary_rhs): Likewise.
> * tree-inline.c (estimate_operator_cost): Likewise.
> * expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
> (expand_expr_real_2): Handle VEC_SERIES_EXPR.
> (expand_expr_real_1): Handle VEC_SERIES_CST.
> * optabs.def (vec_series_optab): New optab.
> * optabs.h (expand_vec_series_expr): Declare.
> * optabs.c (expand_vec_series_expr): New function.
> * optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
> * tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
> (verify_gimple_assign_single): Handle VEC_SERIES_CST.
> * tree-vect-generic.c (expand_vector_operations_1): Check that
> the operands also have vector type.
>
> Index: gcc/doc/generic.texi
> ===
> --- gcc/doc/generic.texi2017-10-23 11:41:51.760448406 +0100
> +++ gcc/doc/generic.texi2017-10-23 11:42:34.910720660 +0100
> @@ -1037,6 +1037,7 @@ As this example indicates, the operands
>  @tindex COMPLEX_CST
>  @tindex VECTOR_CST
>  @tindex VEC_DUPLICATE_CST
> +@tindex VEC_SERIES_CST
>  @tindex STRING_CST
>  @findex TREE_STRING_LENGTH
>  @findex TREE_STRING_POINTER
> @@ -1098,6 +1099,16 @@ instead.  The scalar element value is gi
>  @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
>  element of a @code{VECTOR_CST}.
>
> +@item VEC_SERIES_CST
> +These nodes represent a vector constant in which element @var{i}
> +has the value @samp{@var{base} + @var{i} * @var{step}}, for some
> +constant @var{base} and @var{step}.  The value of @var{base} is
> +given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
> +given by @code{VEC_SERIES_CST_STEP}.
> +
> +These nodes are restricted to integral types, in order to avoid
> +specifying the rounding behavior for floating-point types.
> +
>  @item STRING_CST
>  These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
>  returns the length of the string, as an @code{int}.  The
> @@ -1702,6 +1713,7 @@ a value from @code{enum annot_expr_kind}
>  @node Vectors
>  @subsection Vectors
>  @tindex VEC_DUPLICATE_EXPR
> +@tindex VEC_SERIES_EXPR
>  @tindex VEC_LSHIFT_EXPR
>  @tindex VEC_RSHIFT_EXPR
>  @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1721,6 +1733,14 @@ a value from

Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:23 PM, Richard Biener
 wrote:
> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
>  wrote:
>> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
>> tree code equivalents of the VEC_SERIES rtx code.  VEC_SERIES_EXPR
>> is for non-constant inputs and is a normal tcc_binary.  VEC_SERIES_CST
>> is a tcc_constant.
>>
>> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
>> vectors.  This avoids the need to handle combinations of VECTOR_CST
>> and VEC_SERIES_CST.
>
> Similar to the other patch can you document and verify that VEC_SERIES_CST
> is only used on variable length vectors?
>
> Ok with that change.

Btw, did you think of merging VEC_DUPLICATE_CST with VEC_SERIES_CST
via setting step == 0?  I think you can do {1, 1, 1, 1... } + {1, 2,3
,4,5 } constant
folding but you don't implement that.  Propagation can also turn
VEC_SERIES_EXPR into VEC_SERIES_CST and VEC_DUPLICATE_EXPR
into VEC_DUPLICATE_CST (didn't see the former, don't remember the latter).

Richard.

> Thanks,
> Richard.
>
>>
>> 2017-10-23  Richard Sandiford  
>> Alan Hayward  
>> David Sherwood  
>>
>> gcc/
>> * doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
>> * doc/md.texi (vec_series@var{m}): Document.
>> * tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
>> * tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
>> codes.
>> (VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
>> (build_vec_series_cst, build_vec_series): Declare.
>> * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
>> (add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
>> (build_vec_series_cst, build_vec_series): New functions.
>> * cfgexpand.c (expand_debug_expr): Handle the new codes.
>> * tree-pretty-print.c (dump_generic_node): Likewise.
>> * dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
>> * gimple-expr.h (is_gimple_constant): Likewise.
>> * gimplify.c (gimplify_expr): Likewise.
>> * graphite-scop-detection.c (scan_tree_for_params): Likewise.
>> * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
>> (func_checker::compare_operand): Likewise.
>> * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
>> * print-tree.c (print_node): Likewise.
>> * tree-ssa-loop.c (for_each_index): Likewise.
>> * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
>> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
>> (ao_ref_init_from_vn_reference): Likewise.
>> * varasm.c (const_hash_1, compare_constant): Likewise.
>> * fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
>> (fold_checksum_tree): Likewise.
>> (vec_series_equivalent_p): New function.
>> (const_binop): Use it.  Fold VEC_SERIES_EXPRs of constants.
>> * expmed.c (make_tree): Handle VEC_SERIES.
>> * gimple-pretty-print.c (dump_binary_rhs): Likewise.
>> * tree-inline.c (estimate_operator_cost): Likewise.
>> * expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
>> (expand_expr_real_2): Handle VEC_SERIES_EXPR.
>> (expand_expr_real_1): Handle VEC_SERIES_CST.
>> * optabs.def (vec_series_optab): New optab.
>> * optabs.h (expand_vec_series_expr): Declare.
>> * optabs.c (expand_vec_series_expr): New function.
>> * optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
>> * tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
>> (verify_gimple_assign_single): Handle VEC_SERIES_CST.
>> * tree-vect-generic.c (expand_vector_operations_1): Check that
>> the operands also have vector type.
>>
>> Index: gcc/doc/generic.texi
>> ===
>> --- gcc/doc/generic.texi2017-10-23 11:41:51.760448406 +0100
>> +++ gcc/doc/generic.texi2017-10-23 11:42:34.910720660 +0100
>> @@ -1037,6 +1037,7 @@ As this example indicates, the operands
>>  @tindex COMPLEX_CST
>>  @tindex VECTOR_CST
>>  @tindex VEC_DUPLICATE_CST
>> +@tindex VEC_SERIES_CST
>>  @tindex STRING_CST
>>  @findex TREE_STRING_LENGTH
>>  @findex TREE_STRING_POINTER
>> @@ -1098,6 +1099,16 @@ instead.  The scalar element value is gi
>>  @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
>>  element of a @code{VECTOR_CST}.
>>
>> +@item VEC_SERIES_CST
>> +These nodes represent a vector constant in which element @var{i}
>> +has the value @samp{@var{base} + @var{i} * @var{step}}, for some
>> +constant @var{base} and @var{step}.  The value of @var{base} is
>> +given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
>> +given by @code{VEC_SERIES_CST_STEP}.
>> +
>> +These nodes are restricted to integral types, in ord

Re: [14/nn] Add helpers for shift count modes

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
 wrote:
> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>  wrote:
>> This patch adds a stub helper routine to provide the mode
>> of a scalar shift amount, given the mode of the values
>> being shifted.
>>
>> One long-standing problem has been to decide what this mode
>> should be for arbitrary rtxes (as opposed to those directly
>> tied to a target pattern).  Is it the mode of the shifted
>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>> the corresponding target pattern says?  (In which case what
>> should the mode be when the target doesn't have a pattern?)
>>
>> For now the patch picks word_mode, which should be safe on
>> all targets but could perhaps become suboptimal if the helper
>> routine is used more often than it is in this patch.  As it
>> stands the patch does not change the generated code.
>>
>> The patch also adds a helper function that constructs rtxes
>> for constant shift amounts, again given the mode of the value
>> being shifted.  As well as helping with the SVE patches, this
>> is one step towards allowing CONST_INTs to have a real mode.
>
> I think gen_shift_amount_mode is flawed and while encapsulating
> constant shift amount RTX generation into a gen_int_shift_amount
> looks good to me I'd rather have that ??? in this function (and
> I'd use the mode of the RTX shifted, not word_mode...).
>
> In the end it's up to insn recognizing to convert the op to the
> expected mode and for generic RTL it's us that should decide
> on the mode -- on GENERIC the shift amount has to be an
> integer so why not simply use a mode that is large enough to
> make the constant fit?
>
> Just throwing in some comments here, RTL isn't my primary
> expertise.

To add a little bit - shift amounts is maybe the only(?) place
where a modeless CONST_INT makes sense!  So "fixing"
that first sounds backwards.

Richard.

> Richard.
>
>>
>> 2017-10-23  Richard Sandiford  
>> Alan Hayward  
>> David Sherwood  
>>
>> gcc/
>> * target.h (get_shift_amount_mode): New function.
>> * emit-rtl.h (gen_int_shift_amount): Declare.
>> * emit-rtl.c (gen_int_shift_amount): New function.
>> * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
>> instead of GEN_INT.
>> * calls.c (shift_return_value): Likewise.
>> * cse.c (fold_rtx): Likewise.
>> * dse.c (find_shift_sequence): Likewise.
>> * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
>> (expand_shift, expand_smod_pow2): Likewise.
>> * lower-subreg.c (shift_cost): Likewise.
>> * simplify-rtx.c (simplify_unary_operation_1): Likewise.
>> (simplify_binary_operation_1): Likewise.
>> * combine.c (try_combine, find_split_point, force_int_to_mode)
>> (simplify_shift_const_1, simplify_shift_const): Likewise.
>> (change_zero_ext): Likewise.  Use simplify_gen_binary.
>> * optabs.c (expand_superword_shift, expand_doubleword_mult)
>> (expand_unop): Use gen_int_shift_amount instead of GEN_INT.
>> (expand_binop): Likewise.  Use get_shift_amount_mode instead
>> of word_mode as the mode of a CONST_INT shift amount.
>> (shift_amt_for_vec_perm_mask): Add a machine_mode argument.
>> Use gen_int_shift_amount instead of GEN_INT.
>> (expand_vec_perm): Update caller accordingly.  Use
>> gen_int_shift_amount instead of GEN_INT.
>>
>> Index: gcc/target.h
>> ===
>> --- gcc/target.h2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/target.h2017-10-23 11:47:11.277288162 +0100
>> @@ -209,6 +209,17 @@ #define HOOKSTRUCT(FRAGMENT) FRAGMENT
>>
>>  extern struct gcc_target targetm;
>>
>> +/* Return the mode that should be used to hold a scalar shift amount
>> +   when shifting values of the given mode.  */
>> +/* ??? This could in principle be generated automatically from the .md
>> +   shift patterns, but for now word_mode should be universally OK.  */
>> +
>> +inline scalar_int_mode
>> +get_shift_amount_mode (machine_mode)
>> +{
>> +  return word_mode;
>> +}
>> +
>>  #ifdef GCC_TM_H
>>
>>  #ifndef CUMULATIVE_ARGS_MAGIC
>> Index: gcc/emit-rtl.h
>> ===
>> --- gcc/emit-rtl.h  2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/emit-rtl.h  2017-10-23 11:47:11.274393237 +0100
>> @@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
>>  extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
>>  extern void adjust_reg_mode (rtx, machine_mode);
>>  extern int mem_expr_equal_p (const_tree, const_tree);
>> +extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
>>
>>  extern bool need_atomic_barrier_p (enum memmodel, bool);
>>
>> Index: gcc/emit-rtl.c
>> ===
>> --- gcc/emit-rtl.c  2017-

Re: [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:28 PM, Richard Sandiford
 wrote:
> This patch avoids some calculations of the form:
>
>   GET_MODE_SIZE (vector_mode) / GET_MODE_SIZE (element_mode)
>
> in simplify-rtx.c.  If we're dealing with CONST_VECTORs, it's better
> to use CONST_VECTOR_NUNITS, since that remains constant even after the
> SVE patches.  In other cases we can get the number from GET_MODE_NUNITS.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * simplify-rtx.c (simplify_const_unary_operation): Use GET_MODE_NUNITS
> and CONST_VECTOR_NUNITS instead of computing the number of units from
> the byte sizes of the vector and element.
> (simplify_binary_operation_1): Likewise.
> (simplify_const_binary_operation): Likewise.
> (simplify_ternary_operation): Likewise.
>
> Index: gcc/simplify-rtx.c
> ===
> --- gcc/simplify-rtx.c  2017-10-23 11:47:11.277288162 +0100
> +++ gcc/simplify-rtx.c  2017-10-23 11:47:32.868935554 +0100
> @@ -1752,18 +1752,12 @@ simplify_const_unary_operation (enum rtx
> return gen_const_vec_duplicate (mode, op);
>if (GET_CODE (op) == CONST_VECTOR)
> {
> - int elt_size = GET_MODE_UNIT_SIZE (mode);
> -  unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
> - rtvec v = rtvec_alloc (n_elts);
> - unsigned int i;
> -
> - machine_mode inmode = GET_MODE (op);
> - int in_elt_size = GET_MODE_UNIT_SIZE (inmode);
> - unsigned in_n_elts = (GET_MODE_SIZE (inmode) / in_elt_size);
> -
> + unsigned int n_elts = GET_MODE_NUNITS (mode);
> + unsigned int in_n_elts = CONST_VECTOR_NUNITS (op);
>   gcc_assert (in_n_elts < n_elts);
>   gcc_assert ((n_elts % in_n_elts) == 0);
> - for (i = 0; i < n_elts; i++)
> + rtvec v = rtvec_alloc (n_elts);
> + for (unsigned i = 0; i < n_elts; i++)
> RTVEC_ELT (v, i) = CONST_VECTOR_ELT (op, i % in_n_elts);
>   return gen_rtx_CONST_VECTOR (mode, v);
> }
> @@ -3608,9 +3602,7 @@ simplify_binary_operation_1 (enum rtx_co
>   rtx op0 = XEXP (trueop0, 0);
>   rtx op1 = XEXP (trueop0, 1);
>
> - machine_mode opmode = GET_MODE (op0);
> - int elt_size = GET_MODE_UNIT_SIZE (opmode);
> - int n_elts = GET_MODE_SIZE (opmode) / elt_size;
> + int n_elts = GET_MODE_NUNITS (GET_MODE (op0));
>
>   int i = INTVAL (XVECEXP (trueop1, 0, 0));
>   int elem;
> @@ -3637,21 +3629,8 @@ simplify_binary_operation_1 (enum rtx_co
>   mode01 = GET_MODE (op01);
>
>   /* Find out number of elements of each operand.  */
> - if (VECTOR_MODE_P (mode00))
> -   {
> - elt_size = GET_MODE_UNIT_SIZE (mode00);
> - n_elts00 = GET_MODE_SIZE (mode00) / elt_size;
> -   }
> - else
> -   n_elts00 = 1;
> -
> - if (VECTOR_MODE_P (mode01))
> -   {
> - elt_size = GET_MODE_UNIT_SIZE (mode01);
> - n_elts01 = GET_MODE_SIZE (mode01) / elt_size;
> -   }
> - else
> -   n_elts01 = 1;
> + n_elts00 = GET_MODE_NUNITS (mode00);
> + n_elts01 = GET_MODE_NUNITS (mode01);
>
>   gcc_assert (n_elts == n_elts00 + n_elts01);
>
> @@ -3771,9 +3750,8 @@ simplify_binary_operation_1 (enum rtx_co
>   rtx subop1 = XEXP (trueop0, 1);
>   machine_mode mode0 = GET_MODE (subop0);
>   machine_mode mode1 = GET_MODE (subop1);
> - int li = GET_MODE_UNIT_SIZE (mode0);
> - int l0 = GET_MODE_SIZE (mode0) / li;
> - int l1 = GET_MODE_SIZE (mode1) / li;
> + int l0 = GET_MODE_NUNITS (mode0);
> + int l1 = GET_MODE_NUNITS (mode1);
>   int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
>   if (i0 == 0 && !side_effects_p (op1) && mode == mode0)
> {
> @@ -3931,14 +3909,10 @@ simplify_binary_operation_1 (enum rtx_co
> || CONST_SCALAR_INT_P (trueop1)
> || CONST_DOUBLE_AS_FLOAT_P (trueop1)))
>   {
> -   int elt_size = GET_MODE_UNIT_SIZE (mode);
> -   unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
> +   unsigned n_elts = GET_MODE_NUNITS (mode);
> +   unsigned in_n_elts = GET_MODE_NUNITS (op0_mode);
> rtvec v = rtvec_alloc (n_elts);
> unsigned int i;
> -   unsigned in_n_elts = 1;
> -
> -   if (VECTOR_MODE_P (op0_mode))
> - in_n_elts = (GET_MODE_SIZE (op0_mode) / elt_size);
> for (i = 0; i < n_elts; i++)
>   {
> if (i < in_n_elts)
> @@ -4026,16 +4

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor  wrote:
> Hi,
>
> On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote:
>> On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor  wrote:
>> > Hi,
>> >
>> > I'd like to request comments to the patch below which aims to fix PR
>> > 80689, which is an instance of a store-to-load forwarding stall on
>> > x86_64 CPUs in the Image Magick benchmark, which is responsible for a
>> > slow down of up to 9% compared to gcc 6, depending on options and HW
>> > used.  (Actually, I have just seen 24% in one specific combination but
>> > for various reasons can no longer verify it today.)
>> >
>> > The revision causing the regression is 237074, which increased the
>> > size of the mode for copying aggregates "by pieces" to 128 bits,
>> > incurring big stalls when the values being copied are also still being
>> > stored in a smaller data type or if the copied values are loaded in a
>> > smaller types shortly afterwards.  Such situations happen in Image
>> > Magick even across calls, which means that any non-IPA flow-sensitive
>> > approach would not detect them.  Therefore, the patch simply changes
>> > the way we copy small BLKmode data that are simple combinations of
>> > records and arrays (meaning no unions, bit-fields but also character
>> > arrays are disallowed) and simply copies them one field and/or element
>> > at a time.
>> >
>> > "Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs
>> > (the structure in the benchmark has 32 bytes) but is subject to change
>> > after more benchmarking and is actually zero - meaning element copying
>> > never happens - on other architectures.  I believe that any
>> > architecture with a store buffer can benefit but it's probably better
>> > to leave it to their maintainers to find a different default value.  I
>> > am not sure this is how such HW-dependant decisions should be done and
>> > is the primary reason, why I am sending this RFC first.
>> >
>> > I have decided to implement this change at the expansion level because
>> > at that point the type information is still readily available and at
>> > the same time we can also handle various implicit copies, for example
>> > those passing a parameter.  I found I could re-use some bits and
>> > pieces of tree-SRA and so I did, creating tree-sra.h header file in
>> > the process.
>> >
>> > I am fully aware that in the final patch the new parameter, or indeed
>> > any new parameters, need to be documented.  I have skipped that
>> > intentionally now and will write the documentation if feedback here is
>> > generally good.
>> >
>> > I have bootstrapped and tested this patch on x86_64-linux, with
>> > different values of the parameter and only found problems with
>> > unreasonably high values leading to OOM.  I have done the same with a
>> > previous version of the patch which was equivalent to the limit being
>> > 64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran
>> > into failures of tests which assumed that structure padding was copied
>> > in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also
>> > for example gcc.dg/vmx/varargs-4.c).
>> >
>> > The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and
>> > 8% at -O2 and -Ofast compilation levels respectively on one particular
>> > new AMD CPU and by 6% and 3% on one particular old Intel machine.
>> >
>> > Thanks in advance for any comments,
>>
>> I wonder if you can at the place you choose to hook this in elide any
>> copying of padding between fields.
>>
>> I'd rather have hooked such "high level" optimization in
>> expand_assignment where you can be reasonably sure you're seeing an
>> actual source-level construct.
>
> I have discussed this with Honza and we eventually decided to make the
> elememnt-wise copy an alternative to emit_block_move (which uses the
> larger mode for moving since GCC 7) exactly so that we handle not only
> source-level assignments but also passing parameters by value and
> other situations.
>
>>
>> 35 bytes seems to be much - what is the code-size impact?
>
> I will find out and report on that.  I need at least 32 bytes (four
> long ints) to fix imagemagick, where the problematic structure is:
>
>   typedef struct _RectangleInfo
>   {
> size_t
>   width,
>   height;
>
> ssize_t
>   x,
>   y;
>   } RectangleInfo;
>
> ...so 8 longs, no padding.  Since any aggregate having between 33-35
> bytes needs to consist of smaller fields/elements, it seemed
> reasonable to also copy them element-wise.
>
> Nevertheless, I still intend to experiment with the limit, I sent out
> this RFC exactly so that I don't spend a lot of time benchmarking
> something that is eventually not deemed acceptable on principle.

I think the limit should be on the number of generated copies and not
the overall size of the structure...  If the struct were composed of
32 individual chars we wouldn't want to emit 32 loads and 32 stores...

I wonder how 

Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>>  wrote:
>>> This patch adds a POD version of fixed_size_mode.  The only current use
>>> is for storing the __builtin_apply and __builtin_result register modes,
>>> which were made fixed_size_modes by the previous patch.
>>
>> Bah - can we update our host compiler to C++11/14 please ...?
>> (maybe requiring that build with GCC 4.8 as host compiler works,
>> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>
> That'd be great :-)  It would avoid all the poly_int_pod stuff too,
> and allow some clean-up of wide-int.h.

Can you figure what oldest GCC release supports the C++11/14 POD handling
that would be required?

Richard.

> Thanks for the reviews,
> Richard
>
>
>>
>> Ok.
>>
>> Thanks,
>> Richard.
>>
>>>
>>> 2017-10-23  Richard Sandiford  
>>> Alan Hayward  
>>> David Sherwood  
>>>
>>> gcc/
>>> * coretypes.h (fixed_size_mode): Declare.
>>> (fixed_size_mode_pod): New typedef.
>>> * builtins.h (target_builtins::x_apply_args_mode)
>>> (target_builtins::x_apply_result_mode): Change type to
>>> fixed_size_mode_pod.
>>> * builtins.c (apply_args_size, apply_result_size, result_vector)
>>> (expand_builtin_apply_args_1, expand_builtin_apply)
>>> (expand_builtin_return): Update accordingly.
>>>
>>> Index: gcc/coretypes.h
>>> ===
>>> --- gcc/coretypes.h 2017-09-11 17:10:58.656085547 +0100
>>> +++ gcc/coretypes.h 2017-10-23 11:42:57.592545063 +0100
>>> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
>>>  class scalar_int_mode;
>>>  class scalar_float_mode;
>>>  class complex_mode;
>>> +class fixed_size_mode;
>>>  template class opt_mode;
>>>  typedef opt_mode opt_scalar_mode;
>>>  typedef opt_mode opt_scalar_int_mode;
>>> @@ -66,6 +67,7 @@ typedef opt_mode opt_
>>>  template class pod_mode;
>>>  typedef pod_mode scalar_mode_pod;
>>>  typedef pod_mode scalar_int_mode_pod;
>>> +typedef pod_mode fixed_size_mode_pod;
>>>
>>>  /* Subclasses of rtx_def, using indentation to show the class
>>> hierarchy, along with the relevant invariant.
>>> Index: gcc/builtins.h
>>> ===
>>> --- gcc/builtins.h  2017-08-30 12:18:46.602740973 +0100
>>> +++ gcc/builtins.h  2017-10-23 11:42:57.592545063 +0100
>>> @@ -29,14 +29,14 @@ struct target_builtins {
>>>   the register is not used for calling a function.  If the machine
>>>   has register windows, this gives only the outbound registers.
>>>   INCOMING_REGNO gives the corresponding inbound register.  */
>>> -  machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>> +  fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>>
>>>/* For each register that may be used for returning values, this gives
>>>   a mode used to copy the register's value.  VOIDmode indicates the
>>>   register is not used for returning values.  If the machine has
>>>   register windows, this gives only the outbound registers.
>>>   INCOMING_REGNO gives the corresponding inbound register.  */
>>> -  machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>> +  fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>>  };
>>>
>>>  extern struct target_builtins default_target_builtins;
>>> Index: gcc/builtins.c
>>> ===
>>> --- gcc/builtins.c  2017-10-23 11:41:23.140260335 +0100
>>> +++ gcc/builtins.c  2017-10-23 11:42:57.592545063 +0100
>>> @@ -1358,7 +1358,6 @@ apply_args_size (void)
>>>static int size = -1;
>>>int align;
>>>unsigned int regno;
>>> -  machine_mode mode;
>>>
>>>/* The values computed by this function never change.  */
>>>if (size < 0)
>>> @@ -1374,7 +1373,7 @@ apply_args_size (void)
>>>for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> if (FUNCTION_ARG_REGNO_P (regno))
>>>   {
>>> -   mode = targetm.calls.get_raw_arg_mode (regno);
>>> +   fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>>>
>>> gcc_assert (mode != VOIDmode);
>>>
>>> @@ -1386,7 +1385,7 @@ apply_args_size (void)
>>>   }
>>> else
>>>   {
>>> -   apply_args_mode[regno] = VOIDmode;
>>> +   apply_args_mode[regno] = as_a  (VOIDmode);
>>>   }
>>>  }
>>>return size;
>>> @@ -1400,7 +1399,6 @@ apply_result_size (void)
>>>  {
>>>static int size = -1;
>>>int align, regno;
>>> -  machine_mode mode;
>>>
>>>/* The values computed by this function never change.  */
>>>if (size < 0)
>>> @@ -1410,7 +1408,7 @@ apply_result_size (void)
>>>for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> if (targetm.calls.function_value_regno_p (regno))
>>>  

Re: [C++ PATCH] Kill IDENTIFIER_LABEL_VALUE

2017-10-26 Thread Nathan Sidwell

On 10/25/2017 05:36 PM, Nathan Sidwell wrote:
This patch removes 'label_value' from lang_identifier, shrinking it from 
72 to 64 bytes (on 64-bit machine).   We replace this by augmenting the 
already used per-function named_labels hash table.  This is a major win, 
because labels are extremely rare and there are many identifiers.  We 
also shring the binding structure by a pointer, as the shadowed_labels 
list goes away.


I was a little over zealous killing code, but perhaps now this is being 
a little paranoid.  It restore the UID sorting of labels when inserting 
them into the BLOCK chain.  The original comment was confusing, as it 
mentioned code generation and then debug information.  I think this just 
affects the order of debug records, but ICBW.  For any given function, 
the iteration of the hash table should be stable across versions, unless 
the hash table implementation or the IDENTIFIER_HASH_VALUE changes.  But 
may as well be safe.


I also add the N_ translation markup I forgot about yesterday when 
taking strings out of 'inform' calls.


nathan

--
Nathan Sidwell
2017-10-26  Nathan Sidwell  

	* decl.c (sort_labels): Restore function.
	(pop_labels): Sort labels
	(identify_goto): Add translation markup.

Index: decl.c
===
--- decl.c	(revision 254087)
+++ decl.c	(working copy)
@@ -372,6 +372,18 @@ check_label_used (tree label)
 }
 }
 
+/* Helper function to sort named label entries in a vector by DECL_UID.  */
+
+static int
+sort_labels (const void *a, const void *b)
+{
+  tree label1 = *(tree const *) a;
+  tree label2 = *(tree const *) b;
+
+  /* DECL_UIDs can never be equal.  */
+  return DECL_UID (label1) > DECL_UID (label2) ? -1 : +1;
+}
+
 /* At the end of a function, all labels declared within the function
go out of scope.  BLOCK is the top-level block for the
function.  */
@@ -382,6 +394,12 @@ pop_labels (tree block)
   if (!named_labels)
 return;
 
+  /* We need to add the labels to the block chain, so debug
+ information is emitted.  But, we want the order to be stable so
+ need to sort them first.  Otherwise the debug output could be
+ randomly ordered.  I guess it's mostly stable, unless the hash
+ table implementation changes.  */
+  auto_vec labels (named_labels->elements ());
   hash_table::iterator end (named_labels->end ());
   for (hash_table::iterator iter
 	 (named_labels->begin ()); iter != end; ++iter)
@@ -390,18 +408,21 @@ pop_labels (tree block)
 
   gcc_checking_assert (!ent->outer);
   if (ent->label_decl)
-	{
-	  check_label_used (ent->label_decl);
-
-	  /* Put the labels into the "variables" of the top-level block,
-	 so debugger can see them.  */
-	  DECL_CHAIN (ent->label_decl) = BLOCK_VARS (block);
-	  BLOCK_VARS (block) = ent->label_decl;
-	}
+	labels.quick_push (ent->label_decl);
   ggc_free (ent);
 }
-
   named_labels = NULL;
+  labels.qsort (sort_labels);
+
+  while (labels.length ())
+{
+  tree label = labels.pop ();
+
+  DECL_CHAIN (label) = BLOCK_VARS (block);
+  BLOCK_VARS (block) = label;
+
+  check_label_used (label);
+}
 }
 
 /* At the end of a block with local labels, restore the outer definition.  */
@@ -3066,8 +3087,8 @@ identify_goto (tree decl, location_t loc
 {
   bool complained
 = emit_diagnostic (diag_kind, loc, 0,
-		   decl ? "jump to label %qD" : "jump to case label",
-		   decl);
+		   decl ? N_("jump to label %qD")
+		   : N_("jump to case label"), decl);
   if (complained && locus)
 inform (*locus, "  from here");
   return complained;
@@ -3136,32 +3157,32 @@ check_previous_goto_1 (tree decl, cp_bin
 	{
 	case sk_try:
 	  if (!saw_eh)
-	inf = "enters try block";
+	inf = N_("enters try block");
 	  saw_eh = true;
 	  break;
 
 	case sk_catch:
 	  if (!saw_eh)
-	inf = "enters catch block";
+	inf = N_("enters catch block");
 	  saw_eh = true;
 	  break;
 
 	case sk_omp:
 	  if (!saw_omp)
-	inf = "enters OpenMP structured block";
+	inf = N_("enters OpenMP structured block");
 	  saw_omp = true;
 	  break;
 
 	case sk_transaction:
 	  if (!saw_tm)
-	inf = "enters synchronized or atomic statement";
+	inf = N_("enters synchronized or atomic statement");
 	  saw_tm = true;
 	  break;
 
 	case sk_block:
 	  if (!saw_cxif && level_for_constexpr_if (b->level_chain))
 	{
-	  inf = "enters constexpr if statement";
+	  inf = N_("enters constexpr if statement");
 	  loc = EXPR_LOCATION (b->level_chain->this_entity);
 	  saw_cxif = true;
 	}


Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Jan Hubicka
> I think the limit should be on the number of generated copies and not
> the overall size of the structure...  If the struct were composed of
> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
> 
> I wonder how rep; movb; interacts with store to load forwarding?  Is
> that maybe optimized well on some archs?  movb should always
> forward and wasn't the setup cost for small N reasonable on modern
> CPUs?

rep mov is win over loop for blocks over 128bytes on core, for blocks in rage
24-128 on zen.  This is w/o store/load forwarding, but I doubt those provide
a cheap way around.

> 
> It probably depends on the width of the entries in the store buffer,
> if they appear in-order and the alignment of the stores (if they are larger 
> than
> 8 bytes they are surely aligned).  IIRC CPUs had smaller store buffer
> entries than cache line size.
> 
> Given that load bandwith is usually higher than store bandwith it
> might make sense to do the store combining in our copying sequence,
> like for the 8 byte entry case use sth like
> 
>   movq 0(%eax), %xmm0
>   movhps 8(%eax), %xmm0 // or vpinsert
>   mov[au]ps %xmm0, 0%(ebx)
> ...
> 
> thus do two loads per store and perform the stores in wider
> mode?

This may be somewhat faster indeed.  I am not sure if store to load
forwarding will work for the later half when read again by halves.
It would not happen on older CPUs :)

Honza
> 
> As said a general concern was you not copying padding.  If you
> put this into an even more common place you surely will break
> stuff, no?
> 
> Richard.
> 
> >
> > Martin
> >
> >
> >>
> >> Richard.
> >>
> >> > Martin
> >> >
> >> >
> >> > 2017-10-12  Martin Jambor  
> >> >
> >> > PR target/80689
> >> > * tree-sra.h: New file.
> >> > * ipa-prop.h: Moved declaration of build_ref_for_offset to
> >> > tree-sra.h.
> >> > * expr.c: Include params.h and tree-sra.h.
> >> > (emit_move_elementwise): New function.
> >> > (store_expr_with_bounds): Optionally use it.
> >> > * ipa-cp.c: Include tree-sra.h.
> >> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
> >> > * config/i386/i386.c (ix86_option_override_internal): Set
> >> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
> >> > * tree-sra.c: Include tree-sra.h.
> >> > (scalarizable_type_p): Renamed to
> >> > simple_mix_of_records_and_arrays_p, made public, renamed the
> >> > second parameter to allow_char_arrays.
> >> > (extract_min_max_idx_from_array): New function.
> >> > (completely_scalarize): Moved bits of the function to
> >> > extract_min_max_idx_from_array.
> >> >
> >> > testsuite/
> >> > * gcc.target/i386/pr80689-1.c: New test.
> >> > ---
> >> >  gcc/config/i386/i386.c|   4 ++
> >> >  gcc/expr.c| 103 
> >> > --
> >> >  gcc/ipa-cp.c  |   1 +
> >> >  gcc/ipa-prop.h|   4 --
> >> >  gcc/params.def|   6 ++
> >> >  gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 +++
> >> >  gcc/tree-sra.c|  86 
> >> > +++--
> >> >  gcc/tree-sra.h|  33 ++
> >> >  8 files changed, 233 insertions(+), 42 deletions(-)
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
> >> >  create mode 100644 gcc/tree-sra.h
> >> >
> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> >> > index 1ee8351c21f..87f602e7ead 100644
> >> > --- a/gcc/config/i386/i386.c
> >> > +++ b/gcc/config/i386/i386.c
> >> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p,
> >> >  ix86_tune_cost->l2_cache_size,
> >> >  opts->x_param_values,
> >> >  opts_set->x_param_values);
> >> > +  maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
> >> > +35,
> >> > +opts->x_param_values,
> >> > +opts_set->x_param_values);
> >> >
> >> >/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. 
> >> >  */
> >> >if (opts->x_flag_prefetch_loop_arrays < 0
> >> > diff --git a/gcc/expr.c b/gcc/expr.c
> >> > index 134ee731c29..dff24e7f166 100644
> >> > --- a/gcc/expr.c
> >> > +++ b/gcc/expr.c
> >> > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3.  If not see
> >> >  #include "tree-chkp.h"
> >> >  #include "rtl-chkp.h"
> >> >  #include "ccmp.h"
> >> > -
> >> > +#include "params.h"
> >> > +#include "tree-sra.h"
> >> >
> >> >  /* If this is nonzero, we do not bother generating VOLATILE
> >> > around volatile memory references, and we are willing to
> >> > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from)
> >> >return maybe_expand_insn (code, 2, ops);
> >> >  }
> >> >
> >> > +/* Generat

Re: [PATCH PR79868 ][aarch64] Fix error calls in aarch64 code so they can be translated (version 2)

2017-10-26 Thread Richard Earnshaw (lists)
On 26/09/17 00:25, Steve Ellcey wrote:
> This is a new version of my patch to fix PR target/79868, where some
> error messages are impossible to translate correctly due to how the
> strings are dynamically constructed.  It also includes some format
> changes in the error messags to make the messages more consistent with
> each other and with other GCC errors.  This was worked out with help
> from Martin Sebor.  I also had to fix some tests to match the new error
> string formats.
> 
> Tested on Aarch64 with no regressions, OK to checkin?

I can't help feeling that all this logic is somewhat excessive and
changing the wording of each message to include "pragma or attribute"
would solve it equally well.  With the new context highlighting it's
trivial to tell which subcase of usage is being referred to.

R.

> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2017-09-25  Steve Ellcey  
> 
>   PR target/79868
>   * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse):
>   Change argument type on aarch64_process_target_attr call.
>   * config/aarch64/aarch64-protos.h (aarch64_process_target_attr):
>   Change argument type.
>   * config/aarch64/aarch64.c (aarch64_attribute_info): Change
>   field type.
>   (aarch64_handle_attr_arch): Change argument type, use boolean
>   argument to use different strings in error calls.
>   (aarch64_handle_attr_cpu): Ditto.
>   (aarch64_handle_attr_tune): Ditto.
>   (aarch64_handle_attr_isa_flags): Ditto.
>   (aarch64_process_one_target_attr): Ditto.
>   (aarch64_process_target_attr): Ditto.
>   (aarch64_option_valid_attribute_p): Change argument type on
>   aarch64_process_target_attr call.
> 
> 
> 2017-09-25  Steve Ellcey  
> 
>   PR target/79868
>   * gcc.target/aarch64/spellcheck_1.c: Update dg-error string to match
>   new format.
>   * gcc.target/aarch64/spellcheck_2.c: Ditto.
>   * gcc.target/aarch64/spellcheck_3.c: Ditto.
>   * gcc.target/aarch64/target_attr_11.c: Ditto.
>   * gcc.target/aarch64/target_attr_12.c: Ditto.
>   * gcc.target/aarch64/target_attr_17.c: Ditto.
> 
> 
> pr79868.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
> index 177e638..c9945db 100644
> --- a/gcc/config/aarch64/aarch64-c.c
> +++ b/gcc/config/aarch64/aarch64-c.c
> @@ -165,7 +165,7 @@ aarch64_pragma_target_parse (tree args, tree pop_target)
>   information that it specifies.  */
>if (args)
>  {
> -  if (!aarch64_process_target_attr (args, "pragma"))
> +  if (!aarch64_process_target_attr (args, true))
>   return false;
>  
>aarch64_override_options_internal (&global_options);
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index e67c2ed..4323e9e 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -445,7 +445,7 @@ bool aarch64_gen_adjusted_ldpstp (rtx *, bool, 
> scalar_mode, RTX_CODE);
>  
>  void aarch64_init_builtins (void);
>  
> -bool aarch64_process_target_attr (tree, const char*);
> +bool aarch64_process_target_attr (tree, bool);
>  void aarch64_override_options_internal (struct gcc_options *);
>  
>  rtx aarch64_expand_builtin (tree exp,
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 1c14008..122ed5e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -67,6 +67,7 @@
>  #include "common/common-target.h"
>  #include "selftest.h"
>  #include "selftest-rtl.h"
> +#include "intl.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -9554,15 +9555,15 @@ struct aarch64_attribute_info
>const char *name;
>enum aarch64_attr_opt_type attr_type;
>bool allow_neg;
> -  bool (*handler) (const char *, const char *);
> +  bool (*handler) (const char *, bool);
>enum opt_code opt_num;
>  };
>  
>  /* Handle the ARCH_STR argument to the arch= target attribute.
> -   PRAGMA_OR_ATTR is used in potential error messages.  */
> +   IS_PRAGMA is used in potential error messages.  */
>  
>  static bool
> -aarch64_handle_attr_arch (const char *str, const char *pragma_or_attr)
> +aarch64_handle_attr_arch (const char *str, bool is_pragma)
>  {
>const struct processor *tmp_arch = NULL;
>enum aarch64_parse_opt_result parse_res
> @@ -9579,15 +9580,22 @@ aarch64_handle_attr_arch (const char *str, const char 
> *pragma_or_attr)
>switch (parse_res)
>  {
>case AARCH64_PARSE_MISSING_ARG:
> - error ("missing architecture name in 'arch' target %s", pragma_or_attr);
> + error (is_pragma
> +? G_("missing name in % pragma")
> +: G_("missing name in % attribute"));
>   break;
>case AARCH64_PARSE_INVALID_ARG:
> - error ("unknown value %qs for 'arch' target %s", str, pragma_or_attr);
> + error (is_pragma
> +? G_("invalid name (\"%s\") in % pragma")
> +: G_("invali

Re: [PING][PATCH][Aarch64] Improve int<->FP conversions

2017-10-26 Thread James Greenhalgh
On Tue, Oct 24, 2017 at 10:47:32PM +0100, Michael Collison wrote:
> James,
> 
> The patch was test as required. However when I tested with the latest trunk 
> there were two test failures that need to be updated because of my patch. 
> 
> The files gcc.target/aarch64/vect-vcvt.c was failing because the 
> scan-assembler directives assumed the destination register was assumed to be 
> an integer register. With my patch the destination can be an integer or fp 
> register.
> 
> I fixed the failures and bootstrapped and tested on aarch64-linux-gnu. Okay 
> for trunk?

OK.

> --- a/gcc/testsuite/gcc.target/aarch64/vect-vcvt.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vcvt.c
> @@ -56,13 +56,13 @@ TEST (SUFFIX, q, 32, 4, u,u,s)\
>  TEST (SUFFIX, q, 64, 2, u,u,d)   \
>  
>  BUILD_VARIANTS ( )
> -/* { dg-final { scan-assembler "fcvtzs\\tw\[0-9\]+, s\[0-9\]+" } } */
> -/* { dg-final { scan-assembler "fcvtzs\\tx\[0-9\]+, d\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "fcvtzs\\t(w|s)\[0-9\]+, s\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "fcvtzs\\t(x|d)\[0-9\]+, d\[0-9\]+" } } */
>  /* { dg-final { scan-assembler "fcvtzs\\tv\[0-9\]+\.2s, v\[0-9\]+\.2s" } } */
>  /* { dg-final { scan-assembler "fcvtzs\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s" } } */
>  /* { dg-final { scan-assembler "fcvtzs\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d" } } */
> -/* { dg-final { scan-assembler "fcvtzu\\tw\[0-9\]+, s\[0-9\]+" } } */
> -/* { dg-final { scan-assembler "fcvtzu\\tx\[0-9\]+, d\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "fcvtzu\\t(w|s)\[0-9\]+, s\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "fcvtzu\\t(x|d)\[0-9\]+, d\[0-9\]+" } } */
>  /* { dg-final { scan-assembler "fcvtzu\\tv\[0-9\]+\.2s, v\[0-9\]+\.2s" } } */
>  /* { dg-final { scan-assembler "fcvtzu\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s" } } */
>  /* { dg-final { scan-assembler "fcvtzu\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d" } } */

Personally I'd have used \[ws\] but this works too.

Reviewed by: James Greenhalgh 

James



Re: [PATCH 00/13] Removal of SDB debug info support

2017-10-26 Thread Jeff Law
On 10/26/2017 03:33 AM, Richard Biener wrote:
> On Wed, Oct 25, 2017 at 11:24 PM, Jim Wilson  wrote:
>> We have no targets that emit SDB debug info by default.  We dropped all
>> of the SVR3 Unix and embedded COFF targets a while ago.  The only
>> targets that are still able to emit SDB debug info are cygwin, mingw,
>> and msdosdjgpp.
>>
>> I tried a cygwin build with sources modified to emit SDB by default, to
>> see if the support was still usable.  I ran into multiple problems.
>>  There is no SDB support for IMPORTED_DECL which was added in 2008.  -
>> freorder-functions and -freorder-blocks-and-partition did not work and
>> had to be disabled.  I hit a cgraph assert because sdbout.c uses
>> assemble_name on types, which fails if there is a function and type
>> with the same name.  This also causes types to be added to the debug
>> info with prepended underscores which is wrong.  I then ran into a
>> problem with the i386_pe_declare_function_type call from
>> i386_pe_file_end and gave up because I didn't see an easy workaround.
>>
>> It seems clear that the SDB support is no longer usable, and probably
>> hasn't been for a while.  This support should just be removed.
>>
>> SDB is both a debug info format and an old Unix debugger.  There were
>> some references to the debugger that I left in, changing to past tense,
>> as the comments are useful history to explain why the code was written
>> the was it was.  Otherwise, I tried to eliminate all references to sdb
>> as a debug info format.
>>
>> This patch series was tested with a C only cross compiler build for all
>> modified embedded targets, a default languages build for power aix,
>> i686 cygwin, and x86_64 linux.  I also did gdb testsuite runs for
>> cygwin and linux.  There were no regressions.
>>
>> As a debug info maintainer, I can self approve some of this stuff,
>> would be would be good to get a review from one of the other global
>> reviewers, and/or target maintainers.
> 
> You have my approval for this.  Can you add a blurb to gcc-8/changes.html,
> like "support for emitting SDB debug info has been removed" in the caveats
> section?
I didn't see anything I would consider controversial in the series.  I'd
echo Richi's comment about potentially keeping the flag as ignored.

jeff



Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Michael Matz
Hi,

On Thu, 26 Oct 2017, Martin Jambor wrote:

> > 35 bytes seems to be much - what is the code-size impact?
> 
> I will find out and report on that.  I need at least 32 bytes (four
> long ints) to fix imagemagick, where the problematic structure is:

Surely the final heuristic should look at the size and number of elements 
of the struct in question, not only on size.


Ciao,
Michael.


[Patch obvious][arm testsuite] Fixup expected location in require-pic-register-loc.c

2017-10-26 Thread James Greenhalgh

Hi,

After r254010 we now add -gcolumn-info by default, that means the tests
in gcc.target/arm/require-pic-register-loc.c need adjusting to not expect
to see column zero.

That's the obvious fix, and just extends what Jakub did in r254010 so
I've applied it as r254106.

Thanks,
James

---
2017-10-25  James Greenhalgh  

* gcc.target/arm/require-pic-register-loc.c: Use wider regex for
column information.

diff --git a/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c b/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c
index bd85e86..268e9e4 100644
--- a/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c
+++ b/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c
@@ -18,12 +18,12 @@ main (int argc)/* line 9.  */
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\.loc 1 7 0" } } */
-/* { dg-final { scan-assembler-not "\.loc 1 8 0" } } */
-/* { dg-final { scan-assembler-not "\.loc 1 9 0" } } */
+/* { dg-final { scan-assembler-not "\.loc 1 7 \[0-9\]\+" } } */
+/* { dg-final { scan-assembler-not "\.loc 1 8 \[0-9\]\+" } } */
+/* { dg-final { scan-assembler-not "\.loc 1 9 \[0-9\]\+" } } */
 
 /* The loc at the start of the prologue.  */
-/* { dg-final { scan-assembler-times "\.loc 1 10 0" 1 } } */
+/* { dg-final { scan-assembler-times "\.loc 1 10 \[0-9\]\+" 1 } } */
 
 /* The loc at the end of the prologue, with the first user line.  */
-/* { dg-final { scan-assembler-times "\.loc 1 11 0" 1 } } */
+/* { dg-final { scan-assembler-times "\.loc 1 11 \[0-9\]\+" 1 } } */


Re: [PATCH, rs6000] Gimple folding for vec_madd()

2017-10-26 Thread Will Schmidt
On Thu, 2017-10-26 at 11:05 +0200, Richard Biener wrote:
> On Wed, Oct 25, 2017 at 4:38 PM, Will Schmidt  
> wrote:
> > Hi,
> >
> > Add support for gimple folding of the vec_madd() (vector multiply-add)
> > intrinsics.
> > Testcase coverage is provided by the existing tests
> >  gcc.target/powerpc/fold-vec-madd-*.c
> >
> > Sniff-tests appear clean.  A full regtest is currently running across 
> > assorted Power systems. (P6-P9).
> > OK for trunk (pending clean run results)?
> 
> You can use FMA_EXPR on integer operands as well.  Otherwise you risk
> the FMA be not matched by combine later when part of the operation is
> CSEd.

I had tried that initially, without success,..   I'll probably need
another hint.  :-) 
Looking a bit closer, I think I see why the assert fired, but I'm not
sure what the proper fix would be.

So attempting to the FMA_EXPR on the integer operands. (vector shorts in
this case), I end up triggering this error:

/home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/testsuite/gcc.target/powerpc/fold-vec-madd-short.c:14:10:
 internal compiler error: in expand_expr_real_2, at expr.c:8712
0x10813303 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
expand_modifier)
/home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8712
0x1081822f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:9787
0x1080f7bb expand_expr_real(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8084
...


which when followed back, I tripped an assert here:  (gcc/expr.c:
expand_expr_real_2() ~ line 8710)

case FMA_EXPR:
  {
optab opt = fma_optab;
gimple *def0, *def2;
if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
  {
tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
tree call_expr;

gcc_assert (fn != NULL_TREE);

where gcc/builtins.c
mathfn_built_in()->mathfn_built_in_1->mathfn_built_in_2 looks to have
returned END_BUILTINS/NULL_TREE, due to falling through the if/else
tree:

  if (TYPE_MAIN_VARIANT (type) == double_type_node)
return fcode;
  else if (TYPE_MAIN_VARIANT (type) == float_type_node)
return fcodef;
  else if (TYPE_MAIN_VARIANT (type) == long_double_type_node)
return fcodel;
  else
return END_BUILTINS;

Looks like that is all double/float/long double contents.  First blush
attempt would be to add V8HI_type_node/integer_type_node to that if/else
tree, but that doesn't look like it would be near enough.

Thanks
-Will

> 
> Richard.
> 
> > Thanks,
> > -Will
> >
> > [gcc]
> >
> > 2017-10-25  Will Schmidt 
> >
> > * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support 
> > for
> >   gimple folding of vec_madd() intrinsics.
> >
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 4837e14..04c2b15 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -16606,10 +16606,43 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> > *gsi)
> >build_int_cst (arg2_type, 0)), 
> > arg0);
> >  gimple_set_location (g, loc);
> >  gsi_replace (gsi, g, true);
> >  return true;
> >}
> > +
> > +/* vec_madd (Float) */
> > +case ALTIVEC_BUILTIN_VMADDFP:
> > +case VSX_BUILTIN_XVMADDDP:
> > +  {
> > +   arg0 = gimple_call_arg (stmt, 0);
> > +   arg1 = gimple_call_arg (stmt, 1);
> > +   tree arg2 = gimple_call_arg (stmt, 2);
> > +   lhs = gimple_call_lhs (stmt);
> > +   gimple *g = gimple_build_assign (lhs, FMA_EXPR , arg0, arg1, arg2);
> > +   gimple_set_location (g, gimple_location (stmt));
> > +   gsi_replace (gsi, g, true);
> > +   return true;
> > +  }
> > +/* vec_madd (Integral) */
> > +case ALTIVEC_BUILTIN_VMLADDUHM:
> > +  {
> > +   arg0 = gimple_call_arg (stmt, 0);
> > +   arg1 = gimple_call_arg (stmt, 1);
> > +   tree arg2 = gimple_call_arg (stmt, 2);
> > +   lhs = gimple_call_lhs (stmt);
> > +   tree lhs_type = TREE_TYPE (lhs);
> > +   location_t loc = gimple_location (stmt);
> > +   gimple_seq stmts = NULL;
> > +   tree mult_result = gimple_build (&stmts, loc, MULT_EXPR,
> > +  lhs_type, arg0, arg1);
> > +   tree plus_result = gimple_build (&stmts, loc, PLUS_EXPR,
> > +  lhs_type, mult_result, arg2);
> > +   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> > +   update_call_from_tree (gsi, plus_result);
> > +   return true;
> > +  }
> > +
> >  default:
> > if (TARGET_DEBUG_BUILTIN)
> >fprintf (stderr, "gimple builtin intrinsic not matched:%d %s 
> > %s\n",
> > fn_code, fn_name1, fn_name2);
> >break;
> >
> >

Re: [Diagnostic Patch] don't print column zero

2017-10-26 Thread David Malcolm
[CCing Rainer and Mike for the gcc-dg.exp part]

On Thu, 2017-10-26 at 07:33 -0400, Nathan Sidwell wrote:
> On the modules branch, I'm starting to add location
> information.  Line 
> numbers don't really make sense when reporting errors reading a
> binary 
> file, so I wanted to change the diagnostics such that line number
> zero 
> (which is not a line) is not printed -- one just gets the file
> name.  I 
> then noticed that we don't elide column zero (also, not a column
> outside 
> of emacsland).
> 
> This patch changes the diagnostics, such that line-zero prints
> neither 
> line nor column and column-zero doesn't print the column.
> 
> The testsuite presumes that all diagnostics have a column (which may
> or 
> may not be specified in the test pattern).  This patch augments it
> such 
> that a prefix of '-:' indicates 'no column'.  We still default to 
> expecting a column
> 
> The vast bulk is annotating C & C++ tests that do not have a column. 
> Some of those were explicitly checking for column-zero, but many
> just 
> expected some arbitrary column number, which happened to be
> zero.  Of 
> course many (most?) of these diagnostics could be improved to provide
> a 
> column.  Most are from the preprocessor.
> 
> While this is a change in the compiler's output, it's effectively 
> returning to a pre-column formatting for the cases where the column 
> number is not known.  I'd expect (hope?) error message parsers to be 
> robust in that case. (I've found it confusing when column-zero is 
> printed, as I think columns might be zero-based after all.)
> 
> bootstrapped on all languages.
> 
> ok?
> 
> nathan

Indeed, gcc uses 1-based columns, with 0 meaning "the whole line",
whereas Emacs uses 0-based columns (see the comment in line-map.h). 
Probably best to not print them, to avoid confusing the user.

Alternate idea: could show_column become a tri-state:
  * default: show non-zero columns
  * never: never show columns
  * always: always show a column, printing 0 for the no-column case
and then use "always" in our testsuite
?

-fno-show-column would presumably then be a legacy alias for the
"never" value.

> Index: gcc/diagnostic.c
> ===
> --- gcc/diagnostic.c  (revision 254060)
> +++ gcc/diagnostic.c  (working copy)
> @@ -293,6 +293,24 @@ diagnostic_get_color_for_kind (diagnosti
>return diagnostic_kind_color[kind];
>  }
>  
> +/* Return a formatted line and column ':%line:%column'.  Elided if
> +   zero.  The result is a statically allocated buffer.  */

> +static const char *
> +maybe_line_and_column (int line, int col)
> +{
> +  static char result[32];
> +
> +  if (line)
> +{
> +  size_t l = sprintf (result, col ? ":%d:%d" : ":%d", line, col);

Possibly a silly question, but is it OK to have a formatted string
call in which some of the arguments aren't consumed? (here "col" is only
consumed for the true case, which consumes 2 arguments; it's not consumed
for the false case).

> +  gcc_checking_assert (l + 1 < sizeof (result));

Would snprintf be safer?

Please create a selftest for the function, covering these cases:

* line == 0
* line > 0 and col == 0
* line > 0 and col > 0 (checking output for these cases)
* line == INT_MAX and col == INT_MAX (without checking output, just to tickle 
the assert)
* line == INT_MIN and col == INT_MIN (likewise)

Alternatively, please create a selftest for diagnostic_get_location_text,
testing the cases of:
* context->show_column true and false
* N_("")
* the above line/col value combos


> +}
> +  else
> +result[0] = 0;
> +  return result;
> +}
> +
>  /* Return a malloc'd string describing a location e.g. "foo.c:42:10".
> The caller is responsible for freeing the memory.  */
>  
> @@ -303,19 +321,13 @@ diagnostic_get_location_text (diagnostic
>pretty_printer *pp = context->printer;
>const char *locus_cs = colorize_start (pp_show_color (pp), "locus");
>const char *locus_ce = colorize_stop (pp_show_color (pp));
> -
> -  if (s.file == NULL)
> -return build_message_string ("%s%s:%s", locus_cs, progname, locus_ce);
> -
> -  if (!strcmp (s.file, N_("")))
> -return build_message_string ("%s%s:%s", locus_cs, s.file, locus_ce);
> -
> -  if (context->show_column)
> -return build_message_string ("%s%s:%d:%d:%s", locus_cs, s.file, s.line,
> -  s.column, locus_ce);
> -  else
> -return build_message_string ("%s%s:%d:%s", locus_cs, s.file, s.line,
> -  locus_ce);
> +  const char *file = s.file ? s.file : progname;
> +  int line = strcmp (file, N_("")) ? s.line : 0;
> +  int col = context->show_column ? s.column : 0;
> +
> +  const char *line_col = maybe_line_and_column (line, col);
> +  return build_message_string ("%s%s%s:%s", locus_cs, file,
> +line_col, locus_ce);
>  }
>  
>  /* Return a malloc'd string describing a location and the severity of the
> @@ -577,21 +589,20 @@ di

Re: [v3 PATCH] Deduction guides for associative containers, debug mode deduction guide fixes.

2017-10-26 Thread Jonathan Wakely

On 17/10/17 22:48 +0300, Ville Voutilainen wrote:

Tested on Linux-PPC64. The debug mode fixes have been tested manually
and individually on Linux-x64.

2017-10-17  Ville Voutilainen  

   Deduction guides for associative containers, debug mode deduction
guide fixes.
   * include/bits/stl_algobase.h (__iter_key_t)
   (__iter_val_t, __iter_to_alloc_t): New.
   * include/bits/stl_map.h: Add deduction guides.
   * include/bits/stl_multimap.h: Likewise.
   * include/bits/stl_multiset.h: Likewise.
   * include/bits/stl_set.h: Likewise.
   * include/bits/unordered_map.h: Likewise.
   * include/bits/unordered_set.h: Likewise.
   * include/debug/deque: Likewise.
   * include/debug/forward_list: Likewise.
   * include/debug/list: Likewise.
   * include/debug/map.h: Likewise.
   * include/debug/multimap.h: Likewise.
   * include/debug/multiset.h: Likewise.
   * include/debug/set.h: Likewise.
   * include/debug/unordered_map: Likewise.
   * include/debug/unordered_set: Likewise.
   * include/debug/vector: Likewise.
   * testsuite/23_containers/map/cons/deduction.cc: New.
   * testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
   * testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
   * testsuite/23_containers/set/cons/deduction.cc: Likewise.
   * testsuite/23_containers/unordered_map/cons/deduction.cc: Likewise.
   * testsuite/23_containers/unordered_multimap/cons/deduction.cc:
   Likewise.
   * testsuite/23_containers/unordered_multiset/cons/deduction.cc:
   Likewise.
   * testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise.





--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h


This doesn't seem like the right place for these.

Maybe stl_iterator.h with forward declarations of pair and allocator
if needed?



@@ -1429,6 +1429,25 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
#endif

_GLIBCXX_END_NAMESPACE_ALGO
+
+#if __cplusplus > 201402L
+
+  template
+  using __iter_key_t = remove_const_t<
+typename iterator_traits<_InputIterator>::value_type::first_type>;
+
+ template
+   using __iter_val_t =
+   typename iterator_traits<_InputIterator>::value_type::second_type;
+
+ template
+   using __iter_to_alloc_t =
+   pair::value_type::first_type>,
+   typename iterator_traits<_InputIterator>::value_type::second_type>;



Inconsistent indentation for these three. Please use:

 template<...>
   using ...
 ...

Would the third one be simpler as:

 template
   using __iter_to_alloc_t =
 pair>,
  __iter_val_t<_InputIterator>>



--- a/libstdc++-v3/include/bits/stl_map.h
+++ b/libstdc++-v3/include/bits/stl_map.h
@@ -1366,6 +1366,40 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 const map<_K1, _T1, _C1, _A1>&);
};

+
+#if __cpp_deduction_guides >= 201606
+
+ template

The additions to this file seem to be indented by only one space not
two.


+set(_InputIterator, _InputIterator,
+   _Compare = _Compare(), _Allocator = _Allocator())
+   -> set::value_type,
+ _Compare, _Allocator>;


The first line seems to be indented wrong here too.

A stray newline has crept into include/debug/unordered_set:


@@ -1031,6 +1159,7 @@ namespace __debug
  const unordered_multiset<_Value, _Hash, _Pred, _Alloc>& __y)
{ return !(__x == __y); }

+
} // namespace __debug
} // namespace std


I notice there are no copyright headers in the new test. imokwiththis.jpg




Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka  wrote:
>> I think the limit should be on the number of generated copies and not
>> the overall size of the structure...  If the struct were composed of
>> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
>>
>> I wonder how rep; movb; interacts with store to load forwarding?  Is
>> that maybe optimized well on some archs?  movb should always
>> forward and wasn't the setup cost for small N reasonable on modern
>> CPUs?
>
> rep mov is win over loop for blocks over 128bytes on core, for blocks in rage
> 24-128 on zen.  This is w/o store/load forwarding, but I doubt those provide
> a cheap way around.
>
>>
>> It probably depends on the width of the entries in the store buffer,
>> if they appear in-order and the alignment of the stores (if they are larger 
>> than
>> 8 bytes they are surely aligned).  IIRC CPUs had smaller store buffer
>> entries than cache line size.
>>
>> Given that load bandwith is usually higher than store bandwith it
>> might make sense to do the store combining in our copying sequence,
>> like for the 8 byte entry case use sth like
>>
>>   movq 0(%eax), %xmm0
>>   movhps 8(%eax), %xmm0 // or vpinsert
>>   mov[au]ps %xmm0, 0%(ebx)
>> ...
>>
>> thus do two loads per store and perform the stores in wider
>> mode?
>
> This may be somewhat faster indeed.  I am not sure if store to load
> forwarding will work for the later half when read again by halves.
> It would not happen on older CPUs :)

Yes, forwarding larger stores to smaller loads generally works fine
since forever with the usual restrictions of alignment/size being
power of two "halves".

The question is of course what to do for 4 byte or smaller elements or
mixed size elements.  We can do zero-extending loads
(do we have them for QI, HI mode loads as well?) and
do shift and or's.  I'm quite sure the CPUs wouldn't like to
see vpinsert's of different vector mode destinations.  So it
would be 8 byte stores from GPRs and values built up via
shift & or.

As said, the important part is that IIRC CPUs can usually
have more loads in flight than stores.  Esp. Bulldozer
with the split core was store buffer size limited (but it
could do merging of store buffer entries IIRC).

Richard.

> Honza
>>
>> As said a general concern was you not copying padding.  If you
>> put this into an even more common place you surely will break
>> stuff, no?
>>
>> Richard.
>>
>> >
>> > Martin
>> >
>> >
>> >>
>> >> Richard.
>> >>
>> >> > Martin
>> >> >
>> >> >
>> >> > 2017-10-12  Martin Jambor  
>> >> >
>> >> > PR target/80689
>> >> > * tree-sra.h: New file.
>> >> > * ipa-prop.h: Moved declaration of build_ref_for_offset to
>> >> > tree-sra.h.
>> >> > * expr.c: Include params.h and tree-sra.h.
>> >> > (emit_move_elementwise): New function.
>> >> > (store_expr_with_bounds): Optionally use it.
>> >> > * ipa-cp.c: Include tree-sra.h.
>> >> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
>> >> > * config/i386/i386.c (ix86_option_override_internal): Set
>> >> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
>> >> > * tree-sra.c: Include tree-sra.h.
>> >> > (scalarizable_type_p): Renamed to
>> >> > simple_mix_of_records_and_arrays_p, made public, renamed the
>> >> > second parameter to allow_char_arrays.
>> >> > (extract_min_max_idx_from_array): New function.
>> >> > (completely_scalarize): Moved bits of the function to
>> >> > extract_min_max_idx_from_array.
>> >> >
>> >> > testsuite/
>> >> > * gcc.target/i386/pr80689-1.c: New test.
>> >> > ---
>> >> >  gcc/config/i386/i386.c|   4 ++
>> >> >  gcc/expr.c| 103 
>> >> > --
>> >> >  gcc/ipa-cp.c  |   1 +
>> >> >  gcc/ipa-prop.h|   4 --
>> >> >  gcc/params.def|   6 ++
>> >> >  gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 +++
>> >> >  gcc/tree-sra.c|  86 
>> >> > +++--
>> >> >  gcc/tree-sra.h|  33 ++
>> >> >  8 files changed, 233 insertions(+), 42 deletions(-)
>> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
>> >> >  create mode 100644 gcc/tree-sra.h
>> >> >
>> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> >> > index 1ee8351c21f..87f602e7ead 100644
>> >> > --- a/gcc/config/i386/i386.c
>> >> > +++ b/gcc/config/i386/i386.c
>> >> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p,
>> >> >  ix86_tune_cost->l2_cache_size,
>> >> >  opts->x_param_values,
>> >> >  opts_set->x_param_values);
>> >> > +  maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
>> >> > +35,
>> >> > +

Re: [Diagnostic Patch] don't print column zero

2017-10-26 Thread Nathan Sidwell

On 10/26/2017 10:34 AM, David Malcolm wrote:

[CCing Rainer and Mike for the gcc-dg.exp part]



Alternate idea: could show_column become a tri-state:
   * default: show non-zero columns
   * never: never show columns
   * always: always show a column, printing 0 for the no-column case
and then use "always" in our testsuite


One of the things this patch shows up is the number of places where 
we're default accepting a zero column.  IMHO it is best to explicitly 
mark such tests.



+  size_t l = sprintf (result, col ? ":%d:%d" : ":%d", line, col);


Possibly a silly question, but is it OK to have a formatted string
call in which some of the arguments aren't consumed? (here "col" is only
consumed for the true case, which consumes 2 arguments; it's not consumed
for the false case).


Yes.


+  gcc_checking_assert (l + 1 < sizeof (result));


Would snprintf be safer?


I guess. but the assert's still needed.


Please create a selftest for the function, covering these cases:

* line == 0
* line > 0 and col == 0
* line > 0 and col > 0 (checking output for these cases)
* line == INT_MAX and col == INT_MAX (without checking output, just to tickle 
the assert)
* line == INT_MIN and col == INT_MIN (likewise)


Ok, I'll investigate this new fangled self-testing framework :)


There are some testcases where we deliberately don't have a *line*
number; what happens to these?


Those don't change.  the dg-harness already does NOT expect a column 
when lineno=0.



My Tcl skills aren't great, so hopefully someone else can review this;
CCing Rainer and Mike.

Also, is the proposed syntax for "no columns" OK?  (note the tristate
idea above)


I'm not wedded to '-:', but as mentioned above, I think the tests should 
be explicit about whether a column is expected or not (and the default 
needs to be 'expect column', because of history)


thanks for your comments.

nathan

--
Nathan Sidwell


Re: [PATCH] Improve alloca alignment

2017-10-26 Thread Jeff Law
On 10/05/2017 03:16 AM, Richard Biener wrote:
>  On Thu, Oct 5, 2017 at 1:07 AM, Jeff Law  wrote:
>> On 10/04/2017 08:53 AM, Eric Botcazou wrote:
 This seems like a SPARC target problem to me -- essentially it's
 claiming a higher STACK_BOUNDARY than it really has.
>>>
>>> No, it is not, I can guarantee you that the stack pointer is always aligned 
>>> to
>>> 64-bit boundaries on SPARC, otherwise all hell would break loose...
>> Then something is inconsistent somehwere.  Either the stack is aligned
>> prior to that code or it is not.  If it is aligned, then Wilco's patch
>> ought to keep it aligned.  If is not properly aligned, then well, that's
>> the problem ISTM.
>>
>> Am I missing something here?
> 
> What I got from the discussion and the PR is that the stack hardregister
> is properly aligned but what GCC maps to it (virtual or frame or whatever)
> might not be at all points.
Ah!  But I'd probably claim that having the virtual unaligned is erroneous.

> 
> allocate_dynamic_stack_space uses virtual_stack_dynamic_rtx and I'm not
> sure STACK_BOUNDARY applies to it?
> 
> Not that I know anything about this here ;)
My first thought is that sure it should apply.  It just seems wrong that
STACK_BOUNDARY wouldn't apply to the virtual.  But I doubt we've ever
documented that as a requirement/assumption.

Jeff



Re: [PATCH] Improve alloca alignment

2017-10-26 Thread Jeff Law
On 10/17/2017 06:04 AM, Wilco Dijkstra wrote:
> Wilco Dijkstra wrote:
>>
>> Yes STACK_BOUNDARY applies to virtual_stack_dynamic_rtx and all other
>> virtual frame registers. It appears it's main purpose is to enable alignment
>> optimizations since PREFERRED_STACK_BOUNDARY is used to align
>> local and outgoing argument area etc. So if you don't want the alignment
>> optimizations it is feasible to set STACK_BOUNDARY to a lower value
>> without changing the stack layout.
>>
>> There is also STACK_DYNAMIC_OFFSET which computes the total offset
>> from the stack. It's not obvious whether the default version should align 
>> (since
>> outgoing arguments are already aligned there is no easy way to record the
>> extra padding), but we could assert if the offset isn't aligned.
> 
> Also there is something odd in the sparc backend:
> 
> /* Given the stack bias, the stack pointer isn't actually aligned.  */
> #define INIT_EXPANDERS   \
>   do {   \
> if (crtl->emit.regno_pointer_align && SPARC_STACK_BIAS)  \
>   {  \
> REGNO_POINTER_ALIGN (STACK_POINTER_REGNUM) = BITS_PER_UNIT;  \
> REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = BITS_PER_UNIT; \
>   }  \
>   } while (0)
> 
> That lowers the alignment for the stack and frame pointer. So assuming that 
> works
> and blocks alignment optimizations, why isn't this done for the dynamic 
> offset as well?
No clue, but ISTM that it should.  Eric, can you try that and see if it
addresses these problems?  I'd really like to get this wrapped up, but I
don't have access to any sparc systems to test it myself.

Jeff


Re: [PATCH] Fix nrv-1.c false failure on aarch64.

2017-10-26 Thread Jeff Law
On 10/18/2017 10:59 AM, Egeyar Bagcioglu wrote:
> Hello,
> 
> Test case "guality.exp=nrv-1.c" fails on aarch64. Optimizations reorder
> the instructions and cause the value of a variable to be checked before
> its first assignment. The following patch is moving the
> break point to the end of the function. Therefore, it ensures that the
> break point is reached after the assignment instruction is executed.
> 
> Please review the patch and apply if legitimate.
This seems wrong.

If I understand the test correctly, we want to break on the line with
the assignment to a2.i[4] = 7 and verify that before that line executes
that a2.i[0] == 42.

Moving the test point to the end of the function seems to defeat the
purpose of the test.  A breakpoint at the end of the function to test
state is pointless as it doesn't reflect what a user is likely to want
to do.

I'm guessing based on your description that optimization has sunk the
assignment to a2.i[0] down past the assignment to a2.i[4]?  What
optimization did this and what do the dwarf records look like?


Jeff


Re: [v3 PATCH] Deduction guides for associative containers, debug mode deduction guide fixes.

2017-10-26 Thread Jonathan Wakely

On 26/10/17 15:36 +0100, Jonathan Wakely wrote:

On 17/10/17 22:48 +0300, Ville Voutilainen wrote:

Tested on Linux-PPC64. The debug mode fixes have been tested manually
and individually on Linux-x64.

2017-10-17  Ville Voutilainen  

  Deduction guides for associative containers, debug mode deduction
guide fixes.
  * include/bits/stl_algobase.h (__iter_key_t)
  (__iter_val_t, __iter_to_alloc_t): New.
  * include/bits/stl_map.h: Add deduction guides.
  * include/bits/stl_multimap.h: Likewise.
  * include/bits/stl_multiset.h: Likewise.
  * include/bits/stl_set.h: Likewise.
  * include/bits/unordered_map.h: Likewise.
  * include/bits/unordered_set.h: Likewise.


Also, please put the deduction guides for a class immediately after
the definition of that class, rather than grouping all the guides for
unordered_map and unordered_multimap together.

Thanks.



Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 4:38 PM, Richard Biener
 wrote:
> On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka  wrote:
>>> I think the limit should be on the number of generated copies and not
>>> the overall size of the structure...  If the struct were composed of
>>> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
>>>
>>> I wonder how rep; movb; interacts with store to load forwarding?  Is
>>> that maybe optimized well on some archs?  movb should always
>>> forward and wasn't the setup cost for small N reasonable on modern
>>> CPUs?
>>
>> rep mov is win over loop for blocks over 128bytes on core, for blocks in rage
>> 24-128 on zen.  This is w/o store/load forwarding, but I doubt those provide
>> a cheap way around.
>>
>>>
>>> It probably depends on the width of the entries in the store buffer,
>>> if they appear in-order and the alignment of the stores (if they are larger 
>>> than
>>> 8 bytes they are surely aligned).  IIRC CPUs had smaller store buffer
>>> entries than cache line size.
>>>
>>> Given that load bandwith is usually higher than store bandwith it
>>> might make sense to do the store combining in our copying sequence,
>>> like for the 8 byte entry case use sth like
>>>
>>>   movq 0(%eax), %xmm0
>>>   movhps 8(%eax), %xmm0 // or vpinsert
>>>   mov[au]ps %xmm0, 0%(ebx)
>>> ...
>>>
>>> thus do two loads per store and perform the stores in wider
>>> mode?
>>
>> This may be somewhat faster indeed.  I am not sure if store to load
>> forwarding will work for the later half when read again by halves.
>> It would not happen on older CPUs :)
>
> Yes, forwarding larger stores to smaller loads generally works fine
> since forever with the usual restrictions of alignment/size being
> power of two "halves".
>
> The question is of course what to do for 4 byte or smaller elements or
> mixed size elements.  We can do zero-extending loads
> (do we have them for QI, HI mode loads as well?) and
> do shift and or's.  I'm quite sure the CPUs wouldn't like to
> see vpinsert's of different vector mode destinations.  So it
> would be 8 byte stores from GPRs and values built up via
> shift & or.

Like we generate

foo:
.LFB0:
.cfi_startproc
movl4(%rdi), %eax
movzwl  2(%rdi), %edx
salq$16, %rax
orq %rdx, %rax
movzbl  1(%rdi), %edx
salq$8, %rax
orq %rdx, %rax
movzbl  (%rdi), %edx
salq$8, %rax
orq %rdx, %rax
movq%rax, (%rsi)
ret

for

struct x { char e; char f; short c; int i; } a;

void foo (struct x *p, long *q)
{
 *q = (((unsigned long)(unsigned int)p->i) << 16)
   | (((unsigned long)(unsigned short)p->c))) << 8)
   | (((unsigned long)(unsigned char)p->f))) << 8)
   | ((unsigned long)(unsigned char)p->e);
}

if you disable the bswap pass.  Doing 4 byte stores in this
case would save some prefixes at least.  I expected the
ORs and shifts to have smaller encodings...

With 4 byte stores we end up with the same size as with
individual loads & stores.

> As said, the important part is that IIRC CPUs can usually
> have more loads in flight than stores.  Esp. Bulldozer
> with the split core was store buffer size limited (but it
> could do merging of store buffer entries IIRC).

Also if we do the stores in smaller chunks we are more
likely hitting the same store-to-load-forwarding issue
elsewhere.  Like in case the destination is memcpy'ed
away.

So the proposed change isn't necessarily a win without
a possible similar regression that it tries to fix.

Whole-program analysis of accesses might allow
marking affected objects.

Richard.

> Richard.
>
>> Honza
>>>
>>> As said a general concern was you not copying padding.  If you
>>> put this into an even more common place you surely will break
>>> stuff, no?
>>>
>>> Richard.
>>>
>>> >
>>> > Martin
>>> >
>>> >
>>> >>
>>> >> Richard.
>>> >>
>>> >> > Martin
>>> >> >
>>> >> >
>>> >> > 2017-10-12  Martin Jambor  
>>> >> >
>>> >> > PR target/80689
>>> >> > * tree-sra.h: New file.
>>> >> > * ipa-prop.h: Moved declaration of build_ref_for_offset to
>>> >> > tree-sra.h.
>>> >> > * expr.c: Include params.h and tree-sra.h.
>>> >> > (emit_move_elementwise): New function.
>>> >> > (store_expr_with_bounds): Optionally use it.
>>> >> > * ipa-cp.c: Include tree-sra.h.
>>> >> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
>>> >> > * config/i386/i386.c (ix86_option_override_internal): Set
>>> >> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
>>> >> > * tree-sra.c: Include tree-sra.h.
>>> >> > (scalarizable_type_p): Renamed to
>>> >> > simple_mix_of_records_and_arrays_p, made public, renamed the
>>> >> > second parameter to allow_char_arrays.
>>> >> > (extract_min_max_idx_from_array): New function.
>>> >> > (completely_scalarize): Moved bits of the function to
>>> >> > extract_min_max_idx_fr

Re: [PATCH][AArch64] Improve addressing of TI/TFmode

2017-10-26 Thread James Greenhalgh
On Thu, Jul 20, 2017 at 01:49:03PM +0100, Wilco Dijkstra wrote:
> In https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01125.html Jiong
> pointed out some addressing inefficiencies due to a recent change in
> regcprop (https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00775.html).
> 
> This patch improves aarch64_legitimize_address_displacement to split
> unaligned offsets of TImode and TFmode accesses.  The resulting code
> is better and no longer relies on the original regcprop optimization.
> 
> For the test we now produce:
> 
>   add x1, sp, 4
>   stp xzr, xzr, [x1, 24]
> 
> rather than:
> 
> mov x1, sp
> add x1, x1, 28
> stp xzr, xzr, [x1]
> 
> OK for commit?

OK.

Reviewed by: James Greenhalgh 

Thanks,
James

> 
> ChangeLog:
> 2017-06-20  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (aarch64_legitimize_address_displacement):
>   Improve unaligned TImode/TFmode base/offset split.
> 
> testsuite
>   * gcc.target/aarch64/ldp_stp_unaligned_2.c: New file.
> 
> --


Re: [PATCH, rs6000] Gimple folding for vec_madd()

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 4:30 PM, Will Schmidt  wrote:
> On Thu, 2017-10-26 at 11:05 +0200, Richard Biener wrote:
>> On Wed, Oct 25, 2017 at 4:38 PM, Will Schmidt  
>> wrote:
>> > Hi,
>> >
>> > Add support for gimple folding of the vec_madd() (vector multiply-add)
>> > intrinsics.
>> > Testcase coverage is provided by the existing tests
>> >  gcc.target/powerpc/fold-vec-madd-*.c
>> >
>> > Sniff-tests appear clean.  A full regtest is currently running across 
>> > assorted Power systems. (P6-P9).
>> > OK for trunk (pending clean run results)?
>>
>> You can use FMA_EXPR on integer operands as well.  Otherwise you risk
>> the FMA be not matched by combine later when part of the operation is
>> CSEd.
>
> I had tried that initially, without success,..   I'll probably need
> another hint.  :-)
> Looking a bit closer, I think I see why the assert fired, but I'm not
> sure what the proper fix would be.
>
> So attempting to the FMA_EXPR on the integer operands. (vector shorts in
> this case), I end up triggering this error:
>
> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/testsuite/gcc.target/powerpc/fold-vec-madd-short.c:14:10:
>  internal compiler error: in expand_expr_real_2, at expr.c:8712
> 0x10813303 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
> expand_modifier)
> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8712
> 0x1081822f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:9787
> 0x1080f7bb expand_expr_real(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8084
> ...
>
>
> which when followed back, I tripped an assert here:  (gcc/expr.c:
> expand_expr_real_2() ~ line 8710)
>
> case FMA_EXPR:
>   {
> optab opt = fma_optab;
> gimple *def0, *def2;
> if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
>   {
> tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
> tree call_expr;
>
> gcc_assert (fn != NULL_TREE);
>
> where gcc/builtins.c
> mathfn_built_in()->mathfn_built_in_1->mathfn_built_in_2 looks to have
> returned END_BUILTINS/NULL_TREE, due to falling through the if/else
> tree:
>
>   if (TYPE_MAIN_VARIANT (type) == double_type_node)
> return fcode;
>   else if (TYPE_MAIN_VARIANT (type) == float_type_node)
> return fcodef;
>   else if (TYPE_MAIN_VARIANT (type) == long_double_type_node)
> return fcodel;
>   else
> return END_BUILTINS;
>
> Looks like that is all double/float/long double contents.  First blush
> attempt would be to add V8HI_type_node/integer_type_node to that if/else
> tree, but that doesn't look like it would be near enough.

Well - we of course expect to have an optab for the fma with vector
short.  I thought
you had one given you have the intrinsic.  If you don't have an optab
you of course
have to open-code it.

Just thought you expected an actual machine instruction doing the integer FMA.

Richard.

> Thanks
> -Will
>
>>
>> Richard.
>>
>> > Thanks,
>> > -Will
>> >
>> > [gcc]
>> >
>> > 2017-10-25  Will Schmidt 
>> >
>> > * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support 
>> > for
>> >   gimple folding of vec_madd() intrinsics.
>> >
>> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> > index 4837e14..04c2b15 100644
>> > --- a/gcc/config/rs6000/rs6000.c
>> > +++ b/gcc/config/rs6000/rs6000.c
>> > @@ -16606,10 +16606,43 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
>> > *gsi)
>> >build_int_cst (arg2_type, 0)), 
>> > arg0);
>> >  gimple_set_location (g, loc);
>> >  gsi_replace (gsi, g, true);
>> >  return true;
>> >}
>> > +
>> > +/* vec_madd (Float) */
>> > +case ALTIVEC_BUILTIN_VMADDFP:
>> > +case VSX_BUILTIN_XVMADDDP:
>> > +  {
>> > +   arg0 = gimple_call_arg (stmt, 0);
>> > +   arg1 = gimple_call_arg (stmt, 1);
>> > +   tree arg2 = gimple_call_arg (stmt, 2);
>> > +   lhs = gimple_call_lhs (stmt);
>> > +   gimple *g = gimple_build_assign (lhs, FMA_EXPR , arg0, arg1, arg2);
>> > +   gimple_set_location (g, gimple_location (stmt));
>> > +   gsi_replace (gsi, g, true);
>> > +   return true;
>> > +  }
>> > +/* vec_madd (Integral) */
>> > +case ALTIVEC_BUILTIN_VMLADDUHM:
>> > +  {
>> > +   arg0 = gimple_call_arg (stmt, 0);
>> > +   arg1 = gimple_call_arg (stmt, 1);
>> > +   tree arg2 = gimple_call_arg (stmt, 2);
>> > +   lhs = gimple_call_lhs (stmt);
>> > +   tree lhs_type = TREE_TYPE (lhs);
>> > +   location_t loc = gimple_location (stmt);
>> > +   gimple_seq stmts = NULL;
>> > +   tree mult_result = gimple_build (&stmts, loc, MULT_EXPR,
>> > +  lhs_type, arg0, arg1);
>> > +   tree plu

Re: [PATCH, rs6000] Gimple folding for vec_madd()

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 5:13 PM, Richard Biener
 wrote:
> On Thu, Oct 26, 2017 at 4:30 PM, Will Schmidt  
> wrote:
>> On Thu, 2017-10-26 at 11:05 +0200, Richard Biener wrote:
>>> On Wed, Oct 25, 2017 at 4:38 PM, Will Schmidt  
>>> wrote:
>>> > Hi,
>>> >
>>> > Add support for gimple folding of the vec_madd() (vector multiply-add)
>>> > intrinsics.
>>> > Testcase coverage is provided by the existing tests
>>> >  gcc.target/powerpc/fold-vec-madd-*.c
>>> >
>>> > Sniff-tests appear clean.  A full regtest is currently running across 
>>> > assorted Power systems. (P6-P9).
>>> > OK for trunk (pending clean run results)?
>>>
>>> You can use FMA_EXPR on integer operands as well.  Otherwise you risk
>>> the FMA be not matched by combine later when part of the operation is
>>> CSEd.
>>
>> I had tried that initially, without success,..   I'll probably need
>> another hint.  :-)
>> Looking a bit closer, I think I see why the assert fired, but I'm not
>> sure what the proper fix would be.
>>
>> So attempting to the FMA_EXPR on the integer operands. (vector shorts in
>> this case), I end up triggering this error:
>>
>> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/testsuite/gcc.target/powerpc/fold-vec-madd-short.c:14:10:
>>  internal compiler error: in expand_expr_real_2, at expr.c:8712
>> 0x10813303 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
>> expand_modifier)
>> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8712
>> 0x1081822f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:9787
>> 0x1080f7bb expand_expr_real(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8084
>> ...
>>
>>
>> which when followed back, I tripped an assert here:  (gcc/expr.c:
>> expand_expr_real_2() ~ line 8710)
>>
>> case FMA_EXPR:
>>   {
>> optab opt = fma_optab;
>> gimple *def0, *def2;
>> if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
>>   {
>> tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
>> tree call_expr;
>>
>> gcc_assert (fn != NULL_TREE);
>>
>> where gcc/builtins.c
>> mathfn_built_in()->mathfn_built_in_1->mathfn_built_in_2 looks to have
>> returned END_BUILTINS/NULL_TREE, due to falling through the if/else
>> tree:
>>
>>   if (TYPE_MAIN_VARIANT (type) == double_type_node)
>> return fcode;
>>   else if (TYPE_MAIN_VARIANT (type) == float_type_node)
>> return fcodef;
>>   else if (TYPE_MAIN_VARIANT (type) == long_double_type_node)
>> return fcodel;
>>   else
>> return END_BUILTINS;
>>
>> Looks like that is all double/float/long double contents.  First blush
>> attempt would be to add V8HI_type_node/integer_type_node to that if/else
>> tree, but that doesn't look like it would be near enough.
>
> Well - we of course expect to have an optab for the fma with vector
> short.  I thought
> you had one given you have the intrinsic.  If you don't have an optab
> you of course
> have to open-code it.
>
> Just thought you expected an actual machine instruction doing the integer FMA.

So you have

(define_insn "altivec_vmladduhm"
  [(set (match_operand:V8HI 0 "register_operand" "=v")
(plus:V8HI (mult:V8HI (match_operand:V8HI 1 "register_operand" "v")
  (match_operand:V8HI 2 "register_operand" "v"))
   (match_operand:V8HI 3 "register_operand" "v")))]
  "TARGET_ALTIVEC"
  "vmladduhm %0,%1,%2,%3"
  [(set_attr "type" "veccomplex")])

but not

(define_expand "fmav8hi4"
...

or define_insn in case that's also a way to register an optab.

Richard.



> Richard.
>
>> Thanks
>> -Will
>>
>>>
>>> Richard.
>>>
>>> > Thanks,
>>> > -Will
>>> >
>>> > [gcc]
>>> >
>>> > 2017-10-25  Will Schmidt 
>>> >
>>> > * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add 
>>> > support for
>>> >   gimple folding of vec_madd() intrinsics.
>>> >
>>> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>>> > index 4837e14..04c2b15 100644
>>> > --- a/gcc/config/rs6000/rs6000.c
>>> > +++ b/gcc/config/rs6000/rs6000.c
>>> > @@ -16606,10 +16606,43 @@ rs6000_gimple_fold_builtin 
>>> > (gimple_stmt_iterator *gsi)
>>> >build_int_cst (arg2_type, 0)), 
>>> > arg0);
>>> >  gimple_set_location (g, loc);
>>> >  gsi_replace (gsi, g, true);
>>> >  return true;
>>> >}
>>> > +
>>> > +/* vec_madd (Float) */
>>> > +case ALTIVEC_BUILTIN_VMADDFP:
>>> > +case VSX_BUILTIN_XVMADDDP:
>>> > +  {
>>> > +   arg0 = gimple_call_arg (stmt, 0);
>>> > +   arg1 = gimple_call_arg (stmt, 1);
>>> > +   tree arg2 = gimple_call_arg (stmt, 2);
>>> > +   lhs = gimple_call_lhs (stmt);
>>> > +   gimple *g = gimple_build_assign (lhs, FMA_EXPR , arg0, arg1, 
>>> > arg2);
>>> >

Re: [PATCH][AArch64] Simplify frame layout for stack probing

2017-10-26 Thread James Greenhalgh
On Tue, Jul 25, 2017 at 02:58:04PM +0100, Wilco Dijkstra wrote:
> This patch makes some changes to the frame layout in order to simplify
> stack probing.  We want to use the save of LR as a probe in any non-leaf
> function.  With shrinkwrapping we may only save LR before a call, so it
> is useful to define a fixed location in the callee-saves. So force LR at
> the bottom of the callee-saves even with -fomit-frame-pointer.
> 
> Also remove a rarely used frame layout that saves the callee-saves first
> with -fomit-frame-pointer.
> 
> OK for commit (and backport to GCC7)?

OK. Leave it a week before backporting.

Reviewed by: James Greenhalgh 

Thanks,
James

> 
> ChangeLog:
> 2017-07-25  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (aarch64_layout_frame):
>   Ensure LR is always stored at the bottom of the callee-saves.
>   Remove frame option which saves callee-saves at top of frame.
> 


Re: [PATCH] Improve alloca alignment

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 4:55 PM, Jeff Law  wrote:
> On 10/17/2017 06:04 AM, Wilco Dijkstra wrote:
>> Wilco Dijkstra wrote:
>>>
>>> Yes STACK_BOUNDARY applies to virtual_stack_dynamic_rtx and all other
>>> virtual frame registers. It appears it's main purpose is to enable alignment
>>> optimizations since PREFERRED_STACK_BOUNDARY is used to align
>>> local and outgoing argument area etc. So if you don't want the alignment
>>> optimizations it is feasible to set STACK_BOUNDARY to a lower value
>>> without changing the stack layout.
>>>
>>> There is also STACK_DYNAMIC_OFFSET which computes the total offset
>>> from the stack. It's not obvious whether the default version should align 
>>> (since
>>> outgoing arguments are already aligned there is no easy way to record the
>>> extra padding), but we could assert if the offset isn't aligned.
>>
>> Also there is something odd in the sparc backend:
>>
>> /* Given the stack bias, the stack pointer isn't actually aligned.  */
>> #define INIT_EXPANDERS   \
>>   do {   \
>> if (crtl->emit.regno_pointer_align && SPARC_STACK_BIAS)  \
>>   {  \
>> REGNO_POINTER_ALIGN (STACK_POINTER_REGNUM) = BITS_PER_UNIT;  \
>> REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = BITS_PER_UNIT; \
>>   }  \
>>   } while (0)
>>
>> That lowers the alignment for the stack and frame pointer. So assuming that 
>> works
>> and blocks alignment optimizations, why isn't this done for the dynamic 
>> offset as well?
> No clue, but ISTM that it should.  Eric, can you try that and see if it
> addresses these problems?  I'd really like to get this wrapped up, but I
> don't have access to any sparc systems to test it myself.

Or maybe adjust all non-hardreg stack pointers by the bias so they
_are_ aligned.  And of course
make sure we always use the aligned pointers when allocating.

Weird ABI ...

Richard.

> Jeff


Re: [PATCH, rs6000] Gimple folding for vec_madd()

2017-10-26 Thread Will Schmidt
On Thu, 2017-10-26 at 17:18 +0200, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 5:13 PM, Richard Biener
>  wrote:
> > On Thu, Oct 26, 2017 at 4:30 PM, Will Schmidt  
> > wrote:
> >> On Thu, 2017-10-26 at 11:05 +0200, Richard Biener wrote:
> >>> On Wed, Oct 25, 2017 at 4:38 PM, Will Schmidt  
> >>> wrote:
> >>> > Hi,
> >>> >
> >>> > Add support for gimple folding of the vec_madd() (vector multiply-add)
> >>> > intrinsics.
> >>> > Testcase coverage is provided by the existing tests
> >>> >  gcc.target/powerpc/fold-vec-madd-*.c
> >>> >
> >>> > Sniff-tests appear clean.  A full regtest is currently running across 
> >>> > assorted Power systems. (P6-P9).
> >>> > OK for trunk (pending clean run results)?
> >>>
> >>> You can use FMA_EXPR on integer operands as well.  Otherwise you risk
> >>> the FMA be not matched by combine later when part of the operation is
> >>> CSEd.
> >>
> >> I had tried that initially, without success,..   I'll probably need
> >> another hint.  :-)
> >> Looking a bit closer, I think I see why the assert fired, but I'm not
> >> sure what the proper fix would be.
> >>
> >> So attempting to the FMA_EXPR on the integer operands. (vector shorts in
> >> this case), I end up triggering this error:
> >>
> >> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/testsuite/gcc.target/powerpc/fold-vec-madd-short.c:14:10:
> >>  internal compiler error: in expand_expr_real_2, at expr.c:8712
> >> 0x10813303 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
> >> expand_modifier)
> >> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8712
> >> 0x1081822f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> >> expand_modifier, rtx_def**, bool)
> >> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:9787
> >> 0x1080f7bb expand_expr_real(tree_node*, rtx_def*, machine_mode, 
> >> expand_modifier, rtx_def**, bool)
> >> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8084
> >> ...
> >>
> >>
> >> which when followed back, I tripped an assert here:  (gcc/expr.c:
> >> expand_expr_real_2() ~ line 8710)
> >>
> >> case FMA_EXPR:
> >>   {
> >> optab opt = fma_optab;
> >> gimple *def0, *def2;
> >> if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
> >>   {
> >> tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
> >> tree call_expr;
> >>
> >> gcc_assert (fn != NULL_TREE);
> >>
> >> where gcc/builtins.c
> >> mathfn_built_in()->mathfn_built_in_1->mathfn_built_in_2 looks to have
> >> returned END_BUILTINS/NULL_TREE, due to falling through the if/else
> >> tree:
> >>
> >>   if (TYPE_MAIN_VARIANT (type) == double_type_node)
> >> return fcode;
> >>   else if (TYPE_MAIN_VARIANT (type) == float_type_node)
> >> return fcodef;
> >>   else if (TYPE_MAIN_VARIANT (type) == long_double_type_node)
> >> return fcodel;
> >>   else
> >> return END_BUILTINS;
> >>
> >> Looks like that is all double/float/long double contents.  First blush
> >> attempt would be to add V8HI_type_node/integer_type_node to that if/else
> >> tree, but that doesn't look like it would be near enough.
> >
> > Well - we of course expect to have an optab for the fma with vector
> > short.  I thought
> > you had one given you have the intrinsic.  If you don't have an optab
> > you of course
> > have to open-code it.
> >
> > Just thought you expected an actual machine instruction doing the integer 
> > FMA.
> 
> So you have
> 
> (define_insn "altivec_vmladduhm"
>   [(set (match_operand:V8HI 0 "register_operand" "=v")
> (plus:V8HI (mult:V8HI (match_operand:V8HI 1 "register_operand" "v")
>   (match_operand:V8HI 2 "register_operand" "v"))
>(match_operand:V8HI 3 "register_operand" "v")))]
>   "TARGET_ALTIVEC"
>   "vmladduhm %0,%1,%2,%3"
>   [(set_attr "type" "veccomplex")])
> 
> but not
> 
> (define_expand "fmav8hi4"
> ...
> 
> or define_insn in case that's also a way to register an optab.
> 
> Richard.

Ok.  Thanks for the guidance.  :-) 

-Will


> 
> 
> 
> > Richard.
> >
> >> Thanks
> >> -Will
> >>
> >>>
> >>> Richard.
> >>>
> >>> > Thanks,
> >>> > -Will
> >>> >
> >>> > [gcc]
> >>> >
> >>> > 2017-10-25  Will Schmidt 
> >>> >
> >>> > * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add 
> >>> > support for
> >>> >   gimple folding of vec_madd() intrinsics.
> >>> >
> >>> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >>> > index 4837e14..04c2b15 100644
> >>> > --- a/gcc/config/rs6000/rs6000.c
> >>> > +++ b/gcc/config/rs6000/rs6000.c
> >>> > @@ -16606,10 +16606,43 @@ rs6000_gimple_fold_builtin 
> >>> > (gimple_stmt_iterator *gsi)
> >>> >build_int_cst (arg2_type, 
> >>> > 0)), arg0);
> >>> >  gimple_set_location (g, loc);
> >>> >  gsi_replace (gsi, g, true);
> >>> >  return true;
> >>> >}
> >>> > +
> >>> > +/* vec_madd (Float

[PATCH] Fix PR81659

2017-10-26 Thread Richard Biener

The following fixes lower_eh_dispatch destroying dominator info
that was still live from previous passes.  This clears it from the
obvious place (when we think we might have created unreachable blocks).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2017-10-26  Richard Biener  

PR middle-end/81659
* tree-eh.c (pass_lower_eh_dispatch::execute): Free dominator
info when we redirected EH.

* g++.dg/torture/pr81659.C: New testcase.

Index: gcc/tree-eh.c
===
--- gcc/tree-eh.c   (revision 254099)
+++ gcc/tree-eh.c   (working copy)
@@ -3779,7 +3779,10 @@ pass_lower_eh_dispatch::execute (functio
 }
 
   if (redirected)
-delete_unreachable_blocks ();
+{
+  free_dominance_info (CDI_DOMINATORS);
+  delete_unreachable_blocks ();
+}
   return flags;
 }
 
Index: gcc/testsuite/g++.dg/torture/pr81659.C
===
--- gcc/testsuite/g++.dg/torture/pr81659.C  (nonexistent)
+++ gcc/testsuite/g++.dg/torture/pr81659.C  (working copy)
@@ -0,0 +1,19 @@
+// { dg-do compile }
+
+void
+a (int b)
+{
+  if (b)
+throw;
+  try
+{
+  a (3);
+}
+  catch (int)
+{
+}
+  catch (int)
+{
+}
+}
+


Re: [PATCH][AArch64] Improve aarch64_legitimate_constant_p

2017-10-26 Thread James Greenhalgh
On Fri, Jul 07, 2017 at 12:28:11PM +0100, Wilco Dijkstra wrote:
> This patch further improves aarch64_legitimate_constant_p.  Allow all
> integer, floating point and vector constants.  Allow label references
> and non-anchor symbols with an immediate offset.  This allows such
> constants to be rematerialized, resulting in smaller code and fewer stack
> spills.
> 
> SPEC2006 codesize reduces by 0.08%, SPEC2017 by 0.13%.
> 
> Bootstrap OK, OK for commit?

This is mostly OK, but I think you lose one case we previosuly permitted,
buried in aarch64_classify_address (the CONST case).

OK with that case handled too (assuming that passes a bootstrap and test).

Reviewed by: James Greenhalgh 

Thanks,
James

> 
> ChangeLog:
> 2017-07-07  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (aarch64_legitimate_constant_p):
>   Return true for more constants, symbols and label references.
>   (aarch64_valid_floating_const): Remove unused function.
> 


Re: [PATCH, rs6000] 2/2 Add x86 SSE2 intrinsics to GCC PPC64LE target

2017-10-26 Thread Steven Munroe
On Wed, 2017-10-25 at 18:37 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Oct 17, 2017 at 01:27:16PM -0500, Steven Munroe wrote:
> > This it part 2/2 for contributing PPC64LE support for X86 SSE2
> > instrisics. This patch includes testsuite/gcc.target tests for the
> > intrinsics included by emmintrin.h. 
> 
> > --- gcc/testsuite/gcc.target/powerpc/sse2-mmx.c (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/sse2-mmx.c (revision 0)
> > @@ -0,0 +1,83 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O3 -mdirect-move" } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-require-effective-target p8vector_hw } */
> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> > "-mcpu=power8" } } */
> 
> Why this dg-skip-if?  Also, why -mdirect-move?
> 
this is weird test because it the effectively MMX style operations but
added to IA under the SSE2 Technology.

Normally mmintrin.h compare operations require a transfer to/from vector
with direct move for efficient execution on power.

The one exception to that is _mm_cmpeq_pi8 which can be implemented
directly in GPRs using cmpb.

The cmpb instruction is from power6 but I do not want to use
-mcpu=power6 here. -mdirect-move is a compromise.

I suspect that the dg-skip-if is an artifact of the early struggles to
make this stuff work across various --withcpu= settings.

I think the key is dg-require-effective-target p8vector_hw which should
allow dropping both the -mdirect-move and the whole dg-skip-if clause.

Will need to try this change and retest.

> 
> Okay for trunk with that taken care of.  Sorry it took a while.
> 
> Have you tested this on big endian btw?
> 
Yes.

I have tested on P8 BE using --withcpu=[power6 | power7 | power8 ]

> 
> Segher
> 




Re: [RFA][PATCH] Convert sprintf warning code to use a dominator walk

2017-10-26 Thread Jeff Law
On 10/26/2017 03:09 AM, Richard Biener wrote:
> On Wed, Oct 25, 2017 at 5:44 PM, Jeff Law  wrote:
>> On 10/24/2017 11:35 AM, Martin Sebor wrote:
>>> On 10/23/2017 05:14 PM, Jeff Law wrote:

 Martin,

 I'd like your thoughts on this patch.

 One of the things I'm working on is changes that would allow passes that
 use dominator walks to trivially perform context sensitive range
 analysis as a part of their dominator walk.

 As I outlined earlier this would allow us to easily fix the false
 positive sprintf warning reported a week or two ago.

 This patch converts the sprintf warning code to perform a dominator walk
 rather than just walking the blocks in whatever order they appear in the
 basic block array.

 From an implementation standpoint we derive a new class sprintf_dom_walk
 from the dom_walker class.  Like other dom walkers we walk statements
 from within the before_dom_children member function.  Very standard
 stuff.

 I moved handle_gimple_call and various dependencies into the
 sprintf_dom_walker class to facilitate calling handle_gimple_call from
 within the before_dom_children member function.  There's light fallout
 in various places where the call_info structure was explicitly expected
 to be found in the pass_sprintf_length class, but is now found in the
 sprintf_dom_walker class.

 This has been bootstrapped and regression tested on x86_64-linux-gnu.
 I've also layered my embedded VRP analysis on top of this work and
 verified that it does indeed fix the reported false positive.

 Thoughts?
>>>
>>> If it lets us improve the quality of the range information I can't
>>> think of a downside.
>> It's potentially slower simply because the domwalk interface is more
>> expensive than just iterating over the blocks with FOR_EACH_BB.  But
>> that's about it.  I think the ability to get more accurate range
>> information will make the compile-time hit worth it.
>>
>>>
>>> Besides the sprintf pass, a number of other areas depend on ranges,
>>> most of all the -Wstringop-overflow and truncation warnings and
>>> now -Wrestrict (once my enhancement is approved).  It would be nice
>>> to be able to get the same improvements there.  Does it mean that
>>> those warnings will need to be moved into a standalone pass?  (I'm
>>> not opposed to it, just wondering what to expect if this is
>>> the route we want to go.)
>> They don't necessarily have to be a standalone pass -- they just have to
>> be implementable as part of a dominator walk to get the cheap context
>> sensitive range data.
>>
>> So IIRC you've got some code to add additional warnings within the
>> strlen pass.  That pass is already a dominator walk.  In theory you'll
>> just add a member to the strlen_dom_walker class, then a call in
>> before_dom_children and after_dom_children virtuals and you should be
>> able to query the context sensitive range information.
>>
>> For warnings that occur in code that is not easily structured as a
>> dominator walk, Andrew's work will definitely be a better choice.
>>
>> Andrew's work will almost certainly also generate even finer grained
>> ranges because it can work on an arbitrary path through the CFG rather
>> than relying on dominance relationships.  Consider
>>
>> A
>>/ \
>>   B   C
>>\ /
>> D
>>
>> Range information implied by the edge A->B is usable within B because
>> the edge A->B dominates B.  Similarly for range information implied by
>> A->C being available in C.  But range information implied by A->B is not
>> available in D because A->B does not dominate D.  SImilarly range
>> information implied by A->C is not available in D.
>>
>> I touched on this in a private message recently.  Namely that exploiting
>> range data in non-dominated blocks feels a *lot* like jump threading and
>> should likely be structured as a backwards walk query (and thus is more
>> suitable for Andrew's infrastructure).
> 
> On the contrary - with a backward walk you don't know which way to go.
> From D to B or to C?  With a forward walk there's no such ambiguity
> (unless you start from A).
> 
> Note I have patches for EVRP merging ranges from B and C to make
> the info available for D but usually there's nothing to recover here
> that isn't also valid in A.  Just ranges derived from non-conditional
> stmts (by means of exploiting undefined behavior) can help here.
My point is the range information that is specific to the A->B edge is
not usable to optimize D because it does not hold on all paths to D.
But it can be used to improve preciseness of warnings for code within D.


It's certainly possible to merge the range information for the A->B edge
and the information for the A->C edge as we enter D (in the original
graph).  For range data implied by the edge traversal I suspect what you
end up is rarely, if ever better than what we had in A.  But if there
are range generating

Re: [PATCH][AArch64] Introduce emit_frame_chain

2017-10-26 Thread James Greenhalgh
On Fri, Aug 04, 2017 at 01:26:15PM +0100, Wilco Dijkstra wrote:
> The current frame code combines the separate concepts of a frame chain
> (saving old FP,LR in a record and pointing new FP to it) and a frame
> pointer used to access locals.  Add emit_frame_chain to the aarch64_frame
> descriptor and use it in the prolog and epilog code.  For now just
> initialize it as before, so generated code is identical.
> 
> Also correctly set EXIT_IGNORE_STACK.  The current AArch64 epilog code 
> restores SP from FP if alloca is used.  If a frame pointer is used but
> there is no alloca, SP must remain valid for the epilog to work correctly.

OK.

Reviewed by: James Greenhalgh 

Thanks,
James

> 
> ChangeLog:
> 2017-08-03  Wilco Dijkstra  
> 
> gcc/
>   * config/aarch64/aarch64.h (EXIT_IGNORE_STACK): Set if alloca is used.
>   (aarch64_frame): Add emit_frame_chain boolean.
>   * config/aarch64/aarch64.c (aarch64_frame_pointer_required)
>   Move eh_return case to aarch64_layout_frame.
>   (aarch64_layout_frame): Initialize emit_frame_chain.
>   (aarch64_expand_prologue): Use emit_frame_chain.
> 


Re: [v3 PATCH] Deduction guides for associative containers, debug mode deduction guide fixes.

2017-10-26 Thread Ville Voutilainen
On 26 October 2017 at 18:04, Jonathan Wakely  wrote:
> Also, please put the deduction guides for a class immediately after
> the definition of that class, rather than grouping all the guides for
> unordered_map and unordered_multimap together.


Alright.

2017-10-26  Ville Voutilainen  

Deduction guides for associative containers, debug mode deduction
guide fixes.
* include/bits/stl_iterator.h (__iter_key_t)
(__iter_val_t, __iter_to_alloc_t): New.
* include/bits/stl_map.h: Add deduction guides.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/unordered_map.h: Likewise.
* include/bits/unordered_set.h: Likewise.
* include/debug/deque: Likewise.
* include/debug/forward_list: Likewise.
* include/debug/list: Likewise.
* include/debug/map.h: Likewise.
* include/debug/multimap.h: Likewise.
* include/debug/multiset.h: Likewise.
* include/debug/set.h: Likewise.
* include/debug/unordered_map: Likewise.
* include/debug/unordered_set: Likewise.
* include/debug/vector: Likewise.
* testsuite/23_containers/map/cons/deduction.cc: New.
* testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
* testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
* testsuite/23_containers/set/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_map/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_multimap/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise.


deduction_guidos_3.diff.bz2
Description: BZip2 compressed data


Re: [v3 PATCH] Deduction guides for associative containers, debug mode deduction guide fixes.

2017-10-26 Thread Jonathan Wakely

On 26/10/17 19:23 +0300, Ville Voutilainen wrote:

On 26 October 2017 at 18:04, Jonathan Wakely  wrote:

Also, please put the deduction guides for a class immediately after
the definition of that class, rather than grouping all the guides for
unordered_map and unordered_multimap together.



Alright.

2017-10-26  Ville Voutilainen  

   Deduction guides for associative containers, debug mode deduction
guide fixes.
   * include/bits/stl_iterator.h (__iter_key_t)
   (__iter_val_t, __iter_to_alloc_t): New.
   * include/bits/stl_map.h: Add deduction guides.
   * include/bits/stl_multimap.h: Likewise.
   * include/bits/stl_multiset.h: Likewise.
   * include/bits/stl_set.h: Likewise.
   * include/bits/unordered_map.h: Likewise.
   * include/bits/unordered_set.h: Likewise.
   * include/debug/deque: Likewise.
   * include/debug/forward_list: Likewise.
   * include/debug/list: Likewise.
   * include/debug/map.h: Likewise.
   * include/debug/multimap.h: Likewise.
   * include/debug/multiset.h: Likewise.
   * include/debug/set.h: Likewise.
   * include/debug/unordered_map: Likewise.
   * include/debug/unordered_set: Likewise.
   * include/debug/vector: Likewise.
   * testsuite/23_containers/map/cons/deduction.cc: New.
   * testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
   * testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
   * testsuite/23_containers/set/cons/deduction.cc: Likewise.
   * testsuite/23_containers/unordered_map/cons/deduction.cc: Likewise.
   * testsuite/23_containers/unordered_multimap/cons/deduction.cc:
   Likewise.
   * testsuite/23_containers/unordered_multiset/cons/deduction.cc:
   Likewise.
   * testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise.


OK for trunk - thanks.




Re: [PATCH] Document --coverage and fork-like functions (PR gcov-profile/82457).

2017-10-26 Thread Sandra Loosemore

On 10/26/2017 01:21 AM, Martin Liška wrote:

On 10/20/2017 06:03 AM, Sandra Loosemore wrote:

On 10/19/2017 12:26 PM, Eric Gallager wrote:

On 10/19/17, Martin Liška  wrote:

Hi.

As discussed in the PR, we should be more precise in our documentation.
The patch does that.

Ready for trunk?
Martin

gcc/ChangeLog:

2017-10-19  Martin Liska  

 PR gcov-profile/82457
 * doc/invoke.texi: Document that one needs a non-strict ISO mode
 for fork-like functions to be properly instrumented.
---
   gcc/doc/invoke.texi | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)





The wording is kinda unclear because the modes in the parentheses are
all strict ISO modes, but the part before the parentheses says
NON-strict... I think you either need an additional "not" inside the
parentheses, or to change all the instances of -std=c* to -std=gnu*.


The wording in the patch doesn't make sense to me, either.  If I understand the 
issue correctly, the intent is probably to say something like

Unless a strict ISO C dialect option is in effect,
@code{fork} calls are detected and correctly handled without double counting.

??


Hi Sandra.

Thank you for the feedback, I'm sending version you suggested. Hope it's fine 
to install the patch?


Ummm, no.  Sorry to have been unclear; the wording I suggested above was 
intended to replace the existing sentence about fork behavior, not to be 
appended to it.


-Sandra



Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Martin Sebor

 /* The tree and const_tree overload templates.   */
 namespace wi
 {
+  class unextended_tree
+  {
+  private:
+const_tree m_t;
+
+  public:
+unextended_tree () {}


Defining no-op ctors is quite dangerous and error-prone.  I suggest
to instead default initialize the member(s):

   unextended_tree (): m_t () {}

Ditto everywhere else, such as in:


This is really performance-senesitive code though, so I don't think
we want to add any unnecessary initialisation.  Primitive types are
uninitalised by default too, and the point of this class is to
provide an integer-like interface.


I understand the performance concern (more on that below), but
to clarify the usability issues,  I don't think the analogy with
primitive types is quite fitting here: int() evaluates to zero,
as do the values of i and a[0] and a[1] after an object of type
S is constructed using its default ctor, i.e., S ():

   struct S {
 int i;
 int a[2];

 S (): i (), a () { }
   };


Sure, I realise that.  I meant that:

  int x;

doesn't initialise x to zero.  So it's a question of which case is the
most motivating one: using "x ()" to initialise x to 0 in a constructor
or "int x;" to declare a variable of type x, uninitialised.  I think the
latter use case is much more common (at least in GCC).  Rearranging
things, I said later:


I agree that the latter use case is more common in GCC, but I don't
see it as a good thing.  GCC was written in C and most code still
uses now outdated C practices such as declaring variables at the top
of a (often long) function, and usually without initializing them.
It's been established that it's far better to declare variables with
the smallest scope, and to initialize them on declaration.  Compilers
are smart enough these days to eliminate redundant initialization or
assignments.


In your other message you used the example of explicit default
initialisation, such as:

class foo
{
  foo () : x () {}
  unextended_tree x;
};

But I think we should strongly discourage that kind of thing.
If someone wants to initialise x to a particular value, like
integer_zero_node, then it would be better to do it explicitly.
If they don't care what the initial value is, then for these
integer-mimicing classes, uninitialised is as good as anything
else. :-)


What I meant was: if you want to initialise "i" to 1 in your example,
you'd have to write "i (1)".  Being able to write "i ()" instead of
"i (0)" saves one character but I don't think it adds much clarity.
Explicitly initialising something only seems worthwhile if you say
what you're initialising it to.


My comment is not motivated by convenience.  What I'm concerned
about is that defining a default ctor to be a no-op defeats the
zero-initialization semantics most users expect of T().

This is particularly concerning for a class designed to behave
like an [improved] basic integer type.  Such a class should act
as closely as possible to the type it emulates and in the least
surprising ways.  Any sort of a deviation that replaces well-
defined behavior with undefined is a gotcha and a bug waiting
to happen.

It's also a concern in generic (template) contexts where T() is
expected to zero-initialize.  A template designed to work with
a fundamental integer type should also work with a user-defined
type designed to behave like an integer.


With the new (and some existing) classes that's not so, and it
makes them harder and more error-prone to use (I just recently
learned this the hard way about offset_int and the debugging
experience is still fresh in my memory).


Sorry about the bad experience.  But that kind of thing cuts
both ways.  If I write:

poly_int64
foo (void)
{
  poly_int64 x;
  x += 2;
  return x;
}

then I get a warning about x being used uninitialised, without
having had to run anything.  If we add default initialisation
then this becomes something that has to be debugged against
a particular test case, i.e. we've stopped the compiler from
giving us useful static analysis.


With default initialization the code above becomes valid and has
the expected effect of adding 2 to zero.  It's just more robust
than the same code with that uses a basic type instead.  This
seems no more unexpected and no less desirable than the well-
defined semantics of something like:

  std::string x;
  x += "2";
  return x;

or using any other C++ standard library type in a similar way.

(Incidentally, although I haven't tried with poly_int, I get no
warnings for the code above with offset_int or wide_int.)


When the cor is inline and the initialization unnecessary then
GCC will in most instances eliminate it, so I also don't think
the suggested change would have a significant impact on
the efficiency of optimized code, but...

...if it is thought essential to provide a no-op ctor, I would
suggest to consider making its property explicit, e.g., like so:

   struct unextended_tree {

 struct Uninit { };

 // ...
 unextended_tree (Uninit) { /* no initi

[PATCH] RISC-V: Correct and improve the "-mabi" documentation

2017-10-26 Thread Palmer Dabbelt
The documentation for the "-mabi" argument on RISC-V was incorrect.  We
chose to treat this as a documentation bug rather than a code bug, and
to make the documentation match what GCC currently does.  In the
process, I also improved the documentation a bit.

Thanks to Alex Bradbury for finding the bug!

PR target/82717: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82717

gcc/ChangeLog

2017-10-26  Palmer Dabbelt  

PR target/82717
* doc/invoke.texi (RISC-V) <-mabi>: Correct and improve.
---
 gcc/doc/invoke.texi | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 71b2445f70fd..d184e1d7b7d4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21669,9 +21669,26 @@ When generating PIC code, allow the use of PLTs. 
Ignored for non-PIC.
 
 @item -mabi=@var{ABI-string}
 @opindex mabi
-Specify integer and floating-point calling convention.  This defaults to the
-natural calling convention: e.g.@ LP64 for RV64I, ILP32 for RV32I, LP64D for
-RV64G.
+@item -mabi=@var{ABI-string}
+@opindex mabi
+Specify integer and floating-point calling convention.  @var{ABI-string}
+contains two parts: the size of integer types and the registers used for
+floating-point types.  For example @samp{-march=rv64ifd -mabi=lp64d} means that
+@samp{long} and pointers are 64-bit (implicitly defining @samp{int} to be
+32-bit), and that floating-point values up to 64 bits wide are passed in F
+registers.  Contrast this with @samp{-march=rv64ifd -mabi=lp64f}, which still
+allows the compiler to generate code that uses the F and D extensions but only
+allows floating-point values up to 32 bits long to be passed in registers; or
+@samp{-march=rv64ifd -mabi=lp64}, in which no floating-point arguments will be
+passed in registers.
+
+The default for this argument is system dependent, users who want a specific
+calling convention should specify one explicitly.  The valid calling
+conventions are: @samp{ilp32}, @samp{ilp32f}, @samp{ilp32d}, @samp{lp64},
+@samp{lp64f}, and @samp{lp64d}.  Some calling conventions are impossible to
+implement on some ISAs: for example, @samp{-march=rv32if -mabi=ilp32d} is
+invalid because the ABI requires 64-bit values be passed in F registers, but F
+registers are only 32 bits wide.
 
 @item -mfdiv
 @itemx -mno-fdiv
-- 
2.13.6



Re: [RFA][PATCH] Provide a class interface into substitute_and_fold.

2017-10-26 Thread Jeff Law
On 10/26/2017 03:24 AM, Richard Biener wrote:
> On Tue, Oct 24, 2017 at 8:44 PM, Jeff Law  wrote:
>> This is similar to the introduction of the ssa_propagate_engine, but for
>> the substitution/replacements bits.
>>
>> In a couple places the pass specific virtual functions are just wrappers
>> around existing functions.  A good example of this is
>> ccp_folder::get_value.  Many other routines in tree-ssa-ccp.c want to
>> use get_constant_value.  Some may be convertable to use the class
>> instance, but I haven't looked closely.
>>
>> Another example is vrp_folder::get_value.  In this case we're wrapping
>> op_with_constant_singleton_value.  In a later patch that moves into the
>> to-be-introduced vr_values class so we'll delegate to that class rather
>> than wrap.
>>
>> FWIW I did look at having a single class for the propagation engine and
>> the substitution engine.  That turned out to be a bit problematical due
>> to the calls into the substitution engine from the evrp bits which don't
>> use the propagation engine at all.  Given propagation and substitution
>> are distinct concepts I ultimately decided the cleanest path forward was
>> to keep the two classes separate.
>>
>> Bootstrapped and regression tested on x86_64.  OK for the trunk?
> 
> So what I don't understand in this 2 part series is why you put
> substitute-and-fold into a different class.
Good question.  They're in different classes because they can and are
used independently.

For example, tree-complex uses the propagation engine, but not the
substitution engine.   EVRP uses the substitution engine, but not the
propagation engine.  The standard VRP algorithm uses both engines, but
other than shared data (vr_values), they are independent.  CCP and
copy-prop are similar to VRP.  Essentially one is a producer, the other
a consumer.

It might be possible to smash them together, but I'm not sure if that's
wise or not.  I do suspect that smashing them together would be easier
once all the other work is done if we were to make that choice.  But
composition, multiple inheritance or just passing around the class
instance may be better.  I think that's a TBD.


> 
> This makes it difficult for users to inherit and put the lattice in
> the deriving class as we have the visit routines which will update
> the lattice and the get_value hook which queries it.
Yes.  The key issue is the propagation step produces vr_values and the
substitution step consumes vr_values.

For VRP the way I solve this is to have a vr_values class in the derived
propagation engine class as well as the derived substitution engine
class.  When we're done with propagation we move the class instance from
the propagation engine to the substitution engine.

EVRP works similarly except the vr_values starts in the evrp_dom_walker
class, then moves to its substitution engine.

There's a bit of cleanup to do there in terms of implementation.  But
that's the basic model that I'm using right now.  It should be fairly
easy to move to a unioned class or multiple inheritance if we so
desired.  It shouldn't affect most of what I'm doing now around
encapsulating vr_values.

> 
> So from maintaining the state for the users using a single
> class whould be more appropriate.  Of course it seems like
> substitute-and-fold can be used without using the SSA
> propagator itself and the SSA propagator can be used
> without the substitute and fold engine.
Right.  THey can and are used independently which is what led to having
independent classes.


> 
> IIRC we decided against using multiple inheritance?  Which
> means a user would put the lattice in the SSA propagation
> engine derived class and do the inheriting via composition
> as member in the substitute_and_fold engine?
Right, we have decided against using multiple inheritance.   So rather
than using  multiple inheritance I pass the vr_values object.  So in my
development tree I have this:


class vrp_prop : public ssa_propagation_engine
{
 public:
  enum ssa_prop_result visit_stmt (gimple *, edge *, tree *) FINAL OVERRIDE;
  enum ssa_prop_result visit_phi (gphi *) FINAL OVERRIDE;

  /* XXX Drop the indirection through the pointer, not needed.  */
  class vr_values *vr_values;
};


class vrp_folder : public substitute_and_fold_engine
{
 public:
  tree get_value (tree) FINAL OVERRIDE;
  bool fold_stmt (gimple_stmt_iterator *) FINAL OVERRIDE;
  class vr_values *vr_values;
};

In vrp_finalize:
  class vrp_folder vrp_folder;
  vrp_folder.vr_values = vr_values;
  vrp_folder.substitute_and_fold ();


I'm in the process of cleaning this up -- in particular there'll be a
ctor in vrp_folder which will require passing in a vr_values and we'll
be dropping some indirections as well.

I just went through his exact cleanup yesterday with the separated evrp
style range analyzer and evrp itself.


> Your patches keep things simple (aka the lattice and most
> functions are globals), but is composition what you had
> in mind when doing this class-ification?
Yes.  I

Re: Rename cxx1998 into normal

2017-10-26 Thread Daniel Krügler
2017-10-26 7:51 GMT+02:00 François Dumont :
> Hi
>
> We once talk about rename the __cxx1998 namespace into something less
> C++98 biased. Here is the patch to do so.
>
> Ok with the new name ?

IMO the name should somehow still contain "cxx" somewhere, otherwise
this could easily cause a semantic ambiguity in situations, where the
term "normal" has a specific meaning.

What about "__cxxdef[ault]" ?

- Daniel


Re: [PATCH] expand -fdebug-prefix-map documentation

2017-10-26 Thread Sandra Loosemore

On 10/25/2017 06:26 PM, Jim Wilson wrote:


Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 254023)
+++ gcc/doc/invoke.texi (working copy)
@@ -6981,7 +6981,12 @@ link processing time.  Merging is enabled by defau
 @item -fdebug-prefix-map=@var{old}=@var{new}
 @opindex fdebug-prefix-map
 When compiling files in directory @file{@var{old}}, record debugging
-information describing them as in @file{@var{new}} instead.
+information describing them as in @file{@var{new}} instead.  This can be
+used to replace a build time path with an install time path in the debug info.


build-time path, install-time path


+It can also be used to change an absolute path to a relative path by using
+@file{.} for @var{new}.  This can give more reproducible builds, which are
+location independent, but may require an extra command to tell gdb where to
+find the source files.

 @item -fvar-tracking
 @opindex fvar-tracking


OK with that tweak to the hyphenation.

-Sandra



Re: Rename cxx1998 into normal

2017-10-26 Thread Jonathan Wakely

On 26/10/17 18:55 +0200, Daniel Krügler wrote:

2017-10-26 7:51 GMT+02:00 François Dumont :

Hi

We once talk about rename the __cxx1998 namespace into something less
C++98 biased. Here is the patch to do so.

Ok with the new name ?


IMO the name should somehow still contain "cxx" somewhere, otherwise
this could easily cause a semantic ambiguity in situations, where the
term "normal" has a specific meaning.

What about "__cxxdef[ault]" ?


I'm not sure we need to change it at all. The name is anachronistic,
but harmless.



[PATCH] Change default optimization level to -Og

2017-10-26 Thread Wilco Dijkstra
GCC's default optimization level is -O0.  Unfortunately unlike other compilers,
GCC generates extremely inefficient code with -O0.  It is almost unusable for
low-level debugging or manual inspection of generated code.  So a -O option is
always required for compilation.  -Og not only allows for fast compilation, but
also produces code that is efficient, readable as well as debuggable.
Therefore -Og makes for a much better default setting.

Any comments?

2017-10-26  Wilco Dijkstra  

* opts.c (default_options_optimization): Set default to -Og.

doc/
* invoke.texi (-O0) Remove default mention.
(-Og): Add mention of default setting.

--
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
3328a3b5fafa6a98007eff52d2a26af520de9128..74c33ea35b9f320b419a3417e6007d2391536f1b
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7343,7 +7343,7 @@ by @option{-O2} and also turns on the following 
optimization flags:
 @item -O0
 @opindex O0
 Reduce compilation time and make debugging produce the expected
-results.  This is the default.
+results.
 
 @item -Os
 @opindex Os
@@ -7371,7 +7371,7 @@ Optimize debugging experience.  @option{-Og} enables 
optimizations
 that do not interfere with debugging. It should be the optimization
 level of choice for the standard edit-compile-debug cycle, offering
 a reasonable level of optimization while maintaining fast compilation
-and a good debugging experience.
+and a good debugging experience.  This is the default.
 @end table
 
 If you use multiple @option{-O} options, with or without level numbers,
diff --git a/gcc/opts.c b/gcc/opts.c
index 
dfad955e220870a3250198640f3790c804b191e0..74511215309f11445685db4894be2ab6881695d3
 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -565,6 +565,12 @@ default_options_optimization (struct gcc_options *opts,
   int opt2;
   bool openacc_mode = false;
 
+  /* Set the default optimization to -Og.  */
+  opts->x_optimize_size = 0;
+  opts->x_optimize = 1;
+  opts->x_optimize_fast = 0;
+  opts->x_optimize_debug = 1;
+
   /* Scan to see what optimization level has been specified.  That will
  determine the default value of many flags.  */
   for (i = 1; i < decoded_options_count; i++)



PING Re: [patch] configure option to override TARGET_LIBC_PROVIDES_SSP

2017-10-26 Thread Sandra Loosemore

This one.

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00537.html

There was discussion about documenting this, but the actual configure 
change hasn't been reviewed yet.


-Sandra



Re: [PATCH], Enable IBM/IEEE long double format to overriden more easily

2017-10-26 Thread Michael Meissner
On Wed, Oct 25, 2017 at 07:11:07PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Sat, Oct 21, 2017 at 09:09:58AM -0400, Michael Meissner wrote:
> > As Segher and I were discussing off-line, I have some problems with the 
> > current
> > -mabi={ieee,ibm}longdouble switches as we start to planning to modify GCC 9 
> > and
> > GLIBC 2.27/2.28 to support __float128 as the default long double format for
> > power server systems.
> > 
> > My gripes are:
> > 
> > 1)  Using Warn() in rs6000.opt means that you get two warning 
> > messages when
> > you use the switches (one from the driver, and one from cc1).
> > 
> > 2)  I feel you should not get a warning if you select the option 
> > that
> > reflects the current default behavior (i.e. -mabi=ibmlongdouble
> > currently for power server systems).
> > 
> > 3)  There is no way to silenece the warning (i.e. -w doesn't work on
> > warnings in the .opt file).  Both GLIBC and LIBGCC will need the
> > ability to build support modules with an explicit long double format.
> > 
> > 4)  In the future we will need a little more flexibility in how the 
> > default
> > is set.
> > 
> > 5)  There is a mis-match between the documentation and rs6000.opt, 
> > as these
> > switches are documented, but use Undocumented in the rs6000.opt.
> 
> Agreed on all.
> 
> > These patches fix these issues.  If you use -Wno-psabi, it will silence the
> > warning.  I have built these patches on a little endian power8 system, and
> > there were no regressions.  Can I check these patches into the trunk?
> > 
> > 2017-10-21  Michael Meissner  
> > 
> > * config/rs6000/aix.h (TARGET_IEEEQUAD_DEFAULT): Set long double
> > default to IBM.
> > * config/rs6000/darwin.h (TARGET_IEEEQUAD_DEFAULT): Likewise.
> > * config/rs6000/rs6000.opt (-mabi=ieeelongdouble): Move the
> > warning to rs6000.c.  Remove the Undocumented flag, since it has
> > been documented.
> > (-mabi=ibmlongdouble): Likewise.
> 
> And more importantly, we _want_ it to be documented (right)?

I would have preferred to not document it until GCC 9 when we start the switch
to long double == _Float128, but since it was already documented (albeit only
for 32 bit), I kept it documented.  If you want me to change it to
undocumented, I can do that.

> > --- gcc/config/rs6000/rs6000.opt(revision 253961)
> > +++ gcc/config/rs6000/rs6000.opt(working copy)
> > @@ -381,10 +381,10 @@ mabi=d32
> >  Target RejectNegative Undocumented Warn(using old darwin ABI) 
> > Var(rs6000_darwin64_abi, 0)
> >  
> >  mabi=ieeelongdouble
> > -Target RejectNegative Undocumented Warn(using IEEE extended precision long 
> > double) Var(rs6000_ieeequad) Save
> > +Target RejectNegative Var(rs6000_ieeequad) Save
> >  
> >  mabi=ibmlongdouble
> > -Target RejectNegative Undocumented Warn(using IBM extended precision long 
> > double) Var(rs6000_ieeequad, 0)
> > +Target RejectNegative Var(rs6000_ieeequad, 0)
> 
> Does this need "Save" as well?

No for variables, you want Save on the first instance only.

> > +  if (!warned_change_long_double && warn_psabi)
> > +   {
> > + warned_change_long_double = true;
> > + if (TARGET_IEEEQUAD)
> > +   warning (0, "Using IEEE extended precision long double");
> > + else
> > +   warning (0, "Using IBM extended precision long double");
> > +   }
> 
> You can put OPT_Wpsabi in place of that 0, it's what that arg is for :-)

Ah, ok.  Thanks.

> Okay with that changed.  Thanks!
> 
> 
> Segher
> 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] expand -fdebug-prefix-map documentation

2017-10-26 Thread Gerald Pfeifer
On Wed, 25 Oct 2017, Jim Wilson wrote:
> The current documentation doesn't explain what the option is for, or
> how one might use it.  The attached patch expands the documentation a
> bit to try to explain this.

> OK?

Thanks you for fleshing this out, Jim!

This looks fine to me (modula Sandra's note).  Just a question: would
we refer to GDB instead of gdb here?  It feels a little in between to
me, whether we are referring to the tool or the actual binary.  I'm
sure Sandra will have guidance for us. ;-)

+@file{.} for @var{new}.  This can give more reproducible builds, which are
+location independent, but may require an extra command to tell gdb where to
+find the source files.
 
Gerald

Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Richard Sandiford
Martin Sebor  writes:
>>  /* The tree and const_tree overload templates.   */
>>  namespace wi
>>  {
>> +  class unextended_tree
>> +  {
>> +  private:
>> +const_tree m_t;
>> +
>> +  public:
>> +unextended_tree () {}
>
> Defining no-op ctors is quite dangerous and error-prone.  I suggest
> to instead default initialize the member(s):
>
>unextended_tree (): m_t () {}
>
> Ditto everywhere else, such as in:

 This is really performance-senesitive code though, so I don't think
 we want to add any unnecessary initialisation.  Primitive types are
 uninitalised by default too, and the point of this class is to
 provide an integer-like interface.
>>>
>>> I understand the performance concern (more on that below), but
>>> to clarify the usability issues,  I don't think the analogy with
>>> primitive types is quite fitting here: int() evaluates to zero,
>>> as do the values of i and a[0] and a[1] after an object of type
>>> S is constructed using its default ctor, i.e., S ():
>>>
>>>struct S {
>>>  int i;
>>>  int a[2];
>>>
>>>  S (): i (), a () { }
>>>};
>>
>> Sure, I realise that.  I meant that:
>>
>>   int x;
>>
>> doesn't initialise x to zero.  So it's a question of which case is the
>> most motivating one: using "x ()" to initialise x to 0 in a constructor
>> or "int x;" to declare a variable of type x, uninitialised.  I think the
>> latter use case is much more common (at least in GCC).  Rearranging
>> things, I said later:
>
> I agree that the latter use case is more common in GCC, but I don't
> see it as a good thing.  GCC was written in C and most code still
> uses now outdated C practices such as declaring variables at the top
> of a (often long) function, and usually without initializing them.
> It's been established that it's far better to declare variables with
> the smallest scope, and to initialize them on declaration.  Compilers
> are smart enough these days to eliminate redundant initialization or
> assignments.
>
 In your other message you used the example of explicit default
 initialisation, such as:

 class foo
 {
   foo () : x () {}
   unextended_tree x;
 };

 But I think we should strongly discourage that kind of thing.
 If someone wants to initialise x to a particular value, like
 integer_zero_node, then it would be better to do it explicitly.
 If they don't care what the initial value is, then for these
 integer-mimicing classes, uninitialised is as good as anything
 else. :-)
>>
>> What I meant was: if you want to initialise "i" to 1 in your example,
>> you'd have to write "i (1)".  Being able to write "i ()" instead of
>> "i (0)" saves one character but I don't think it adds much clarity.
>> Explicitly initialising something only seems worthwhile if you say
>> what you're initialising it to.
>
> My comment is not motivated by convenience.  What I'm concerned
> about is that defining a default ctor to be a no-op defeats the
> zero-initialization semantics most users expect of T().
>
> This is particularly concerning for a class designed to behave
> like an [improved] basic integer type.  Such a class should act
> as closely as possible to the type it emulates and in the least
> surprising ways.  Any sort of a deviation that replaces well-
> defined behavior with undefined is a gotcha and a bug waiting
> to happen.
>
> It's also a concern in generic (template) contexts where T() is
> expected to zero-initialize.  A template designed to work with
> a fundamental integer type should also work with a user-defined
> type designed to behave like an integer.

But that kind of situation is one where using "T (0)" over "T ()"
is useful.  It means that template substitution will succeed for
T that are sufficiently integer-like to have a single well-defined
zero but not for T that aren't (such as wide_int).

>>> With the new (and some existing) classes that's not so, and it
>>> makes them harder and more error-prone to use (I just recently
>>> learned this the hard way about offset_int and the debugging
>>> experience is still fresh in my memory).
>>
>> Sorry about the bad experience.  But that kind of thing cuts
>> both ways.  If I write:
>>
>> poly_int64
>> foo (void)
>> {
>>   poly_int64 x;
>>   x += 2;
>>   return x;
>> }
>>
>> then I get a warning about x being used uninitialised, without
>> having had to run anything.  If we add default initialisation
>> then this becomes something that has to be debugged against
>> a particular test case, i.e. we've stopped the compiler from
>> giving us useful static analysis.
>
> With default initialization the code above becomes valid and has
> the expected effect of adding 2 to zero.  It's just more robust
> than the same code with that uses a basic type instead.  This
> seems no more unexpected and no less desirable than the well-
> defined semantics of something like:
>
>std::

Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Pedro Alves
On 10/26/2017 05:37 PM, Martin Sebor wrote:

> I agree that the latter use case is more common in GCC, but I don't
> see it as a good thing.  GCC was written in C and most code still
> uses now outdated C practices such as declaring variables at the top
> of a (often long) function, and usually without initializing them.
> It's been established that it's far better to declare variables with
> the smallest scope, and to initialize them on declaration.  Compilers
> are smart enough these days to eliminate redundant initialization or
> assignments.

I don't agree that that's established.  FWIW, I'm on the
"prefer the -Wuninitialized" warnings camp.  I've been looking
forward to all the VRP and threader improvements hoping that that
warning (and -Wmaybe-uninitialized...) will improve along.

> My comment is not motivated by convenience.  What I'm concerned
> about is that defining a default ctor to be a no-op defeats the
> zero-initialization semantics most users expect of T().

This sounds like it's a problem because GCC is written in C++98.

You can get the semantics you want in C++11 by defining
the constructor with "= default;" :

 struct T
 {
   T(int); // some other constructor forcing me to 
   // add a default constructor.

   T() = default; // give me default construction using
  // default initialization.
   int i;
 };

And now 'T t;' leaves T::i default initialized, i.e.,
uninitialized, while T() value-initializes T::i, i.e.,
initializes it to zero.

So if that's a concern, maybe you could use "= default" 
conditionally depending on #if __cplusplus >= C++11, so that
you'd get it for stages after stage1.

Or just start requiring C++11 already. :-)

Thanks,
Pedro Alves



Re: [RFA][PATCH] Provide a class interface into substitute_and_fold.

2017-10-26 Thread Richard Biener
On October 26, 2017 6:50:15 PM GMT+02:00, Jeff Law  wrote:
>On 10/26/2017 03:24 AM, Richard Biener wrote:
>> On Tue, Oct 24, 2017 at 8:44 PM, Jeff Law  wrote:
>>> This is similar to the introduction of the ssa_propagate_engine, but
>for
>>> the substitution/replacements bits.
>>>
>>> In a couple places the pass specific virtual functions are just
>wrappers
>>> around existing functions.  A good example of this is
>>> ccp_folder::get_value.  Many other routines in tree-ssa-ccp.c want
>to
>>> use get_constant_value.  Some may be convertable to use the class
>>> instance, but I haven't looked closely.
>>>
>>> Another example is vrp_folder::get_value.  In this case we're
>wrapping
>>> op_with_constant_singleton_value.  In a later patch that moves into
>the
>>> to-be-introduced vr_values class so we'll delegate to that class
>rather
>>> than wrap.
>>>
>>> FWIW I did look at having a single class for the propagation engine
>and
>>> the substitution engine.  That turned out to be a bit problematical
>due
>>> to the calls into the substitution engine from the evrp bits which
>don't
>>> use the propagation engine at all.  Given propagation and
>substitution
>>> are distinct concepts I ultimately decided the cleanest path forward
>was
>>> to keep the two classes separate.
>>>
>>> Bootstrapped and regression tested on x86_64.  OK for the trunk?
>> 
>> So what I don't understand in this 2 part series is why you put
>> substitute-and-fold into a different class.
>Good question.  They're in different classes because they can and are
>used independently.
>
>For example, tree-complex uses the propagation engine, but not the
>substitution engine.   EVRP uses the substitution engine, but not the
>propagation engine.  The standard VRP algorithm uses both engines, but
>other than shared data (vr_values), they are independent.  CCP and
>copy-prop are similar to VRP.  Essentially one is a producer, the other
>a consumer.
>
>It might be possible to smash them together, but I'm not sure if that's
>wise or not.  I do suspect that smashing them together would be easier
>once all the other work is done if we were to make that choice.  But
>composition, multiple inheritance or just passing around the class
>instance may be better.  I think that's a TBD.
>
>
>> 
>> This makes it difficult for users to inherit and put the lattice in
>> the deriving class as we have the visit routines which will update
>> the lattice and the get_value hook which queries it.
>Yes.  The key issue is the propagation step produces vr_values and the
>substitution step consumes vr_values.
>
>For VRP the way I solve this is to have a vr_values class in the
>derived
>propagation engine class as well as the derived substitution engine
>class.  When we're done with propagation we move the class instance
>from
>the propagation engine to the substitution engine.
>
>EVRP works similarly except the vr_values starts in the evrp_dom_walker
>class, then moves to its substitution engine.
>
>There's a bit of cleanup to do there in terms of implementation.  But
>that's the basic model that I'm using right now.  It should be fairly
>easy to move to a unioned class or multiple inheritance if we so
>desired.  It shouldn't affect most of what I'm doing now around
>encapsulating vr_values.
>
>> 
>> So from maintaining the state for the users using a single
>> class whould be more appropriate.  Of course it seems like
>> substitute-and-fold can be used without using the SSA
>> propagator itself and the SSA propagator can be used
>> without the substitute and fold engine.
>Right.  THey can and are used independently which is what led to having
>independent classes.
>
>
>> 
>> IIRC we decided against using multiple inheritance?  Which
>> means a user would put the lattice in the SSA propagation
>> engine derived class and do the inheriting via composition
>> as member in the substitute_and_fold engine?
>Right, we have decided against using multiple inheritance.   So rather
>than using  multiple inheritance I pass the vr_values object.  So in my
>development tree I have this:
>
>
>class vrp_prop : public ssa_propagation_engine
>{
> public:
>enum ssa_prop_result visit_stmt (gimple *, edge *, tree *) FINAL
>OVERRIDE;
>  enum ssa_prop_result visit_phi (gphi *) FINAL OVERRIDE;
>
>  /* XXX Drop the indirection through the pointer, not needed.  */
>  class vr_values *vr_values;
>};
>
>
>class vrp_folder : public substitute_and_fold_engine
>{
> public:
>  tree get_value (tree) FINAL OVERRIDE;
>  bool fold_stmt (gimple_stmt_iterator *) FINAL OVERRIDE;
>  class vr_values *vr_values;
>};
>
>In vrp_finalize:
>  class vrp_folder vrp_folder;
>  vrp_folder.vr_values = vr_values;
>  vrp_folder.substitute_and_fold ();
>
>
>I'm in the process of cleaning this up -- in particular there'll be a
>ctor in vrp_folder which will require passing in a vr_values and we'll
>be dropping some indirections as well.
>
>I just went through his exact cleanup yesterday with the separated evrp
>style range analyzer 

Re: [Diagnostic Patch] don't print column zero

2017-10-26 Thread Eric Gallager
On 10/26/17, Nathan Sidwell  wrote:
> On 10/26/2017 10:34 AM, David Malcolm wrote:
>> [CCing Rainer and Mike for the gcc-dg.exp part]
>
>> Alternate idea: could show_column become a tri-state:
>>* default: show non-zero columns
>>* never: never show columns
>>* always: always show a column, printing 0 for the no-column case
>> and then use "always" in our testsuite
>
> One of the things this patch shows up is the number of places where
> we're default accepting a zero column.  IMHO it is best to explicitly
> mark such tests.
>
>>> +  size_t l = sprintf (result, col ? ":%d:%d" : ":%d", line, col);
>>
>> Possibly a silly question, but is it OK to have a formatted string
>> call in which some of the arguments aren't consumed? (here "col" is only
>> consumed for the true case, which consumes 2 arguments; it's not consumed
>> for the false case).
>
> Yes.

I think I remember clang disagreeing; I remember it printing warnings
from -Wformat-extra-args in a similar situation in gnulib's
error_at_line module

>
>>> +  gcc_checking_assert (l + 1 < sizeof (result));
>>
>> Would snprintf be safer?
>
> I guess. but the assert's still needed.
>
>> Please create a selftest for the function, covering these cases:
>>
>> * line == 0
>> * line > 0 and col == 0
>> * line > 0 and col > 0 (checking output for these cases)
>> * line == INT_MAX and col == INT_MAX (without checking output, just to
>> tickle the assert)
>> * line == INT_MIN and col == INT_MIN (likewise)
>
> Ok, I'll investigate this new fangled self-testing framework :)
>
>> There are some testcases where we deliberately don't have a *line*
>> number; what happens to these?
>
> Those don't change.  the dg-harness already does NOT expect a column
> when lineno=0.
>
>> My Tcl skills aren't great, so hopefully someone else can review this;
>> CCing Rainer and Mike.
>>
>> Also, is the proposed syntax for "no columns" OK?  (note the tristate
>> idea above)
>
> I'm not wedded to '-:', but as mentioned above, I think the tests should
> be explicit about whether a column is expected or not (and the default
> needs to be 'expect column', because of history)
>
> thanks for your comments.
>
> nathan
>
> --
> Nathan Sidwell
>


Re: [PATCH] Change default optimization level to -Og

2017-10-26 Thread Eric Gallager
On 10/26/17, Wilco Dijkstra  wrote:
> GCC's default optimization level is -O0.  Unfortunately unlike other
> compilers,
> GCC generates extremely inefficient code with -O0.  It is almost unusable
> for
> low-level debugging or manual inspection of generated code.  So a -O option
> is
> always required for compilation.  -Og not only allows for fast compilation,
> but
> also produces code that is efficient, readable as well as debuggable.
> Therefore -Og makes for a much better default setting.
>
> Any comments?

There are a number of bugs with -Og that I'd want to see fixed before
making it the default; I'll follow this message up once I find them
all.

>
> 2017-10-26  Wilco Dijkstra  
>
>   * opts.c (default_options_optimization): Set default to -Og.
>
> doc/
>   * invoke.texi (-O0) Remove default mention.
>   (-Og): Add mention of default setting.
>
> --
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index
> 3328a3b5fafa6a98007eff52d2a26af520de9128..74c33ea35b9f320b419a3417e6007d2391536f1b
> 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -7343,7 +7343,7 @@ by @option{-O2} and also turns on the following
> optimization flags:
>  @item -O0
>  @opindex O0
>  Reduce compilation time and make debugging produce the expected
> -results.  This is the default.
> +results.
>
>  @item -Os
>  @opindex Os
> @@ -7371,7 +7371,7 @@ Optimize debugging experience.  @option{-Og} enables
> optimizations
>  that do not interfere with debugging. It should be the optimization
>  level of choice for the standard edit-compile-debug cycle, offering
>  a reasonable level of optimization while maintaining fast compilation
> -and a good debugging experience.
> +and a good debugging experience.  This is the default.
>  @end table
>
>  If you use multiple @option{-O} options, with or without level numbers,
> diff --git a/gcc/opts.c b/gcc/opts.c
> index
> dfad955e220870a3250198640f3790c804b191e0..74511215309f11445685db4894be2ab6881695d3
> 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -565,6 +565,12 @@ default_options_optimization (struct gcc_options
> *opts,
>int opt2;
>bool openacc_mode = false;
>
> +  /* Set the default optimization to -Og.  */
> +  opts->x_optimize_size = 0;
> +  opts->x_optimize = 1;
> +  opts->x_optimize_fast = 0;
> +  opts->x_optimize_debug = 1;
> +
>/* Scan to see what optimization level has been specified.  That will
>   determine the default value of many flags.  */
>for (i = 1; i < decoded_options_count; i++)
>
>


Re: [patch][i386, AVX] Adding missing CMP* intrinsics

2017-10-26 Thread Kirill Yukhin
Hello Olga, Sebastian,
On 20 Oct 08:36, Peryt, Sebastian wrote:
> Hi,
> 
> This patch written by Olga Makhotina adds listed below missing intrinsics:
> _mm512_[mask_]cmpeq_[pd|ps]_mask
> _mm512_[mask_]cmple_[pd|ps]_mask
> _mm512_[mask_]cmplt_[pd|ps]_mask
> _mm512_[mask_]cmpneq_[pd|ps]_mask
> _mm512_[mask_]cmpnle_[pd|ps]_mask
> _mm512_[mask_]cmpnlt_[pd|ps]_mask
> _mm512_[mask_]cmpord_[pd|ps]_mask
> _mm512_[mask_]cmpunord_[pd|ps]_mask
> 
> Is it ok for trunk?
Your patch is OK for trunk. I've checked it in.

--
Thanks, K

> Thanks,
> Sebastian
> 




Re: [Diagnostic Patch] don't print column zero

2017-10-26 Thread Nathan Sidwell

On 10/26/2017 02:12 PM, Eric Gallager wrote:

On 10/26/17, Nathan Sidwell  wrote:

On 10/26/2017 10:34 AM, David Malcolm wrote:



Possibly a silly question, but is it OK to have a formatted string
call in which some of the arguments aren't consumed? (here "col" is only
consumed for the true case, which consumes 2 arguments; it's not consumed
for the false case).


Yes.


I think I remember clang disagreeing; I remember it printing warnings
from -Wformat-extra-args in a similar situation in gnulib's
error_at_line module


C++ 21.10.1 defers to C.  C-99 7.15.1 has no words saying va_arg must be 
applied to exactly all arguments captured by va_list object. (and I'm 
pretty sure scanf can bail early)


Now, it might be sensible to warn about:
  printf ("", 5);
because printf's semantics are known.  But that's not ill-formed, just 
inefficient.  And in this case we're doing the equivalent of:

  printf (not-compile-time-constant, 5);

nathan

--
Nathan Sidwell


  1   2   >