Re: [PATCH][RTL] Fix PR87852

2018-11-05 Thread Richard Biener
On Fri, 2 Nov 2018, Eric Botcazou wrote:

> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > OK for trunk?
> > 
> > Thanks,
> > Richard.
> > 
> > 2018-11-02  Richard Biener  
> > 
> > PR rtl-optimization/87852
> > * fwprop.c (use_killed_between): Only consider single-defs of the
> > use in the definition stmt that dominate it.
> 
> This looks OK to me, but this lacks commentary and I have a hard time parsing 
> the ChangeLog entry.  Maybe:
> 
>   * fwprop.c (use_killed_between): Only consider single-defs of the use
>   whose definition statement dominates the use.
> 
> FWIW I've attached a patch that also fixes the head comment of the function.

Thanks, I have committed the following then.

Richard.

2018-11-05  Richard Biener  

PR rtl-optimization/87852
* fwprop.c (use_killed_between): Only consider single-defs of the
use whose definition statement dominates the use.

Index: gcc/fwprop.c
===
--- gcc/fwprop.c(revision 265790)
+++ gcc/fwprop.c(working copy)
@@ -731,14 +731,15 @@ local_ref_killed_between_p (df_ref ref,
 }
 
 
-/* Check if the given DEF is available in INSN.  This would require full
-   computation of available expressions; we check only restricted conditions:
-   - if DEF is the sole definition of its register, go ahead;
-   - in the same basic block, we check for no definitions killing the
- definition of DEF_INSN;
-   - if USE's basic block has DEF's basic block as the sole predecessor,
- we check if the definition is killed after DEF_INSN or before
+/* Check if USE is killed between DEF_INSN and TARGET_INSN.  This would
+   require full computation of available expressions; we check only a few
+   restricted conditions:
+   - if the reg in USE has only one definition, go ahead;
+   - in the same basic block, we check for no definitions killing the use;
+   - if TARGET_INSN's basic block has DEF_INSN's basic block as its sole
+ predecessor, we check if the use is killed after DEF_INSN or before
  TARGET_INSN insn, in their respective basic blocks.  */
+
 static bool
 use_killed_between (df_ref use, rtx_insn *def_insn, rtx_insn *target_insn)
 {
@@ -762,12 +763,17 @@ use_killed_between (df_ref use, rtx_insn
  know that this definition reaches use, or we wouldn't be here.
  However, this is invalid for hard registers because if they are
  live at the beginning of the function it does not mean that we
- have an uninitialized access.  */
+ have an uninitialized access.  And we have to check for the case
+ where a register may be used uninitialized in a loop as above.  */
   regno = DF_REF_REGNO (use);
   def = DF_REG_DEF_CHAIN (regno);
   if (def
   && DF_REF_NEXT_REG (def) == NULL
-  && regno >= FIRST_PSEUDO_REGISTER)
+  && regno >= FIRST_PSEUDO_REGISTER
+  && (BLOCK_FOR_INSN (DF_REF_INSN (def)) == def_bb
+ ? DF_INSN_LUID (DF_REF_INSN (def)) < DF_INSN_LUID (def_insn)
+ : dominated_by_p (CDI_DOMINATORS,
+   def_bb, BLOCK_FOR_INSN (DF_REF_INSN (def)
 return false;
 
   /* Check locally if we are in the same basic block.  */


[PATCH 3/4] Fix vector memory statistics.

2018-11-05 Thread marxin

gcc/ChangeLog:

2018-11-02  Martin Liska  

* mem-stats.h (mem_alloc_description::release_instance_overhead):
Return T *.
* vec.c (struct vec_usage): Register m_element_size.
(vec_prefix::register_overhead): New arguments: elements and
element_size.
(vec_prefix::release_overhead): Subtract elements.
* vec.h (struct vec_prefix): Change signature.
(va_heap::reserve): Pass proper arguments.
(va_heap::release): Likewise.
---
 gcc/mem-stats.h | 14 --
 gcc/vec.c   | 34 +-
 gcc/vec.h   | 12 
 3 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/gcc/mem-stats.h b/gcc/mem-stats.h
index 3ef6d53dfa6..860908cf585 100644
--- a/gcc/mem-stats.h
+++ b/gcc/mem-stats.h
@@ -341,8 +341,8 @@ public:
 
   /* Release PTR pointer of SIZE bytes. If REMOVE_FROM_MAP is set to true,
  remove the instance from reverse map.  */
-  void release_instance_overhead (void *ptr, size_t size,
-  bool remove_from_map = false);
+  T * release_instance_overhead (void *ptr, size_t size,
+ bool remove_from_map = false);
 
   /* Release intance object identified by PTR pointer.  */
   void release_object_overhead (void *ptr);
@@ -503,7 +503,7 @@ mem_alloc_description::register_overhead (size_t size,
 /* Release PTR pointer of SIZE bytes.  */
 
 template 
-inline void
+inline T *
 mem_alloc_description::release_instance_overhead (void *ptr, size_t size,
 		 bool remove_from_map)
 {
@@ -512,14 +512,16 @@ mem_alloc_description::release_instance_overhead (void *ptr, size_t size,
   if (!slot)
 {
   /* Due to PCH, it can really happen.  */
-  return;
+  return NULL;
 }
 
-  mem_usage_pair usage_pair = *slot;
-  usage_pair.usage->release_overhead (size);
+  T *usage = (*slot).usage;
+  usage->release_overhead (size);
 
   if (remove_from_map)
 m_reverse_map->remove (ptr);
+
+  return usage;
 }
 
 /* Release intance object identified by PTR pointer.  */
diff --git a/gcc/vec.c b/gcc/vec.c
index ff2456aead9..bfd52856e46 100644
--- a/gcc/vec.c
+++ b/gcc/vec.c
@@ -52,13 +52,14 @@ vnull vNULL;
 struct vec_usage: public mem_usage
 {
   /* Default constructor.  */
-  vec_usage (): m_items (0), m_items_peak (0) {}
+  vec_usage (): m_items (0), m_items_peak (0), m_element_size (0) {}
 
   /* Constructor.  */
   vec_usage (size_t allocated, size_t times, size_t peak,
-	 size_t items, size_t items_peak)
+	 size_t items, size_t items_peak, size_t element_size)
 : mem_usage (allocated, times, peak),
-m_items (items), m_items_peak (items_peak) {}
+m_items (items), m_items_peak (items_peak),
+m_element_size (element_size) {}
 
   /* Sum the usage with SECOND usage.  */
   vec_usage
@@ -68,7 +69,7 @@ struct vec_usage: public mem_usage
 		  m_times + second.m_times,
 		  m_peak + second.m_peak,
 		  m_items + second.m_items,
-		  m_items_peak + second.m_items_peak);
+		  m_items_peak + second.m_items_peak, 0);
   }
 
   /* Dump usage coupled to LOC location, where TOTAL is sum of all rows.  */
@@ -81,7 +82,8 @@ struct vec_usage: public mem_usage
 
 s[48] = '\0';
 
-fprintf (stderr, "%-48s %10li:%4.1f%%%10li%10li:%4.1f%%%11li%11li\n", s,
+fprintf (stderr, "%-48s %10li%11li:%4.1f%%%10li%10li:%4.1f%%%11li%11li\n", s,
+	 (long)m_element_size,
 	 (long)m_allocated, m_allocated * 100.0 / total.m_allocated,
 	 (long)m_peak, (long)m_times, m_times * 100.0 / total.m_times,
 	 (long)m_items, (long)m_items_peak);
@@ -101,8 +103,8 @@ struct vec_usage: public mem_usage
   static inline void
   dump_header (const char *name)
   {
-fprintf (stderr, "%-48s %11s%15s%10s%17s%11s\n", name, "Leak", "Peak",
-	 "Times", "Leak items", "Peak items");
+fprintf (stderr, "%-48s %10s%11s%16s%10s%17s%11s\n", name, "sizeof(T)",
+	 "Leak", "Peak", "Times", "Leak items", "Peak items");
 print_dash_line ();
   }
 
@@ -110,6 +112,8 @@ struct vec_usage: public mem_usage
   size_t m_items;
   /* Peak value of number of allocated items.  */
   size_t m_items_peak;
+  /* Size of element of the vector.  */
+  size_t m_element_size;
 };
 
 /* Vector memory description.  */
@@ -118,12 +122,14 @@ static mem_alloc_description  vec_mem_desc;
 /* Account the overhead.  */
 
 void
-vec_prefix::register_overhead (void *ptr, size_t size, size_t elements
-			   MEM_STAT_DECL)
+vec_prefix::register_overhead (void *ptr, size_t elements,
+			   size_t element_size MEM_STAT_DECL)
 {
   vec_mem_desc.register_descriptor (ptr, VEC_ORIGIN, false
 FINAL_PASS_MEM_STAT);
-  vec_usage *usage = vec_mem_desc.register_instance_overhead (size, ptr);
+  vec_usage *usage
+= vec_mem_desc.register_instance_overhead (elements * element_size, ptr);
+  usage->m_element_size = element_size;
   usage->m_items += elements;
   if (usage->m_items_peak < usage->m_items)
 usage->m_items_peak = usage->m_items;
@@ -132,13 +138,15 @@ vec_prefix::reg

[PATCH 1/4] Fix string pool statistics.

2018-11-05 Thread marxin

libcpp/ChangeLog:

2018-11-02  Martin Liska  

* include/symtab.h (ht_identifier):
Make room for ggc flag.
* symtab.c (ht_lookup_with_hash): Mark
GGC and non-GGC allocated strings.
(ht_dump_statistics): Use the information.
---
 libcpp/include/symtab.h |  4 +++-
 libcpp/symtab.c | 28 +++-
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/libcpp/include/symtab.h b/libcpp/include/symtab.h
index c08a4f29ca2..da92192849f 100644
--- a/libcpp/include/symtab.h
+++ b/libcpp/include/symtab.h
@@ -30,12 +30,14 @@ typedef struct ht_identifier ht_identifier;
 typedef struct ht_identifier *ht_identifier_ptr;
 struct GTY(()) ht_identifier {
   const unsigned char *str;
-  unsigned int len;
   unsigned int hash_value;
+  unsigned int len : 31;
+  unsigned int ggc : 1;
 };
 
 #define HT_LEN(NODE) ((NODE)->len)
 #define HT_STR(NODE) ((NODE)->str)
+#define HT_GGC(NODE) ((NODE)->ggc)
 
 typedef struct ht cpp_hash_table;
 typedef struct ht_identifier *hashnode;
diff --git a/libcpp/symtab.c b/libcpp/symtab.c
index fd86c849f7f..1c62a16d335 100644
--- a/libcpp/symtab.c
+++ b/libcpp/symtab.c
@@ -164,10 +164,14 @@ ht_lookup_with_hash (cpp_hash_table *table, const unsigned char *str,
   memcpy (chars, str, len);
   chars[len] = '\0';
   HT_STR (node) = (const unsigned char *) chars;
+  HT_GGC (node) = 1;
 }
   else
-HT_STR (node) = (const unsigned char *) obstack_copy0 (&table->stack,
-			   str, len);
+{
+  HT_STR (node) = (const unsigned char *) obstack_copy0 (&table->stack,
+			 str, len);
+  HT_GGC (node) = 1;
+}
 
   if (++table->nelements * 4 >= table->nslots * 3)
 /* Must expand the string table.  */
@@ -274,7 +278,7 @@ void
 ht_dump_statistics (cpp_hash_table *table)
 {
   size_t nelts, nids, overhead, headers;
-  size_t total_bytes, longest, deleted = 0;
+  size_t total_bytes_obstack = 0, total_bytes_ggc = 0, longest, deleted = 0;
   double sum_of_squares, exp_len, exp_len2, exp2_len;
   hashnode *p, *limit;
 
@@ -285,7 +289,7 @@ ht_dump_statistics (cpp_hash_table *table)
 		 : (x) / (1024*1024
 #define LABEL(x) ((x) < 1024*10 ? ' ' : ((x) < 1024*1024*10 ? 'k' : 'M'))
 
-  total_bytes = longest = sum_of_squares = nids = 0;
+  longest = sum_of_squares = nids = 0;
   p = table->entries;
   limit = p + table->nslots;
   do
@@ -295,7 +299,11 @@ ht_dump_statistics (cpp_hash_table *table)
   {
 	size_t n = HT_LEN (*p);
 
-	total_bytes += n;
+	if (HT_GGC (*p))
+	  total_bytes_ggc += n;
+	else
+	  total_bytes_obstack += n;
+
 	sum_of_squares += (double) n * n;
 	if (n > longest)
 	  longest = n;
@@ -304,7 +312,7 @@ ht_dump_statistics (cpp_hash_table *table)
   while (++p < limit);
 
   nelts = table->nelements;
-  overhead = obstack_memory_used (&table->stack) - total_bytes;
+  overhead = obstack_memory_used (&table->stack) - total_bytes_obstack;
   headers = table->nslots * sizeof (hashnode);
 
   fprintf (stderr, "\nString pool\nentries\t\t%lu\n",
@@ -315,13 +323,15 @@ ht_dump_statistics (cpp_hash_table *table)
 	   (unsigned long) table->nslots);
   fprintf (stderr, "deleted\t\t%lu\n",
 	   (unsigned long) deleted);
-  fprintf (stderr, "bytes\t\t%lu%c (%lu%c overhead)\n",
-	   SCALE (total_bytes), LABEL (total_bytes),
+  fprintf (stderr, "GGC bytes\t%lu%c\n",
+	   SCALE (total_bytes_ggc), LABEL (total_bytes_ggc));
+  fprintf (stderr, "obstack bytes\t%lu%c (%lu%c overhead)\n",
+	   SCALE (total_bytes_obstack), LABEL (total_bytes_obstack),
 	   SCALE (overhead), LABEL (overhead));
   fprintf (stderr, "table size\t%lu%c\n",
 	   SCALE (headers), LABEL (headers));
 
-  exp_len = (double)total_bytes / (double)nelts;
+  exp_len = (double)total_bytes_obstack / (double)nelts;
   exp2_len = exp_len * exp_len;
   exp_len2 = (double) sum_of_squares / (double) nelts;
 


[PATCH 4/4] Come up with SIZE_AMOUNT and use it in memory statistics and sort stats.

2018-11-05 Thread marxin

gcc/ChangeLog:

2018-11-02  Martin Liska  

* alloc-pool.h (struct pool_usage): Use SIZE_AMOUNT.
* bitmap.h (struct bitmap_usage): Likewise.
* ggc-common.c (SCALE): Remove.
(LABEL): Likewise.
(struct ggc_usage): Use SIZE_AMOUNT. And update
compare method.
* ggc-page.c (SCALE): Remove.
(STAT_LABEL): Remove.
(ggc_print_statistics): Use SIZE_AMOUNT.
* gimple.h (SCALE): Remove.
(LABEL): Likewise.
* input.c (ONE_K): Remove.
(ONE_M): Likewise.
(SCALE): Likewise.
(STAT_LABEL): Likewise.
(FORMAT_AMOUNT): Likewise.
(dump_line_table_statistics): Use SIZE_AMOUNT.
* mem-stats.h (struct mem_usage): Likewise.
* rtl.c (dump_rtx_statistics): Likewise.
(rtx_alloc_counts): Change type to size_t.
(rtx_alloc_sizes): Likewise.
(rtx_count_cmp): New.
(dump_rtx_statistics): Sort first based on counts.
* tree.c (tree_nodes_cmp): New.
(tree_codes_cmp): New.
(dump_tree_statistics): Sort first based on counts.
* system.h (ONE_K): New.
(ONE_M): Likewise.
(SIZE_SCALE): Likewise.
(SIZE_LABEL): Likewise.
(SIZE_AMOUNT): Likewise.
* tree-cfg.c (dump_cfg_stats): Use SIZE_AMOUNT.
* tree-dfa.c (dump_dfa_stats): Likewise.
* tree-phinodes.c (phinodes_print_statistics): Likewise.
* tree-ssanames.c (ssanames_print_statistics): Likewise.
* tree.c (dump_tree_statistics): Likewise.
* vec.c (struct vec_usage): Likewise.
* trans-mem.c (tm_mangle): Enlarge buffer in order to not
trigger a -Werror=format-overflow with
--enable-gather-detailed-stats.
---
 gcc/alloc-pool.h| 18 +
 gcc/bitmap.h| 12 +++---
 gcc/ggc-common.c| 32 ++--
 gcc/ggc-page.c  | 86 +-
 gcc/gimple.c| 11 +++---
 gcc/gimple.h| 10 -
 gcc/input.c | 75 ++---
 gcc/mem-stats.h | 12 +++---
 gcc/rtl.c   | 66 +---
 gcc/system.h| 25 +
 gcc/trans-mem.c |  2 +-
 gcc/tree-cfg.c  |  8 ++--
 gcc/tree-dfa.c  | 16 
 gcc/tree-phinodes.c |  5 ++-
 gcc/tree-ssanames.c |  6 ++-
 gcc/tree.c  | 91 +
 gcc/vec.c   | 20 ++
 17 files changed, 272 insertions(+), 223 deletions(-)

diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
index d2ee0005761..d17a05ca4fb 100644
--- a/gcc/alloc-pool.h
+++ b/gcc/alloc-pool.h
@@ -63,12 +63,16 @@ struct pool_usage: public mem_usage
   {
 char *location_string = loc->to_string ();
 
-fprintf (stderr, "%-32s%-48s %6li%10li:%5.1f%%%10li%10li:%5.1f%%%12li\n",
-	 m_pool_name, location_string, (long)m_instances,
-	 (long)m_allocated, get_percent (m_allocated, total.m_allocated),
-	 (long)m_peak, (long)m_times,
+fprintf (stderr, "%-32s%-48s %5zu%c%9zu%c:%5.1f%%%9zu"
+	 "%c%9zu%c:%5.1f%%%12zu\n",
+	 m_pool_name, location_string,
+	 SIZE_AMOUNT (m_instances),
+	 SIZE_AMOUNT (m_allocated),
+	 get_percent (m_allocated, total.m_allocated),
+	 SIZE_AMOUNT (m_peak),
+	 SIZE_AMOUNT (m_times),
 	 get_percent (m_times, total.m_times),
-	 (long)m_element_size);
+	 m_element_size);
 
 free (location_string);
   }
@@ -87,8 +91,8 @@ struct pool_usage: public mem_usage
   dump_footer ()
   {
 print_dash_line ();
-fprintf (stderr, "%s%82li%10li\n", "Total", (long)m_instances,
-	 (long)m_allocated);
+fprintf (stderr, "%s%82zu%c%10zu%c\n", "Total",
+	 SIZE_AMOUNT (m_instances), SIZE_AMOUNT (m_allocated));
 print_dash_line ();
   }
 
diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 5d3e8a5088e..973ea846baf 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -239,14 +239,14 @@ struct bitmap_usage: public mem_usage
   {
 char *location_string = loc->to_string ();
 
-fprintf (stderr, "%-48s %10" PRIu64 ":%5.1f%%"
-	 "%10" PRIu64 "%10" PRIu64 ":%5.1f%%"
-	 "%12" PRIu64 "%12" PRIu64 "%10s\n",
-	 location_string, (uint64_t)m_allocated,
+fprintf (stderr, "%-48s %9zu%c:%5.1f%%"
+	 "%9zu%c%9zu%c:%5.1f%%"
+	 "%11" PRIu64 "%c%11" PRIu64 "%c%10s\n",
+	 location_string, SIZE_AMOUNT (m_allocated),
 	 get_percent (m_allocated, total.m_allocated),
-	 (uint64_t)m_peak, (uint64_t)m_times,
+	 SIZE_AMOUNT (m_peak), SIZE_AMOUNT (m_times),
 	 get_percent (m_times, total.m_times),
-	 m_nsearches, m_search_iter,
+	 SIZE_AMOUNT (m_nsearches), SIZE_AMOUNT (m_search_iter),
 	 loc->m_ggc ? "ggc" : "heap");
 
 free (location_string);
diff --git a/gcc/ggc-common.c b/gcc/ggc-common.c
index f83fc136d04..9fdba23ce4c 100644
--- a/gcc/ggc-common.c
+++ b/gcc/ggc-common.c
@@ -195,14 +195,6 @@ ggc_splay_dont_free (void * x ATTRIBUTE_UNUSED, void *nl)
   gcc_assert (!nl);
 }
 
-/* Print statistics

[PATCH 2/4] Fix GNU coding style.

2018-11-05 Thread marxin

gcc/ChangeLog:

2018-11-02  Martin Liska  

* mem-stats.h (mem_alloc_description::get_list): Fix GNU coding
style.
* vec.c: Likewise.
---
 gcc/mem-stats.h | 61 +
 gcc/vec.c   |  1 -
 2 files changed, 26 insertions(+), 36 deletions(-)

diff --git a/gcc/mem-stats.h b/gcc/mem-stats.h
index 140691e8ec1..3ef6d53dfa6 100644
--- a/gcc/mem-stats.h
+++ b/gcc/mem-stats.h
@@ -282,21 +282,21 @@ public:
 static hashval_t
 hash (value_type l)
 {
-	inchash::hash hstate;
+  inchash::hash hstate;
 
-	hstate.add_ptr ((const void *)l->m_filename);
-	hstate.add_ptr (l->m_function);
-	hstate.add_int (l->m_line);
+  hstate.add_ptr ((const void *)l->m_filename);
+  hstate.add_ptr (l->m_function);
+  hstate.add_int (l->m_line);
 
-	return hstate.end ();
+  return hstate.end ();
 }
 
 static bool
 equal (value_type l1, value_type l2)
 {
-  return l1->m_filename == l2->m_filename
-	&& l1->m_function == l2->m_function
-	&& l1->m_line == l2->m_line;
+  return (l1->m_filename == l2->m_filename
+	  && l1->m_function == l2->m_function
+	  && l1->m_line == l2->m_line);
 }
   };
 
@@ -313,59 +313,50 @@ public:
   ~mem_alloc_description ();
 
   /* Returns true if instance PTR is registered by the memory description.  */
-  bool
-  contains_descriptor_for_instance (const void *ptr);
+  bool contains_descriptor_for_instance (const void *ptr);
 
   /* Return descriptor for instance PTR.  */
-  T *
-  get_descriptor_for_instance (const void *ptr);
+  T * get_descriptor_for_instance (const void *ptr);
 
   /* Register memory allocation descriptor for container PTR which is
  described by a memory LOCATION.  */
-  T *
-  register_descriptor (const void *ptr, mem_location *location);
+  T * register_descriptor (const void *ptr, mem_location *location);
 
   /* Register memory allocation descriptor for container PTR.  ORIGIN identifies
  type of container and GGC identifes if the allocation is handled in GGC
  memory.  Each location is identified by file NAME, LINE in source code and
  FUNCTION name.  */
-  T *
-  register_descriptor (const void *ptr, mem_alloc_origin origin,
-			  bool ggc, const char *name, int line,
-			  const char *function);
+  T * register_descriptor (const void *ptr, mem_alloc_origin origin,
+			   bool ggc, const char *name, int line,
+			   const char *function);
 
   /* Register instance overhead identified by PTR pointer. Allocation takes
  SIZE bytes.  */
-  T *
-  register_instance_overhead (size_t size, const void *ptr);
+  T * register_instance_overhead (size_t size, const void *ptr);
 
   /* For containers (and GGC) where we want to track every instance object,
  we register allocation of SIZE bytes, identified by PTR pointer, belonging
  to USAGE descriptor.  */
-  void
-  register_object_overhead (T *usage, size_t size, const void *ptr);
+  void register_object_overhead (T *usage, size_t size, const void *ptr);
 
   /* Release PTR pointer of SIZE bytes. If REMOVE_FROM_MAP is set to true,
  remove the instance from reverse map.  */
-  void
-  release_instance_overhead (void *ptr, size_t size,
+  void release_instance_overhead (void *ptr, size_t size,
   bool remove_from_map = false);
 
   /* Release intance object identified by PTR pointer.  */
-  void
-  release_object_overhead (void *ptr);
+  void release_object_overhead (void *ptr);
 
   /* Get sum value for ORIGIN type of allocation for the descriptor.  */
-  T
-  get_sum (mem_alloc_origin origin);
+  T get_sum (mem_alloc_origin origin);
 
   /* Get all tracked instances registered by the description. Items
  are filtered by ORIGIN type, LENGTH is return value where we register
  the number of elements in the list. If we want to process custom order,
  CMP comparator can be provided.  */
-  mem_list_t *
-  get_list (mem_alloc_origin origin, unsigned *length,
-	int (*cmp) (const void *first, const void *second) = NULL);
+  mem_list_t * get_list (mem_alloc_origin origin, unsigned *length,
+			 int (*cmp) (const void *first,
+ const void *second) = NULL);
 
   /* Dump all tracked instances of type ORIGIN. If we want to process custom
  order, CMP comparator can be provided.  */
@@ -391,7 +382,6 @@ private:
   reverse_mem_map_t *m_reverse_map;
 };
 
-
 /* Returns true if instance PTR is registered by the memory description.  */
 
 template 
@@ -410,9 +400,9 @@ mem_alloc_description::get_descriptor_for_instance (const void *ptr)
   return m_reverse_map->get (ptr) ? (*m_reverse_map->get (ptr)).usage : NULL;
 }
 
+/* Register memory allocation descriptor for container PTR which is
+   described by a memory LOCATION.  */
 
-  /* Register memory allocation descriptor for container PTR which is
- described by a memory LOCATION.  */
 template 
 inline T*
 mem_alloc_description::register_descriptor (const void *ptr,
@@ -584,7 +574,8 @@ template 
 inline
 typena

[PATCH 0/4] Enhance and fix various issues in -fmem-report

2018-11-05 Thread marxin
Hi.

As I discussed with Richi, the patch set fixes few issues in memory
report. Apart from that it makes it more readable with usage of k (or M)
as units.

Survives bootstrap and regression tests on x86_64-linux-gnu. And there
are no new warnings on i586-linux-gnu.

Martin

marxin (4):
  Fix string pool statistics.
  Fix GNU coding style.
  Fix vector memory statistics.
  Come up with SIZE_AMOUNT and use it in memory statistics and sort
stats.

 gcc/alloc-pool.h| 18 
 gcc/bitmap.h| 12 +++---
 gcc/ggc-common.c| 32 +--
 gcc/ggc-page.c  | 86 +++---
 gcc/gimple.c| 11 ++---
 gcc/gimple.h| 10 -
 gcc/input.c | 75 +
 gcc/mem-stats.h | 85 ++
 gcc/rtl.c   | 66 --
 gcc/system.h| 25 +++
 gcc/trans-mem.c |  2 +-
 gcc/tree-cfg.c  |  8 ++--
 gcc/tree-dfa.c  | 16 
 gcc/tree-phinodes.c |  5 ++-
 gcc/tree-ssanames.c |  6 ++-
 gcc/tree.c  | 91 ++---
 gcc/vec.c   | 51 ++-
 gcc/vec.h   | 12 --
 libcpp/include/symtab.h |  4 +-
 libcpp/symtab.c | 28 +
 20 files changed, 354 insertions(+), 289 deletions(-)

-- 
2.19.1



Re: [PATCH] i386: Remove duplicated AVX2/AVX512 vec_dup patterns

2018-11-05 Thread Uros Bizjak
On Sun, Nov 4, 2018 at 9:49 PM H.J. Lu  wrote:

> > > > Actually, we can achieve the same with pre-reload splitters. Please
> > > > see the attached patch for a couple of examples and a fix for
> > > > vbroadcastss that accesses the memory in wrong mode.
> > > >
> > >
> > > My patch removes a bunch of duplicated patterns from sse.md.  But
> > > yours adds a couple more patterns.   Isn't fewer patterns preferred?
> >
> > Playing SUBREG games before reload does not look safe to me. We would
>
> There are plenty of SUBREG usage in i386 backend before preload.  It is
> perfectly safe to do so as long as we don't create SUBREG with a different
> register class from the base.  Do you have a testcase to show my SUBREG
> usage is unsafe?

No. However, the patch then substatially changes functionality in the
vector part of the i386 (expand_vec_perm_1), so it needs approval from
the relevant maintainer (Kirill).

Uros.


Re: [PATCH] newlib/configure.host: Set have_init_fini to no for OpenRISC

2018-11-05 Thread Corinna Vinschen
On Nov  3 07:00, Stafford Horne wrote:
> The new GCC port for OpenRISC will use the init_fini_array only and not
> provide the init() and fini() functions.  Disable the function usage by
> default as its no longer needed.
> 
> Signed-off-by: Stafford Horne 
> ---
>  newlib/configure.host | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/newlib/configure.host b/newlib/configure.host
> index 27bce36a1..6c49cb750 100644
> --- a/newlib/configure.host
> +++ b/newlib/configure.host
> @@ -279,6 +279,7 @@ case "${host_cpu}" in
>   ;;
>or1k*|or1knd*)
>   machine_dir=or1k
> + have_init_fini=no
>   ;;
>powerpc*)
>   machine_dir=powerpc
> -- 
> 2.17.2

Pushed.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat


signature.asc
Description: PGP signature


[PATCH] Fix -fsanitize=undefined vs. x + y < x (PR sanitizer/87837)

2018-11-05 Thread Jakub Jelinek
Hi!

I wish I had a better fix, but I don't, trying to sanitize signed integer
arithmetics in the FEs already before any folding there is complicated by
that arithmetics being created just in way too many spots.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-11-05  Jakub Jelinek  

PR sanitizer/87837
* match.pd (X + Y < X): Don't optimize if TYPE_OVERFLOW_SANITIZED.

* c-c++-common/ubsan/pr87837.c: New test.

--- gcc/match.pd.jj 2018-10-31 10:33:07.438686055 +0100
+++ gcc/match.pd2018-11-01 10:26:44.251883633 +0100
@@ -1572,6 +1572,7 @@ (define_operator_list COND_TERNARY
   (op:c (plus:c@2 @0 @1) @1)
   (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
&& TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+   && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
&& (CONSTANT_CLASS_P (@0) || single_use (@2)))
(op @0 { build_zero_cst (TREE_TYPE (@0)); }
 /* For equality, this is also true with wrapping overflow.  */
--- gcc/testsuite/c-c++-common/ubsan/pr87837.c.jj   2018-11-01 
10:37:35.159186004 +0100
+++ gcc/testsuite/c-c++-common/ubsan/pr87837.c  2018-11-01 10:39:56.162868607 
+0100
@@ -0,0 +1,18 @@
+/* PR sanitizer/87837 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=signed-integer-overflow -Wno-unused-variable" } */
+
+int
+foo (int n)
+{
+  return n + __INT_MAX__ < n;
+}
+
+int
+main ()
+{
+  volatile int a = foo (1);
+  return 0;
+}
+
+/* { dg-output "signed integer overflow: 1 \\+ 2147483647 cannot be 
represented in type 'int'" } */

Jakub


Re: PR83750: CSE erf/erfc pair

2018-11-05 Thread Richard Biener
On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
 wrote:
>
> Hi,
> This patch adds two transforms to match.pd to CSE erf/erfc pair.
> erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> erf(x) when canonicalization is disabled and result of erf(x) has
> single use within 1 - erf(x).
>
> The patch regressed builtin-nonneg-1.c. The following test-case
> reproduces the issue with patch:
>
> void test(double d1) {
>   if (signbit(erfc(d1)))
> link_failure_erfc();
> }
>
> ssa dump:
>
>:
>   _5 = __builtin_erf (d1_4(D));
>   _1 = 1.0e+0 - _5;
>   _6 = _1 < 0.0;
>   _2 = (int) _6;
>   if (_2 != 0)
> goto ; [INV]
>   else
> goto ; [INV]
>
>:
>   link_failure_erfc ();
>
>:
>   return;
>
> As can be seen, erfc(d1) is folded to 1 - erf(d1).
> forwprop then transforms the if condition from _2 != 0
> to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> in undefined reference to link_failure_erfc().
>
> So, the patch adds another transform erf(x) > 1 -> 0.

Ick.

Why not canonicalize erf (x) to 1-erfc(x) instead?

> which resolves the regression.
>
> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> Cross-testing on arm and aarch64 variants in progress.
> OK for trunk if passes ?
>
> Thanks,
> Prathamesh


Re: [PATCH 1/4] Fix string pool statistics.

2018-11-05 Thread Richard Biener
On Mon, Nov 5, 2018 at 9:07 AM marxin  wrote:
>
>
> libcpp/ChangeLog:

Hmm, the patch suggests the flag might be instead
part of cpp_hash_table instead of each individual
ht_identifier?  Or the patch is confused when it
sets HT_GGC to 1 even in

   else
-HT_STR (node) = (const unsigned char *) obstack_copy0 (&table->stack,
-  str, len);
+{
+  HT_STR (node) = (const unsigned char *) obstack_copy0 (&table->stack,
+str, len);
+  HT_GGC (node) = 1;
+}

?  Do we really support mixed operation here?

> 2018-11-02  Martin Liska  
>
> * include/symtab.h (ht_identifier):
> Make room for ggc flag.
> * symtab.c (ht_lookup_with_hash): Mark
> GGC and non-GGC allocated strings.
> (ht_dump_statistics): Use the information.
> ---
>  libcpp/include/symtab.h |  4 +++-
>  libcpp/symtab.c | 28 +++-
>  2 files changed, 22 insertions(+), 10 deletions(-)
>


Re: [PATCH 2/4] Fix GNU coding style.

2018-11-05 Thread Richard Biener
On Mon, Nov 5, 2018 at 9:07 AM marxin  wrote:
>

OK

> gcc/ChangeLog:
>
> 2018-11-02  Martin Liska  
>
> * mem-stats.h (mem_alloc_description::get_list): Fix GNU coding
> style.
> * vec.c: Likewise.
> ---
>  gcc/mem-stats.h | 61 +
>  gcc/vec.c   |  1 -
>  2 files changed, 26 insertions(+), 36 deletions(-)
>


Re: [PATCH 3/4] Fix vector memory statistics.

2018-11-05 Thread Richard Biener
On Mon, Nov 5, 2018 at 9:07 AM marxin  wrote:
>
>
> gcc/ChangeLog:

   /* Release PTR pointer of SIZE bytes. If REMOVE_FROM_MAP is set to true,
  remove the instance from reverse map.  */
-  void release_instance_overhead (void *ptr, size_t size,
- bool remove_from_map = false);
+  T * release_instance_overhead (void *ptr, size_t size,
+bool remove_from_map = false);

can you document the return value?

Otherwise OK.

Richard.

> 2018-11-02  Martin Liska  
>
> * mem-stats.h (mem_alloc_description::release_instance_overhead):
> Return T *.
> * vec.c (struct vec_usage): Register m_element_size.
> (vec_prefix::register_overhead): New arguments: elements and
> element_size.
> (vec_prefix::release_overhead): Subtract elements.
> * vec.h (struct vec_prefix): Change signature.
> (va_heap::reserve): Pass proper arguments.
> (va_heap::release): Likewise.
> ---
>  gcc/mem-stats.h | 14 --
>  gcc/vec.c   | 34 +-
>  gcc/vec.h   | 12 
>  3 files changed, 37 insertions(+), 23 deletions(-)
>


[PATCH] Fix store merging wrong-code (PR tree-optimization/87859)

2018-11-05 Thread Jakub Jelinek
Hi!

My recent change for PR86844 broke the following testcases.

The problem is that we are processing the stores in the bitpos/bitsize
order.  The change PR86844 change was about if we try to merge two
overlapping stores of INTEGER_CST, we need to go through all other stores
that are overlapping those and come in program order before the last of
those; if they are INTEGER_CST stores, we can merge them, if they are other
stores, we punt completely.  If there are any stores ordered in the
bitpos/bitsize ordering in between those (thus also overlapping), we keep
them unmerged as is.

The problem is that if we this way skip any stores, we need to ensure that
we don't merge this store group with any statements beyond that, because
merged store group is stored at the point of the last statement, and that
would overwrite the skipped stores.  Let's use the testcase:
  i.f.i[0] = 0; // bitpos 0, bitsize 32
  i.f.i[1] = 0; // bitpos 32, bitsize 32
  i.f.i[2] = 0; // bitpos 64, bitsize 32
  i.f.q.f7 = 1; // all other stores are bitpos the number after f, bitsize 1
  i.f.q.f2 = 1;
  i.f.q.f21 = 1;
  i.f.q.f19 = 1;
  i.f.q.f14 = 1;
  i.f.q.f5 = 1;
  i.f.q.f0 = 1;
  i.f.q.f15 = 1;
  i.f.q.f16 = 1;
  i.f.q.f6 = 1;
  i.f.q.f9 = 1;
  i.f.q.f17 = 1;
  i.f.q.f1 = 1;
  i.f.q.f8 = 1;
  i.f.q.f13 = 1;
  i.f.q.f66 = 1;
In the bitpos/bitsize order, this is:
  i.f.i[0] = 0; // bitpos 0, bitsize 32
  i.f.q.f0 = 1;
  i.f.q.f1 = 1;
  i.f.q.f2 = 1;
  i.f.q.f5 = 1;
  i.f.q.f6 = 1;
  i.f.q.f7 = 1;
  i.f.q.f8 = 1;
  i.f.q.f9 = 1;
  i.f.q.f13 = 1;
  i.f.q.f14 = 1;
  i.f.q.f15 = 1;
  i.f.q.f16 = 1;
  i.f.q.f17 = 1;
  i.f.q.f19 = 1;
  i.f.q.f21 = 1;
  i.f.i[1] = 0; // bitpos 32, bitsize 32
  i.f.i[2] = 0; // bitpos 64, bitsize 32
  i.f.q.f66 = 1;
and when trying to merge overlapping i.f.i[0] = 0; with i.f.q.f0 = 1;,
we also merge in overlapping f7, f2, f21, f19, f14, f5 and skip
overlapping, but coming after the f0 store, stores f1 .. f17 other than the
above mentioned ones (so those stores remain in the source).  Now, in this
case the merged store group would be emitted where the f0 store is and the
optimization would be still correct.  The problem is that we keep going and
merge that store group with i[1] and i[2] stores (that is ok, those are
before the f0 store, so again, still at the spot of f0 store), but then also
merge with f66 store and that makes it incorrect, the
  i.f.q.f15 = 1;
  i.f.q.f16 = 1;
  i.f.q.f6 = 1;
  i.f.q.f9 = 1;
  i.f.q.f17 = 1;
  i.f.q.f1 = 1;
  i.f.q.f8 = 1;
  i.f.q.f13 = 1;
stores remain in the IL, but because the last store of the merged group is
now at f66 store, that is where those bits are overwritten with the store
of 96 bits together.  The following patch fixes that by making sure that if
we skip this way any stores (keep them in the IL), then we ensure that we
don't grow the store group beyond that last_order point.

Now, the first testcase also shows that we skip those stmts completely
unnecessarily, while they are overlapping stores that come after the
latter of the two overlapping stores being merged, they are still
INTEGER_CST stores and in the past we've merged them all into a constant
store just fine.  The reason PR86844 was done is that there can be
non-INTEGER_CST stores that prevent the merging.  So, this patch also
remembers which of the overlapping stores we would skip are INTEGER_CST
stores and which are other stores; if we see any to be skipped INTEGER_CSTs,
we retry the analysis with including those too; if that works out, we use
that, if it doesn't, we revert to what we chose before (skipping those
stores with magic to ensure we don't merge with other stores later in
program order).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and 8.3?

2018-11-05  Jakub Jelinek  

PR tree-optimization/87859
* gimple-ssa-store-merging.c (struct merged_store_group): Add
only_constants and first_nonmergeable_order members.
(merged_store_group::merged_store_group): Initialize them.
(merged_store_group::do_merge): Clear only_constants member if
adding something other than INTEGER_CST store.
(imm_store_chain_info::coalesce_immediate_stores): Don't merge
stores with order >= first_nonmergeable_order.  Use
merged_store->only_constants instead of always recomputing it.
Set merged_store->first_nonmergeable_order if we've skipped any
stores.  Attempt to merge overlapping INTEGER_CST stores that
we would otherwise skip.

* gcc.dg/store_merging_24.c: New test.
* gcc.dg/store_merging_25.c: New test.

--- gcc/gimple-ssa-store-merging.c.jj   2018-11-02 15:36:30.802999681 +0100
+++ gcc/gimple-ssa-store-merging.c  2018-11-05 10:20:25.980370963 +0100
@@ -1429,6 +1429,8 @@ struct merged_store_group
   unsigned int first_order;
   unsigned int last_order;
   bool bit_insertion;
+  bool only_constants;
+  unsigned int first_nonmergeable_order;
 
   auto_vec stores;
   /* We record the fi

Re: [PATCH] Fix -fsanitize=undefined vs. x + y < x (PR sanitizer/87837)

2018-11-05 Thread Richard Biener
On Mon, 5 Nov 2018, Jakub Jelinek wrote:

> Hi!
> 
> I wish I had a better fix, but I don't, trying to sanitize signed integer
> arithmetics in the FEs already before any folding there is complicated by
> that arithmetics being created just in way too many spots.

I suppose we could play some tricks and "unset" TYPE_OVERFLOW_SANITIZED
after instrumentation finished?

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2018-11-05  Jakub Jelinek  
> 
>   PR sanitizer/87837
>   * match.pd (X + Y < X): Don't optimize if TYPE_OVERFLOW_SANITIZED.
> 
>   * c-c++-common/ubsan/pr87837.c: New test.
> 
> --- gcc/match.pd.jj   2018-10-31 10:33:07.438686055 +0100
> +++ gcc/match.pd  2018-11-01 10:26:44.251883633 +0100
> @@ -1572,6 +1572,7 @@ (define_operator_list COND_TERNARY
>(op:c (plus:c@2 @0 @1) @1)
>(if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> +   && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> && (CONSTANT_CLASS_P (@0) || single_use (@2)))
> (op @0 { build_zero_cst (TREE_TYPE (@0)); }
>  /* For equality, this is also true with wrapping overflow.  */
> --- gcc/testsuite/c-c++-common/ubsan/pr87837.c.jj 2018-11-01 
> 10:37:35.159186004 +0100
> +++ gcc/testsuite/c-c++-common/ubsan/pr87837.c2018-11-01 
> 10:39:56.162868607 +0100
> @@ -0,0 +1,18 @@
> +/* PR sanitizer/87837 */
> +/* { dg-do run } */
> +/* { dg-options "-fsanitize=signed-integer-overflow -Wno-unused-variable" } 
> */
> +
> +int
> +foo (int n)
> +{
> +  return n + __INT_MAX__ < n;
> +}
> +
> +int
> +main ()
> +{
> +  volatile int a = foo (1);
> +  return 0;
> +}
> +
> +/* { dg-output "signed integer overflow: 1 \\+ 2147483647 cannot be 
> represented in type 'int'" } */
> 
>   Jakub
> 


Re: [PATCH v2] S/390: Allow LARL of literal pool entries

2018-11-05 Thread Ilya Leoshkevich



> Am 31.10.2018 um 10:59 schrieb Ulrich Weigand :
> 
> Ilya Leoshkevich wrote:
>> Am 30.10.2018 um 18:22 schrieb Ulrich Weigand :
>>> This definitely looks wrong.  If we haven't annotated the address,
>>> it should *not* be found by find_constant_pool_ref, since we are
>>> not going to replace it!  That was the whole point of not annotating
>>> it in the first place ...
>> 
>> There are two use cases for find_constant_pool_ref ().  One is indeed
>> replacing annotated references.  The other (in s390_mainpool_start ()
>> and s390_chunkify_start ()) is creating pool entries.  So I've decided
>> to let it find unannotated references for the second use case.
> 
> OK, but if we access the constant via relative address, we don't need
> to copy it into the back-end managed pool either; the relative address
> can just refer the constant in the default pool maintained by the
> middle end.

Wouldn’t that prevent constant merging in case the same constant is
used with both relative and base-register addressing?


Re: [PATCH] Fix store merging wrong-code (PR tree-optimization/87859)

2018-11-05 Thread Richard Biener
On Mon, 5 Nov 2018, Jakub Jelinek wrote:

> Hi!
> 
> My recent change for PR86844 broke the following testcases.
> 
> The problem is that we are processing the stores in the bitpos/bitsize
> order.  The change PR86844 change was about if we try to merge two
> overlapping stores of INTEGER_CST, we need to go through all other stores
> that are overlapping those and come in program order before the last of
> those; if they are INTEGER_CST stores, we can merge them, if they are other
> stores, we punt completely.  If there are any stores ordered in the
> bitpos/bitsize ordering in between those (thus also overlapping), we keep
> them unmerged as is.
> 
> The problem is that if we this way skip any stores, we need to ensure that
> we don't merge this store group with any statements beyond that, because
> merged store group is stored at the point of the last statement, and that
> would overwrite the skipped stores.  Let's use the testcase:
>   i.f.i[0] = 0; // bitpos 0, bitsize 32
>   i.f.i[1] = 0; // bitpos 32, bitsize 32
>   i.f.i[2] = 0; // bitpos 64, bitsize 32
>   i.f.q.f7 = 1; // all other stores are bitpos the number after f, bitsize 1
>   i.f.q.f2 = 1;
>   i.f.q.f21 = 1;
>   i.f.q.f19 = 1;
>   i.f.q.f14 = 1;
>   i.f.q.f5 = 1;
>   i.f.q.f0 = 1;
>   i.f.q.f15 = 1;
>   i.f.q.f16 = 1;
>   i.f.q.f6 = 1;
>   i.f.q.f9 = 1;
>   i.f.q.f17 = 1;
>   i.f.q.f1 = 1;
>   i.f.q.f8 = 1;
>   i.f.q.f13 = 1;
>   i.f.q.f66 = 1;
> In the bitpos/bitsize order, this is:
>   i.f.i[0] = 0; // bitpos 0, bitsize 32
>   i.f.q.f0 = 1;
>   i.f.q.f1 = 1;
>   i.f.q.f2 = 1;
>   i.f.q.f5 = 1;
>   i.f.q.f6 = 1;
>   i.f.q.f7 = 1;
>   i.f.q.f8 = 1;
>   i.f.q.f9 = 1;
>   i.f.q.f13 = 1;
>   i.f.q.f14 = 1;
>   i.f.q.f15 = 1;
>   i.f.q.f16 = 1;
>   i.f.q.f17 = 1;
>   i.f.q.f19 = 1;
>   i.f.q.f21 = 1;
>   i.f.i[1] = 0; // bitpos 32, bitsize 32
>   i.f.i[2] = 0; // bitpos 64, bitsize 32
>   i.f.q.f66 = 1;
> and when trying to merge overlapping i.f.i[0] = 0; with i.f.q.f0 = 1;,
> we also merge in overlapping f7, f2, f21, f19, f14, f5 and skip
> overlapping, but coming after the f0 store, stores f1 .. f17 other than the
> above mentioned ones (so those stores remain in the source).  Now, in this
> case the merged store group would be emitted where the f0 store is and the
> optimization would be still correct.  The problem is that we keep going and
> merge that store group with i[1] and i[2] stores (that is ok, those are
> before the f0 store, so again, still at the spot of f0 store), but then also
> merge with f66 store and that makes it incorrect, the
>   i.f.q.f15 = 1;
>   i.f.q.f16 = 1;
>   i.f.q.f6 = 1;
>   i.f.q.f9 = 1;
>   i.f.q.f17 = 1;
>   i.f.q.f1 = 1;
>   i.f.q.f8 = 1;
>   i.f.q.f13 = 1;
> stores remain in the IL, but because the last store of the merged group is
> now at f66 store, that is where those bits are overwritten with the store
> of 96 bits together.  The following patch fixes that by making sure that if
> we skip this way any stores (keep them in the IL), then we ensure that we
> don't grow the store group beyond that last_order point.
> 
> Now, the first testcase also shows that we skip those stmts completely
> unnecessarily, while they are overlapping stores that come after the
> latter of the two overlapping stores being merged, they are still
> INTEGER_CST stores and in the past we've merged them all into a constant
> store just fine.  The reason PR86844 was done is that there can be
> non-INTEGER_CST stores that prevent the merging.  So, this patch also
> remembers which of the overlapping stores we would skip are INTEGER_CST
> stores and which are other stores; if we see any to be skipped INTEGER_CSTs,
> we retry the analysis with including those too; if that works out, we use
> that, if it doesn't, we revert to what we chose before (skipping those
> stores with magic to ensure we don't merge with other stores later in
> program order).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and 8.3?

OK.

Thanks,
Richard.

> 2018-11-05  Jakub Jelinek  
> 
>   PR tree-optimization/87859
>   * gimple-ssa-store-merging.c (struct merged_store_group): Add
>   only_constants and first_nonmergeable_order members.
>   (merged_store_group::merged_store_group): Initialize them.
>   (merged_store_group::do_merge): Clear only_constants member if
>   adding something other than INTEGER_CST store.
>   (imm_store_chain_info::coalesce_immediate_stores): Don't merge
>   stores with order >= first_nonmergeable_order.  Use
>   merged_store->only_constants instead of always recomputing it.
>   Set merged_store->first_nonmergeable_order if we've skipped any
>   stores.  Attempt to merge overlapping INTEGER_CST stores that
>   we would otherwise skip.
> 
>   * gcc.dg/store_merging_24.c: New test.
>   * gcc.dg/store_merging_25.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.c.jj 2018-11-02 15:36:30.802999681 +0100
> +++ gcc/gimple-ssa-store-merging.c2018-11-05 10:2

Re: [PATCH] Fix -fsanitize=undefined vs. x + y < x (PR sanitizer/87837)

2018-11-05 Thread Jakub Jelinek
On Mon, Nov 05, 2018 at 11:03:28AM +0100, Richard Biener wrote:
> On Mon, 5 Nov 2018, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > I wish I had a better fix, but I don't, trying to sanitize signed integer
> > arithmetics in the FEs already before any folding there is complicated by
> > that arithmetics being created just in way too many spots.
> 
> I suppose we could play some tricks and "unset" TYPE_OVERFLOW_SANITIZED
> after instrumentation finished?

Yes, e.g. have some cfun-> flag or property that would be cleared during the
ubsan pass (and clear from the beginning if not sanitizing integer
overflows).

Jakub


Re: [aarch64] disable shrink wrapping when tracking speculative execution

2018-11-05 Thread Richard Biener
On Fri, Nov 2, 2018 at 3:04 PM Richard Earnshaw (lists)
 wrote:
>
> On 02/11/2018 13:53, Richard Biener wrote:
> > On Fri, Nov 2, 2018 at 2:38 PM Richard Earnshaw (lists)
> >  wrote:
> >>
> >> Although there's no fundamental reason why shrink wrapping and
> >> speculation tracking are incompatible, a phase-ordering requirement (we
> >> need to do speculation tracking before the final basic block clean-up)
> >> means that the shrink wrapping pass can undo some of the changes the
> >> speculation tracking pass makes.  The result is that the tracking, while
> >> still safe is less comprehensive than we really want.
> >>
> >> So to keep things simple, and because the tracking code is quite
> >> expensive anyway, it seems best to just disable that pass when we are
> >> tracking speculative execution.
> >
> > Shouldn't you be able to do this per function at least?
> >
>
> do what per function?  track speculation?

disable shrink-wrapping only when any speculation was there
(this is about __bultin_speculation_safe_value, no?)

Richard.

> R.
>
> > Richard.
> >
> >> * config/aarch64/aarch64.c (aarch64_override_options): Disable
> >> shrink-wrapping when -mtrack-speculation.
> >>
> >> Committed.
>


Re: [aarch64] disable shrink wrapping when tracking speculative execution

2018-11-05 Thread Richard Earnshaw (lists)
On 05/11/2018 10:05, Richard Biener wrote:
> On Fri, Nov 2, 2018 at 3:04 PM Richard Earnshaw (lists)
>  wrote:
>>
>> On 02/11/2018 13:53, Richard Biener wrote:
>>> On Fri, Nov 2, 2018 at 2:38 PM Richard Earnshaw (lists)
>>>  wrote:

 Although there's no fundamental reason why shrink wrapping and
 speculation tracking are incompatible, a phase-ordering requirement (we
 need to do speculation tracking before the final basic block clean-up)
 means that the shrink wrapping pass can undo some of the changes the
 speculation tracking pass makes.  The result is that the tracking, while
 still safe is less comprehensive than we really want.

 So to keep things simple, and because the tracking code is quite
 expensive anyway, it seems best to just disable that pass when we are
 tracking speculative execution.
>>>
>>> Shouldn't you be able to do this per function at least?
>>>
>>
>> do what per function?  track speculation?
> 
> disable shrink-wrapping only when any speculation was there
> (this is about __bultin_speculation_safe_value, no?)
> 

Only indirectly.  This is about the tracking code that tracks
conditional branches and propagates that information through call/return
sequences.  Shrink wrapping messes with the prologue/epilogue sequences
after the speculation tracking pass has run and unknowingly deletes some
of the additional code that was previously inserted by the tracking pass.

R.

> Richard.
> 
>> R.
>>
>>> Richard.
>>>
 * config/aarch64/aarch64.c (aarch64_override_options): Disable
 shrink-wrapping when -mtrack-speculation.

 Committed.
>>



Re: [PATCH 1/3][GCC] Add new target hook asm_post_cfi_startproc

2018-11-05 Thread Sam Tebbs


On 11/05/2018 07:54 AM, Richard Biener wrote:
> On Fri, 2 Nov 2018, Sam Tebbs wrote:
>
>> On 11/02/2018 05:28 PM, Sam Tebbs wrote:
>>
>>> Hi all,
>>>
>>> This patch adds a new target hook called "asm_post_cfi_startproc". This 
>>> hook is
>>> intended to be used by the aarch64 backend to emit a directive that enables
>>> support for unwinding frames signed with the pointer authentication B-key. 
>>> This
>>> hook is triggered after the ".cfi_startproc" directive is emitted in
>>> gcc/dwarf2out.c.
>>>
>>> Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf with 
>>> no regressions.
>>>
>>> Ok for trunk?
> Can you explain why existing prologue/cfi emission points are not
> enough?

I couldn't find any target hooks that were triggered at the 
assembly-printing level at the correct point in time (after 
.cfi_startproc is emitted), please do point me to one if that is not the 
case.

An alternative could have been to implement a new reg_note but that 
would have meant adding target-specific code to target-agnostic files 
and wouldn't have been as flexible.

Sam

>
>>> gcc/
>>> 2018-11-02  Sam Tebbs
>>>
>>> * doc/tm.texi (TARGET_ASM_POST_CFI_STARTPROC): Define.
>>> * doc/tm.texi.in (TARGET_ASM_POST_CFI_STARTPROC): Define.
>>> * dwarf2out.c (dwarf2out_do_cfi_startproc): Trigger the hook.
>>> * hooks.c (hook_void_FILEptr_tree): Define.
>>> * hooks.h (hook_void_FILEptr_tree): Define.
>>> * target.def (post_cfi_startproc): Define.
>> CCing global reviewers and dwarf maintainers.
>>
>>



[PATCH] Rip out SCEV cprop

2018-11-05 Thread Richard Biener


This rips out SCEV constant propagation, keeping only final value
replacement in the scev_cprop pass.  There's not a single testcase
that shows "SCEV" constant propagation is doing sth useful.

This takes us one step closer of removing the pass completely,
doing final value replacement only when it helps another transform
like DCE (remove an otherwise empty loop) or vectorization
(removing a reduction).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2018-11-05  Richard Biener  

* tree-scalar-evolution.h (final_value_replacement_loop): Update
prototype.
* tree-scalar-evolution.c (final_value_replacement_loop): Return
whether anything was done.
(scev_const_prop): Remove constant propagation part, fold
remains into ...
* tree-ssa-loop.c (pass_scev_cprop::execute): ... here.
(pass_data_scev_cprop): TODO_cleanup_cfg is now done
conditionally.

* gcc.dg/pr41488.c: Scan ivcanon dump instead of sccp one.
* gcc.dg/tree-ssa/scev-7.c: Likewise.

Index: gcc/testsuite/gcc.dg/pr41488.c
===
--- gcc/testsuite/gcc.dg/pr41488.c  (revision 265791)
+++ gcc/testsuite/gcc.dg/pr41488.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-sccp-scev" } */
+/* { dg-options "-O2 -fdump-tree-ivcanon-scev" } */
 
 struct struct_t
 {
@@ -14,4 +14,4 @@ void foo (struct struct_t* sp, int start
 sp->data[i+start] = 0;
 }
 
-/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into 
POLYNOMIAL_CHREC" 1 "sccp" } } */
+/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into 
POLYNOMIAL_CHREC" 1 "ivcanon" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/scev-7.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/scev-7.c  (revision 265791)
+++ gcc/testsuite/gcc.dg/tree-ssa/scev-7.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-sccp-scev" } */
+/* { dg-options "-O2 -fdump-tree-ivcanon-scev" } */
 
 struct struct_t
 {
@@ -14,4 +14,4 @@ void foo (struct struct_t* sp, int start
 sp->data[i+start] = 0;
 }
 
-/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into 
POLYNOMIAL_CHREC" 1 "sccp" } } */
+/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into 
POLYNOMIAL_CHREC" 1 "ivcanon" } } */
Index: gcc/tree-scalar-evolution.c
===
--- gcc/tree-scalar-evolution.c (revision 265791)
+++ gcc/tree-scalar-evolution.c (working copy)
@@ -3537,20 +3537,20 @@ expression_expensive_p (tree expr)
 }
 }
 
-/* Do final value replacement for LOOP.  */
+/* Do final value replacement for LOOP, return true if we did anything.  */
 
-void
+bool
 final_value_replacement_loop (struct loop *loop)
 {
   /* If we do not know exact number of iterations of the loop, we cannot
  replace the final value.  */
   edge exit = single_exit (loop);
   if (!exit)
-return;
+return false;
 
   tree niter = number_of_latch_executions (loop);
   if (niter == chrec_dont_know)
-return;
+return false;
 
   /* Ensure that it is possible to insert new statements somewhere.  */
   if (!single_pred_p (exit->dest))
@@ -3563,6 +3563,7 @@ final_value_replacement_loop (struct loo
 = superloop_at_depth (loop,
  loop_depth (exit->dest->loop_father) + 1);
 
+  bool any = false;
   gphi_iterator psi;
   for (psi = gsi_start_phis (exit->dest); !gsi_end_p (psi); )
 {
@@ -3620,6 +3621,7 @@ final_value_replacement_loop (struct loo
  fprintf (dump_file, " with expr: ");
  print_generic_expr (dump_file, def);
}
+  any = true;
   def = unshare_expr (def);
   remove_phi_node (&psi, false);
 
@@ -3662,100 +3664,8 @@ final_value_replacement_loop (struct loo
  fprintf (dump_file, "\n");
}
 }
-}
-
-/* Replace ssa names for that scev can prove they are constant by the
-   appropriate constants.  Also perform final value replacement in loops,
-   in case the replacement expressions are cheap.
-
-   We only consider SSA names defined by phi nodes; rest is left to the
-   ordinary constant propagation pass.  */
-
-unsigned int
-scev_const_prop (void)
-{
-  basic_block bb;
-  tree name, type, ev;
-  gphi *phi;
-  struct loop *loop;
-  bitmap ssa_names_to_remove = NULL;
-  unsigned i;
-  gphi_iterator psi;
-
-  if (number_of_loops (cfun) <= 1)
-return 0;
-
-  FOR_EACH_BB_FN (bb, cfun)
-{
-  loop = bb->loop_father;
-
-  for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi))
-   {
- phi = psi.phi ();
- name = PHI_RESULT (phi);
-
- if (virtual_operand_p (name))
-   continue;
-
- type = TREE_TYPE (name);
-
- if (!POINTER_TYPE_P (type)
- && !INTEGRAL_TYPE_P (type))
-   continue;
-
-  

Re: [PATCH v3 3/3] or1k: gcc: initial support for openrisc

2018-11-05 Thread Szabolcs Nagy
On 04/11/18 09:05, Stafford Horne wrote:
> On Mon, Oct 29, 2018 at 02:28:11PM +, Szabolcs Nagy wrote:
>> On 27/10/18 05:37, Stafford Horne wrote:
...
>>> +#undef LINK_SPEC
>>> +#define LINK_SPEC "%{h*}   \
>>> +   %{static:-Bstatic}  \
>>> +   %{shared:-shared}   \
>>> +   %{symbolic:-Bsymbolic}  \
>>> +   %{!static:  \
>>> + %{rdynamic:-export-dynamic}   \
>>> + %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}"
>>> +
>>> +#endif /* GCC_OR1K_LINUX_H */
>>
>> note that because of the -static-pie mess each
>> target needs a more complicated LINK_SPEC now.
> 
> Hello,
> 
> Does something like this look better?
> 
> --- a/gcc/config/or1k/linux.h
> +++ b/gcc/config/or1k/linux.h
> @@ -37,8 +37,9 @@
> %{static:-Bstatic}  \
> %{shared:-shared}   \
> %{symbolic:-Bsymbolic}  \
> -   %{!static:  \
> +   %{!static:%{!static-pie:\
>   %{rdynamic:-export-dynamic}   \
> - %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}"
> + %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}} \
> +   %{static-pie:-Bstatic -pie --no-dynamic-linker -z text}"
>  
>  #endif /* GCC_OR1K_LINUX_H */

looks ok.

> I have tested this out with or1k-linux-musl, but I get some LD complaints i.e.
> 
> .../or1k-linux-musl/bin/ld: .../or1k-linux-musl/lib/libc.a(exit.o): non-pic 
> relocation against symbol __fini_array_end
> .../or1k-linux-musl/bin/ld: .../or1k-linux-musl/lib/libc.a(exit.o): non-pic 
> relocation against symbol __fini_array_start
> 
> Those are some warnings we recently added to LD, perhaps I need to rebuild the
> libc.a with PIE as well.  I will try it out, but if anyone has some 
> suggestions
> that would be helpful.

yes, musl does not build libc.a with pic by default,
either use a gcc configured with --enable-default-pie
or CC='gcc -fPIC' when building musl.

after that -static-pie linking should work.

(maybe musl should have an --enable-static-pie config
option to make this simpler)


[PATCH][GCC] Make DR_TARGET_ALIGNMENT compile time variable

2018-11-05 Thread Andre Vieira (lists)

Hi,

This patch enables targets to describe DR_TARGET_ALIGNMENT as a
compile-time variable.  It does so by turning the variable into a
'poly_uint64'.  This should not affect the current code-generation for
any target.

We hope to use this in the near future for SVE using the
current_vector_size as the preferred target alignment for vectors.  In
fact I have a patch to do just this, but I am still trying to figure out
whether and when it is beneficial to peel for alignment with a runtime
misalignment.  The patch I am working on will change the behavior of
auto-vectorization for SVE when building vector-length agnostic code for
targets that benefit from aligned vector loads/stores.  The patch will
result in  the generation of a runtime computation of misalignment and
the construction of a corresponding mask for the first iteration of the
loop.

I have decided to not offer support for prolog/epilog peeling when the
target alignment is not compile-time constant, as this didn't seem
useful, this is why 'vect_do_peeling' returns early if
DR_TARGET_ALIGNMENT is not constant.

I bootstrapped and tested this on aarch64 and x86 basically
bootstrapping one target that uses this hook and one that doesn't.

Is this OK for trunk?

Cheers,
Andre

2018-11-05  Andre Vieira  

* config/aarch64/aarch64.c 
(aarch64_vectorize_preferred_vector_alignment):
Change return type to poly_uint64.
(aarch64_simd_vector_alignment_reachable): Adapt to preferred vector
alignment being a poly int.
* doc/tm.texi (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Change 
return
type to poly_uint64.
* target.def (default_preferred_vector_alignment): Likewise.
* targhooks.c (default_preferred_vector_alignment): Likewise.
* targhooks.h (default_preferred_vector_alignment): Likewise.
* tree-vect-data-refs.c
(vect_calculate_target_alignment): Likewise.
(vect_compute_data_ref_alignment): Adapt to vector alignment
being a poly int.
(vect_update_misalignment_for_peel): Likewise.
(vect_enhance_data_refs_alignment): Likewise.
(vect_find_same_alignment_drs): Likewise.
(vect_duplicate_ssa_name_ptr_info): Likewise.
(vect_setup_realignment): Likewise.
(vect_can_force_dr_alignment_p): Change alignment parameter type to
poly_uint64.
* tree-vect-loop-manip.c (get_misalign_in_elems): Learn to construct a 
mask
with a compile time variable vector alignment.
(vect_gen_prolog_loop_niters): Adapt to vector alignment being a poly 
int.
(vect_do_peeling): Exit early if vector alignment is not constant.
* tree-vect-stmts.c (ensure_base_align): Adapt to vector alignment 
being a
poly int.
(vectorizable_store): Likewise.
(vectorizable_load): Likweise.
* tree-vectorizer.h (struct dr_vec_info): Make target_alignment field a
poly_uint64.
(vect_known_alignment_in_bytes): Adapt to vector alignment being a poly
int.
(vect_can_force_dr_alignment_p): Change alignment parameter type to
poly_uint64.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4c7790826658539f71f2fd9eb9ea0329081938be..19f084abefd76255d1a4a0b726e51c7025b9cea6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14120,7 +14120,7 @@ aarch64_simd_vector_alignment (const_tree type)
 }
 
 /* Implement target hook TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT.  */
-static HOST_WIDE_INT
+static poly_uint64
 aarch64_vectorize_preferred_vector_alignment (const_tree type)
 {
   if (aarch64_sve_data_mode_p (TYPE_MODE (type)))
@@ -14145,9 +14145,11 @@ aarch64_simd_vector_alignment_reachable (const_tree type, bool is_packed)
   /* For fixed-length vectors, check that the vectorizer will aim for
  full-vector alignment.  This isn't true for generic GCC vectors
  that are wider than the ABI maximum of 128 bits.  */
+  poly_uint64 preferred_alignment =
+aarch64_vectorize_preferred_vector_alignment (type);
   if (TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
-  && (wi::to_widest (TYPE_SIZE (type))
-	  != aarch64_vectorize_preferred_vector_alignment (type)))
+  && maybe_ne (wi::to_widest (TYPE_SIZE (type)),
+		   preferred_alignment))
 return false;
 
   /* Vectors whose size is <= BIGGEST_ALIGNMENT are naturally aligned.  */
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 0fcf8069b8cc948fcaf5604a1235fe269de7e8f3..328eb43ca2495dd889bc47cf136761381a594cdf 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5889,7 +5889,7 @@ For vector memory operations the cost may depend on type (@var{vectype}) and
 misalignment value (@var{misalign}).
 @end deftypefn
 
-@deftypefn {Target Hook} HOST_WIDE_INT TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT (const_tree @var{type})
+@deftypefn {Target Hook} poly_uint64 TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT (const_tree @var{type})
 This hook returns the preferred alignment in bi

Re: PR83750: CSE erf/erfc pair

2018-11-05 Thread Prathamesh Kulkarni
On Mon, 5 Nov 2018 at 15:10, Richard Biener  wrote:
>
> On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
>  wrote:
> >
> > Hi,
> > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > erf(x) when canonicalization is disabled and result of erf(x) has
> > single use within 1 - erf(x).
> >
> > The patch regressed builtin-nonneg-1.c. The following test-case
> > reproduces the issue with patch:
> >
> > void test(double d1) {
> >   if (signbit(erfc(d1)))
> > link_failure_erfc();
> > }
> >
> > ssa dump:
> >
> >:
> >   _5 = __builtin_erf (d1_4(D));
> >   _1 = 1.0e+0 - _5;
> >   _6 = _1 < 0.0;
> >   _2 = (int) _6;
> >   if (_2 != 0)
> > goto ; [INV]
> >   else
> > goto ; [INV]
> >
> >:
> >   link_failure_erfc ();
> >
> >:
> >   return;
> >
> > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > forwprop then transforms the if condition from _2 != 0
> > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > in undefined reference to link_failure_erfc().
> >
> > So, the patch adds another transform erf(x) > 1 -> 0.
>
> Ick.
>
> Why not canonicalize erf (x) to 1-erfc(x) instead?
Sorry I didn't quite follow, won't this cause similar issue with erf ?
I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
and 1 - erfc(x) -> erf(x) after canonicalization is disabled.

This caused undefined reference to link_failure_erf() in following test-case:

extern int signbit(double);
extern void link_failure_erf(void);
extern double erf(double);

void test(double d1) {
  if (signbit(erf(d1)))
link_failure_erf();
}

forwprop1 shows:

:
  _5 = __builtin_erfc (d1_4(D));
  _1 = 1.0e+0 - _5;
  _6 = _5 > 1.0e+0;
  _2 = (int) _6;
  if (_5 > 1.0e+0)
goto ; [INV]
  else
goto ; [INV]

   :
  link_failure_erf ();

   :
  return;

which defeats DCE to remove call to link_failure_erf.

Thanks,
Prathamesh
>
> > which resolves the regression.
> >
> > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > Cross-testing on arm and aarch64 variants in progress.
> > OK for trunk if passes ?
> >
> > Thanks,
> > Prathamesh


Re: [PATCH 1/4] Fix string pool statistics.

2018-11-05 Thread Martin Liška
On 11/5/18 10:52 AM, Richard Biener wrote:
> On Mon, Nov 5, 2018 at 9:07 AM marxin  wrote:
>>
>>
>> libcpp/ChangeLog:
> 
> Hmm, the patch suggests the flag might be instead
> part of cpp_hash_table instead of each individual
> ht_identifier?  Or the patch is confused when it
> sets HT_GGC to 1 even in
> 
>else
> -HT_STR (node) = (const unsigned char *) obstack_copy0 (&table->stack,
> -  str, len);
> +{
> +  HT_STR (node) = (const unsigned char *) obstack_copy0 (&table->stack,
> +str, len);
> +  HT_GGC (node) = 1;
> +}
> 
> ?  Do we really support mixed operation here?

No, simplified in attached patch.

Martin

> 
>> 2018-11-02  Martin Liska  
>>
>> * include/symtab.h (ht_identifier):
>> Make room for ggc flag.
>> * symtab.c (ht_lookup_with_hash): Mark
>> GGC and non-GGC allocated strings.
>> (ht_dump_statistics): Use the information.
>> ---
>>  libcpp/include/symtab.h |  4 +++-
>>  libcpp/symtab.c | 28 +++-
>>  2 files changed, 22 insertions(+), 10 deletions(-)
>>
>From d615abbe29f0c99801de27533ec69ec63991bf8e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 2 Nov 2018 10:51:23 +0100
Subject: [PATCH 1/4] Fix string pool statistics.

libcpp/ChangeLog:

2018-11-05  Martin Liska  

	* symtab.c (ht_dump_statistics): Make dump conditional
	based on alloc_subobject.
---
 libcpp/symtab.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/libcpp/symtab.c b/libcpp/symtab.c
index fd86c849f7f..e6e5bcb1cef 100644
--- a/libcpp/symtab.c
+++ b/libcpp/symtab.c
@@ -304,7 +304,6 @@ ht_dump_statistics (cpp_hash_table *table)
   while (++p < limit);
 
   nelts = table->nelements;
-  overhead = obstack_memory_used (&table->stack) - total_bytes;
   headers = table->nslots * sizeof (hashnode);
 
   fprintf (stderr, "\nString pool\nentries\t\t%lu\n",
@@ -315,9 +314,16 @@ ht_dump_statistics (cpp_hash_table *table)
 	   (unsigned long) table->nslots);
   fprintf (stderr, "deleted\t\t%lu\n",
 	   (unsigned long) deleted);
-  fprintf (stderr, "bytes\t\t%lu%c (%lu%c overhead)\n",
-	   SCALE (total_bytes), LABEL (total_bytes),
-	   SCALE (overhead), LABEL (overhead));
+
+  if (table->alloc_subobject)
+fprintf (stderr, "GGC bytes\t%lu%c\n",
+	 SCALE (total_bytes), LABEL (total_bytes));
+  else
+{
+  overhead = obstack_memory_used (&table->stack) - total_bytes;
+  fprintf (stderr, "obstack bytes\t%lu%c (%lu%c overhead)\n",
+	   SCALE (total_bytes), LABEL (total_bytes));
+}
   fprintf (stderr, "table size\t%lu%c\n",
 	   SCALE (headers), LABEL (headers));
 
-- 
2.19.1



Re: [PATCH 3/4] Fix vector memory statistics.

2018-11-05 Thread Martin Liška
On 11/5/18 10:56 AM, Richard Biener wrote:
> On Mon, Nov 5, 2018 at 9:07 AM marxin  wrote:
>>
>>
>> gcc/ChangeLog:
> 
>/* Release PTR pointer of SIZE bytes. If REMOVE_FROM_MAP is set to true,
>   remove the instance from reverse map.  */
> -  void release_instance_overhead (void *ptr, size_t size,
> - bool remove_from_map = false);
> +  T * release_instance_overhead (void *ptr, size_t size,
> +bool remove_from_map = false);
> 
> can you document the return value?

Sure, fixed in attached patch.

Martin

> 
> Otherwise OK.
> 
> Richard.
> 
>> 2018-11-02  Martin Liska  
>>
>> * mem-stats.h (mem_alloc_description::release_instance_overhead):
>> Return T *.
>> * vec.c (struct vec_usage): Register m_element_size.
>> (vec_prefix::register_overhead): New arguments: elements and
>> element_size.
>> (vec_prefix::release_overhead): Subtract elements.
>> * vec.h (struct vec_prefix): Change signature.
>> (va_heap::reserve): Pass proper arguments.
>> (va_heap::release): Likewise.
>> ---
>>  gcc/mem-stats.h | 14 --
>>  gcc/vec.c   | 34 +-
>>  gcc/vec.h   | 12 
>>  3 files changed, 37 insertions(+), 23 deletions(-)
>>

>From ff75c10bc06eb9c172188cb49863cd78fd0540b7 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 2 Nov 2018 12:52:28 +0100
Subject: [PATCH 3/4] Fix vector memory statistics.

gcc/ChangeLog:

2018-11-02  Martin Liska  

	* mem-stats.h (mem_alloc_description::release_instance_overhead):
	Return T *.
	* vec.c (struct vec_usage): Register m_element_size.
	(vec_prefix::register_overhead): New arguments: elements and
	element_size.
	(vec_prefix::release_overhead): Subtract elements.
	* vec.h (struct vec_prefix): Change signature.
	(va_heap::reserve): Pass proper arguments.
	(va_heap::release): Likewise.
---
 gcc/mem-stats.h | 17 ++---
 gcc/vec.c   | 34 +-
 gcc/vec.h   | 12 
 3 files changed, 39 insertions(+), 24 deletions(-)

diff --git a/gcc/mem-stats.h b/gcc/mem-stats.h
index 3ef6d53dfa6..b7f7e06a1c7 100644
--- a/gcc/mem-stats.h
+++ b/gcc/mem-stats.h
@@ -340,9 +340,10 @@ public:
   void register_object_overhead (T *usage, size_t size, const void *ptr);
 
   /* Release PTR pointer of SIZE bytes. If REMOVE_FROM_MAP is set to true,
- remove the instance from reverse map.  */
-  void release_instance_overhead (void *ptr, size_t size,
-  bool remove_from_map = false);
+ remove the instance from reverse map.  Return memory usage that belongs
+ to this memory description.  */
+  T * release_instance_overhead (void *ptr, size_t size,
+ bool remove_from_map = false);
 
   /* Release intance object identified by PTR pointer.  */
   void release_object_overhead (void *ptr);
@@ -503,7 +504,7 @@ mem_alloc_description::register_overhead (size_t size,
 /* Release PTR pointer of SIZE bytes.  */
 
 template 
-inline void
+inline T *
 mem_alloc_description::release_instance_overhead (void *ptr, size_t size,
 		 bool remove_from_map)
 {
@@ -512,14 +513,16 @@ mem_alloc_description::release_instance_overhead (void *ptr, size_t size,
   if (!slot)
 {
   /* Due to PCH, it can really happen.  */
-  return;
+  return NULL;
 }
 
-  mem_usage_pair usage_pair = *slot;
-  usage_pair.usage->release_overhead (size);
+  T *usage = (*slot).usage;
+  usage->release_overhead (size);
 
   if (remove_from_map)
 m_reverse_map->remove (ptr);
+
+  return usage;
 }
 
 /* Release intance object identified by PTR pointer.  */
diff --git a/gcc/vec.c b/gcc/vec.c
index ff2456aead9..bfd52856e46 100644
--- a/gcc/vec.c
+++ b/gcc/vec.c
@@ -52,13 +52,14 @@ vnull vNULL;
 struct vec_usage: public mem_usage
 {
   /* Default constructor.  */
-  vec_usage (): m_items (0), m_items_peak (0) {}
+  vec_usage (): m_items (0), m_items_peak (0), m_element_size (0) {}
 
   /* Constructor.  */
   vec_usage (size_t allocated, size_t times, size_t peak,
-	 size_t items, size_t items_peak)
+	 size_t items, size_t items_peak, size_t element_size)
 : mem_usage (allocated, times, peak),
-m_items (items), m_items_peak (items_peak) {}
+m_items (items), m_items_peak (items_peak),
+m_element_size (element_size) {}
 
   /* Sum the usage with SECOND usage.  */
   vec_usage
@@ -68,7 +69,7 @@ struct vec_usage: public mem_usage
 		  m_times + second.m_times,
 		  m_peak + second.m_peak,
 		  m_items + second.m_items,
-		  m_items_peak + second.m_items_peak);
+		  m_items_peak + second.m_items_peak, 0);
   }
 
   /* Dump usage coupled to LOC location, where TOTAL is sum of all rows.  */
@@ -81,7 +82,8 @@ struct vec_usage: public mem_usage
 
 s[48] = '\0';
 
-fprintf (stderr, "%-48s %10li:%4.1f%%%10li%10li:%4.1f%%%11li%11li\n", s,
+fprintf (stderr, "%-48s %10li%11li:%4.1f%%%10li%10li:%4.1f%%%11li%11li\n", s,
+	 (l

Re: [PATCH] combine: Do not combine moves from hard registers

2018-11-05 Thread Renlin Li

Hi Segher,

On 11/03/2018 02:34 AM, Jeff Law wrote:

On 11/2/18 5:54 PM, Segher Boessenkool wrote:

On Fri, Nov 02, 2018 at 06:03:20PM -0500, Segher Boessenkool wrote:

The original rtx is generated by expand_builtin_setjmp_receiver to adjust
the frame pointer.

And later in LRA, it will try to eliminate frame_pointer with hard frame
pointer which is
defined the ELIMINABLE_REGS.

Your change split the insn into two.
This makes it doesn't match the "from" and "to" regs defined in
ELIMINABLE_REGS.
The if statement to generate the adjustment insn is been skipt.
And the original instruction is just been deleted!

I don't follow why, or what should have prevented it from being deleted.


Probably, we don't want to split the move rtx if they are related to
entries defined in ELIMINABLE_REGS?

One thing I can easily do is not making an intermediate pseudo when copying
*to* a fixed reg, which sfp is.  Let me try if that helps the testcase I'm
looking at (setjmp-4.c).

This indeed helps, see patch below.  Could you try that on the whole
testsuite?

Thanks,


Segher


p.s. It still is a problem in the arm backend, but this won't hurt combine,
so why not.


 From 814ca23ce05384d017b3c2bff41ab61cf5446e46 Mon Sep 17 00:00:00 2001
Message-Id: 
<814ca23ce05384d017b3c2bff41ab61cf5446e46.1541202704.git.seg...@kernel.crashing.org>
From: Segher Boessenkool 
Date: Fri, 2 Nov 2018 23:33:32 +
Subject: [PATCH] combine: Don't break up copy from hard to fixed reg

---
  gcc/combine.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index dfb0b44..15e941a 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14998,6 +14998,8 @@ make_more_copies (void)
continue;
  if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
continue;
+ if (REG_P (dest) && TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest)))
+   continue;
  
  	  rtx new_reg = gen_reg_rtx (GET_MODE (dest));

  rtx_insn *new_insn = gen_move_insn (new_reg, src);
-- 1.8.3.1

It certainly helps the armeb test results.


Yes, I can also see it helps a lot with the regression test.
Thanks for working on it!


Beside the correctness issue, there are performance regression issues as other 
people also reported.

I analysised a case, which is gcc.c-torture/execute/builtins/memcpy-chk.c
In this case, two additional register moves and callee saves are emitted.

The problem is that, make_more_moves split a move into two. Ideally, the RA 
could figure out and
make the best register allocation. However, in reality, scheduler in some cases 
will reschedule
the instructions, and which changes the live-range of registers. And thus 
change the interference graph
of pseudo registers.

This will force the RA to choose a different register for it, and make the move 
instruction not redundant,
at least, not possible for RA to eliminate it.

For example,

set r102, r1

After combine:
insn x: set r103, r1
insn x+1: set r22, r103

After scheduler:
insn x: set r103, r1
...
...
...
insn x+1: set r102, r103

After IRA, r1 could be assigned to operands used in instructions in between 
insn x and x+1.
so r23 is conflicting with r1. LRA has to assign r23 a different hard register.
This cause one additional move, and probably one more callee save/restore.

Nothing is obviously wrong here. But...

One simple case probably not beneficial is to split hard register store.
According to your comment on make_more_moves, you might want to apply the 
transformation only
on hard-reg-to-pseudo-copy?

Regards,
Renlin






Jeff



Re: [PATCH][GCC] Make DR_TARGET_ALIGNMENT compile time variable

2018-11-05 Thread Richard Biener
On Mon, Nov 5, 2018 at 1:07 PM Andre Vieira (lists)
 wrote:
>
>
> Hi,
>
> This patch enables targets to describe DR_TARGET_ALIGNMENT as a
> compile-time variable.  It does so by turning the variable into a
> 'poly_uint64'.  This should not affect the current code-generation for
> any target.
>
> We hope to use this in the near future for SVE using the
> current_vector_size as the preferred target alignment for vectors.  In
> fact I have a patch to do just this, but I am still trying to figure out
> whether and when it is beneficial to peel for alignment with a runtime
> misalignment.

In fact in most cases I have seen the issue is that it's not visible whether
peeling will be able to align _all_ references and doing peeling only to
align some is hardly beneficial.  To improve things the vectorizer would
have to version the loop for the case where peeling can reach alignment
for a group of DRs and then vectorize one copy with peeling for alignment
and one copy with unaligned accesses.

>  The patch I am working on will change the behavior of
> auto-vectorization for SVE when building vector-length agnostic code for
> targets that benefit from aligned vector loads/stores.  The patch will
> result in  the generation of a runtime computation of misalignment and
> the construction of a corresponding mask for the first iteration of the
> loop.
>
> I have decided to not offer support for prolog/epilog peeling when the
> target alignment is not compile-time constant, as this didn't seem
> useful, this is why 'vect_do_peeling' returns early if
> DR_TARGET_ALIGNMENT is not constant.
>
> I bootstrapped and tested this on aarch64 and x86 basically
> bootstrapping one target that uses this hook and one that doesn't.
>
> Is this OK for trunk?

The patch looks good but I wonder wheter it is really necessary at this
point.

Thanks,
Richard.

> Cheers,
> Andre
>
> 2018-11-05  Andre Vieira  
>
> * config/aarch64/aarch64.c 
> (aarch64_vectorize_preferred_vector_alignment):
> Change return type to poly_uint64.
> (aarch64_simd_vector_alignment_reachable): Adapt to preferred vector
> alignment being a poly int.
> * doc/tm.texi (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Change 
> return
> type to poly_uint64.
> * target.def (default_preferred_vector_alignment): Likewise.
> * targhooks.c (default_preferred_vector_alignment): Likewise.
> * targhooks.h (default_preferred_vector_alignment): Likewise.
> * tree-vect-data-refs.c
> (vect_calculate_target_alignment): Likewise.
> (vect_compute_data_ref_alignment): Adapt to vector alignment
> being a poly int.
> (vect_update_misalignment_for_peel): Likewise.
> (vect_enhance_data_refs_alignment): Likewise.
> (vect_find_same_alignment_drs): Likewise.
> (vect_duplicate_ssa_name_ptr_info): Likewise.
> (vect_setup_realignment): Likewise.
> (vect_can_force_dr_alignment_p): Change alignment parameter type to
> poly_uint64.
> * tree-vect-loop-manip.c (get_misalign_in_elems): Learn to construct 
> a mask
> with a compile time variable vector alignment.
> (vect_gen_prolog_loop_niters): Adapt to vector alignment being a poly 
> int.
> (vect_do_peeling): Exit early if vector alignment is not constant.
> * tree-vect-stmts.c (ensure_base_align): Adapt to vector alignment 
> being a
> poly int.
> (vectorizable_store): Likewise.
> (vectorizable_load): Likweise.
> * tree-vectorizer.h (struct dr_vec_info): Make target_alignment field 
> a
> poly_uint64.
> (vect_known_alignment_in_bytes): Adapt to vector alignment being a 
> poly
> int.
> (vect_can_force_dr_alignment_p): Change alignment parameter type to
> poly_uint64.


Re: PR83750: CSE erf/erfc pair

2018-11-05 Thread Richard Biener
On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni
 wrote:
>
> On Mon, 5 Nov 2018 at 15:10, Richard Biener  
> wrote:
> >
> > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
> >  wrote:
> > >
> > > Hi,
> > > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > > erf(x) when canonicalization is disabled and result of erf(x) has
> > > single use within 1 - erf(x).
> > >
> > > The patch regressed builtin-nonneg-1.c. The following test-case
> > > reproduces the issue with patch:
> > >
> > > void test(double d1) {
> > >   if (signbit(erfc(d1)))
> > > link_failure_erfc();
> > > }
> > >
> > > ssa dump:
> > >
> > >:
> > >   _5 = __builtin_erf (d1_4(D));
> > >   _1 = 1.0e+0 - _5;
> > >   _6 = _1 < 0.0;
> > >   _2 = (int) _6;
> > >   if (_2 != 0)
> > > goto ; [INV]
> > >   else
> > > goto ; [INV]
> > >
> > >:
> > >   link_failure_erfc ();
> > >
> > >:
> > >   return;
> > >
> > > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > > forwprop then transforms the if condition from _2 != 0
> > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > > in undefined reference to link_failure_erfc().
> > >
> > > So, the patch adds another transform erf(x) > 1 -> 0.
> >
> > Ick.
> >
> > Why not canonicalize erf (x) to 1-erfc(x) instead?
> Sorry I didn't quite follow, won't this cause similar issue with erf ?
> I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
> and 1 - erfc(x) -> erf(x) after canonicalization is disabled.
>
> This caused undefined reference to link_failure_erf() in following test-case:
>
> extern int signbit(double);
> extern void link_failure_erf(void);
> extern double erf(double);
>
> void test(double d1) {
>   if (signbit(erf(d1)))
> link_failure_erf();
> }

But that's already not optimized without any canonicalization
because erf returns sth in range [-1, 1].

I suggested the change because we have limited support for FP
value-ranges and nonnegative is one thing we can compute
(and erfc as opposed to erf is nonnegative).

> forwprop1 shows:
>
> :
>   _5 = __builtin_erfc (d1_4(D));
>   _1 = 1.0e+0 - _5;
>   _6 = _5 > 1.0e+0;
>   _2 = (int) _6;
>   if (_5 > 1.0e+0)
> goto ; [INV]
>   else
> goto ; [INV]
>
>:
>   link_failure_erf ();
>
>:
>   return;
>
> which defeats DCE to remove call to link_failure_erf.
>
> Thanks,
> Prathamesh
> >
> > > which resolves the regression.
> > >
> > > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > > Cross-testing on arm and aarch64 variants in progress.
> > > OK for trunk if passes ?
> > >
> > > Thanks,
> > > Prathamesh


Re: [PATCH 1/4] Fix string pool statistics.

2018-11-05 Thread Richard Biener
On Mon, Nov 5, 2018 at 1:17 PM Martin Liška  wrote:
>
> On 11/5/18 10:52 AM, Richard Biener wrote:
> > On Mon, Nov 5, 2018 at 9:07 AM marxin  wrote:
> >>
> >>
> >> libcpp/ChangeLog:
> >
> > Hmm, the patch suggests the flag might be instead
> > part of cpp_hash_table instead of each individual
> > ht_identifier?  Or the patch is confused when it
> > sets HT_GGC to 1 even in
> >
> >else
> > -HT_STR (node) = (const unsigned char *) obstack_copy0 (&table->stack,
> > -  str, len);
> > +{
> > +  HT_STR (node) = (const unsigned char *) obstack_copy0 (&table->stack,
> > +str, len);
> > +  HT_GGC (node) = 1;
> > +}
> >
> > ?  Do we really support mixed operation here?
>
> No, simplified in attached patch.

OK.

Thanks,
Richard.

> Martin
>
> >
> >> 2018-11-02  Martin Liska  
> >>
> >> * include/symtab.h (ht_identifier):
> >> Make room for ggc flag.
> >> * symtab.c (ht_lookup_with_hash): Mark
> >> GGC and non-GGC allocated strings.
> >> (ht_dump_statistics): Use the information.
> >> ---
> >>  libcpp/include/symtab.h |  4 +++-
> >>  libcpp/symtab.c | 28 +++-
> >>  2 files changed, 22 insertions(+), 10 deletions(-)
> >>


Re: [PATCH 3/4] Fix vector memory statistics.

2018-11-05 Thread Richard Biener
On Mon, Nov 5, 2018 at 1:17 PM Martin Liška  wrote:
>
> On 11/5/18 10:56 AM, Richard Biener wrote:
> > On Mon, Nov 5, 2018 at 9:07 AM marxin  wrote:
> >>
> >>
> >> gcc/ChangeLog:
> >
> >/* Release PTR pointer of SIZE bytes. If REMOVE_FROM_MAP is set to true,
> >   remove the instance from reverse map.  */
> > -  void release_instance_overhead (void *ptr, size_t size,
> > - bool remove_from_map = false);
> > +  T * release_instance_overhead (void *ptr, size_t size,
> > +bool remove_from_map = false);
> >
> > can you document the return value?
>
> Sure, fixed in attached patch.

OK.

> Martin
>
> >
> > Otherwise OK.
> >
> > Richard.
> >
> >> 2018-11-02  Martin Liska  
> >>
> >> * mem-stats.h (mem_alloc_description::release_instance_overhead):
> >> Return T *.
> >> * vec.c (struct vec_usage): Register m_element_size.
> >> (vec_prefix::register_overhead): New arguments: elements and
> >> element_size.
> >> (vec_prefix::release_overhead): Subtract elements.
> >> * vec.h (struct vec_prefix): Change signature.
> >> (va_heap::reserve): Pass proper arguments.
> >> (va_heap::release): Likewise.
> >> ---
> >>  gcc/mem-stats.h | 14 --
> >>  gcc/vec.c   | 34 +-
> >>  gcc/vec.h   | 12 
> >>  3 files changed, 37 insertions(+), 23 deletions(-)
> >>
>


Re: [ARM] Implement division using vrecpe, vrecps

2018-11-05 Thread Wilco Dijkstra
Hi Prathamesh,

Prathamesh Kulkarni wrote:
> Thanks for the suggestions. The last time I benchmarked the patch
> (around Jan 2016)
> I got following results with the patch for SPEC2006:
>
> a15: +0.64% overall, 481.wrf: +6.46%
> a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76%
> a57: +0.35% overall, 481.wrf: +3.84%
> (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01209.html)
>
> Do these numbers look acceptable ?
> I am benchmarking the patch on ToT, and will report if there are any
> performance improvements found with the patch.

Yes those results are quite good - in fact they seemed too good to be true at 
first.
However looking at arm/neon.md there isn't a division pattern. So I think it's 
worth
mentioning in the description that your patch actually adds vectorization of
division. Disassembling the AArch64 wrf binary shows several hundred vector
division instructions - so the speedup makes sense now since many more loops
are being vectorized.

It's a shame this pattern wasn't added many years ago... It's a good idea to 
add a
vectorized (r)sqrt too as this will improve wrf even further.

Wilco

[committed] Cherry-pick asan fix (PR sanitizer/87860)

2018-11-05 Thread Martin Liška
Hi.

There's a sparc fix that I've just installed in libsanitizer upstream
repository. I'm going to install it into GCC's trunk.

Martin
>From c43ed4eb4d76ec25e42b954a00a1684de09011da Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 5 Nov 2018 14:30:00 +0100
Subject: [PATCH] Fix build on sparc64-linux-gnu.

libsanitizer/ChangeLog:

2018-11-05  Martin Liska  

	PR sanitizer/87860
	* sanitizer_common/sanitizer_linux.cc:  Cherry-pick upstream
	r346129.
---
 libsanitizer/sanitizer_common/sanitizer_linux.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libsanitizer/sanitizer_common/sanitizer_linux.cc b/libsanitizer/sanitizer_common/sanitizer_linux.cc
index f1f70ec57fc..30d6521f9e9 100644
--- a/libsanitizer/sanitizer_common/sanitizer_linux.cc
+++ b/libsanitizer/sanitizer_common/sanitizer_linux.cc
@@ -1944,14 +1944,14 @@ static void GetPcSpBp(void *context, uptr *pc, uptr *sp, uptr *bp) {
 #elif defined(__sparc__)
   ucontext_t *ucontext = (ucontext_t*)context;
   uptr *stk_ptr;
-# if defined (__sparcv9)
+# if defined(__sparcv9) || defined (__arch64__)
 # ifndef MC_PC
 #  define MC_PC REG_PC
 # endif
 # ifndef MC_O6
 #  define MC_O6 REG_O6
 # endif
-# ifdef SANITIZER_SOLARIS
+# if SANITIZER_SOLARIS
 #  define mc_gregs gregs
 # endif
   *pc = ucontext->uc_mcontext.mc_gregs[MC_PC];
-- 
2.19.1



Fix SPEC gcc micompile with LTO

2018-11-05 Thread Jan Hubicka
Hi,
this patch fixes the miscompare I introduced to spec2006 GCC benchmark
when build with LTO.
The problem is that fld_incomplete_type_of builds new pointer type to
incomplete type rather than complete but it ends up giving wrong type
canonical.

This patch also improves TBAA with early opts because we do no lose info
by producing incomplete variants. 
Note that build_pointer_type may return existing type and in that case I
overwrite TYPE_CANONICAL of it, but I believe it should be harmless
because all pointers to a given type should have canonicals constructed
same way.

lto-bootstrapped/regtested x86_64-linux.

Honza
* gcc.dg/lto/tbaa-1.c: New testcase.
* tree.c (fld_incomplete_type_of): Copy TYPE_CANONICAL while creating
pointer type.
Index: testsuite/gcc.dg/lto/tbaa-1.c
===
--- testsuite/gcc.dg/lto/tbaa-1.c   (nonexistent)
+++ testsuite/gcc.dg/lto/tbaa-1.c   (working copy)
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -flto -fdump-tree-evrp" } */
+typedef struct rtx_def *rtx;
+typedef struct cselib_val_struct
+{
+  union
+  {
+  } u;
+  struct elt_loc_list *locs;
+}
+cselib_val;
+struct elt_loc_list
+{
+  struct elt_loc_list *next;
+  rtx loc;
+};
+static int n_useless_values;
+unchain_one_elt_loc_list (pl)
+ struct elt_loc_list **pl;
+{
+  struct elt_loc_list *l = *pl;
+  *pl = l->next;
+}
+
+discard_useless_locs (x, info)
+ void **x;
+{
+  cselib_val *v = (cselib_val *) * x;
+  struct elt_loc_list **p = &v->locs;
+  int had_locs = v->locs != 0;
+  while (*p)
+{
+  unchain_one_elt_loc_list (p);
+  p = &(*p)->next;
+}
+  if (had_locs && v->locs == 0)
+{
+  n_useless_values++;
+}
+}
+/* { dg-final { scan-tree-dump-times "n_useless_values" 2 "evrp" } } */
 
Index: tree.c
===
--- tree.c  (revision 265766)
+++ tree.c  (working copy)
@@ -5146,6 +5146,7 @@ fld_incomplete_type_of (tree t, struct f
  else
first = build_reference_type_for_mode (t2, TYPE_MODE (t),
TYPE_REF_CAN_ALIAS_ALL (t));
+ TYPE_CANONICAL (first) = TYPE_CANONICAL (TYPE_MAIN_VARIANT (t));
  add_tree_to_fld_list (first, fld);
  return fld_type_variant (first, t, fld);
}


Re: [PATCH] x86: Update VFIXUPIMM* Intrinsics to align with the latest Intel SDM

2018-11-05 Thread H.J. Lu
On Sun, Nov 4, 2018 at 11:00 PM Uros Bizjak  wrote:
>
> On Mon, Nov 5, 2018 at 6:54 AM Wei Xiao  wrote:
> >
> > > Please also rename these:
> > >
> > >  _mm512_mask_fixupimm_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
> > > __m512i __C, const int __imm, const int __R)
> > >
> > >  _mm512_mask_fixupimm_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
> > > __m512i __C, const int __imm, const int __R)
> > >
> > >  _mm_mask_fixupimm_round_sd (__m128d __A, __mmask8 __U, __m128d __B,
> > >  __m128i __C, const int __imm, const int __R)
> > >
> > >  _mm_mask_fixupimm_round_ss (__m128 __A, __mmask8 __U, __m128 __B,
> > >  __m128i __C, const int __imm, const int __R)
> > >
> > >  _mm512_mask_fixupimm_pd (__m512d __A, __mmask8 __U, __m512d __B,
> > >   __m512i __C, const int __imm)
> > >
> > > _mm512_mask_fixupimm_ps (__m512 __A, __mmask16 __U, __m512 __B,
> > >   __m512i __C, const int __imm)
> > >
> > >  _mm_mask_fixupimm_sd (__m128d __A, __mmask8 __U, __m128d __B,
> > >__m128i __C, const int __imm)
> > >
> > >  _mm_mask_fixupimm_ss (__m128 __A, __mmask8 __U, __m128 __B,
> > >__m128i __C, const int __imm)
> > >
> > >  _mm256_mask_fixupimm_pd (__m256d __A, __mmask8 __U, __m256d __B,
> > >   __m256i __C, const int __imm)
> > >
> > >  _mm256_mask_fixupimm_ps (__m256 __A, __mmask8 __U, __m256 __B,
> > >   __m256i __C, const int __imm)
> > >
> > >   _mm_mask_fixupimm_pd (__m128d __A, __mmask8 __U, __m128d __B,
> > >__m128i __C, const int __imm)
> > >
> > >  _mm_mask_fixupimm_ps (__m128 __A, __mmask8 __U, __m128 __B,
> > >__m128i __C, const int __imm)
> > >
> > > Uros.
> >
> > As attached, I have renamed above intrinsics according to
> > aforementioned convention:
> >
> > [ __m512. __W,] __mmask. __U, __m512x __A, __m512x __B, ..., const int
> > _imm, const int __R].
>
> LGTM.
>

LGTM.

Thanks.


-- 
H.J.


Re: [ARM] Implement division using vrecpe, vrecps

2018-11-05 Thread Ramana Radhakrishnan
On 26/10/2018 06:04, Prathamesh Kulkarni wrote:
> Hi,
> This is a rebased version of patch that adds a pattern to neon.md for
> implementing division with multiplication by reciprocal using
> vrecpe/vrecps with -funsafe-math-optimizations excluding -Os.
> The newly added test-cases are not vectorized on armeb target with
> -O2. I posted the analysis for that here:
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01765.html
> 
> Briefly, the difference between little and big-endian vectorizer is in
> arm_builtin_support_vector_misalignment() which calls
> default_builtin_support_vector_misalignment() for big-endian case, and
> that returns false because
> movmisalign_optab does not exist for V2SF mode. This isn't observed
> with -O3 because loop peeling for alignment gets enabled.
> 
> It seems that the test cases in patch appear unsupported on armeb,
> after r221677 thus this patch requires no changes to
> target-supports.exp to adjust for armeb (unlike last time which
> stalled the patch).
> 
> Bootstrap+tested on arm-linux-gnueabihf.
> Cross-tested on arm*-*-* variants.
> OK for trunk ?
> 
> Thanks,
> Prathamesh
> 
> 
> tcwg-319-3.txt
> 
> 2018-10-26  Prathamesh Kulkarni
> 
>   * config/arm/neon.md (div3): New pattern.
> 
> testsuite/
>   * gcc.target/arm/neon-vect-div-1.c: New test.
>   * gcc.target/arm/neon-vect-div-2.c: Likewise.
> 
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 5aeee4b08c1..25ed45d381a 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -620,6 +620,38 @@
>   (const_string "neon_mul_")))]
>   )
>   
> +/* Perform division using multiply-by-reciprocal.
> +   Reciprocal is calculated using Newton-Raphson method.
> +   Enabled with -funsafe-math-optimizations -freciprocal-math
> +   and disabled for -Os since it increases code size .  */ > +
> +(define_expand "div3"
> +  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
> +(div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")
> +   (match_operand:VCVTF 2 "s_register_operand" "w")))]
> +  "TARGET_NEON && !optimize_size
> +   && flag_unsafe_math_optimizations && flag_reciprocal_math"

I would prefer this to be more granular than 
flag_unsafe_math_optimization && flag_reciprocal_math which really is 
flag_reciprocal_math as it is turned on by default with 
funsafe-math-optimizations.

I think this should really be just flag_reciprocal_math.


Otherwise ok.

regards
Ramana




> +  {
> +rtx rec = gen_reg_rtx (mode);
> +rtx vrecps_temp = gen_reg_rtx (mode);
> +
> +/* Reciprocal estimate.  */
> +emit_insn (gen_neon_vrecpe (rec, operands[2]));
> +
> +/* Perform 2 iterations of newton-raphson method.  */
> +for (int i = 0; i < 2; i++)
> +  {
> + emit_insn (gen_neon_vrecps (vrecps_temp, rec, operands[2]));
> + emit_insn (gen_mul3 (rec, rec, vrecps_temp));
> +  }
> +
> +/* We now have reciprocal in rec, perform operands[0] = operands[1] * 
> rec.  */
> +emit_insn (gen_mul3 (operands[0], operands[1], rec));
> +DONE;
> +  }
> +)
> +
> +
>   (define_insn "mul3add_neon"
> [(set (match_operand:VDQW 0 "s_register_operand" "=w")
>   (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_operand" 
> "w")
> diff --git a/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c 
> b/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c
> new file mode 100644
> index 000..50d04b4175b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c
> @@ -0,0 +1,16 @@
> +/* Test pattern div3.  */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-require-effective-target vect_hw_misalign } */
> +/* { dg-options "-O2 -ftree-vectorize -funsafe-math-optimizations 
> -fdump-tree-vect-details" } */
> +/* { dg-add-options arm_neon } */
> +
> +void
> +foo (int len, float * __restrict p, float *__restrict x)
> +{
> +  len = len & ~31;
> +  for (int i = 0; i < len; i++)
> +p[i] = p[i] / x[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c 
> b/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c
> new file mode 100644
> index 000..606f54b4e0e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c
> @@ -0,0 +1,16 @@
> +/* Test pattern div3.  */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-require-effective-target vect_hw_misalign } */
> +/* { dg-options "-O3 -ftree-vectorize -funsafe-math-optimizations 
> -fdump-tree-vect-details -fno-reciprocal-math" } */
> +/* { dg-add-options arm_neon } */
> +
> +void
> +foo (int len, float * __restrict p, float *__restrict x)
> +{
> +  len = len & ~31;
> +  for (int i = 0; i < len; i++)
> +p[i] = p[i] / x[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops" "vect" } } */
> 



Re: PR83750: CSE erf/erfc pair

2018-11-05 Thread Prathamesh Kulkarni
On Mon, 5 Nov 2018 at 18:14, Richard Biener  wrote:
>
> On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 5 Nov 2018 at 15:10, Richard Biener  
> > wrote:
> > >
> > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > Hi,
> > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > > > erf(x) when canonicalization is disabled and result of erf(x) has
> > > > single use within 1 - erf(x).
> > > >
> > > > The patch regressed builtin-nonneg-1.c. The following test-case
> > > > reproduces the issue with patch:
> > > >
> > > > void test(double d1) {
> > > >   if (signbit(erfc(d1)))
> > > > link_failure_erfc();
> > > > }
> > > >
> > > > ssa dump:
> > > >
> > > >:
> > > >   _5 = __builtin_erf (d1_4(D));
> > > >   _1 = 1.0e+0 - _5;
> > > >   _6 = _1 < 0.0;
> > > >   _2 = (int) _6;
> > > >   if (_2 != 0)
> > > > goto ; [INV]
> > > >   else
> > > > goto ; [INV]
> > > >
> > > >:
> > > >   link_failure_erfc ();
> > > >
> > > >:
> > > >   return;
> > > >
> > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > > > forwprop then transforms the if condition from _2 != 0
> > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > > > in undefined reference to link_failure_erfc().
> > > >
> > > > So, the patch adds another transform erf(x) > 1 -> 0.
> > >
> > > Ick.
> > >
> > > Why not canonicalize erf (x) to 1-erfc(x) instead?
> > Sorry I didn't quite follow, won't this cause similar issue with erf ?
> > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
> > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.
> >
> > This caused undefined reference to link_failure_erf() in following 
> > test-case:
> >
> > extern int signbit(double);
> > extern void link_failure_erf(void);
> > extern double erf(double);
> >
> > void test(double d1) {
> >   if (signbit(erf(d1)))
> > link_failure_erf();
> > }
>
> But that's already not optimized without any canonicalization
> because erf returns sth in range [-1, 1].
>
> I suggested the change because we have limited support for FP
> value-ranges and nonnegative is one thing we can compute
> (and erfc as opposed to erf is nonnegative).
Ah right, thanks for the explanation.
Unfortunately this still regresses builtin-nonneg-1.c, which can be
reproduced with following test-case:

extern int signbit(double);
extern void link_failure_erf(void);
extern double erf(double);
extern double fabs(double);

void test(double d1) {
  if (signbit(erf(fabs(d1
link_failure_erf();
}

signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch
it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly
defeats DCE.

forwprop1 shows:
 :
  _1 = ABS_EXPR ;
  _6 = __builtin_erfc (_1);
  _2 = 1.0e+0 - _6;
  _7 = _6 > 1.0e+0;
  _3 = (int) _7;
  if (_6 > 1.0e+0)
goto ; [INV]
  else
goto ; [INV]

   :
  link_failure_erf ();

   :
  return;

I assume we would need to somehow tell gcc that the canonicalized
expression 1 - erfc(x) would not exceed 1.0 ?
Is there a better way to do that apart from defining pattern (1 -
erfc(x)) > 1.0 -> 0
which I agree doesn't look ideal to add in match.pd ?

Thanks
Prathamesh
>
> > forwprop1 shows:
> >
> > :
> >   _5 = __builtin_erfc (d1_4(D));
> >   _1 = 1.0e+0 - _5;
> >   _6 = _5 > 1.0e+0;
> >   _2 = (int) _6;
> >   if (_5 > 1.0e+0)
> > goto ; [INV]
> >   else
> > goto ; [INV]
> >
> >:
> >   link_failure_erf ();
> >
> >:
> >   return;
> >
> > which defeats DCE to remove call to link_failure_erf.
> >
> > Thanks,
> > Prathamesh
> > >
> > > > which resolves the regression.
> > > >
> > > > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > > > Cross-testing on arm and aarch64 variants in progress.
> > > > OK for trunk if passes ?
> > > >
> > > > Thanks,
> > > > Prathamesh


Backports to 8.3

2018-11-05 Thread Jakub Jelinek
Hi!

I've backported from trunk, bootstrapped/regtested on x86_64-linux and
i686-linux and committed to gcc-8-branch the following 6 patches.

Jakub
2018-11-05  Jakub Jelinek  

Backported from mainline
2018-10-19  Jakub Jelinek  

PR middle-end/85488
PR middle-end/87649
* omp-low.c (check_omp_nesting_restrictions): Diagnose ordered without
depend closely nested inside of loop with ordered clause with
a parameter.

* c-c++-common/gomp/doacross-2.c: New test.
* c-c++-common/gomp/sink-3.c: Expect another error during error
recovery.

--- gcc/omp-low.c   (revision 265334)
+++ gcc/omp-low.c   (revision 265335)
@@ -2762,14 +2762,25 @@ check_omp_nesting_restrictions (gimple *
  case GIMPLE_OMP_FOR:
if (gimple_omp_for_kind (ctx->stmt) == GF_OMP_FOR_KIND_TASKLOOP)
  goto ordered_in_taskloop;
-   if (omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
-OMP_CLAUSE_ORDERED) == NULL)
+   tree o;
+   o = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
+OMP_CLAUSE_ORDERED);
+   if (o == NULL)
  {
error_at (gimple_location (stmt),
  "% region must be closely nested inside "
  "a loop region with an % clause");
return false;
  }
+   if (OMP_CLAUSE_ORDERED_EXPR (o) != NULL_TREE
+   && omp_find_clause (c, OMP_CLAUSE_DEPEND) == NULL_TREE)
+ {
+   error_at (gimple_location (stmt),
+ "% region without % clause may "
+ "not be closely nested inside a loop region with "
+ "an % clause with a parameter");
+   return false;
+ }
return true;
  case GIMPLE_OMP_TARGET:
if (gimple_omp_target_kind (ctx->stmt)
--- gcc/testsuite/c-c++-common/gomp/sink-3.c(revision 265334)
+++ gcc/testsuite/c-c++-common/gomp/sink-3.c(revision 265335)
@@ -14,7 +14,7 @@ foo ()
   for (i=0; i < 100; ++i)
 {
 #pragma omp ordered depend(sink:poo-1,paa+1) /* { dg-error 
"poo.*declared.*paa.*declared" } */
-bar(&i);
+bar(&i);/* { dg-error "may not be closely 
nested" "" { target *-*-* } .-1 } */
 #pragma omp ordered depend(source)
 }
 }
--- gcc/testsuite/c-c++-common/gomp/doacross-2.c(nonexistent)
+++ gcc/testsuite/c-c++-common/gomp/doacross-2.c(revision 265335)
@@ -0,0 +1,49 @@
+/* PR middle-end/87649 */
+
+void
+foo (void)
+{
+  int i;
+  #pragma omp for ordered(1)
+  for (i = 0; i < 64; i++)
+{
+  #pragma omp ordered  /* { dg-error "'ordered' region 
without 'depend' clause may not be closely nested inside a loop region with an 
'ordered' clause with a parameter" } */
+  ;
+}
+  #pragma omp for ordered(1)
+  for (i = 0; i < 64; i++)
+{
+  #pragma omp ordered threads  /* { dg-error "'ordered' region 
without 'depend' clause may not be closely nested inside a loop region with an 
'ordered' clause with a parameter" } */
+  ;
+}
+}
+
+void
+bar (void)
+{
+  int i;
+  #pragma omp for ordered
+  for (i = 0; i < 64; i++)
+{
+  #pragma omp ordered depend(source)   /* { dg-error "'ordered' 
construct with 'depend' clause must be closely nested inside a loop with 
'ordered' clause with a parameter" } */
+  #pragma omp ordered depend(sink: i - 1)  /* { dg-error "'ordered' 
construct with 'depend' clause must be closely nested inside a loop with 
'ordered' clause with a parameter" } */
+}
+  #pragma omp for
+  for (i = 0; i < 64; i++)
+{
+  #pragma omp ordered depend(source)   /* { dg-error "'ordered' 
construct with 'depend' clause must be closely nested inside a loop with 
'ordered' clause with a parameter" } */
+  #pragma omp ordered depend(sink: i - 1)  /* { dg-error "'ordered' 
construct with 'depend' clause must be closely nested inside a loop with 
'ordered' clause with a parameter" } */
+}
+  #pragma omp for
+  for (i = 0; i < 64; i++)
+{
+  #pragma omp ordered  /* { dg-error "'ordered' region 
must be closely nested inside a loop region with an 'ordered' clause" } */
+  ;
+}
+  #pragma omp for
+  for (i = 0; i < 64; i++)
+{
+  #pragma omp ordered threads  /* { dg-error "'ordered' region 
must be closely nested inside a loop region with an 'ordered' clause" } */
+  ;
+}
+}
2018-11-05  Jakub Jelinek  

Backported from mainline
2018-10-20  Jakub Jelinek  

PR middle-end/87647
* varasm.c (decode_addr_const): Handle COMPOUND_LITERAL_EXPR.

* gcc.c-torture/compile/pr87647.c: New test.

--- gcc/varasm.c(revision 265340)
+++ gcc/varasm.c(revision 265341)
@@ -2953,6 +2953,11 @@ decode

[PATCH] S/390: Introduce relative_long attribute

2018-11-05 Thread Ilya Leoshkevich
In order to properly fix PR87762, we need to distinguish between
instructions which support relative addressing and instructions which
don't.  We could check whether the existing "type" attribute is equal to
"larl", but there are notable exceptions (lrl, for example), and
changing them makes scheduling worse on z10.  We could also check
whether the existing "op_type" attribute is equal to "RIL-b" or "RIL-c".
However, adding a new attribute provides more flexibility, since we
don't depend idiosyncrasies which might be introduced into PoP in the
future.

gcc/ChangeLog:

2018-11-05  Ilya Leoshkevich  

PR target/87762
* config/s390/s390.md: Add relative_long attribute.
---
 gcc/config/s390/s390.md | 94 +
 1 file changed, 67 insertions(+), 27 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index e4049c25406..c203bf9ad12 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -570,6 +570,13 @@
 ]
(const_int 0)))
 
+;; Whether an instruction supports relative long addressing.
+;; Currently this corresponds to RIL-b and RIL-c instruction formats,
+;; but having a separate attribute, as opposed to reusing op_type,
+;; provides additional flexibility.
+
+(define_attr "relative_long" "no,yes" (const_string "no"))
+
 ;; Pipeline description for z900.
 (include "2064.md")
 
@@ -1130,7 +1137,8 @@
cgfrl\t%0,%1"
   [(set_attr "op_type"  "RRE,RXY,RIL")
(set_attr "z10prop" "z10_c,*,*")
-   (set_attr "type" "*,*,larl")])
+   (set_attr "type" "*,*,larl")
+   (set_attr "relative_long" "*,*,yes")])
 
 
 
@@ -1146,7 +1154,8 @@
   [(set_attr "op_type"  "RX,RXY,RIL")
(set_attr "cpu_facility" "*,longdisp,z10")
(set_attr "type" "*,*,larl")
-   (set_attr "z196prop" "z196_cracked,z196_cracked,z196_cracked")])
+   (set_attr "z196prop" "z196_cracked,z196_cracked,z196_cracked")
+   (set_attr "relative_long" "*,*,yes")])
 
 (define_insn "*cmphi_ccs_z10"
   [(set (reg CC_REGNUM)
@@ -1166,7 +1175,8 @@
cgh\t%0,%1
cghrl\t%0,%1"
   [(set_attr "op_type" "RXY,RIL")
-   (set_attr "type""*,larl")])
+   (set_attr "type""*,larl")
+   (set_attr "relative_long" "*,yes")])
 
 ; cr, chi, cfi, c, cy, cgr, cghi, cgfi, cg, chsi, cghsi, crl, cgrl
 (define_insn "*cmp_ccs"
@@ -1187,7 +1197,8 @@
   [(set_attr "op_type" "RR,RI,SIL,RIL,RX,RXY,RIL")
(set_attr "cpu_facility" "*,*,z10,extimm,*,longdisp,z10")
(set_attr "type" "*,*,*,*,*,*,larl")
-   (set_attr "z10prop" 
"z10_super_c,z10_super,z10_super,z10_super,z10_super,z10_super,z10_super")])
+   (set_attr "z10prop" 
"z10_super_c,z10_super,z10_super,z10_super,z10_super,z10_super,z10_super")
+   (set_attr "relative_long" "*,*,*,*,*,*,yes")])
 
 
 ; Compare (unsigned) instructions
@@ -1201,7 +1212,8 @@
   "clhrl\t%0,%1"
   [(set_attr "op_type" "RIL")
(set_attr "type""larl")
-   (set_attr "z10prop" "z10_super")])
+   (set_attr "z10prop" "z10_super")
+   (set_attr "relative_long" "yes")])
 
 ; clhrl, clghrl
 (define_insn "*cmp_ccu_zerohi_rldi"
@@ -1213,7 +1225,8 @@
   "clhrl\t%0,%1"
   [(set_attr "op_type" "RIL")
(set_attr "type""larl")
-   (set_attr "z10prop" "z10_super")])
+   (set_attr "z10prop" "z10_super")
+   (set_attr "relative_long" "yes")])
 
 (define_insn "*cmpdi_ccu_zero"
   [(set (reg CC_REGNUM)
@@ -1228,7 +1241,8 @@
   [(set_attr "op_type"  "RRE,RXY,RIL")
(set_attr "cpu_facility" "*,*,z10")
(set_attr "type" "*,*,larl")
-   (set_attr "z10prop" "z10_super_c,z10_super_E1,z10_super")])
+   (set_attr "z10prop" "z10_super_c,z10_super_E1,z10_super")
+   (set_attr "relative_long" "*,*,yes")])
 
 (define_insn "*cmpdi_ccu"
   [(set (reg CC_REGNUM)
@@ -1248,7 +1262,8 @@
   [(set_attr "op_type" "RRE,RIL,RIL,SIL,RXY,SS,SS")
(set_attr "cpu_facility" "*,extimm,z10,z10,*,*,*")
(set_attr "type" "*,*,larl,*,*,*,*")
-   (set_attr "z10prop" 
"z10_super_c,z10_super,z10_super,z10_super,z10_super,*,*")])
+   (set_attr "z10prop" 
"z10_super_c,z10_super,z10_super,z10_super,z10_super,*,*")
+   (set_attr "relative_long" "*,*,yes,*,*,*,*")])
 
 (define_insn "*cmpsi_ccu"
   [(set (reg CC_REGNUM)
@@ -1267,7 +1282,8 @@
   [(set_attr "op_type" "RR,RIL,RIL,SIL,RX,RXY,SS,SS")
(set_attr "cpu_facility" "*,extimm,z10,z10,*,longdisp,*,*")
(set_attr "type" "*,*,larl,*,*,*,*,*")
-   (set_attr "z10prop" 
"z10_super_c,z10_super,z10_super,z10_super,z10_super,z10_super,*,*")])
+   (set_attr "z10prop" 
"z10_super_c,z10_super,z10_super,z10_super,z10_super,z10_super,*,*")
+   (set_attr "relative_long" "*,*,yes,*,*,*,*,*")])
 
 (define_insn "*cmphi_ccu"
   [(set (reg CC_REGNUM)
@@ -1805,6 +1821,10 @@
 *,
 *,*,*,*,*,*,*,
 z10_super_A1")
+   (set_attr "relative_long" "*,*,*,*,*,*,*,*,*,*,
+  *,yes,*,*,*,*,*,*,*,*,
+  yes,*,*,*,*,*,*,*,*,*,
+  *,*,ye

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-11-05 Thread Jan Hubicka
> 
> Did you mean "the nearest common dominator"?

If the nearest common dominator appears in the loop while all uses are
out of loops, this will result in suboptimal xor placement.
In this case you want to split edges out of the loop.

In general this is what the LCM framework will do for you if the problem
is modelled siimlar way as in mode_swtiching.  At entry function mode is
"no zero register needed" and all conversions need mode "zero register
needed".  Mode switching should then do the correct placement decisions
(reaching minimal number of executions of xor).

Jeff, whan is your optinion on the approach taken by the patch?
It seems like a special case of more general issue, but I do not see
very elegant way to solve it at least in the GCC 9 horisont, so if
the placement is correct we can probalby go either with new pass or
making this part of mode swithcing (which is anyway run by x86 backend)

Honza
> 
> > of the set of all uses of the zero register?
> >
> 
> Here is the updated patch to adds a pass to generate a single
> 
>   vxorps  %xmmN, %xmmN, %xmmN
> 
> at entry of the nearest common dominator for basic blocks with SF/DF
> conversions.  OK for trunk?
> 
> Thanks.
> 
> 
> -- 
> H.J.

> From e2a437f48778ae9586f2038220840ecc41566f69 Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" 
> Date: Wed, 15 Aug 2018 09:58:31 -0700
> Subject: [PATCH] i386: Add pass_remove_partial_avx_dependency
> 
> With -mavx, for
> 
> [hjl@gnu-cfl-1 skx-2]$ cat foo.i
> extern float f;
> extern double d;
> extern int i;
> 
> void
> foo (void)
> {
>   d = f;
>   f = i;
> }
> 
> we need to generate
> 
>   vxorp[ds]   %xmmN, %xmmN, %xmmN
>   ...
>   vcvtss2sd   f(%rip), %xmmN, %xmmX
>   ...
>   vcvtsi2ss   i(%rip), %xmmN, %xmmY
> 
> to avoid partial XMM register stall.  This patch adds a pass to generate
> a single
> 
>   vxorps  %xmmN, %xmmN, %xmmN
> 
> at entry of the nearest common dominator for basic blocks with SF/DF
> conversions, instead of generating one
> 
>   vxorp[ds]   %xmmN, %xmmN, %xmmN
> 
> for each SF/DF conversion.
> 
> Performance impacts on SPEC CPU 2017 rate with 1 copy using
> 
> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops
> 
> are
> 
> 1. On Broadwell server:
> 
> 500.perlbench_r (-0.82%)
> 502.gcc_r (0.73%)
> 505.mcf_r (-0.24%)
> 520.omnetpp_r (-2.22%)
> 523.xalancbmk_r (-1.47%)
> 525.x264_r (0.31%)
> 531.deepsjeng_r (0.27%)
> 541.leela_r (0.85%)
> 548.exchange2_r (-0.11%)
> 557.xz_r (-0.34%)
> Geomean: (-0.23%)
> 
> 503.bwaves_r (0.00%)
> 507.cactuBSSN_r (-1.88%)
> 508.namd_r (0.00%)
> 510.parest_r (-0.56%)
> 511.povray_r (0.49%)
> 519.lbm_r (-1.28%)
> 521.wrf_r (-0.28%)
> 526.blender_r (0.55%)
> 527.cam4_r (-0.20%)
> 538.imagick_r (2.52%)
> 544.nab_r (-0.18%)
> 549.fotonik3d_r (-0.51%)
> 554.roms_r (-0.22%)
> Geomean: (0.00%)
> 
> 2. On Skylake client:
> 
> 500.perlbench_r (-0.29%)
> 502.gcc_r (-0.36%)
> 505.mcf_r (1.77%)
> 520.omnetpp_r (-0.26%)
> 523.xalancbmk_r (-3.69%)
> 525.x264_r (-0.32%)
> 531.deepsjeng_r (0.00%)
> 541.leela_r (-0.46%)
> 548.exchange2_r (0.00%)
> 557.xz_r (0.00%)
> Geomean: (-0.34%)
> 
> 503.bwaves_r (0.00%)
> 507.cactuBSSN_r (-0.56%)
> 508.namd_r (0.87%)
> 510.parest_r (0.00%)
> 511.povray_r (-0.73%)
> 519.lbm_r (0.84%)
> 521.wrf_r (0.00%)
> 526.blender_r (-0.81%)
> 527.cam4_r (-0.43%)
> 538.imagick_r (2.55%)
> 544.nab_r (0.28%)
> 549.fotonik3d_r (0.00%)
> 554.roms_r (0.32%)
> Geomean: (0.12%)
> 
> 3. On Skylake server:
> 
> 500.perlbench_r (-0.55%)
> 502.gcc_r (0.69%)
> 505.mcf_r (0.00%)
> 520.omnetpp_r (-0.33%)
> 523.xalancbmk_r (-0.21%)
> 525.x264_r (-0.27%)
> 531.deepsjeng_r (0.00%)
> 541.leela_r (0.00%)
> 548.exchange2_r (-0.11%)
> 557.xz_r (0.00%)
> Geomean: (0.00%)
> 
> 503.bwaves_r (0.58%)
> 507.cactuBSSN_r (0.00%)
> 508.namd_r (0.00%)
> 510.parest_r (0.18%)
> 511.povray_r (-0.58%)
> 519.lbm_r (0.25%)
> 521.wrf_r (0.40%)
> 526.blender_r (0.34%)
> 527.cam4_r (0.19%)
> 538.imagick_r (5.87%)
> 544.nab_r (0.17%)
> 549.fotonik3d_r (0.00%)
> 554.roms_r (0.00%)
> Geomean: (0.62%)
> 
> On Skylake client, impacts on 538.imagick_r are
> 
> size before:
> 
>text  data bss dec hex filename
> 277 108765576 2572029  273efd imagick_r.exe
> 
> size after:
> 
>text  data bss dec hex filename
> 2511825 108765576 2528277  269415 imagick_r.exe
> 
> number of vxorp[ds]:
> 
> beforeafter   difference
> 14570 4515-69%
> 
> gcc/
> 
> 2018-08-28  H.J. Lu  
>   Sunil K Pandey  
> 
>   PR target/87007
>   * config/i386/i386-passes.def: Add
>   pass_remove_partial_avx_dependency.
>   * config/i386/i386-protos.h
>   (make_pass_remove_partial_avx_dependency): New.
>   * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
>   New function.
>   (pass_data_remove_partial_avx_dependency): New.
>   (pass_remove_partial_avx_dependency): Likewise.
>

[PATCH][OBVIOUS] Fix printf call in symtab.c.

2018-11-05 Thread Martin Liška
Hi.

I'm sending obvious fix that I forgot to adjust.

Martin
>From 873c7df254f98b27a83272abd9f60adb32741026 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 5 Nov 2018 15:22:20 +0100
Subject: [PATCH] Fix printf call in symtab.c.

libcpp/ChangeLog:

2018-11-05  Martin Liska  

	* symtab.c (ht_dump_statistics): Fix format and
	pass missing argument.
---
 libcpp/symtab.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libcpp/symtab.c b/libcpp/symtab.c
index e6e5bcb1cef..0976c43b002 100644
--- a/libcpp/symtab.c
+++ b/libcpp/symtab.c
@@ -321,8 +321,9 @@ ht_dump_statistics (cpp_hash_table *table)
   else
 {
   overhead = obstack_memory_used (&table->stack) - total_bytes;
-  fprintf (stderr, "obstack bytes\t%lu%c (%lu%c overhead)\n",
-	   SCALE (total_bytes), LABEL (total_bytes));
+  fprintf (stderr, "obstack bytes\t%zu%c (%zu%c overhead)\n",
+	   SCALE (total_bytes), LABEL (total_bytes),
+	   SCALE (overhead), LABEL (overhead));
 }
   fprintf (stderr, "table size\t%lu%c\n",
 	   SCALE (headers), LABEL (headers));
-- 
2.19.1



Re: [PATCH][OBVIOUS] Fix printf call in symtab.c.

2018-11-05 Thread Martin Liška
One more change is needed.

Thanks for understanding,
Martin
>From 60d59b8b3deea1b59c135705b333ddf2ab6a9ba4 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 5 Nov 2018 15:28:21 +0100
Subject: [PATCH] Do not use %zu format in libcpp.

libcpp/ChangeLog:

2018-11-05  Martin Liska  

	* symtab.c (ht_dump_statistics): Replace %zu with %lu format.
---
 libcpp/symtab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/symtab.c b/libcpp/symtab.c
index 0976c43b002..de30bb83bfc 100644
--- a/libcpp/symtab.c
+++ b/libcpp/symtab.c
@@ -321,7 +321,7 @@ ht_dump_statistics (cpp_hash_table *table)
   else
 {
   overhead = obstack_memory_used (&table->stack) - total_bytes;
-  fprintf (stderr, "obstack bytes\t%zu%c (%zu%c overhead)\n",
+  fprintf (stderr, "obstack bytes\t%lu%c (%lu%c overhead)\n",
 	   SCALE (total_bytes), LABEL (total_bytes),
 	   SCALE (overhead), LABEL (overhead));
 }
-- 
2.19.1



Re: [PATCH AutoFDO/2]Treat ZERO as common profile probability/count

2018-11-05 Thread Jan Hubicka
diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index f4d0c340a0a..4289bc5a004 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -200,11 +200,11 @@ public:
   ret.m_quality = profile_guessed;
   return ret;
 }
-  static profile_probability always ()
+  static profile_probability always (enum profile_quality q = profile_precise)

There are functions to convert value into given precision. If you wnat
to guess that something is always taken, you write
profile_probability::always().guessed ()
So for autofdo we only need to add .atofdo() conversion method.

@@ -459,10 +459,12 @@ public:
   return RDIV (val * m_val, max_probability);
 }
 
-  /* Return 1-*THIS.  */
+  /* Return 1-*THIS.  It's meaningless to invert an uninitialized value.  */
   profile_probability invert () const
 {
-  return profile_probability::always() - *this;
+  if (! initialized_p ())
+   return *this;
+  return profile_probability::always (m_quality) - *this;

How this changes the behaviour? If THIS is uninitialied
profile_probability::alwyas() will return uninitialized which seem
to make sense.
If you have value of some quality it will merge the qualitis
and will return value in corresponding m_quality.



[PATCH] S/390: Increase register move costs for CC_REGS

2018-11-05 Thread Robin Dapp
Hi,

the attached patch increases the move costs for moves involving the CC
register.  This saves us some instructions in SPEC CPU2006.

Regards
 Robin

--

gcc/ChangeLog:

2018-11-05  Robin Dapp  

* config/s390/s390.c (s390_register_move_cost): Increase costs
for moves involving the CC reg.
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 762c6bff07b..0f33101d779 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3416,6 +3416,11 @@ s390_register_move_cost (machine_mode mode,
 	  && reg_classes_intersect_p (to, GENERAL_REGS)))
 return 10;
 
+  /* We usually do not want to copy via CC.  */
+  if (reg_classes_intersect_p (from, CC_REGS)
+   || reg_classes_intersect_p (to, CC_REGS))
+return 5;
+
   return 1;
 }
 


Re: [PATCH AutoFDO/2]Treat ZERO as common profile probability/count

2018-11-05 Thread Jan Hubicka
diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index 4289bc5a004..2b5e3269250 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -218,6 +218,11 @@ public:
 }
 
 
+  /* Return true if value is zero.  */
+  bool never_p () const
+{
+  return m_val == 0;
+}
   /* Return true if value has been initialized.  */
   bool initialized_p () const
 {
@@ -288,9 +293,9 @@ public:
 }
   profile_probability operator+ (const profile_probability &other) const
 {
-  if (other == profile_probability::never ())
+  if (other.never_p ())
return *this;
-  if (*this == profile_probability::never ())
+  if (this->never_p ())

This is not correct change.  If you add guessed 0 to precise 0,
the result needs to be guessed 0 because we are no longer sure the code
will not get executed.  This is why all the checks here go explicitly
to profile_probability::never.

Honza


[PATCH][RTL] Add simplify pattern for bitfield insertion

2018-11-05 Thread Richard Biener


The PR18041 testcase is about bitfield insertion of the style

 b->bit |= <...>

where the RMW cycle we end up generating contains redundant
masking and ORing of the original b->bit value.  The following
adds a combine pattern in simplify-rtx to specifically match

  (X & C) | ((X | Y) & ~C)

and simplifying that to X | (Y & ~C).  That helps improving 
code-generation from

movzbl  (%rdi), %eax
orl %eax, %esi
andl$-2, %eax
andl$1, %esi
orl %esi, %eax
movb%al, (%rdi)

to

andl$1, %esi
orb %sil, (%rdi)

if you OR in more state association might break the pattern again.

Still the bug was long-time assigned to me (for doing sth on
the tree level for combining multiple adjacent bitfield accesses
as in the original testcase).  So this is my shot at the part
of the problem that isn't going to be solved on trees.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

The "simpler" testcase manages to break the combination on
x86-64 with -m32, a combine missed-optimization I guess.

A similar case can be made for b->bit &= <...>.

OK for trunk?

Thanks,
Richard.

2018-11-05  Richard Biener  

PR middle-end/18041
* simplify-rtx.c (simplify_binary_operation_1): Add pattern
matching bitfield insertion.

* gcc.target/i386/pr18041-1.c: New testcase.
* gcc.target/i386/pr18041-2.c: Likewise.

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 2ff68ceb4e3..0d53135f1ff 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2857,6 +2857,38 @@ simplify_binary_operation_1 (enum rtx_code code, 
machine_mode mode,
XEXP (op0, 1));
 }
 
+  /* The following happens with bitfield merging.
+ (X & C) | ((X | Y) & ~C) -> X | (Y & ~C) */
+  if (GET_CODE (op0) == AND
+ && GET_CODE (op1) == AND
+ && CONST_INT_P (XEXP (op0, 1))
+ && CONST_INT_P (XEXP (op1, 1))
+ && (INTVAL (XEXP (op0, 1))
+ == ~INTVAL (XEXP (op1, 1
+   {
+ /* The IOR may be on both sides.  */
+ rtx top0 = NULL_RTX, top1 = NULL_RTX;
+ if (GET_CODE (XEXP (op1, 0)) == IOR)
+   top0 = op0, top1 = op1;
+ else if (GET_CODE (XEXP (op0, 0)) == IOR)
+   top0 = op1, top1 = op0;
+ if (top0 && top1)
+   {
+ /* X may be on either side of the inner IOR.  */
+ rtx tem = NULL_RTX;
+ if (rtx_equal_p (XEXP (top0, 0),
+  XEXP (XEXP (top1, 0), 0)))
+   tem = XEXP (XEXP (top1, 0), 1);
+ else if (rtx_equal_p (XEXP (top0, 0),
+   XEXP (XEXP (top1, 0), 1)))
+   tem = XEXP (XEXP (top1, 0), 0);
+ if (tem)
+   return simplify_gen_binary (IOR, mode, XEXP (top0, 0),
+   simplify_gen_binary
+ (AND, mode, tem, XEXP (top1, 1)));
+   }
+   }
+
   tem = simplify_byte_swapping_operation (code, mode, op0, op1);
   if (tem)
return tem;
diff --git a/gcc/testsuite/gcc.target/i386/pr18041-1.c 
b/gcc/testsuite/gcc.target/i386/pr18041-1.c
new file mode 100644
index 000..24da41a02ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr18041-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct B { unsigned bit0 : 1; unsigned bit1 : 1; };
+
+void
+foo (struct B *b)
+{
+b->bit0 = b->bit0 | b->bit1;
+}
+
+/* { dg-final { scan-assembler-times "and" 1 } } */
+/* { dg-final { scan-assembler-times "or" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr18041-2.c 
b/gcc/testsuite/gcc.target/i386/pr18041-2.c
new file mode 100644
index 000..00ebd2ae36d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr18041-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct B { unsigned bit0 : 1; unsigned bit1 : 1; };
+
+void
+bar  (struct B *b, int x)
+{
+  b->bit0 |= x;
+}
+
+/* This fails to combine in 32bit mode but not for x32.  */
+/* { dg-final { scan-assembler-times "and" 1 { xfail { { ! x32 } && ilp32 } } 
} } */
+/* { dg-final { scan-assembler-times "or" 1 { xfail { { ! x32 } && ilp32 } } } 
} */


Re: Fix SPEC gcc micompile with LTO

2018-11-05 Thread Richard Biener
On Mon, 5 Nov 2018, Jan Hubicka wrote:

> Hi,
> this patch fixes the miscompare I introduced to spec2006 GCC benchmark
> when build with LTO.
> The problem is that fld_incomplete_type_of builds new pointer type to
> incomplete type rather than complete but it ends up giving wrong type
> canonical.
> 
> This patch also improves TBAA with early opts because we do no lose info
> by producing incomplete variants. 
> Note that build_pointer_type may return existing type and in that case I
> overwrite TYPE_CANONICAL of it, but I believe it should be harmless
> because all pointers to a given type should have canonicals constructed
> same way.
> 
> lto-bootstrapped/regtested x86_64-linux.
> 
> Honza
>   * gcc.dg/lto/tbaa-1.c: New testcase.
>   * tree.c (fld_incomplete_type_of): Copy TYPE_CANONICAL while creating
>   pointer type.
> Index: testsuite/gcc.dg/lto/tbaa-1.c
> ===
> --- testsuite/gcc.dg/lto/tbaa-1.c (nonexistent)
> +++ testsuite/gcc.dg/lto/tbaa-1.c (working copy)
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -flto -fdump-tree-evrp" } */
> +typedef struct rtx_def *rtx;
> +typedef struct cselib_val_struct
> +{
> +  union
> +  {
> +  } u;
> +  struct elt_loc_list *locs;
> +}
> +cselib_val;
> +struct elt_loc_list
> +{
> +  struct elt_loc_list *next;
> +  rtx loc;
> +};
> +static int n_useless_values;
> +unchain_one_elt_loc_list (pl)
> + struct elt_loc_list **pl;
> +{
> +  struct elt_loc_list *l = *pl;
> +  *pl = l->next;
> +}
> +
> +discard_useless_locs (x, info)
> + void **x;
> +{
> +  cselib_val *v = (cselib_val *) * x;
> +  struct elt_loc_list **p = &v->locs;
> +  int had_locs = v->locs != 0;
> +  while (*p)
> +{
> +  unchain_one_elt_loc_list (p);
> +  p = &(*p)->next;
> +}
> +  if (had_locs && v->locs == 0)
> +{
> +  n_useless_values++;
> +}
> +}
> +/* { dg-final { scan-tree-dump-times "n_useless_values" 2 "evrp" } } */  
>
> Index: tree.c
> ===
> --- tree.c(revision 265766)
> +++ tree.c(working copy)
> @@ -5146,6 +5146,7 @@ fld_incomplete_type_of (tree t, struct f
> else
>   first = build_reference_type_for_mode (t2, TYPE_MODE (t),
>   TYPE_REF_CAN_ALIAS_ALL (t));
> +   TYPE_CANONICAL (first) = TYPE_CANONICAL (TYPE_MAIN_VARIANT (t));

Hmm, this _should_ be a no-op.  Can you, before that line, add

  gcc_assert (TYPE_CANONICAL (t2) != t2
  && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));

?  That is, the incomplete variant should share TYPE_CANONICAL with
the pointed-to type and be _not_ the canonical leader (otherwise
all other pointer types are bogus).


> add_tree_to_fld_list (first, fld);
> return fld_type_variant (first, t, fld);
>   }
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] Fix PR87873

2018-11-05 Thread Richard Biener


The fragile PHI copying logic in the vectorizer got confused by
constants in loop-closed PHI nodes.  Fixed like the following.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

>From a965417cbefd54f45ac6f2b6e3d5dc39c307da09 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Mon, 5 Nov 2018 13:02:48 +0100
Subject: [PATCH] fix-pr87873

2018-11-05  Richard Biener  

PR tree-optimization/87873
* tree-ssa-loop-manip.h (split_loop_exit_edge): Add copy_constants_p
argument.
* tree-ssa-loop-manip.c (split_loop_exit_edge): Likewise.
* tree-vect-loop.c (vect_transform_loop): When splitting the
loop exit also create forwarder PHIs for constants.
* tree-vect-loop-manip.c (slpeel_duplicate_current_defs_from_edges):
Handle constant to_arg, add extra checking we match up the correct
PHIs.

* gcc.dg/pr87873.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/pr87873.c b/gcc/testsuite/gcc.dg/pr87873.c
new file mode 100644
index 000..63d05342b40
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr87873.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-loop-vectorize" } */
+
+long k3;
+int gs;
+
+void
+s2 (int aj)
+{
+  while (aj < 1)
+{
+  gs ^= 1;
+  k3 = (long) gs * 2;
+  if (k3 != 0)
+   k3 = 0;
+
+  ++aj;
+}
+}
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 5acee6c98f3..726590ac6df 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -773,10 +773,12 @@ verify_loop_closed_ssa (bool verify_ssa_p, struct loop 
*loop)
 }
 
 /* Split loop exit edge EXIT.  The things are a bit complicated by a need to
-   preserve the loop closed ssa form.  The newly created block is returned.  */
+   preserve the loop closed ssa form.  If COPY_CONSTANTS_P is true then
+   forwarder PHIs are also created for constant arguments.
+   The newly created block is returned.  */
 
 basic_block
-split_loop_exit_edge (edge exit)
+split_loop_exit_edge (edge exit, bool copy_constants_p)
 {
   basic_block dest = exit->dest;
   basic_block bb = split_edge (exit);
@@ -796,12 +798,13 @@ split_loop_exit_edge (edge exit)
 
   /* If the argument of the PHI node is a constant, we do not need
 to keep it inside loop.  */
-  if (TREE_CODE (name) != SSA_NAME)
+  if (TREE_CODE (name) != SSA_NAME
+ && !copy_constants_p)
continue;
 
   /* Otherwise create an auxiliary phi node that will copy the value
 of the SSA name out of the loop.  */
-  new_name = duplicate_ssa_name (name, NULL);
+  new_name = duplicate_ssa_name (PHI_RESULT (phi), NULL);
   new_phi = create_phi_node (new_name, bb);
   add_phi_arg (new_phi, name, exit, locus);
   SET_USE (op_p, new_name);
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index 390ac6f8278..ddda5cf7515 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -37,7 +37,7 @@ checking_verify_loop_closed_ssa (bool verify_ssa_p, struct 
loop *loop = NULL)
 verify_loop_closed_ssa (verify_ssa_p, loop);
 }
 
-extern basic_block split_loop_exit_edge (edge);
+extern basic_block split_loop_exit_edge (edge, bool = false);
 extern basic_block ip_end_pos (struct loop *);
 extern basic_block ip_normal_pos (struct loop *);
 extern void standard_iv_increment_position (struct loop *,
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 1d1d1147696..f1b023b4e4e 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -977,10 +977,15 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge 
to)
}
   if (TREE_CODE (from_arg) != SSA_NAME)
gcc_assert (operand_equal_p (from_arg, to_arg, 0));
-  else
+  else if (TREE_CODE (to_arg) == SSA_NAME)
{
  if (get_current_def (to_arg) == NULL_TREE)
-   set_current_def (to_arg, get_current_def (from_arg));
+   {
+ gcc_assert (types_compatible_p (TREE_TYPE (to_arg),
+ TREE_TYPE (get_current_def
+  (from_arg;
+ set_current_def (to_arg, get_current_def (from_arg));
+   }
}
   gsi_next (&gsi_from);
   gsi_next (&gsi_to);
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 3cdf46d723c..51be405b5a0 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -8236,7 +8236,7 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   edge e = single_exit (loop);
   if (! single_pred_p (e->dest))
 {
-  split_loop_exit_edge (e);
+  split_loop_exit_edge (e, true);
   if (dump_enabled_p ())
dump_printf (MSG_NOTE, "split exit edge\n");
 }


Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-11-05 Thread Jeff Law
On 11/5/18 7:21 AM, Jan Hubicka wrote:
>>
>> Did you mean "the nearest common dominator"?
> 
> If the nearest common dominator appears in the loop while all uses are
> out of loops, this will result in suboptimal xor placement.
> In this case you want to split edges out of the loop.
> 
> In general this is what the LCM framework will do for you if the problem
> is modelled siimlar way as in mode_swtiching.  At entry function mode is
> "no zero register needed" and all conversions need mode "zero register
> needed".  Mode switching should then do the correct placement decisions
> (reaching minimal number of executions of xor).
> 
> Jeff, whan is your optinion on the approach taken by the patch?
> It seems like a special case of more general issue, but I do not see
> very elegant way to solve it at least in the GCC 9 horisont, so if
> the placement is correct we can probalby go either with new pass or
> making this part of mode swithcing (which is anyway run by x86 backend)
So I haven't followed this discussion at all, but did touch on this
issue with some patch a month or two ago with a target patch that was
trying to avoid the partial stalls.

My assumption is that we're trying to find one or more places to
initialize the upper half of an avx register so as to avoid partial
register stall at existing sites that set the upper half.

This sounds like a classic PRE/LCM style problem (of which mode
switching is just another variant).   A common-dominator approach is
closer to a classic GCSE and is going to result is more initializations
at sub-optimal points than a PRE/LCM style.

The only advantage a common-dominator approach would have that I could
think of would be potentially further separating the initialization from
the subsequent use points which avoid store-store stalls or somesuch.  I
doubt this effect would be enough to overcome the inherent advantages of
a PRE/LCM approach.


ISTM that if we were to scan the RTL noting which instructions set the
upper part of the avx register, which instructions have potential stalls
and which instructions reset the upper half to an indeterminate state
(calls), then we have the local properties.  THen we feed that into a
traditional LCM solver and we get back the optimal points.

The only weirdness is that we don't want to move existing instructions
that set the upper bits.  Those are essentially fixed.  So maybe this is
actually better modeled by click's algorithm which has the concept of
pinned instructions.  But click's algorithm assumes SSA.  Ugh.

I'd probably have to sit down with it for a while -- it might be
possible to handle the fixed instructions using some of the ideas from
click (essentially exposing the earliest/latest results from LCM, then
picking a point on the dominator path between earliest and latest).



jeff


Re: [PATCH v3 3/3] or1k: gcc: initial support for openrisc

2018-11-05 Thread Rich Felker
On Mon, Nov 05, 2018 at 11:13:53AM +, Szabolcs Nagy wrote:
> On 04/11/18 09:05, Stafford Horne wrote:
> > On Mon, Oct 29, 2018 at 02:28:11PM +, Szabolcs Nagy wrote:
> >> On 27/10/18 05:37, Stafford Horne wrote:
> ...
> >>> +#undef LINK_SPEC
> >>> +#define LINK_SPEC "%{h*} \
> >>> +   %{static:-Bstatic}\
> >>> +   %{shared:-shared} \
> >>> +   %{symbolic:-Bsymbolic}\
> >>> +   %{!static:\
> >>> + %{rdynamic:-export-dynamic} \
> >>> + %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}"
> >>> +
> >>> +#endif /* GCC_OR1K_LINUX_H */
> >>
> >> note that because of the -static-pie mess each
> >> target needs a more complicated LINK_SPEC now.
> > 
> > Hello,
> > 
> > Does something like this look better?
> > 
> > --- a/gcc/config/or1k/linux.h
> > +++ b/gcc/config/or1k/linux.h
> > @@ -37,8 +37,9 @@
> > %{static:-Bstatic}  \
> > %{shared:-shared}   \
> > %{symbolic:-Bsymbolic}  \
> > -   %{!static:  \
> > +   %{!static:%{!static-pie:\
> >   %{rdynamic:-export-dynamic}   \
> > - %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}"
> > + %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}} \
> > +   %{static-pie:-Bstatic -pie --no-dynamic-linker -z text}"
> >  
> >  #endif /* GCC_OR1K_LINUX_H */
> 
> looks ok.
> 
> > I have tested this out with or1k-linux-musl, but I get some LD complaints 
> > i.e.
> > 
> > .../or1k-linux-musl/bin/ld: .../or1k-linux-musl/lib/libc.a(exit.o): non-pic 
> > relocation against symbol __fini_array_end
> > .../or1k-linux-musl/bin/ld: .../or1k-linux-musl/lib/libc.a(exit.o): non-pic 
> > relocation against symbol __fini_array_start
> > 
> > Those are some warnings we recently added to LD, perhaps I need to rebuild 
> > the
> > libc.a with PIE as well.  I will try it out, but if anyone has some 
> > suggestions
> > that would be helpful.
> 
> yes, musl does not build libc.a with pic by default,
> either use a gcc configured with --enable-default-pie
> or CC='gcc -fPIC' when building musl.
> 
> after that -static-pie linking should work.
> 
> (maybe musl should have an --enable-static-pie config
> option to make this simpler)

For practical purposes, if you want to use static pie, you need a
default-pie toolchain. This is because _every_ static lib you might
link needs to be built with -fPIE (or -fPIC), and ensuring that
happens on a package-by-package basis is largely impractical; at least
it's on the same order of magnitude of difficulty as other systems
integration/packaging tasks.

However from the musl side it might make sense to produce a libc_pic.a
as part of the build process. This would make it easy to replace
libc.a with libc_pic.a if desired, and could also be used as the basis
for linking libc.so and to allow production of a stripped-down libc.so
that only includes symbols a fixed set of binaries depend on. We could
discuss something like this on the musl list.

Rich


Re: [PATCH, testsuite] ignore some "conflicting types for built-in" messages

2018-11-05 Thread Paul Koning



> On Nov 3, 2018, at 10:12 PM, Jeff Law  wrote:
> 
> On 11/1/18 1:13 PM, Paul Koning wrote:
>> A number of test cases contain declarations like:
>>  void *memcpy();
>> which currently are silently accepted on most platforms but not on all; 
>> pdp11 (and possibly some others) generate a "conflicting types for built-in 
>> function" warning.
>> 
>> It was suggested to prune those messages because the test cases where these 
>> occur are not looking for the message but are testing some other issue, so 
>> the message is not relevant.  The attached patch adds dg-prune-output 
>> directives to do so.
>> 
>> Ok for trunk?
>> 
>>  paul
>> 
>> ChangeLog:
>> 
>> 2018-11-01  Paul Koning  
>> 
>>  * gcc.dg/Walloca-16.c: Ignore conflicting types for built-in
>>  warnings.
>>  * gcc.dg/Wrestrict-4.c: Ditto.
>>  * gcc.dg/Wrestrict-5.c: Ditto.
>>  * gcc.dg/pr83463.c: Ditto.
>>  * gcc.dg/torture/pr55890-2.c: Ditto.
>>  * gcc.dg/torture/pr55890-3.c: Ditto.
>>  * gcc.dg/torture/pr71816.c: Ditto.
> ISTM it'd be better to just fix memcpy to have a correct prototype.
> 
> jeff

I can do that, but I'm wondering if some systems have different prototypes than 
the C standard calls for so I'd end up breaking those.

paul



Re: Fix SPEC gcc micompile with LTO

2018-11-05 Thread Jan Hubicka
> Hmm, this _should_ be a no-op.  Can you, before that line, add
> 
>   gcc_assert (TYPE_CANONICAL (t2) != t2
>   && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
> 
> ?  That is, the incomplete variant should share TYPE_CANONICAL with
> the pointed-to type and be _not_ the canonical leader (otherwise
> all other pointer types are bogus).

It looks like good idea.  I am re-checking with that change that already
found a bug - build_distinct_type_variant actually resets TYPE_CANONICAL
which I have missed earlier. So I am testing

Index: tree.c
===
--- tree.c  (revision 265807)
+++ tree.c  (working copy)
@@ -5146,6 +5146,9 @@ fld_incomplete_type_of (tree t, struct f
  else
first = build_reference_type_for_mode (t2, TYPE_MODE (t),
TYPE_REF_CAN_ALIAS_ALL (t));
+ gcc_assert (TYPE_CANONICAL (t2) != t2
+ && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
+ TYPE_CANONICAL (first) = TYPE_CANONICAL (TYPE_MAIN_VARIANT (t));
  add_tree_to_fld_list (first, fld);
  return fld_type_variant (first, t, fld);
}
@@ -5169,6 +5172,7 @@ fld_incomplete_type_of (tree t, struct f
  SET_TYPE_MODE (copy, VOIDmode);
  SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
  TYPE_SIZE_UNIT (copy) = NULL;
+ TYPE_CANONICAL (copy) = t;
  if (AGGREGATE_TYPE_P (t))
{
  TYPE_FIELDS (copy) = NULL;

Honza


Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-11-05 Thread Jan Hubicka
> On 11/5/18 7:21 AM, Jan Hubicka wrote:
> >>
> >> Did you mean "the nearest common dominator"?
> > 
> > If the nearest common dominator appears in the loop while all uses are
> > out of loops, this will result in suboptimal xor placement.
> > In this case you want to split edges out of the loop.
> > 
> > In general this is what the LCM framework will do for you if the problem
> > is modelled siimlar way as in mode_swtiching.  At entry function mode is
> > "no zero register needed" and all conversions need mode "zero register
> > needed".  Mode switching should then do the correct placement decisions
> > (reaching minimal number of executions of xor).
> > 
> > Jeff, whan is your optinion on the approach taken by the patch?
> > It seems like a special case of more general issue, but I do not see
> > very elegant way to solve it at least in the GCC 9 horisont, so if
> > the placement is correct we can probalby go either with new pass or
> > making this part of mode swithcing (which is anyway run by x86 backend)
> So I haven't followed this discussion at all, but did touch on this
> issue with some patch a month or two ago with a target patch that was
> trying to avoid the partial stalls.
> 
> My assumption is that we're trying to find one or more places to
> initialize the upper half of an avx register so as to avoid partial
> register stall at existing sites that set the upper half.
> 
> This sounds like a classic PRE/LCM style problem (of which mode
> switching is just another variant).   A common-dominator approach is
> closer to a classic GCSE and is going to result is more initializations
> at sub-optimal points than a PRE/LCM style.

yes, it is usual code placement problem. It is special case because the
zero register is not modified by the conversion (just we need to have
zero somewhere).  So basically we do not have kills to the zero except
for entry block.

Honza


Re: Fix SPEC gcc micompile with LTO

2018-11-05 Thread Richard Biener
On Mon, 5 Nov 2018, Jan Hubicka wrote:

> > Hmm, this _should_ be a no-op.  Can you, before that line, add
> > 
> >   gcc_assert (TYPE_CANONICAL (t2) != t2
> >   && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
> > 
> > ?  That is, the incomplete variant should share TYPE_CANONICAL with
> > the pointed-to type and be _not_ the canonical leader (otherwise
> > all other pointer types are bogus).
> 
> It looks like good idea.  I am re-checking with that change that already
> found a bug - build_distinct_type_variant actually resets TYPE_CANONICAL
> which I have missed earlier. So I am testing
> 
> Index: tree.c
> ===
> --- tree.c(revision 265807)
> +++ tree.c(working copy)
> @@ -5146,6 +5146,9 @@ fld_incomplete_type_of (tree t, struct f
> else
>   first = build_reference_type_for_mode (t2, TYPE_MODE (t),
>   TYPE_REF_CAN_ALIAS_ALL (t));
> +   gcc_assert (TYPE_CANONICAL (t2) != t2
> +   && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
> +   TYPE_CANONICAL (first) = TYPE_CANONICAL (TYPE_MAIN_VARIANT (t));

as said the TYPE_CANONICAL assign should be already done exactly this
way in build_{poitner,reference}_for_mode.  So you should be able to
drop this from the patch.

> add_tree_to_fld_list (first, fld);
> return fld_type_variant (first, t, fld);
>   }
> @@ -5169,6 +5172,7 @@ fld_incomplete_type_of (tree t, struct f
> SET_TYPE_MODE (copy, VOIDmode);
> SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
> TYPE_SIZE_UNIT (copy) = NULL;
> +   TYPE_CANONICAL (copy) = t;

Or use build_variant_type_copy in the first place?  But you do not
seme to queue the new types in the variant list?

> if (AGGREGATE_TYPE_P (t))
>   {
> TYPE_FIELDS (copy) = NULL;
> 
> Honza
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Fix SPEC gcc micompile with LTO

2018-11-05 Thread Jan Hubicka
> > + gcc_assert (TYPE_CANONICAL (t2) != t2
> > + && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
> > + TYPE_CANONICAL (first) = TYPE_CANONICAL (TYPE_MAIN_VARIANT (t));
> 
> as said the TYPE_CANONICAL assign should be already done exactly this
> way in build_{poitner,reference}_for_mode.  So you should be able to
> drop this from the patch.

OK, I have turned this into a sanity check and re-testing.
> 
> >   add_tree_to_fld_list (first, fld);
> >   return fld_type_variant (first, t, fld);
> > }
> > @@ -5169,6 +5172,7 @@ fld_incomplete_type_of (tree t, struct f
> >   SET_TYPE_MODE (copy, VOIDmode);
> >   SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
> >   TYPE_SIZE_UNIT (copy) = NULL;
> > + TYPE_CANONICAL (copy) = t;
> 
> Or use build_variant_type_copy in the first place?  But you do not
> seme to queue the new types in the variant list?

build_variant_type_copy would set TYPE_MAIN_VARAINT (copy) to be T.
I do not want this to happen since streaming would then pick complete
type as well.

Honza


Re: [PATCH, testsuite] ignore some "conflicting types for built-in" messages

2018-11-05 Thread Martin Sebor

On 11/05/2018 08:12 AM, Paul Koning wrote:




On Nov 3, 2018, at 10:12 PM, Jeff Law  wrote:

On 11/1/18 1:13 PM, Paul Koning wrote:

A number of test cases contain declarations like:
 void *memcpy();
which currently are silently accepted on most platforms but not on all; pdp11 (and 
possibly some others) generate a "conflicting types for built-in function" 
warning.

It was suggested to prune those messages because the test cases where these 
occur are not looking for the message but are testing some other issue, so the 
message is not relevant.  The attached patch adds dg-prune-output directives to 
do so.

Ok for trunk?

paul

ChangeLog:

2018-11-01  Paul Koning  

* gcc.dg/Walloca-16.c: Ignore conflicting types for built-in
warnings.
* gcc.dg/Wrestrict-4.c: Ditto.
* gcc.dg/Wrestrict-5.c: Ditto.
* gcc.dg/pr83463.c: Ditto.
* gcc.dg/torture/pr55890-2.c: Ditto.
* gcc.dg/torture/pr55890-3.c: Ditto.
* gcc.dg/torture/pr71816.c: Ditto.

ISTM it'd be better to just fix memcpy to have a correct prototype.

jeff


I can do that, but I'm wondering if some systems have different prototypes than 
the C standard calls for so I'd end up breaking those.


The tests verify that GCC doesn't crash on calls to built-ins
declared without a prototype.  We don't want to declare them,
that would defeat the purpose of the tests.

Martin


Re: Fix SPEC gcc micompile with LTO

2018-11-05 Thread Jan Hubicka
Hi,
this is patch I ended up testing.  It ensures that canonical types of
copies I create are same as of originals C++ FE has its own refernece
type construction (cp_build_reference_type) and it creates additional
pointer types with TYPE_REF_IS_RVALUE set and it has different
TYPE_CANONICAL.

Obviously we do not see this in middle-end and we end up merging the
types despite fact they have different TYPE_CANONICAL.
I guess I can immitate the behaviour in fld_incomplete_type_of by 
implementing my own variant of build_pointer_type that also matches
TYPE_CANONICAL of the pointer it creates.  I wonder if there are better
solutions?

Honza

Index: tree.c
===
--- tree.c  (revision 265807)
+++ tree.c  (working copy)
@@ -5118,6 +5118,7 @@ fld_type_variant (tree first, tree t, st
   TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
   TYPE_NAME (v) = TYPE_NAME (t);
   TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
+  TYPE_CANONICAL (v) = TYPE_CANONICAL (t);
   add_tree_to_fld_list (v, fld);
   return v;
 }
@@ -5146,6 +5147,10 @@ fld_incomplete_type_of (tree t, struct f
  else
first = build_reference_type_for_mode (t2, TYPE_MODE (t),
TYPE_REF_CAN_ALIAS_ALL (t));
+ gcc_assert (TYPE_CANONICAL (t2) != t2
+ && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t))
+ && TYPE_CANONICAL (first)
+== TYPE_CANONICAL (TYPE_MAIN_VARIANT (t)));
  add_tree_to_fld_list (first, fld);
  return fld_type_variant (first, t, fld);
}
@@ -5169,6 +5174,7 @@ fld_incomplete_type_of (tree t, struct f
  SET_TYPE_MODE (copy, VOIDmode);
  SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
  TYPE_SIZE_UNIT (copy) = NULL;
+ TYPE_CANONICAL (copy) = TYPE_CANONICAL (t);
  if (AGGREGATE_TYPE_P (t))
{
  TYPE_FIELDS (copy) = NULL;


Re: Fix SPEC gcc micompile with LTO

2018-11-05 Thread Jan Hubicka
> Hi,
> this is patch I ended up testing.  It ensures that canonical types of
> copies I create are same as of originals C++ FE has its own refernece
piece of mail got lost rendering the paragraph unreadable.  I wanted to say:

This is patch I ended up testing.  It ensures that canonical types of
copies I create are same as of originals.  It however ICEs building
auto-profile.c because C++ FE has its own reference  type construction
(cp_build_reference_type) and it creates additional pointer types with
TYPE_REF_IS_RVALUE set and it has different TYPE_CANONICAL.
> 
> Obviously we do not see this in middle-end and we end up merging the
> types despite fact they have different TYPE_CANONICAL.
> I guess I can immitate the behaviour in fld_incomplete_type_of by 
> implementing my own variant of build_pointer_type that also matches
> TYPE_CANONICAL of the pointer it creates.  I wonder if there are better
> solutions?
> 
> Honza
> 
> Index: tree.c
> ===
> --- tree.c(revision 265807)
> +++ tree.c(working copy)
> @@ -5118,6 +5118,7 @@ fld_type_variant (tree first, tree t, st
>TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
>TYPE_NAME (v) = TYPE_NAME (t);
>TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
> +  TYPE_CANONICAL (v) = TYPE_CANONICAL (t);
>add_tree_to_fld_list (v, fld);
>return v;
>  }
> @@ -5146,6 +5147,10 @@ fld_incomplete_type_of (tree t, struct f
> else
>   first = build_reference_type_for_mode (t2, TYPE_MODE (t),
>   TYPE_REF_CAN_ALIAS_ALL (t));
> +   gcc_assert (TYPE_CANONICAL (t2) != t2
> +   && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t))
> +   && TYPE_CANONICAL (first)
> +  == TYPE_CANONICAL (TYPE_MAIN_VARIANT (t)));
> add_tree_to_fld_list (first, fld);
> return fld_type_variant (first, t, fld);
>   }
> @@ -5169,6 +5174,7 @@ fld_incomplete_type_of (tree t, struct f
> SET_TYPE_MODE (copy, VOIDmode);
> SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
> TYPE_SIZE_UNIT (copy) = NULL;
> +   TYPE_CANONICAL (copy) = TYPE_CANONICAL (t);
> if (AGGREGATE_TYPE_P (t))
>   {
> TYPE_FIELDS (copy) = NULL;


Re: [PATCH], Remove power9 fusion support

2018-11-05 Thread Mike Stump
On Nov 2, 2018, at 11:37 AM, Michael Meissner  wrote:
> 
> As I discussed in my 2018 Cauldron talk, the PowerPC GCC compiler supported a
> subset of the original design for fusion in the power9 hardware using 
> peepholes
> to fuse together ADDIS instructions and floating point load/store operations.
> 
> However, while fusion was part of the original power9 design, by the time the
> machine came out, the fusion support was no longer part of the architecture.
> 
> This patch removes all of the so-called power9 fusion support for the GCC
> compiler.  It leaves -mpower9-fusion as a deprecated switch

So, I'd just remove the flag support as well.  Anyone that hits on it, will 
want to examine their code and have the opportunity to fix it.


[PATCH] S/390: Make tests expect column numbers in RTL output

2018-11-05 Thread Ilya Leoshkevich
RTL output now includes column numbers in addition to line numbers,
like this:

  "gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c":16:1

This confuses some S/390 tests.

gcc/testsuite/ChangeLog:

2018-11-05  Ilya Leoshkevich  

* gcc.target/s390/md/andc-splitter-1.c: Add colon to
expectations.
* gcc.target/s390/md/andc-splitter-2.c: Likewise.
* gcc.target/s390/md/setmem_long-1.c: Likewise.
---
 .../gcc.target/s390/md/andc-splitter-1.c | 16 
 .../gcc.target/s390/md/andc-splitter-2.c | 16 
 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c |  4 ++--
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c 
b/gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c
index 3f0677cfd76..36f2cfc53de 100644
--- a/gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c
+++ b/gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c
@@ -14,26 +14,26 @@
 __attribute__ ((noinline))
 unsigned long andc_vv(unsigned long a, unsigned long b)
 { return ~b & a; }
-/* { dg-final { scan-assembler ":16 .\* \{\\*anddi3\}" } } */
-/* { dg-final { scan-assembler ":16 .\* \{\\*xordi3\}" } } */
+/* { dg-final { scan-assembler ":16:.\* \{\\*anddi3\}" } } */
+/* { dg-final { scan-assembler ":16:.\* \{\\*xordi3\}" } } */
 
 __attribute__ ((noinline))
 unsigned long andc_pv(unsigned long *a, unsigned long b)
 { return ~b & *a; }
-/* { dg-final { scan-assembler ":22 .\* \{\\*anddi3\}" } } */
-/* { dg-final { scan-assembler ":22 .\* \{\\*xordi3\}" } } */
+/* { dg-final { scan-assembler ":22:.\* \{\\*anddi3\}" } } */
+/* { dg-final { scan-assembler ":22:.\* \{\\*xordi3\}" } } */
 
 __attribute__ ((noinline))
 unsigned long andc_vp(unsigned long a, unsigned long *b)
 { return ~*b & a; }
-/* { dg-final { scan-assembler ":28 .\* \{\\*anddi3\}" } } */
-/* { dg-final { scan-assembler ":28 .\* \{\\*xordi3\}" } } */
+/* { dg-final { scan-assembler ":28:.\* \{\\*anddi3\}" } } */
+/* { dg-final { scan-assembler ":28:.\* \{\\*xordi3\}" } } */
 
 __attribute__ ((noinline))
 unsigned long andc_pp(unsigned long *a, unsigned long *b)
 { return ~*b & *a; }
-/* { dg-final { scan-assembler ":34 .\* \{\\*anddi3\}" } } */
-/* { dg-final { scan-assembler ":34 .\* \{\\*xordi3\}" } } */
+/* { dg-final { scan-assembler ":34:.\* \{\\*anddi3\}" } } */
+/* { dg-final { scan-assembler ":34:.\* \{\\*xordi3\}" } } */
 
 /* { dg-final { scan-assembler-times "\tngr\?k\?\t" 4 } } */
 /* { dg-final { scan-assembler-times "\txgr\?\t" 4 } } */
diff --git a/gcc/testsuite/gcc.target/s390/md/andc-splitter-2.c 
b/gcc/testsuite/gcc.target/s390/md/andc-splitter-2.c
index 89c8ea25f99..75ab75b5273 100644
--- a/gcc/testsuite/gcc.target/s390/md/andc-splitter-2.c
+++ b/gcc/testsuite/gcc.target/s390/md/andc-splitter-2.c
@@ -14,26 +14,26 @@
 __attribute__ ((noinline))
 unsigned int andc_vv(unsigned int a, unsigned int b)
 { return ~b & a; }
-/* { dg-final { scan-assembler ":16 .\* \{\\*andsi3_\(esa\|zarch\)\}" } } */
-/* { dg-final { scan-assembler ":16 .\* \{\\*xorsi3\}" } } */
+/* { dg-final { scan-assembler ":16:.\* \{\\*andsi3_\(esa\|zarch\)\}" } } */
+/* { dg-final { scan-assembler ":16:.\* \{\\*xorsi3\}" } } */
 
 __attribute__ ((noinline))
 unsigned int andc_pv(unsigned int *a, unsigned int b)
 { return ~b & *a; }
-/* { dg-final { scan-assembler ":22 .\* \{\\*andsi3_\(esa\|zarch\)\}" } } */
-/* { dg-final { scan-assembler ":22 .\* \{\\*xorsi3\}" } } */
+/* { dg-final { scan-assembler ":22:.\* \{\\*andsi3_\(esa\|zarch\)\}" } } */
+/* { dg-final { scan-assembler ":22:.\* \{\\*xorsi3\}" } } */
 
 __attribute__ ((noinline))
 unsigned int andc_vp(unsigned int a, unsigned int *b)
 { return ~*b & a; }
-/* { dg-final { scan-assembler ":28 .\* \{\\*andsi3_\(esa\|zarch\)\}" } } */
-/* { dg-final { scan-assembler ":28 .\* \{\\*xorsi3\}" } } */
+/* { dg-final { scan-assembler ":28:.\* \{\\*andsi3_\(esa\|zarch\)\}" } } */
+/* { dg-final { scan-assembler ":28:.\* \{\\*xorsi3\}" } } */
 
 __attribute__ ((noinline))
 unsigned int andc_pp(unsigned int *a, unsigned int *b)
 { return ~*b & *a; }
-/* { dg-final { scan-assembler ":34 .\* \{\\*andsi3_\(esa\|zarch\)\}" } } */
-/* { dg-final { scan-assembler ":34 .\* \{\\*xorsi3\}" } } */
+/* { dg-final { scan-assembler ":34:.\* \{\\*andsi3_\(esa\|zarch\)\}" } } */
+/* { dg-final { scan-assembler ":34:.\* \{\\*xorsi3\}" } } */
 
 /* { dg-final { scan-assembler-times "\tnr\?k\?\t" 4 } } */
 /* { dg-final { scan-assembler-times "\txr\?k\?\t" 4 } } */
diff --git a/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c 
b/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
index dec7197cfa9..a1d1c11df37 100644
--- a/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
+++ b/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
@@ -23,8 +23,8 @@ void test2(char *p, int c, int len)
 }
 
 /* Check that the right patterns are used.  */
-/* { dg-final { scan-assembler-times {c"?:16 .*{[*]setmem_long_?3?1?z?}} 1 } } 
*/
-/* { dg-final { scan-assembler-times

Re: [PATCH] S/390: Make tests expect column numbers in RTL output

2018-11-05 Thread Andreas Krebbel
On 05.11.18 17:32, Ilya Leoshkevich wrote:
> RTL output now includes column numbers in addition to line numbers,
> like this:
> 
>   "gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c":16:1
> 
> This confuses some S/390 tests.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-11-05  Ilya Leoshkevich  
> 
>   * gcc.target/s390/md/andc-splitter-1.c: Add colon to
>   expectations.
>   * gcc.target/s390/md/andc-splitter-2.c: Likewise.
>   * gcc.target/s390/md/setmem_long-1.c: Likewise.

Ok. Thanks!

Andreas



Re: [PATCH] S/390: Increase register move costs for CC_REGS

2018-11-05 Thread Andreas Krebbel
On 05.11.18 15:38, Robin Dapp wrote:
> Hi,
> 
> the attached patch increases the move costs for moves involving the CC
> register.  This saves us some instructions in SPEC CPU2006.
> 
> Regards
>  Robin
> 
> --
> 
> gcc/ChangeLog:
> 
> 2018-11-05  Robin Dapp  
> 
>   * config/s390/s390.c (s390_register_move_cost): Increase costs
>   for moves involving the CC reg.
> 

Ok. Thanks!

Andreas



Re: [PATCH, testsuite] ignore some "conflicting types for built-in" messages

2018-11-05 Thread Jeff Law
On 11/5/18 8:12 AM, Paul Koning wrote:
> 
> 
>> On Nov 3, 2018, at 10:12 PM, Jeff Law  wrote:
>>
>> On 11/1/18 1:13 PM, Paul Koning wrote:
>>> A number of test cases contain declarations like:
>>>  void *memcpy();
>>> which currently are silently accepted on most platforms but not on all; 
>>> pdp11 (and possibly some others) generate a "conflicting types for built-in 
>>> function" warning.
>>>
>>> It was suggested to prune those messages because the test cases where these 
>>> occur are not looking for the message but are testing some other issue, so 
>>> the message is not relevant.  The attached patch adds dg-prune-output 
>>> directives to do so.
>>>
>>> Ok for trunk?
>>>
>>> paul
>>>
>>> ChangeLog:
>>>
>>> 2018-11-01  Paul Koning  
>>>
>>> * gcc.dg/Walloca-16.c: Ignore conflicting types for built-in
>>> warnings.
>>> * gcc.dg/Wrestrict-4.c: Ditto.
>>> * gcc.dg/Wrestrict-5.c: Ditto.
>>> * gcc.dg/pr83463.c: Ditto.
>>> * gcc.dg/torture/pr55890-2.c: Ditto.
>>> * gcc.dg/torture/pr55890-3.c: Ditto.
>>> * gcc.dg/torture/pr71816.c: Ditto.
>> ISTM it'd be better to just fix memcpy to have a correct prototype.
>>
>> jeff
> 
> I can do that, but I'm wondering if some systems have different prototypes 
> than the C standard calls for so I'd end up breaking those.I wouldn't worry 
> about those.  I think the bigger question (thanks
Martin) is whether or not any of those tests are checking for issues
that arise specifically due to not having a full prototype available
(and in those cases your fix is probably more appropriate).

Probably the only way to figure that out is to dig into the history of
each one :(  Mighty unpleasant.

jeff


Re: [PATCH][RTL] Add simplify pattern for bitfield insertion

2018-11-05 Thread Jeff Law
On 11/5/18 7:44 AM, Richard Biener wrote:
> 
> The PR18041 testcase is about bitfield insertion of the style
> 
>  b->bit |= <...>
> 
> where the RMW cycle we end up generating contains redundant
> masking and ORing of the original b->bit value.  The following
> adds a combine pattern in simplify-rtx to specifically match
> 
>   (X & C) | ((X | Y) & ~C)
> 
> and simplifying that to X | (Y & ~C).  That helps improving 
> code-generation from
> 
> movzbl  (%rdi), %eax
> orl %eax, %esi
> andl$-2, %eax
> andl$1, %esi
> orl %esi, %eax
> movb%al, (%rdi)
> 
> to
> 
> andl$1, %esi
> orb %sil, (%rdi)
> 
> if you OR in more state association might break the pattern again.
> 
> Still the bug was long-time assigned to me (for doing sth on
> the tree level for combining multiple adjacent bitfield accesses
> as in the original testcase).  So this is my shot at the part
> of the problem that isn't going to be solved on trees.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> The "simpler" testcase manages to break the combination on
> x86-64 with -m32, a combine missed-optimization I guess.
> 
> A similar case can be made for b->bit &= <...>.
> 
> OK for trunk?
> 
> Thanks,
> Richard.
> 
> 2018-11-05  Richard Biener  
> 
>   PR middle-end/18041
>   * simplify-rtx.c (simplify_binary_operation_1): Add pattern
>   matching bitfield insertion.
> 
>   * gcc.target/i386/pr18041-1.c: New testcase.
>   * gcc.target/i386/pr18041-2.c: Likewise.
There was at least one more BZ in this space (older than 18041).
Essentially all the pieces are there for combine to figure out we've got
a bitfield twiddle, but the the structure of some of combine's code made
it exceedingly hard to exploit.  I wonder if this would help.   I'm sure
I'll look at it during the stage3/stage4 cycle, so we'll know then.

OK for the trunk.  As you note there's likely corresponding cases for
BIT-AND as the toplevel op.

jeff





Re: [PATCH 2/4] Fix GNU coding style.

2018-11-05 Thread Martin Sebor

On 11/02/2018 04:37 AM, marxin wrote:


gcc/ChangeLog:

2018-11-02  Martin Liska  

* mem-stats.h (mem_alloc_description::get_list): Fix GNU coding
style.
* vec.c: Likewise.


I have no preference here or even know what the style guide calls
for (nor have I been able to find it) but I've always assumed
the convention for declaring functions that return pointers (and
variables of pointer types) was

  T *func (...);

with a space after the type and before the name) as opposed to
either of:

  T* func (...);
or
  T * func (...);

The former also appears to be dominant style in GCC.

So I'm mostly just curious: is there a recommended or preferred
style or does it not matter?

Thanks
Martin


[PATCH][rs6000] use index form addresses more often for l[wh]brx/st[wh]brx

2018-11-05 Thread Aaron Sawdey
This does the same thing for bswap2 that I previously did for bswapdi2.
The predicates for bswap2_{load,store} are now 
indexed_or_indirect_operand,
and bswap2 uses rs6000_force_indexed_or_indirect_mem to make sure the
address is appropriate for that predicate.

Bootstrap/regtest passes on ppc64le power8/power9, ok for trunk?

Thanks!
Aaron



2018-11-05  Aaron Sawdey  

* config/rs6000/rs6000.md (bswap2): Force address into register
if not in indexed or indirect form.
(bswap2_load): Change predicate to indexed_or_indirect_operand.
(bswap2_store): Ditto.


Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 265753)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -2411,9 +2411,15 @@
 src = force_reg (mode, src);

   if (MEM_P (src))
-emit_insn (gen_bswap2_load (dest, src));
+{
+   src = rs6000_force_indexed_or_indirect_mem (src);
+   emit_insn (gen_bswap2_load (dest, src));
+}
   else if (MEM_P (dest))
-emit_insn (gen_bswap2_store (dest, src));
+{
+   dest = rs6000_force_indexed_or_indirect_mem (dest);
+   emit_insn (gen_bswap2_store (dest, src));
+}
   else
 emit_insn (gen_bswap2_reg (dest, src));
   DONE;
@@ -2421,13 +2427,13 @@

 (define_insn "bswap2_load"
   [(set (match_operand:HSI 0 "gpc_reg_operand" "=r")
-   (bswap:HSI (match_operand:HSI 1 "memory_operand" "Z")))]
+   (bswap:HSI (match_operand:HSI 1 "indexed_or_indirect_operand" "Z")))]
   ""
   "lbrx %0,%y1"
   [(set_attr "type" "load")])

 (define_insn "bswap2_store"
-  [(set (match_operand:HSI 0 "memory_operand" "=Z")
+  [(set (match_operand:HSI 0 "indexed_or_indirect_operand" "=Z")
(bswap:HSI (match_operand:HSI 1 "gpc_reg_operand" "r")))]
   ""
   "stbrx %1,%y0"




-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: Fix SPEC gcc micompile with LTO

2018-11-05 Thread Richard Biener
On November 5, 2018 5:11:09 PM GMT+01:00, Jan Hubicka  wrote:
>> Hi,
>> this is patch I ended up testing.  It ensures that canonical types of
>> copies I create are same as of originals C++ FE has its own refernece
>piece of mail got lost rendering the paragraph unreadable.  I wanted to
>say:
>
>This is patch I ended up testing.  It ensures that canonical types of
>copies I create are same as of originals.  It however ICEs building
>auto-profile.c because C++ FE has its own reference  type construction
>(cp_build_reference_type) and it creates additional pointer types with
>TYPE_REF_IS_RVALUE set and it has different TYPE_CANONICAL.

Hmm. I guess we need to fix that, otherwise alias will be broken (or you need 
to resort to a FE specific routine for pointer building). 

Richard. 

>> Obviously we do not see this in middle-end and we end up merging the
>> types despite fact they have different TYPE_CANONICAL.
>> I guess I can immitate the behaviour in fld_incomplete_type_of by 
>> implementing my own variant of build_pointer_type that also matches
>> TYPE_CANONICAL of the pointer it creates.  I wonder if there are
>better
>> solutions?
>> 
>> Honza
>> 
>> Index: tree.c
>> ===
>> --- tree.c   (revision 265807)
>> +++ tree.c   (working copy)
>> @@ -5118,6 +5118,7 @@ fld_type_variant (tree first, tree t, st
>>TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
>>TYPE_NAME (v) = TYPE_NAME (t);
>>TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
>> +  TYPE_CANONICAL (v) = TYPE_CANONICAL (t);
>>add_tree_to_fld_list (v, fld);
>>return v;
>>  }
>> @@ -5146,6 +5147,10 @@ fld_incomplete_type_of (tree t, struct f
>>else
>>  first = build_reference_type_for_mode (t2, TYPE_MODE (t),
>>  TYPE_REF_CAN_ALIAS_ALL (t));
>> +  gcc_assert (TYPE_CANONICAL (t2) != t2
>> +  && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t))
>> +  && TYPE_CANONICAL (first)
>> + == TYPE_CANONICAL (TYPE_MAIN_VARIANT (t)));
>>add_tree_to_fld_list (first, fld);
>>return fld_type_variant (first, t, fld);
>>  }
>> @@ -5169,6 +5174,7 @@ fld_incomplete_type_of (tree t, struct f
>>SET_TYPE_MODE (copy, VOIDmode);
>>SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
>>TYPE_SIZE_UNIT (copy) = NULL;
>> +  TYPE_CANONICAL (copy) = TYPE_CANONICAL (t);
>>if (AGGREGATE_TYPE_P (t))
>>  {
>>TYPE_FIELDS (copy) = NULL;



Re: [PATCH] Verify that last argument of __builtin_expect_with_probability is a real cst (PR c/87811).

2018-11-05 Thread Martin Sebor

On 11/01/2018 07:45 AM, Martin Liška wrote:

On 11/1/18 1:15 PM, Jakub Jelinek wrote:

On Thu, Nov 01, 2018 at 01:09:16PM +0100, Martin Liška wrote:

-range 0.0 to 1.0, inclusive.
+range 0.0 to 1.0, inclusive.  The @var{probability} argument must be
+a compiler time constant.


When you say must, I think error_at should be used rather than warning_at.
If others disagree I'm open for leaving it as is.


Error is fine for me as well.




@@ -2474,6 +2481,11 @@ expr_expected_value_1 (tree type, tree op0, enum 
tree_code code,
  *predictor = PRED_BUILTIN_EXPECT_WITH_PROBABILITY;
  *probability = probi;
}
+ else
+ warning_at (gimple_location (def), 0,
+ "probability argument %qE must be a in the "
+ "range 0.0 to 1.0", prob);


Wrong indentation.

And, no diagnostics for -O0 (which should also be covered by a testcase).


Test for that added.




+/* { dg-options "-O2 -fdump-tree-profile_estimate -frounding-math" } */


Why the -frounding-math options?


I remember I had some issue with:
  tree r = fold_build2_initializer_loc (UNKNOWN_LOCATION,
MULT_EXPR, t, prob, 
base);

on targets with a non-IEEE floating point arithmetics (s390?).

 I think test

coverage should handle both that and when that option is not used
if that option makes any difference.


It will eventually pop up if we install new tests w/o rounding math.



Jakub




Martin



I noticed a few minor issues in the hunks below:

--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12046,7 +12046,8 @@
 when testing pointer or floating-point values.

 This function has the same semantics as @code{__builtin_expect},
 but the caller provides the expected probability that @var{exp} == 
@var{c}.

 The last argument, @var{probability}, is a floating-point value in the
-range 0.0 to 1.0, inclusive.
+range 0.0 to 1.0, inclusive.  The @var{probability} argument must be
+a compiler time constant.

The term is "compile-time constant" but please see below.

--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -2467,6 +2467,13 @@
 expr_expected_value_1 (tree type, tree op0, enum tree_code code,
  base = build_real_from_int_cst (t, base);
  tree r = fold_build2_initializer_loc (UNKNOWN_LOCATION,
MULT_EXPR, t, prob, 
base);
+ if (TREE_CODE (r) != REAL_CST)
+   {
+ error_at (gimple_location (def),
+   "probability argument %qE must be a compile "
+   "time constant", prob);
+ return NULL;
}

According to GCC coding conventions, when used as an adjective
the term "compile-time" should be hyphenated.  But the term used
in other diagnostics is either "constant integer" or "constant
integer expressions" so I would suggest to use it instead, here
and in the manual.

@@ -2474,6 +2481,11 @@
 expr_expected_value_1 (tree type, tree op0, enum tree_code code,
  *predictor = PRED_BUILTIN_EXPECT_WITH_PROBABILITY;
  *probability = probi;
}
+ else
+   error_at (gimple_location (def),
+ "probability argument %qE must be a in the "
+ "range 0.0 to 1.0", prob);
+

There's a stray 'a' in the text of the error.

But it's not really meaningful to say

  3.14 must be in the range 0.0 to 1.0

because that simply cannot happen.  We could say "argument 2 must
be in the range" but I would instead suggest to rephrase the error
along the same lines as other similar messages GCC already issues:

  "probability %qE is outside the range [0.0, 1.0]"

Martin


Re: [PATCH libquadmath/PR68686]

2018-11-05 Thread Joseph Myers
On Sat, 3 Nov 2018, Jeff Law wrote:

> Note that Joseph's follow-up doesn't touch on the gamma problem AFAICT,
> but instead touches on the larger issues around trying to keep the
> quadmath implementations between glibc and gcc more in sync.

The second version of my patch 
 does address 
the gamma problem.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH libquadmath/PR68686]

2018-11-05 Thread Jakub Jelinek
On Fri, Nov 02, 2018 at 11:43:03PM +, Joseph Myers wrote:
> Here's an updated version of the patch that also updates most of the 
> previously omitted libquadmath/math/ files that are based on glibc sources 
> (not fmaq.c or rem_pio2q.c), including *gamma*.  It adds exp2q and 
> issignalingq as new public interfaces, given how they are used in the 
> current glibc versions of some of the functions already present in 
> libquadmath, but doesn't add any other new functions from glibc.

LGTM (with a suitable ChangeLog).

Jakub


Re: [PATCH] combine: Do not combine moves from hard registers

2018-11-05 Thread Renlin Li




On 11/05/2018 12:35 PM, Renlin Li wrote:

Hi Segher,

On 11/03/2018 02:34 AM, Jeff Law wrote:

On 11/2/18 5:54 PM, Segher Boessenkool wrote:

On Fri, Nov 02, 2018 at 06:03:20PM -0500, Segher Boessenkool wrote:

The original rtx is generated by expand_builtin_setjmp_receiver to adjust
the frame pointer.

And later in LRA, it will try to eliminate frame_pointer with hard frame
pointer which is
defined the ELIMINABLE_REGS.

Your change split the insn into two.
This makes it doesn't match the "from" and "to" regs defined in
ELIMINABLE_REGS.
The if statement to generate the adjustment insn is been skipt.
And the original instruction is just been deleted!

I don't follow why, or what should have prevented it from being deleted.


Probably, we don't want to split the move rtx if they are related to
entries defined in ELIMINABLE_REGS?

One thing I can easily do is not making an intermediate pseudo when copying
*to* a fixed reg, which sfp is.  Let me try if that helps the testcase I'm
looking at (setjmp-4.c).

This indeed helps, see patch below.  Could you try that on the whole
testsuite?

Thanks,


Segher


p.s. It still is a problem in the arm backend, but this won't hurt combine,
so why not.


 From 814ca23ce05384d017b3c2bff41ab61cf5446e46 Mon Sep 17 00:00:00 2001
Message-Id: 
<814ca23ce05384d017b3c2bff41ab61cf5446e46.1541202704.git.seg...@kernel.crashing.org>
From: Segher Boessenkool 
Date: Fri, 2 Nov 2018 23:33:32 +
Subject: [PATCH] combine: Don't break up copy from hard to fixed reg

---
  gcc/combine.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index dfb0b44..15e941a 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14998,6 +14998,8 @@ make_more_copies (void)
  continue;
    if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
  continue;
+  if (REG_P (dest) && TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest)))
+    continue;
    rtx new_reg = gen_reg_rtx (GET_MODE (dest));
    rtx_insn *new_insn = gen_move_insn (new_reg, src);
-- 1.8.3.1

It certainly helps the armeb test results.


Yes, I can also see it helps a lot with the regression test.
Thanks for working on it!


Beside the correctness issue, there are performance regression issues as other 
people also reported.

I analysised a case, which is gcc.c-torture/execute/builtins/memcpy-chk.c
In this case, two additional register moves and callee saves are emitted.

The problem is that, make_more_moves split a move into two. Ideally, the RA 
could figure out and
make the best register allocation. However, in reality, scheduler in some cases 
will reschedule
the instructions, and which changes the live-range of registers. And thus 
change the interference graph
of pseudo registers.

This will force the RA to choose a different register for it, and make the move 
instruction not redundant,
at least, not possible for RA to eliminate it.

For example,

set r102, r1

After combine:
insn x: set r103, r1
insn x+1: set r22, r103

After scheduler:
insn x: set r103, r1
...
...
...
insn x+1: set r102, r103

After IRA, r1 could be assigned to operands used in instructions in between 
insn x and x+1.
so r23 is conflicting with r1. LRA has to assign r23 a different hard register.


Sorry, this is not correct. Instructions scheduled between x and x+1 directly 
use hard register r1.
It is not IRA/LRA assigning r1 to the operands.


To reproduce this particular case, you could use:
cc1  -O3 -marm -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp 
gcc.c-torture/execute/builtins/memcpy-chk.c

This insn is been splitted.

(insn 152 150 154 11 (set (mem/c:QI (plus:SI (reg/f:SI 266)
(const_int 24 [0x18])) [0 MEM[(void *)&p + 20B]+4 S1 A32])
(reg:QI 1 r1)) "memcpy-chk-reduce.c":48:3 189 {*arm_movqi_insn}
 (expr_list:REG_DEAD (reg:QI 1 r1)
(nil)))


Regards,
Renlin



This cause one additional move, and probably one more callee save/restore.

Nothing is obviously wrong here. But...

One simple case probably not beneficial is to split hard register store.
According to your comment on make_more_moves, you might want to apply the 
transformation only
on hard-reg-to-pseudo-copy?

Regards,
Renlin






Jeff



Re: [PATCH, testsuite] ignore some "conflicting types for built-in" messages

2018-11-05 Thread Paul Koning



> On Nov 5, 2018, at 11:45 AM, Jeff Law  wrote:
> 
>>> ...
>> 
>> I can do that, but I'm wondering if some systems have different prototypes 
>> than the C standard calls for so I'd end up breaking those.I wouldn't worry 
>> about those.  I think the bigger question (thanks
> Martin) is whether or not any of those tests are checking for issues
> that arise specifically due to not having a full prototype available
> (and in those cases your fix is probably more appropriate).
> 
> Probably the only way to figure that out is to dig into the history of
> each one :(  Mighty unpleasant.
> 
> jeff

I took a quick look.  PR83655 is specifically about an issue due to a 
declaration with no prototype, but the others (55890, 71816, 83463, 83603, 
84244) are not so clear to me.  Still, what IS clear is that none of them are 
interested in messages that may or may not be generated as a result of these 
funny declarations.  In other words, pruning the messages still looks 
appropriate.

So where do I go from here?  Without the change I can deal with this by 
recognizing these cases as false failures when I do my test runs.

paul



Re: [PATCH libquadmath/PR68686]

2018-11-05 Thread Joseph Myers
On Sun, 4 Nov 2018, Ed Smith-Rowland wrote:

> I looked in glibc.  Unfortunately, I see how they have the same mistake:
> glibc/math/w_tgammal_compat.c:
>     long double
>     __tgammal(long double x)
>     {
>         int local_signgam;
>         long double y = __ieee754_gammal_r(x,&local_signgam);
>     ...
>     return local_signgam < 0 ? - y : y;
>     }
> I'm very sure this is where tgammaq came from.
> Ditto for glibc/math/w_tgamma_compat.c and glibc/math/w_tgammaf_compat.c.

No, that's not a mistake.  __ieee754_gammal_r returns +/- the gamma 
function and sets the integer pointed to by the second argument to 
indicate whether to negate the result.  (This isn't a particularly good 
interface design for tgamma, as opposed to lgamma; unfortunately 
__gammal_r_finite, with this interface, is a public ABI.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH libquadmath/PR68686]

2018-11-05 Thread Joseph Myers
On Sun, 4 Nov 2018, Ed Smith-Rowland wrote:

> I *do* think a couple tests should be added to test-signgam-*.c to test
> alternation of signs:

The main tests for results of libm functions are in auto-libm-test-in 
(from which auto-libm-test-out-* are generated by gen-auto-libm-tests.c) 
and libm-test-*.inc.  I believe these already cover signgam setting 
thoroughly.  test-signgam-*.c are specifically for ISO C namespace issues 
(an ISO C program should be able to define its own variable called signgam 
and not have lgamma affect it, but an XSI program must be able to use the 
signgam variable defined in libm and have it affected by lgamma).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PR83750: CSE erf/erfc pair

2018-11-05 Thread Joseph Myers
On Sun, 4 Nov 2018, Jeff Law wrote:

> Don't we have a flag specific to honoring nans?  Would that be better to
> use than flag_unsafe_math_optimizations?  As Uli mentioned, there's

That's only relevant for the comparison optimization, of course.

Converting erfc to 1-erf is dubious, since the whole point of erfc is for 
cases where 1-erf is inaccurate.  (Conversion in the other direction also 
needs flag_unsafe_math_optimizations.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PR83750: CSE erf/erfc pair

2018-11-05 Thread Jeff Law
On 11/5/18 11:27 AM, Joseph Myers wrote:
> On Sun, 4 Nov 2018, Jeff Law wrote:
> 
>> Don't we have a flag specific to honoring nans?  Would that be better to
>> use than flag_unsafe_math_optimizations?  As Uli mentioned, there's
> 
> That's only relevant for the comparison optimization, of course.
> 
> Converting erfc to 1-erf is dubious, since the whole point of erfc is for 
> cases where 1-erf is inaccurate.  (Conversion in the other direction also 
> needs flag_unsafe_math_optimizations.)
> 
Understood.  Thanks for clarifying.  It seems like
unsafe-math-optimization is a better fit than the nan specific flag.

jeff


Re: [PATCH] Come up with htab_hash_string_vptr and use string-specific if possible.

2018-11-05 Thread Michael Matz
Hi,

On Fri, 2 Nov 2018, Martin Liška wrote:

> V2 of the patch.
> 
> Thoughts?

Whereever the new function belongs it certainly isn't system.h.  Also the 
definition in a header seems excessive.  Sure, it enables inlining of it, 
but that seems premature optimization.  It contains a loop, and inlining 
anything with loops that aren't very likely to loop just once or never 
just blows code for no gain.  Also as the function is leaf there won't be 
any second-order effect from inlining.


Ciao,
Michael.

Re: PR83750: CSE erf/erfc pair

2018-11-05 Thread Michael Matz
Hi,

On Mon, 5 Nov 2018, Jeff Law wrote:

> >> Don't we have a flag specific to honoring nans?  Would that be better 
> >> to use than flag_unsafe_math_optimizations?  As Uli mentioned, 
> >> there's
> > 
> > That's only relevant for the comparison optimization, of course.
> > 
> > Converting erfc to 1-erf is dubious, since the whole point of erfc is 
> > for cases where 1-erf is inaccurate.  (Conversion in the other 
> > direction also needs flag_unsafe_math_optimizations.)
> > 
> Understood.  Thanks for clarifying.  It seems like 
> unsafe-math-optimization is a better fit than the nan specific flag.

But still we should consider general usefullness, even with unsafe-math.  
In this case we would remove a usage of a slow function that the user 
specifically used to deal with inaccuracies with an equally slow function 
(plus a little arithmetic that is shadows by the functions slowness) that 
now exacly produces the inaccuracies the user wanted to avoid.  I.e. the 
speed gain is zero.  The "canonicalization gain" referred to in the PR 
might be real, but it comes at the cost of introducing definite 
catastrophic cancellation.

IMHO that's not a sensible transformation to do, under any flags.


Ciao,
Michael.


[PATCH] gcc: xtensa: don't force PIC for uclinux target

2018-11-05 Thread Max Filippov
xtensa-uclinux uses bFLT executable file format that cannot relocate
fields representing offsets from data to code. C++ objects built as PIC
use offsets to encode FDE structures. As a result C++ exception handling
doesn't work correctly on xtensa-uclinux. Don't use PIC by default on
xtensa-uclinux.

gcc/
2018-11-04  Max Filippov  

* config/xtensa/uclinux.h (XTENSA_ALWAYS_PIC): Change to 0.
---
 gcc/config/xtensa/uclinux.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/xtensa/uclinux.h b/gcc/config/xtensa/uclinux.h
index ba26187c8f7a..1cb334919c7c 100644
--- a/gcc/config/xtensa/uclinux.h
+++ b/gcc/config/xtensa/uclinux.h
@@ -60,7 +60,7 @@ along with GCC; see the file COPYING3.  If not see
 #define LOCAL_LABEL_PREFIX "."
 
 /* Always enable "-fpic" for Xtensa Linux.  */
-#define XTENSA_ALWAYS_PIC 1
+#define XTENSA_ALWAYS_PIC 0
 
 #undef TARGET_LIBC_HAS_FUNCTION
 #define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function
-- 
2.11.0



[PATCH v2] gcc: xtensa: don't force PIC for uclinux target

2018-11-05 Thread Max Filippov
xtensa-uclinux uses bFLT executable file format that cannot relocate
fields representing offsets from data to code. C++ objects built as PIC
use offsets to encode FDE structures. As a result C++ exception handling
doesn't work correctly on xtensa-uclinux. Don't use PIC by default on
xtensa-uclinux.

gcc/
2018-11-04  Max Filippov  

* config/xtensa/uclinux.h (XTENSA_ALWAYS_PIC): Change to 0.
---
Changes v1->v2:
- fix up comment for the XTENSA_ALWAYS_PIC macro

 gcc/config/xtensa/uclinux.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/xtensa/uclinux.h b/gcc/config/xtensa/uclinux.h
index ba26187c8f7a..c7743df9d97c 100644
--- a/gcc/config/xtensa/uclinux.h
+++ b/gcc/config/xtensa/uclinux.h
@@ -59,8 +59,8 @@ along with GCC; see the file COPYING3.  If not see
 #undef LOCAL_LABEL_PREFIX
 #define LOCAL_LABEL_PREFIX "."
 
-/* Always enable "-fpic" for Xtensa Linux.  */
-#define XTENSA_ALWAYS_PIC 1
+/* Don't enable "-fpic" for Xtensa uclinux.  */
+#define XTENSA_ALWAYS_PIC 0
 
 #undef TARGET_LIBC_HAS_FUNCTION
 #define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function
-- 
2.11.0



Re: PR83750: CSE erf/erfc pair

2018-11-05 Thread Paul Koning



> On Nov 5, 2018, at 1:48 PM, Michael Matz  wrote:
> 
> Hi,
> 
> On Mon, 5 Nov 2018, Jeff Law wrote:
> 
 Don't we have a flag specific to honoring nans?  Would that be better 
 to use than flag_unsafe_math_optimizations?  As Uli mentioned, 
 there's
>>> 
>>> That's only relevant for the comparison optimization, of course.
>>> 
>>> Converting erfc to 1-erf is dubious, since the whole point of erfc is 
>>> for cases where 1-erf is inaccurate.  (Conversion in the other 
>>> direction also needs flag_unsafe_math_optimizations.)
>>> 
>> Understood.  Thanks for clarifying.  It seems like 
>> unsafe-math-optimization is a better fit than the nan specific flag.
> 
> But still we should consider general usefullness, even with unsafe-math.  
> In this case we would remove a usage of a slow function that the user 
> specifically used to deal with inaccuracies with an equally slow function 
> (plus a little arithmetic that is shadows by the functions slowness) that 
> now exacly produces the inaccuracies the user wanted to avoid.  I.e. the 
> speed gain is zero.  The "canonicalization gain" referred to in the PR 
> might be real, but it comes at the cost of introducing definite 
> catastrophic cancellation.
> 
> IMHO that's not a sensible transformation to do, under any flags.

That seems right.  The same goes for log vs. logp1, and exp vs. expm1.

paul



Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-05 Thread Jeff Law
On 11/1/18 4:07 PM, Peter Bergner wrote:
> On 11/1/18 1:50 PM, Renlin Li wrote:
>> Is there any update on this issues?
>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.
> 
> From the analysis I've done, my commit is just exposing latent issues
> in LRA.  Can you try the patch I submitted here to see if it helps?
> 
>   https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
> 
> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
> Jeff threw it on his testers and said he saw an arm issue and was
> trying to come up with a test case for me to debug.
So I don't think the ARM issues are related to your patch, they may have
been related the combiner changes that went in around the same time.

At this point your patch appears to be DTRT across the board.  The only
fallout is the bogus s390 asm it caught in the kernel.

Jeff


Re: [PATCH libquadmath/PR68686]

2018-11-05 Thread Ed Smith-Rowland

On 11/5/18 1:19 PM, Joseph Myers wrote:

On Sun, 4 Nov 2018, Ed Smith-Rowland wrote:


I looked in glibc.  Unfortunately, I see how they have the same mistake:
glibc/math/w_tgammal_compat.c:
     long double
     __tgammal(long double x)
     {
         int local_signgam;
         long double y = __ieee754_gammal_r(x,&local_signgam);
     ...
     return local_signgam < 0 ? - y : y;
     }
I'm very sure this is where tgammaq came from.
Ditto for glibc/math/w_tgamma_compat.c and glibc/math/w_tgammaf_compat.c.

No, that's not a mistake.  __ieee754_gammal_r returns +/- the gamma
function and sets the integer pointed to by the second argument to
indicate whether to negate the result.  (This isn't a particularly good
interface design for tgamma, as opposed to lgamma; unfortunately
__gammal_r_finite, with this interface, is a public ABI.)

Excellent, I missed the replacement of  expq (lgammaq (x)) with the 
re-entrant lgamma with the good sign.

Thank you.  I'll let someone else check this off ;-)
FWIW, I'd like to see C++ at least return narrow types like
template
  struct lgamma_t
  {
    _Tp log_abs_gamma; // O something less verbose.
    int sign; // Maybe try for same size as _Tp.
  };
C could do something like also. That's a discussion for another forum.



Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-05 Thread Peter Bergner
On 11/5/18 1:20 PM, Jeff Law wrote:
> On 11/1/18 4:07 PM, Peter Bergner wrote:
>> On 11/1/18 1:50 PM, Renlin Li wrote:
>>> Is there any update on this issues?
>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.
>>
>> From the analysis I've done, my commit is just exposing latent issues
>> in LRA.  Can you try the patch I submitted here to see if it helps?
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
>>
>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
>> Jeff threw it on his testers and said he saw an arm issue and was
>> trying to come up with a test case for me to debug.
> So I don't think the ARM issues are related to your patch, they may have
> been related the combiner changes that went in around the same time.
> 
> At this point your patch appears to be DTRT across the board.  The only
> fallout is the bogus s390 asm it caught in the kernel.

Cool.  I will note that I contacted the s390 kernel guys and gave them a
fix to their broken constraints in that asm and they are going to fix it.

Is the above an approval to commit the patch mentioned above or do you
still want to wait until the ARM issues are fully resolved?

Peter



Re: [PATCH, testsuite] add "inf" target attribute

2018-11-05 Thread Paul Koning



> On Nov 4, 2018, at 2:33 PM, Jeff Law  wrote:
> 
> On 11/1/18 1:30 PM, Paul Koning wrote:
>> A number of test cases fail on pdp11 because they use the "inf" float value 
>> which does not exist on that target (nor on VAX).  Rainer Orth and Joseph 
>> Myers suggested adding a new effective-target keyword to check for this, and 
>> require it for tests that have that dependency.
>> 
>> The attached patch implements this.  Ok for trunk?
>> 
>>  paul
>> 
>> ChangeLog:
>> 
>> 2018-11-01  Paul Koning  
>> 
>>  * doc/sourcebuild.texi (target attributes): Document new "inf"
>>  effective target keyword.
> OK with me.
> 
> jeff

Thanks.  Committed, with the doc change clarified to address Joseph's comment.

paul

Index: doc/sourcebuild.texi
===
--- doc/sourcebuild.texi(revision 265814)
+++ doc/sourcebuild.texi(revision 265815)
@@ -1393,8 +1393,11 @@ for any options added with @code{dg-add-options}.
 Target has runtime support for any options added with
 @code{dg-add-options} for any @code{_Float@var{n}} or
 @code{_Float@var{n}x} type.
+
+@item inf
+Target supports floating point infinite (@code{inf}) for type
+@code{double}.
 @end table
-
 @subsubsection Fortran-specific attributes
 
 @table @code



[PATCH 0/9]: C++ P0482R5 char8_t implementation

2018-11-05 Thread Tom Honermann
This series of patches provides an implementation of the core language 
and library changes for C++ proposal P0482R5 [1].  These changes are 
believed to be complete with the exception of the proposed mbrtoc8() and 
c8rtomb() functions (the expectation is that the C library will provide 
mbrtoc8() and c8rtomb(); future patches will address that support and 
integration).


These changes do not impact default gcc behavior.  A new -fchar8_t 
option is provided to enable the P0482R5 changes, and -fno-char8_t is 
provided to explicitly disable them.


Patch 1: Documentation updates
Patch 2: Core language support
Patch 3: New core language tests
Patch 4: Updates to existing core language tests
Patch 5: Standard library support
Patch 6: A small correction to a common testsuite header file
Patch 7: New standard library tests
Patch 8: Updates to existing standard library tests
Patch 9: Updates to gdb pretty printing support

Tom.

[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r5.html


[PATCH 1/9]: C++ P0482R5 char8_t: Documentation updates

2018-11-05 Thread Tom Honermann

This patch adds documentation for new -fchar8_t and -fno-char8_t options.

gcc/ChangeLog:

2018-11-04  Tom Honermann  
 * doc/invoke.texi (-fchar8_t): Document new option.

Tom.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 57491f1033c..cd3a2a715db 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -206,7 +206,7 @@ in the following sections.
 @item C++ Language Options
 @xref{C++ Dialect Options,,Options Controlling C++ Dialect}.
 @gccoptlist{-fabi-version=@var{n}  -fno-access-control @gol
--faligned-new=@var{n}  -fargs-in-order=@var{n}  -fcheck-new @gol
+-faligned-new=@var{n}  -fargs-in-order=@var{n}  -fchar8_t -fcheck-new @gol
 -fconstexpr-depth=@var{n}  -fconstexpr-loop-limit=@var{n} @gol
 -fno-elide-constructors @gol
 -fno-enforce-eh-specs @gol
@@ -2432,6 +2432,53 @@ but few users will need to override the default of
 
 This flag is enabled by default for @option{-std=c++17}.
 
+@item -fchar8_t
+@itemx -fno-char8_t
+@opindex fchar8_t
+@opindex fno-char8_t
+Enable support for the P0482 proposal including the addition of a
+new @code{char8_t} fundamental type, changes to the types of UTF-8
+string and character literals, new signatures for user defined
+literals, and new specializations of standard library class templates
+@code{std::numeric_limits}, @code{std::char_traits},
+and @code{std::hash}.
+
+This option enables functions to be overloaded for ordinary and UTF-8
+strings:
+
+@smallexample
+int f(const char *);// #1
+int f(const char8_t *); // #2
+int v1 = f("text"); // Calls #1
+int v2 = f(u8"text");   // Calls #2
+@end smallexample
+
+and introduces new signatures for user defined literals:
+
+@smallexample
+int operator""_udl1(char8_t);
+int v3 = u8'x'_udl1;
+int operator""_udl2(const char8_t*, std::size_t);
+int v4 = u8"text"_udl2;
+template int operator""_udl3();
+int v5 = u8"text"_udl3;
+@end smallexample
+
+The change to the types of UTF-8 string and character literals introduces
+incompatibilities with ISO C++11 and later standards.  For example, the
+following code is well-formed under ISO C++11, but is ill-formed when
+@option{-fchar8_t} is specified.
+
+@smallexample
+char ca[] = u8"text";   // error: char-array initialized from wide string
+const char *cp = u8"text";  // error: invalid conversion from 'const char8_t*' to 'const char*'
+int f(const char*);
+auto v = f(u8"text");   // error: invalid conversion from 'const char8_t*' to 'const char*'
+std::string s1@{u8"text"@};   // error: no matching function for call to 'std::basic_string::basic_string()'
+using namespace std::literals;
+std::string s2 = u8"text"s; // error: conversion from 'basic_string' to non-scalar type 'basic_string' requested
+@end smallexample
+
 @item -fcheck-new
 @opindex fcheck-new
 Check that the pointer returned by @code{operator new} is non-null



[PATCH 2/9]: C++ P0482R5 char8_t: Core language support

2018-11-05 Thread Tom Honermann
This patch adds support for the P0482R5 core language changes.  This 
includes:

- The -fchar8_t and -fno_char8_t command line options.
- char8_t as a keyword.
- The char8_t builtin type as a non-aliasing unsigned integral
  character type of size 1.
- Use of char8_t as a simple type specifier.
- u8 character literals with type char8_t.
- u8 string literals with type array of const char8_t.
- User defined literal operators that accept char8_1 and char8_t pointer
  types.
- New __cpp_char8_t predefined feature test macro.
- New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
  macros .
- Name mangling and demangling for char8_t (using Du).

gcc/ChangeLog:

2018-11-04  Tom Honermann  

 * defaults.h: Define CHAR8_TYPE.

gcc/c-family/ChangeLog:

2018-11-04  Tom Honermann  
 * c-family/c-common.c (c_common_reswords): Add char8_t.
 (fix_string_type): Use char8_t for the type of u8 string literals.
 (c_common_get_alias_set): char8_t doesn't alias.
 (c_common_nodes_and_builtins): Define char8_t as a builtin type in
 C++.
 (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
 (keyword_begins_type_specifier): Add RID_CHAR8.
 * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
 (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
 Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
 Define char8_type_node and char8_array_type_node.
 * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
 __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
 (c_cpp_builtins): Predefine __cpp_char8_t.
 * c-family/c-lex.c (lex_string): Use char8_array_type_node as the
 type of CPP_UTF8STRING.
 (lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
 * c-family/c.opt: Add the -fchar8_t command line option.

gcc/c/ChangeLog:

2018-11-04  Tom Honermann  

 * c/c-typeck.c (char_type_p): Add char8_type_node.
 (digest_init): Handle initialization by a u8 string literal of
 char8_t type.

gcc/cp/ChangeLog:

2018-11-04  Tom Honermann  

 * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
 * cp/decl.c (grokdeclarator): Handle invalid type specifier
 combinations involving char8_t.
 * cp/lex.c (init_reswords): Add char8_t as a reserved word.
 * cp/mangle.c (write_builtin_type): Add name mangling for char8_t
 (Du).
 * cp/parser.c (cp_keyword_starts_decl_specifier_p,
 cp_parser_simple_type_specifier): Recognize char8_t as a simple
 type specifier.
 (cp_parser_string_literal): Use char8_array_type_node for the type
 of CPP_UTF8STRING.
 (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in system
 headers.
 * cp/rtti.c (emit_support_tinfos): type_info support for char8_t.
 * cp/tree.c (char_type_p): Recognize char8_t as a character type.
 * cp/typeck.c (string_conv_p): Handle conversions of u8 string
 literals of char8_t type.
 (check_literal_operator_args): Handle UDLs with u8 string literals
 of char8_t type.
 * cp/typeck2.c (digest_init_r): Disallow initializing a char array
 with a u8 string literal.

libiberty/ChangeLog:

2018-10-31  Tom Honermann  
 * cp-demangle.c (cplus_demangle_builtin_types,
 cplus_demangle_type): Add name demangling for char8_t (Du).
 * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
 new char8_t type.

Tom.
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index f10cf89c3a7..c7d88eb9a22 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -79,6 +79,7 @@ machine_mode c_default_pointer_mode = VOIDmode;
 	tree signed_char_type_node;
 	tree wchar_type_node;
 
+	tree char8_type_node;
 	tree char16_type_node;
 	tree char32_type_node;
 
@@ -128,6 +129,11 @@ machine_mode c_default_pointer_mode = VOIDmode;
 
 	tree wchar_array_type_node;
 
+   Type `char8_t[SOMENUMBER]' or something like it.
+   Used when a UTF-8 string literal is created.
+
+	tree char8_array_type_node;
+
Type `char16_t[SOMENUMBER]' or something like it.
Used when a UTF-16 string literal is created.
 
@@ -450,6 +456,7 @@ const struct c_common_resword c_common_reswords[] =
   { "case",		RID_CASE,	0 },
   { "catch",		RID_CATCH,	D_CXX_OBJC | D_CXXWARN },
   { "char",		RID_CHAR,	0 },
+  { "char8_t",		RID_CHAR8,	D_CXX_CHAR8_T_FLAGS | D_CXXWARN },
   { "char16_t",		RID_CHAR16,	D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "char32_t",		RID_CHAR32,	D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "class",		RID_CLASS,	D_CXX_OBJC | D_CXXWARN },
@@ -746,6 +753,11 @@ fix_string_type (tree value)
   nchars = length;
   e_type = char_type_node;
 }
+  else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
+{
+  nchars = length / (TYPE_PRECISION (char8_type_node) / BITS_PER_UNIT);
+  e_type = char8_type_node;
+}
   else if (TREE_TYPE (value) == char16_array_type_node)
 {
   nchars = length / (TYPE_PRECISION (char16_type_node) / BITS_PER_UNIT);
@@ -813,7 +825,8 @@ fix_string_type (tree value)
C

[PATCH 3/9]: C++ P0482R5 char8_t: New core language tests

2018-11-05 Thread Tom Honermann
This patch adds new tests to exercise new behavior for when support for 
char8_t is enabled as well as protect against unintended behavioral 
impact when support for char8_t is not enabled.  In some cases, existing 
tests suffice to exercise existing behavior and such tests have been 
cloned to validate behavior when char8_t is enabled.  In other cases, 
tests are added to validate behavior both when char8_t support is and is 
not enabled.


gcc/testsuite/ChangeLog:

2018-11-04  Tom Honermann  
 * g++.dg/cpp0x/udlit-implicit-conv-neg-char8_t.C: New test cloned
 from udlit-implicit-conv-neg.C.  Validates handling of ill-formed
 uses of char8_t based user defined literals.
 * g++.dg/cpp0x/udlit-resolve-char8_t.C: New test cloned from
 udlit-resolve.C.  Validates handling of well-formed uses of char8_t
 based user defined literals.
 * g++.dg/ext/char8_t-aliasing-1.C: New test; validates warnings
 for type punning with char8_t types.  Illustrates that char8_t does
 not alias.
 * g++.dg/ext/char8_t-char-literal-1.C: New test; validates u8
 character literals have type char if char8_t support is not
 enabled.
 * g++.dg/ext/char8_t-char-literal-2.C: New test; validates u8
 character literals have type char8_t if char8_t support is
 enabled.
 * g++.dg/ext/char8_t-deduction-1.C: New test; validates char is
 deduced for u8 character and string literals if char8_t support is
 not enabled.
 * g++.dg/ext/char8_t-deduction-2.C: New test; validates char8_t is
 deduced for u8 character and string literals if char8_t support is
 enabled.
 * g++.dg/ext/char8_t-feature-test-macro-1.C: New test; validates
 that the __cpp_char8_t feature test macro is not defined if char8_t
 support is not enabled.
 * g++.dg/ext/char8_t-feature-test-macro-2.C: New test; validates
 that the __cpp_char8_t feature test macro is defined with the
 correct value if char8_t support is enabled.
 * g++.dg/ext/char8_t-init-1.C: New test; validates initialization
 by u8 character and string literals when support for char8_t is not
 enabled.
 * g++.dg/ext/char8_t-init-2.C: New test; validates initialization
 by u8 character and string literals when support for char8_t is
 enabled.
 * g++.dg/ext/char8_t-keyword-1.C: New test; validates that char8_t
 is not a keyword if support for char8_t is not enabled.
 * g++.dg/ext/char8_t-keyword-2.C: New test; validates that char8_t
 is a keyword if support for char8_t is enabled.
 * g++.dg/ext/char8_t-limits-1.C: New test; validates that char8_t
 is unsigned and sufficiently large to store the required range of
 char8_t values.
 * g++.dg/ext/char8_t-overload-1.C: New test; validates overload
 resolution for u8 character and string literal arguments when
 support for char8_t is not enabled.
 * g++.dg/ext/char8_t-overload-2.C: New test; validates overload
 resolution for u8 character and string literal arguments when
 support for char8_t is enabled.
 * g++.dg/ext/char8_t-predefined-macros-1.C: New test; validates
 that the __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE
 predefined macros are not defined when support for char8_t is not
 enabled.
 * g++.dg/ext/char8_t-predefined-macros-2.C: New test; validates
 that the __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE
 predefined macros are defined when support for char8_t is enabled.
 * g++.dg/ext/char8_t-sizeof-1.C: New test; validates that the size
 of char8_t and u8 character literals is 1.
 * g++.dg/ext/char8_t-specialization-1.C: New test; validate
 template specialization for u8 character literal template
 arguments when support for char8_t is not enabled.
 * g++.dg/ext/char8_t-specialization-2.C: New test; validate
 template specialization for char8_t and u8 character literal
 template arguments when support for char8_t is enabled.
 * g++.dg/ext/char8_t-string-literal-1.C: New test; validate the
 type of u8 string literals when support for char8_t is not enabled.
 * g++.dg/ext/char8_t-string-literal-2.C: New test; validate the
 type of u8 string literals when support for char8_t is enabled.
 * g++.dg/ext/char8_t-type-specifier-1.C: New test; validate that
 char8_t is not recognized as a type specifier when support for
 char8_t is not enabled.
 * g++.dg/ext/char8_t-type-specifier-2.C: New test; validate that
 char8_t is recognized as a type specifier when support for char8_t
 is enabled.
 * g++.dg/ext/char8_t-typedef-1.C: New test; validate declarations
 of char8_t as a typedef are accepted when support for char8_t is
 not enabled.
 * g++.dg/ext/char8_t-typedef-2.C: New test; validate declarations
 of char8_t as a typedef are not accepted when support for char8_t
 is enabled.
 * g++.dg/ext/char8_t-udl-1.C: New test; validates overloading for
 u8 char

[PATCH 6/9]: C++ P0482R5 char8_t: A small correction to a common testsuite header file

2018-11-05 Thread Tom Honermann
This patch corrects ambiguous partial specializations of 
typelist::detail::append_.  Previously, neither append_, 
Typelist_Chain> nor append_ was a better 
match for append_, null_type>.


libstdc++-v3/ChangeLog:

2018-11-04  Tom Honermann  

 * include/ext/typelist.h: Constrained a partial specialization of
   typelist::detail::append_ to only match chain.

Tom.
diff --git a/libstdc++-v3/include/ext/typelist.h b/libstdc++-v3/include/ext/typelist.h
index b21f01ffb43..2cdbc3efafa 100644
--- a/libstdc++-v3/include/ext/typelist.h
+++ b/libstdc++-v3/include/ext/typelist.h
@@ -215,10 +215,10 @@ namespace detail
   typedef Typelist_Chain 			  		type;
 };
 
-  template
-struct append_
+  template
+struct append_, null_type>
 {
-  typedef Typelist_Chain 	type;
+  typedef chain  	type;
 };
 
   template<>


[PATCH 5/9]: C++ P0482R5 char8_t: Standard library support

2018-11-05 Thread Tom Honermann
This patch adds support to libstdc++ for the P0482R5 standard library 
changes.  This includes:

- New char8_t based specializations:
  - std::numeric_limits
  - std::char_traits
  - std::hash
  - std::hash
  - std::hash
  - std::codecvt
  - std::codecvt
  - std::codecvt_byname
  - std::codecvt_byname
- New char8_t overloads:
  - u8string operator "" s(const char8_t* str, size_t len);
  - u8string_view operator""sv(const char8_t* str, size_t len);
- New type aliases:
  - std::u8string
  - std::u8string_view
  - std::atomic_char8_t
- Changed function signatures:
  - filesystem::path::u8string() returns u8string.
  - filesystem::path::generic_u8string() returns u8string.
- typeinfo for char8_t.
- New macros:
  - __cpp_lib_char8_t
  - ATOMIC_CHAR8_T_LOCK_FREE

For types and templates that existed in an experimental form prior to 
standardization, both the experimental and standardized variants have 
been updated.  The updates to the experimental versions are optional.


I'm not very familiar with how ABI versioning is done and I'm not 
confident that the changes in the .ver files are correct.  In 
particular, I'm unsure as to whether a CXXABI_3.0 section may be needed 
in gnu-versioned-namespace.ver and whether I'm correct in adding a new 
CXXABI_1.3.12 section in gnu.ver.  If I'm not mistaken, CXXABI has not 
already been bumped for gcc 9, so needs to be, but GLIBCXX has already 
been bumped and therefore does not need to be.


gcc/cp/ChangeLog:

2018-11-04  Tom Honermann  

 * name-lookup.c (get_std_name_hint): Added u8string as a name hint.

libstdc++-v3/ChangeLog:

2018-11-04  Tom Honermann  

 * config/abi/pre/gnu-versioned-namespace.ver (CXXABI_2.0): Add
 typeinfo symbols for char8_t.
 * config/abi/pre/gnu.ver: Add CXXABI_1.3.12.
 (GLIBCXX_3.4.26): Add symbols for specializations of
 numeric_limits and codecvt that involve char8_t.
 (CXXABI_1.3.12): Add typeinfo symbols for char8_t.
 * include/bits/atomic_base.h: Add atomic_char8_t.
 * include/bits/basic_string.h: Add std::hash and
 operator""s(const char8_t*, size_t).
 * include/bits/c++config: Define _GLIBCXX_USE_CHAR8_T and
 __cpp_lib_char8_t.
 * include/bits/char_traits.h: Add char_traits.
 * include/bits/codecvt.h: Add
 codecvt,
 codecvt,
 codecvt_byname, and
 codecvt_byname.
 * include/bits/cpp_type_traits.h: Add __is_integer to
 recognize char8_t as an integral type.
 * include/bits/fs_path.h: (path::__is_encoded_char): Recognize
 char8_t.
 (path::u8string): Return std::u8string when char8_t support is
 enabled.
 (path::generic_u8string): Likewise.
 (path::_S_convert): Handle conversion from char8_t input.
 (path::_S_str_convert): Likewise.
 * include/bits/functional_hash.h: Add hash.
 * include/bits/locale_conv.h (__str_codecvt_out): Add overloads for
 char8_t.
 * include/bits/locale_facets.h (_GLIBCXX_NUM_UNICODE_FACETS): Bump
 for new char8_t specializations.
 * include/bits/localefwd.h: Add missing declarations of
 codecvt and
 codecvt.  Add char8_t declarations
 codecvt and
 codecvt.
 * include/bits/postypes.h: Add u8streampos
 * include/bits/stringfwd.h: Add declarations of
 char_traits and u8string.
 * include/c_global/cstddef: Add __byte_operand.
 * include/experimental/bits/fs_path.h (path::__is_encoded_char):
 Recognize char8_t.
 (path::u8string): Return std::u8string when char8_t support is
 enabled.
 (path::generic_u8string): Likewise.
 (path::_S_convert): Handle conversion from char8_t input.
 (path::_S_str_convert): Likewise.
 * include/experimental/string: Add u8string.
 * include/experimental/string_view: Add u8string_view,
 hash, and
 operator""sv(const char8_t*, size_t).
 * include/std/atomic: Add atomic and atomic_char8_t.
 * include/std/charconv (__is_int_to_chars_type): Recognize char8_t
 as a character type.
 * include/std/limits: Add numeric_limits.
 * include/std/string_view: Add u8string_view,
 hash, and
 operator""sv(const char8_t*, size_t).
 * include/std/type_traits: Add __is_integral_helper,
 __make_unsigned, and __make_signed.
 * libsupc++/atomic_lockfree_defines.h: Define
 ATOMIC_CHAR8_T_LOCK_FREE.
 * src/c++11/Makefile.am: Compile with -fchar8_t when compiling
 codecvt.cc and limits.cc so that char8_t specializations of
 numeric_limits and codecvt and emitted.
 * src/c++11/Makefile.in: Likewise.
 * src/c++11/codecvt.cc: Define members of
 codecvt,
 codecvt,
 codecvt_byname, and
 codecvt_byname.
 * src/c++11/limits.cc: Define members of
 numeric_limits.
 * src/c++98/Makefile.am: Compile with -fchar8_t when compiling
 locale_init.cc and localename.cc.
 * src/c++98/Makefile.in: Likewise.
 * src/c++98/locale_init.cc: Add initialization for the
 codecvt and
 codecvt facets.
 * src/c++98/localename.cc: 

[PATCH 4/9]: C++ P0482R5 char8_t: Updates to existing core language tests

2018-11-05 Thread Tom Honermann
This patch updates existing testing gaps related to support for u8 
character and string literals.  None of these changes exercise new 
char8_t functionality; they are intended to guard against regressions in 
behavior of u8 literals when support for char8_t is not enabled.


gcc/testsuite/ChangeLog:

2018-11-04  Tom Honermann  

 * c-c++-common/raw-string-13.c: Added test cases for u8 raw string
 literals.
 * c-c++-common/raw-string-15.c: Likewise.
 * g++.dg/cpp0x/constexpr-wstring2.C: Added test cases for u8
 literals.
 * g++.dg/ext/utf-array-short-wchar.C: Likewise.
 * g++.dg/ext/utf-array.C: Likewise.
 * g++.dg/ext/utf-cxx98.C: Likewise.
 * g++.dg/ext/utf-dflt.C: Likewise.
 * g++.dg/ext/utf-gnuxx98.C: Likewise.
 * gcc.dg/utf-array-short-wchar.c: Likewise.
 * gcc.dg/utf-array.c: Likewise.

Tom.
diff --git a/gcc/testsuite/c-c++-common/raw-string-13.c b/gcc/testsuite/c-c++-common/raw-string-13.c
index 1b37405cee9..fa11edaa7aa 100644
--- a/gcc/testsuite/c-c++-common/raw-string-13.c
+++ b/gcc/testsuite/c-c++-common/raw-string-13.c
@@ -62,6 +62,47 @@ const char s16[] = R"??(??)??";
 const char s17[] = R"?(?)??)?";
 const char s18[] = R"??(??)??)??)??";
 
+const char u800[] = u8R"??=??()??'??!??-\
+(a)#[{}]^|~";
+)??=??";
+const char u801[] = u8R"a(
+)\
+a"
+)a";
+const char u802[] = u8R"a(
+)a\
+"
+)a";
+const char u803[] = u8R"ab(
+)a\
+b"
+)ab";
+const char u804[] = u8R"a??/(x)a??/";
+const char u805[] = u8R"abcdefghijklmn??(abc)abcdefghijklmn??";
+const char u806[] = u8R"abcdefghijklm??/(abc)abcdefghijklm??/";
+const char u807[] = u8R"abc(??)\
+abc";)abc";
+const char u808[] = u8R"def(de)\
+def";)def";
+const char u809[] = u8R"a(??)\
+a"
+)a";
+const char u810[] = u8R"a(??)a\
+"
+)a";
+const char u811[] = u8R"ab(??)a\
+b"
+)ab";
+const char u812[] = u8R"a#(a#)a??=)a#";
+const char u813[] = u8R"a#(??)a??=??)a#";
+const char u814[] = u8R"??/(x)??/
+";)??/";
+const char u815[] = u8R"??/(??)??/
+";)??/";
+const char u816[] = u8R"??(??)??";
+const char u817[] = u8R"?(?)??)?";
+const char u818[] = u8R"??(??)??)??)??";
+
 const char16_t u00[] = uR"??=??()??'??!??-\
 (a)#[{}]^|~";
 )??=??";
@@ -211,6 +252,25 @@ main (void)
   TEST (s16, "??");
   TEST (s17, "?)??");
   TEST (s18, "??"")??"")??");
+  TEST (u800, u8"??""??"")??""'??""!??""-\\\n(a)#[{}]^|~\";\n");
+  TEST (u801, u8"\n)\\\na\"\n");
+  TEST (u802, u8"\n)a\\\n\"\n");
+  TEST (u803, u8"\n)a\\\nb\"\n");
+  TEST (u804, u8"x");
+  TEST (u805, u8"abc");
+  TEST (u806, u8"abc");
+  TEST (u807, u8"??"")\\\nabc\";");
+  TEST (u808, u8"de)\\\ndef\";");
+  TEST (u809, u8"??"")\\\na\"\n");
+  TEST (u810, u8"??"")a\\\n\"\n");
+  TEST (u811, u8"??"")a\\\nb\"\n");
+  TEST (u812, u8"a#)a??""=");
+  TEST (u813, u8"??"")a??""=??");
+  TEST (u814, u8"x)??""/\n\";");
+  TEST (u815, u8"??"")??""/\n\";");
+  TEST (u816, u8"??");
+  TEST (u817, u8"?)??");
+  TEST (u818, u8"??"")??"")??");
   TEST (u00, u"??""??"")??""'??""!??""-\\\n(a)#[{}]^|~\";\n");
   TEST (u01, u"\n)\\\na\"\n");
   TEST (u02, u"\n)a\\\n\"\n");
diff --git a/gcc/testsuite/c-c++-common/raw-string-15.c b/gcc/testsuite/c-c++-common/raw-string-15.c
index 9dfdaabd87d..1d101dc8393 100644
--- a/gcc/testsuite/c-c++-common/raw-string-15.c
+++ b/gcc/testsuite/c-c++-common/raw-string-15.c
@@ -62,6 +62,47 @@ const char s16[] = R"??(??)??";
 const char s17[] = R"?(?)??)?";
 const char s18[] = R"??(??)??)??)??";
 
+const char u800[] = u8R"??=??()??'??!??-\
+(a)#[{}]^|~";
+)??=??";
+const char u801[] = u8R"a(
+)\
+a"
+)a";
+const char u802[] = u8R"a(
+)a\
+"
+)a";
+const char u803[] = u8R"ab(
+)a\
+b"
+)ab";
+const char u804[] = u8R"a??/(x)a??/";
+const char u805[] = u8R"abcdefghijklmn??(abc)abcdefghijklmn??";
+const char u806[] = u8R"abcdefghijklm??/(abc)abcdefghijklm??/";
+const char u807[] = u8R"abc(??)\
+abc";)abc";
+const char u808[] = u8R"def(de)\
+def";)def";
+const char u809[] = u8R"a(??)\
+a"
+)a";
+const char u810[] = u8R"a(??)a\
+"
+)a";
+const char u811[] = u8R"ab(??)a\
+b"
+)ab";
+const char u812[] = u8R"a#(a#)a??=)a#";
+const char u813[] = u8R"a#(??)a??=??)a#";
+const char u814[] = u8R"??/(x)??/
+";)??/";
+const char u815[] = u8R"??/(??)??/
+";)??/";
+const char u816[] = u8R"??(??)??";
+const char u817[] = u8R"?(?)??)?";
+const char u818[] = u8R"??(??)??)??)??";
+
 const char16_t u00[] = uR"??=??()??'??!??-\
 (a)#[{}]^|~";
 )??=??";
@@ -211,6 +252,25 @@ main (void)
   TEST (s16, "??");
   TEST (s17, "?)??");
   TEST (s18, "??"")??"")??");
+  TEST (u800, u8"??""??"")??""'??""!??""-\\\n(a)#[{}]^|~\";\n");
+  TEST (u801, u8"\n)\\\na\"\n");
+  TEST (u802, u8"\n)a\\\n\"\n");
+  TEST (u803, u8"\n)a\\\nb\"\n");
+  TEST (u804, u8"x");
+  TEST (u805, u8"abc");
+  TEST (u806, u8"abc");
+  TEST (u807, u8"??"")\\\nabc\";");
+  TEST (u808, u8"de)\\\ndef\";");
+  TEST (u809, u8"??"")\\\na\"\n");
+  TEST (u810, u8"??"")a\\\n\"\n");
+  TEST (u811, u8"??"")a\\\nb\"\n");
+  TEST (u812, u8"a#)a??""=");
+  TEST (u813, u8"??"")a??""=??");
+  TEST (u814, u8"x)??""/

[PATCH 8/9]: C++ P0482R5 char8_t: Updates to existing standard library tests

2018-11-05 Thread Tom Honermann
This patch augments existing tests to validate behavior for char8_t.  In 
all cases, added test cases are cloned from existing tests for wchar_t 
or char16_t.


A few tests required updates to line numbers for diagnostic messages.

libstdc++-v3/ChangeLog:

2018-11-04  Tom Honermann  

 * testsuite/18_support/byte/ops.cc: Validate
 std::to_integer, std::to_integer, and
 std::to_integer.
 * testsuite/18_support/numeric_limits/dr559.cc: Validate
 std::numeric_limits.
 * testsuite/18_support/numeric_limits/lowest.cc: Validate
 std::numeric_limits::lowest().
 * testsuite/18_support/numeric_limits/max_digits10.cc: Validate
 std::numeric_limits::max_digits10.
 * testsuite/18_support/type_info/fundamental.cc: Validate
 typeinfo for char8_t.
 * testsuite/20_util/from_chars/1_neg.cc: Validate std::from_chars
 with char8_t.
 * testsuite/20_util/hash/requirements/explicit_instantiation.cc:
 Validate explicit instantiation of std::hash.
 * testsuite/20_util/is_integral/value.cc: Validate
 std::is_integral.
 * testsuite/20_util/make_signed/requirements/typedefs-4.cc:
 Validate std::make_signed.
 * testsuite/21_strings/basic_string/cons/char/deduction.cc:
 Validate u8string construction from char8_t sources.
 * testsuite/21_strings/basic_string_view/operations/compare/
 char/70483.cc: Validate substr operations on u8string_view.
 * testsuite/21_strings/basic_string_view/typedefs.cc: Validate that
 the u8string_view typedef is defined.
 * testsuite/21_strings/char_traits/requirements/
 constexpr_functions.cc: Validate char_traits constexpr
 member functions.
 * testsuite/21_strings/char_traits/requirements/
 constexpr_functions_c++17.cc: Validate char_traits C++17
 constexpr member functions.
 * testsuite/21_strings/headers/string/types_std_c++0x.cc: Validate
 that the u8string typedef is defined.
 * testsuite/22_locale/locale/cons/unicode.cc: Validate the presence
 of the std::codecvt and
 std::codecvt facets.
 * testsuite/29_atomics/atomic/cons/assign_neg.cc: Update line
 numbers.
 * testsuite/29_atomics/atomic/cons/copy_neg.cc: Likewise.
 * testsuite/29_atomics/atomic_integral/cons/assign_neg.cc:
 Likewise.
 * testsuite/29_atomics/atomic_integral/cons/copy_neg.cc: Likewise.
 * testsuite/29_atomics/atomic_integral/is_always_lock_free.cc:
 Validate std::atomic::is_always_lock_free
 * testsuite/29_atomics/atomic_integral/operators/bitwise_neg.cc:
 Update line numbers.
 * testsuite/29_atomics/atomic_integral/operators/decrement_neg.cc:
 Likewise.
 * testsuite/29_atomics/atomic_integral/operators/increment_neg.cc:
 Likewise.
 * testsuite/29_atomics/headers/atomic/macros.cc: Validate
 ATOMIC_CHAR8_T_LOCK_FREE and added a missing error message for
 ATOMIC_CHAR16_T_LOCK_FREE.
 * testsuite/29_atomics/headers/atomic/types_std_c++0x.cc: Validate
 std::atomic_char8_t.
 * testsuite/29_atomics/headers/atomic/types_std_c++0x_neg.cc:
 Validate atomic_char8_t.
 * testsuite/experimental/string_view/typedefs.cc: Validate that
 the u8string_view typedef is defined.
 * testsuite/util/testsuite_common_types.h (integral_types,
 integral_types_gnu, atomic_integrals_no_bool, atomic_integrals):
 Add char8_t to the typelist chains of integral types.

Tom.
diff --git a/libstdc++-v3/testsuite/18_support/byte/ops.cc b/libstdc++-v3/testsuite/18_support/byte/ops.cc
index 6f2755eb0a5..dfbaa8b2efa 100644
--- a/libstdc++-v3/testsuite/18_support/byte/ops.cc
+++ b/libstdc++-v3/testsuite/18_support/byte/ops.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++17" }
+// { dg-options "-std=gnu++17 -fchar8_t" }
 // { dg-do compile { target c++17 } }
 
 #include 
@@ -218,7 +218,13 @@ constexpr bool test_to_integer(unsigned char c)
 
 static_assert( test_to_integer(0) );
 static_assert( test_to_integer(255) );
+static_assert( test_to_integer(0) );
 static_assert( test_to_integer(255) );
 static_assert( test_to_integer(0) );
 static_assert( test_to_integer(255) );
-
+static_assert( test_to_integer(0) );
+static_assert( test_to_integer(255) );
+static_assert( test_to_integer(0) );
+static_assert( test_to_integer(255) );
+static_assert( test_to_integer(0) );
+static_assert( test_to_integer(255) );
diff --git a/libstdc++-v3/testsuite/18_support/numeric_limits/dr559.cc b/libstdc++-v3/testsuite/18_support/numeric_limits/dr559.cc
index 150db958807..f72b265dc77 100644
--- a/libstdc++-v3/testsuite/18_support/numeric_limits/dr559.cc
+++ b/libstdc++-v3/testsuite/18_support/numeric_limits/dr559.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++11 } }
+// { dg-options "-fchar8_t" }
 
 // 2010-02-17  Paolo Carlini  
 //
@@ -84,6 +85,9 @@ int main()
   do_test();
   do_test();
   do_test();
+#ifdef _GLIBCXX_USE_CHAR8_T
+  do_test();
+#endif
   d

[PATCH 7/9]: C++ P0482R5 char8_t: New standard library tests

2018-11-05 Thread Tom Honermann
This patch adds new tests for char8_t standard library features.  Most 
of these tests were cloned from existing tests that exercise char16_t 
and adapted for char8_t.  Only testsuite/experimental/feat-char8_t.cc 
and testsuite/ext/char8_t/atomic-1.cc are net new tests.


libstdc++-v3/ChangeLog:

2018-11-04  Tom Honermann  

 * testsuite/18_support/numeric_limits/char8_t.cc: New test cloned
 from char16_32_t.cc; validates numeric_limits.
 * testsuite/21_strings/basic_string/literals/types-char8_t.cc: New
 test cloned from types.cc; validates operator""s for char8_t
 returns u8string.
 * testsuite/21_strings/basic_string/literals/values-char8_t.cc: New
 test cloned from values.cc; validates construction and comparison
 of u8string values.
 * testsuite/21_strings/basic_string/requirements/
 /explicit_instantiation/char8_t/1.cc: New test cloned from
 char16_t/1.cc; validates explicit instantiation of
 basic_string.
 * testsuite/21_strings/basic_string_view/literals/types-char8_t.cc:
 New test cloned from types.cc; validates operator""sv for char8_t
 returns u8string_view.
 * testsuite/21_strings/basic_string_view/literals/
 values-char8_t.cc: New test cloned from values.cc; validates
 construction and comparison of u8string_view values.
 * testsuite/21_strings/basic_string_view/requirements/
 explicit_instantiation/char8_t/1.cc: New test cloned from
 char16_t/1.cc; validates explicit instantiation of
 basic_string_view.
 * testsuite/21_strings/char_traits/requirements/char8_t/65049.cc:
 New test cloned from char16_t/65049.cc; validates that
 char_traits is not vulnerable to the concerns in PR65049.
 * testsuite/21_strings/char_traits/requirements/char8_t/
 typedefs.cc: New test cloned from char16_t/typedefs.cc; validates
 that char_traits member typedefs are present and correct.
 * testsuite/21_strings/char_traits/requirements/
 explicit_instantiation/char8_t/1.cc: New test cloned from
 char16_t/1.cc; validates explicit instantiation of
 char_traits.
 * testsuite/22_locale/codecvt/char16_t-char8_t.cc: New test cloned
 from char16_t.cc: validates
 codecvt.
 * testsuite/22_locale/codecvt/char32_t-char8_t.cc: New test cloned
 from char32_t.cc: validates
 codecvt.
 * testsuite/22_locale/codecvt/utf8-char8_t.cc: New test cloned from
 utf8.cc; validates codecvt and
 codecvt.
 * testsuite/27_io/filesystem/path/native/string-char8_t.cc: New
 test cloned from string.cc; validates filesystem::path construction
 from char8_t input.
 * testsuite/experimental/feat-char8_t.cc: New test; validates that
 the __cpp_lib_char8_t feature test macro is defined with the
 correct value.
 * testsuite/experimental/filesystem/path/native/string-char8_t.cc:
 New test cloned from string.cc; validates filesystem::path
 construction from char8_t input.
 * testsuite/experimental/string_view/literals/types-char8_t.cc: New
 test cloned from types.cc; validates operator""sv for char8_t
 returns u8string_view.
 * testsuite/experimental/string_view/literals/values-char8_t.cc:
 New test cloned from values.cc; validates construction and
 comparison of u8string_view values.
 * testsuite/experimental/string_view/requirements/
 explicit_instantiation/char8_t/1.cc: New test cloned from
 char16_t/1.cc; validates explicit instantiation of
 basic_string_view.
 * testsuite/ext/char8_t/atomic-1.cc: New test; validates that
 ATOMIC_CHAR8_T_LOCK_FREE is not defined if char8_t support is not
 enabled.

Tom.
diff --git a/libstdc++-v3/testsuite/18_support/numeric_limits/char8_t.cc b/libstdc++-v3/testsuite/18_support/numeric_limits/char8_t.cc
new file mode 100644
index 000..346463d7244
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/numeric_limits/char8_t.cc
@@ -0,0 +1,71 @@
+// { dg-do run { target c++11 } }
+// { dg-require-cstdint "" }
+// { dg-options "-fchar8_t" }
+
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+
+// Test specializations for char8_t.
+template
+  void
+  do_test()
+  {
+typedef std::numeric_limits char_type;
+typedef std::

[PATCH 9/9]: C++ P0482R5 char8_t: Updates to gdb pretty printing support

2018-11-05 Thread Tom Honermann
This patch adds recognition of the u8string and u8string_view type 
aliases to the gdb pretty printer extension.


libstdc++-v3/ChangeLog:

2018-11-04  Tom Honermann  

 * python/libstdcxx/v6/printers.py (register_type_printers): Add
 type printers for u8string and u8string_view.
 * testsuite/libstdc++-prettyprinters/whatis.cc: Validate
 recognition of u8string.

Tom.
diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 827c87b70ea..f9e638e210d 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -1554,7 +1554,7 @@ def register_type_printers(obj):
 return
 
 # Add type printers for typedefs std::string, std::wstring etc.
-for ch in ('', 'w', 'u16', 'u32'):
+for ch in ('', 'w', 'u8', 'u16', 'u32'):
 add_one_type_printer(obj, 'basic_string', ch + 'string')
 add_one_type_printer(obj, '__cxx11::basic_string', ch + 'string')
 # Typedefs for __cxx11::basic_string used to be in namespace __cxx11:
@@ -1604,7 +1604,7 @@ def register_type_printers(obj):
 
 # Add type printers for experimental::basic_string_view typedefs.
 ns = 'experimental::fundamentals_v1::'
-for ch in ('', 'w', 'u16', 'u32'):
+for ch in ('', 'w', 'u8', 'u16', 'u32'):
 add_one_type_printer(obj, ns + 'basic_string_view',
  ns + ch + 'string_view')
 
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/whatis.cc b/libstdc++-v3/testsuite/libstdc++-prettyprinters/whatis.cc
index 90f3994314b..d74bf7c5e9b 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/whatis.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/whatis.cc
@@ -1,5 +1,5 @@
 // { dg-do run { target c++11 } }
-// { dg-options "-g -O0" }
+// { dg-options "-g -O0 -fchar8_t" }
 // { dg-skip-if "" { *-*-* } { "-D_GLIBCXX_PROFILE" } }
 
 // Copyright (C) 2011-2018 Free Software Foundation, Inc.
@@ -130,6 +130,9 @@ holder cregex_token_iterator_holder;
 std::sregex_token_iterator *sregex_token_iterator_ptr;
 holder sregex_token_iterator_holder;
 // { dg-final { whatis-test sregex_token_iterator_holder "holder" } }
+std::u8string *u8string_ptr;
+holder u8string_holder;
+// { dg-final { whatis-test u8string_holder "holder" } }
 std::u16string *u16string_ptr;
 holder u16string_holder;
 // { dg-final { whatis-test u16string_holder "holder" } }
@@ -240,6 +243,8 @@ main()
   placeholder(&cregex_token_iterator_holder);
   placeholder(&sregex_token_iterator_ptr);
   placeholder(&sregex_token_iterator_holder);
+  placeholder(&u8string_ptr);
+  placeholder(&u8string_holder);
   placeholder(&u16string_ptr);
   placeholder(&u16string_holder);
   placeholder(&u32string_ptr);


Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-05 Thread Jeff Law
On 11/5/18 12:36 PM, Peter Bergner wrote:
> On 11/5/18 1:20 PM, Jeff Law wrote:
>> On 11/1/18 4:07 PM, Peter Bergner wrote:
>>> On 11/1/18 1:50 PM, Renlin Li wrote:
 Is there any update on this issues?
 arm-none-linux-gnueabihf native toolchain has been mis-compiled for a 
 while.
>>>
>>> From the analysis I've done, my commit is just exposing latent issues
>>> in LRA.  Can you try the patch I submitted here to see if it helps?
>>>
>>>   https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
>>>
>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
>>> Jeff threw it on his testers and said he saw an arm issue and was
>>> trying to come up with a test case for me to debug.
>> So I don't think the ARM issues are related to your patch, they may have
>> been related the combiner changes that went in around the same time.
>>
>> At this point your patch appears to be DTRT across the board.  The only
>> fallout is the bogus s390 asm it caught in the kernel.
> 
> Cool.  I will note that I contacted the s390 kernel guys and gave them a
> fix to their broken constraints in that asm and they are going to fix it.
Sounds good.  I've got a hack in my tester to "fix" that bogus asm until
the kernel folks do it right.


> 
> Is the above an approval to commit the patch mentioned above or do you
> still want to wait until the ARM issues are fully resolved?
I think knowing the patch addresses all the known issues related to the
earlier IRA/LRA change unblocks the review step.  I don't think we need
to wait for the other ARM issues to be resolved -- they seem to be
unrelated to the IRA/LRA changes.

jeff


[PATCH 2/2] C++: improvements to binary operator diagnostics (PR c++/87504)

2018-11-05 Thread David Malcolm
The C frontend is able (where expression locations are available) to print
problems with binary operators in 3-location form, labelling the types of
the expressions:

  arg_0 op arg_1
  ~ ^~ ~
||
|arg1 type
arg0 type

The C++ frontend currently just shows the combined location:

  arg_0 op arg_1
  ~~^~~~

and fails to highlight where the subexpressions are, or their types.

This patch introduces a op_location_t struct for handling the above
operator-location vs combined-location split, and a new
class binary_op_rich_location for displaying the above, so that the
C++ frontend is able to use the more detailed 3-location form for
type mismatches in binary operators, and for -Wtautological-compare
(where types are not displayed).  Both forms can be seen in this
example:

bad-binary-ops.C:69:20: error: no match for 'operator&&' (operand types are
  's' and 't')
   69 |   return ns_4::foo && ns_4::inner::bar;
  |  ~ ^~ 
  ||   |
  |s   t
bad-binary-ops.C:69:20: note: candidate: 'operator&&(bool, bool)' 
   69 |   return ns_4::foo && ns_4::inner::bar;
  |  ~~^~~

The patch also allows from some uses of macros in
-Wtautological-compare, where both sides of the comparison have
been spelled the same way, e.g.:

Wtautological-compare-ranges.c:23:11: warning: self-comparison always
   evaluates to true [-Wtautological-compare]
   23 |   if (FOO == FOO);
  |   ^~

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, in
conjunction with the previous patch.

OK for trunk?
Dave

gcc/c-family/ChangeLog:
PR c++/87504
* c-common.h (warn_tautological_cmp): Convert 1st param from
location_t to const op_location_t &.
* c-warn.c (find_array_ref_with_const_idx_r): Strip location
wrapper when testing for INTEGER_CST.
(warn_tautological_bitwise_comparison): Convert 1st param from
location_t to const op_location_t &; use it to build a
binary_op_rich_location, and use this.
(spelled_the_same_p): New function.
(warn_tautological_cmp): Convert 1st param from location_t to
const op_location_t &.  Warn for macro expansions if
spelled_the_same_p.  Use binary_op_rich_location.

gcc/c/ChangeLog:
PR c++/87504
* c-typeck.c (class maybe_range_label_for_tree_type_mismatch):
Move from here to gcc-rich-location.h and gcc-rich-location.c.
(build_binary_op): Use struct op_location_t and
class binary_op_rich_location.

gcc/cp/ChangeLog:
PR c++/87504
* call.c (op_error): Convert 1st param from location_t to
const op_location_t &.  Use binary_op_rich_location for binary
ops.
(build_conditional_expr_1): Convert 1st param from location_t to
const op_location_t &.
(build_conditional_expr): Likewise.
(build_new_op_1): Likewise.
(build_new_op): Likewise.
* cp-tree.h (build_conditional_expr): Likewise.
(build_new_op): Likewise.
(build_x_binary_op): Likewise.
(cp_build_binary_op): Likewise.
* parser.c (cp_parser_primary_expression): Build a location
for id-expression nodes.
(cp_parser_binary_expression): Use an op_location_t when
calling build_x_binary_op.
(cp_parser_operator): Build a location for user-defined literals.
* typeck.c (build_x_binary_op): Convert 1st param from location_t
to const op_location_t &.
(cp_build_binary_op): Likewise.  Use binary_op_rich_location.

gcc/ChangeLog:
PR c++/87504
* gcc-rich-location.c
(maybe_range_label_for_tree_type_mismatch::get_text): Move here from
c/c-typeck.c.
(binary_op_rich_location::binary_op_rich_location): New ctor.
(binary_op_rich_location::use_operator_loc_p): New function.
* gcc-rich-location.h
(class maybe_range_label_for_tree_type_mismatch)): Move here from
c/c-typeck.c.
(struct op_location_t): New forward decl.
(class binary_op_rich_location): New class.
* tree.h (struct op_location_t): New struct.

gcc/testsuite/ChangeLog:
* c-c++-common/Wtautological-compare-ranges.c: New test.
* g++.dg/cpp0x/pr51420.C: Add -fdiagnostics-show-caret and update
expected output.
* g++.dg/cpp0x/udlit-declare-neg.C: Update expected columns in
output.
* g++.dg/cpp0x/udlit-member-neg.C: Likewise.
* g++.dg/diagnostic/bad-binary-ops.C: Update expected output from
1-location form to 3-location form, with labelling of ranges with
types.  Add examples of id-expression nodes with namespaces.
* g++.dg/diagnostic/param-type-mismatch-2.C: Likewise.
---
 gcc/c-family/c-common.h|  3 +-
 gcc/c-family/c-warn.c 

[PATCH 1/2] C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-11-05 Thread David Malcolm
The C++ frontend gained various location wrapper nodes in r256448 (GCC 8).
That patch:
  https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00799.html
added wrapper nodes around all nodes with !CAN_HAVE_LOCATION_P for:

* arguments at callsites, and for

* typeid, alignof, sizeof, and offsetof.

This is a followup to that patch, adding many more location wrappers
to the C++ frontend.  It adds location wrappers for nodes with
!CAN_HAVE_LOCATION_P to:

* all literal nodes (in cp_parser_primary_expression)

* all id-expression nodes (in finish_id_expression), except within a
  decltype.

* all mem-initializer nodes within a mem-initializer-list
  (in cp_parser_mem_initializer)

However, the patch also adds some suppressions: regions in the parser
for which wrapper nodes will not be created:

* within a template-parameter-list or template-argument-list (in
  cp_parser_template_parameter_list and cp_parser_template_argument_list
  respectively), to avoid encoding the spelling location of the nodes
  in types.  For example, "array<10>" and "array<10>" are the same type,
  despite the fact that the two different "10" tokens are spelled in
  different locations in the source.

* within a gnu-style attribute (none of are handlers are set up to cope
  with location wrappers yet)

* within various OpenMP clauses

The patch enables various improvements to locations for bad
initializations, for -Wchar-subscripts, and enables various other
improvements in the followup patch.

For example, given the followup buggy mem-initializer:

class X {
  X() : bad(42),
good(42)
  { }
  void* bad;
  int good;
};

previously, our diagnostic was on the final close parenthesis of the
mem-initializer-list, leaving it unclear where the problem is:

t.cc: In constructor 'X::X()':
t.cc:3:16: error: invalid conversion from 'int' to 'void*' [-fpermissive]
3 | good(42)
  |^
  ||
  |int

whereas with the patch we highlight which expression is bogus:

t.cc: In constructor 'X::X()':
t.cc:2:13: error: invalid conversion from 'int' to 'void*' [-fpermissive]
2 |   X() : bad(42),
  | ^~
  | |
  | int

Similarly, the diagnostic for this bogus initialization:

i.cc:1:44: error: initializer-string for array of chars is too long 
[-fpermissive]
1 | char test[3][4] = { "ok", "too long", "ok" };
  |^

is improved by the patch so that it indicates which string is too long:

i.cc:1:27: error: initializer-string for array of chars is too long 
[-fpermissive]
1 | char test[3][4] = { "ok", "too long", "ok" };
  |   ^~

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, in
conjunction with the followup patch [1]

I did some light performance testing, comparing release builds with and
without the patch on kdecore.cc (preprocessed all-of-KDE) and a test file
that includes all of the C++ stdlib (but does nothing) with it, in both
cases compiling at -O3 -g.  In both cases there was no significant
difference in the overall wallclock time for all of compilation:

kdecode.c total wallclock time:

http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,58.26,61.79&chco=FF,FF&chdl=control|experiment&chds=58.26,61.79&chd=t:59.55,60.26,60.53,60.35,60.17,60.27,59.26,60.01,60.21,60.23,60.1,60.2,60.12,60.48,60.32,60.18,60.01,60.01,60.04,59.96,60.1,60.11,60.21,60.36,60.08,60.1,60.16,60.01,60.21,60.15,60.12,60.09,59.96,60.12,60.06,60.12,60.05,60.11,59.93,59.99|59.6,59.3,60.03,60.1,60.49,60.35,60.03,60.1,59.87,60.39,60.1,59.96,60.19,60.45,59.97,59.91,60.0,59.99,60.09,60.15,60.79,59.98,60.16,60.09,60.02,60.05,60.32,60.01,59.95,59.88,60.1,60.07,60.22,59.87,60.04,60.11,60.01,60.09,59.86,59.86&chxl=0:|1|8|16|24|32|40|2:||Iteration|3:||Time+(secs)&chtt=Compilation+of+kdecore.cc+at+-O3+with+-g+for+x86_64-pc-linux-gnu:+total:+wall

cp-stdlib.cc total wallclock time:

http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,1.88,4.59&chco=FF,FF&chdl=control|experiment&chds=1.88,4.59&chd=t:3.59,2.94,2.95,2.94,2.94,2.93,2.92,2.94,2.93,2.94,2.94,2.88,2.94,2.9,2.94,2.9,2.94,2.93,2.94,2.93,2.95,2.93,2.9,2.9,2.94,2.99,2.95,3.0,2.94,3.0,2.94,2.99,2.95,2.95,2.9,2.99,2.94,2.99,2.94,2.96|3.54,2.92,2.93,2.88,2.94,2.92,2.93,2.92,2.9,2.93,2.89,2.93,2.9,2.93,2.89,2.91,2.93,2.92,2.89,2.93,2.93,2.92,2.93,2.92,2.93,2.92,2.88,2.92,2.89,2.93,2.94,2.92,2.9,2.92,2.92,2.91,2.94,2.92,2.98,2.88&chxl=0:|1|8|16|24|32|40|2:||Iteration|3:||Time+(secs)&chtt=Compilation+of+cp-stdlib.cc+at+-O3+with+-g+for+x86_64-pc-linux-gnu:+total:+wall

-ftime-report did show that kdecode.cc's "phase parsing" was 3% slower
by wallclock:

http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,1.71,3.95&chco=FF,FF&chdl=control|experiment&chds=1.71,3.95&chd=t:2.74,2.72,2.73,2.8,2.72,2.73,2.72,2.74,2.73,2.73,2.73,2.73,2.73,2.72,2.7

Re: [PATCH v3 3/3] or1k: gcc: initial support for openrisc

2018-11-05 Thread Richard Henderson
On 11/4/18 9:05 AM, Stafford Horne wrote:
> I have had some inqueries into helping
> bootstrap some linux nommu machines.

For nommu, we'd need to implement an FDPIC ABI.

Otherwise, code segments cannot be mapped separately
from data segments.  I believe that the arm (32bit)
port has recently added support for this, so you may
be able to find patches from which to crib.


r~


  1   2   >