Re: [PATCH v2 3/9] Introduce can_vector_compare_p function

2019-08-27 Thread Richard Sandiford
Ilya Leoshkevich  writes:
>> Am 26.08.2019 um 15:17 schrieb Ilya Leoshkevich :
>> 
>>> Am 26.08.2019 um 15:06 schrieb Richard Biener :
>>> 
>>> On Mon, Aug 26, 2019 at 1:54 PM Ilya Leoshkevich  wrote:
 
> Am 26.08.2019 um 10:49 schrieb Richard Biener 
> :
> 
> On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich  
> wrote:
>> 
>>> Am 23.08.2019 um 13:24 schrieb Richard Biener 
>>> :
>>> 
>>> On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
>>>  wrote:
 
 Ilya Leoshkevich  writes:
> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, 
> machine_mode mode,
> return 0;
> }
> 
> +/* can_vector_compare_p presents fake rtx binary operations to the 
> the back-end
> +   in order to determine its capabilities.  In order to avoid 
> creating fake
> +   operations on each call, values from previous calls are cached in 
> a global
> +   cached_binops hash_table.  It contains rtxes, which can be looked 
> up using
> +   binop_keys.  */
> +
> +struct binop_key {
> +  enum rtx_code code;/* Operation code.  */
> +  machine_mode value_mode;   /* Result mode. */
> +  machine_mode cmp_op_mode;  /* Operand mode.*/
> +};
> +
> +struct binop_hasher : pointer_hash_mark, ggc_cache_remove {
> +  typedef rtx value_type;
> +  typedef binop_key compare_type;
> +
> +  static hashval_t
> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode 
> cmp_op_mode)
> +  {
> +inchash::hash hstate (0);
> +hstate.add_int (code);
> +hstate.add_int (value_mode);
> +hstate.add_int (cmp_op_mode);
> +return hstate.end ();
> +  }
> +
> +  static hashval_t
> +  hash (const rtx &ref)
> +  {
> +return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP 
> (ref, 0)));
> +  }
> +
> +  static bool
> +  equal (const rtx &ref1, const binop_key &ref2)
> +  {
> +return (GET_CODE (ref1) == ref2.code)
> +&& (GET_MODE (ref1) == ref2.value_mode)
> +&& (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
> +  }
> +};
> +
> +static GTY ((cache)) hash_table *cached_binops;
> +
> +static rtx
> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
> +   machine_mode cmp_op_mode)
> +{
> +  if (!cached_binops)
> +cached_binops = hash_table::create_ggc (1024);
> +  binop_key key = { code, value_mode, cmp_op_mode };
> +  hashval_t hash = binop_hasher::hash (code, value_mode, 
> cmp_op_mode);
> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
> +  if (!*slot)
> +*slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx 
> (cmp_op_mode),
> + gen_reg_rtx (cmp_op_mode));
> +  return *slot;
> +}
 
 Sorry, I didn't mean anything this complicated.  I just meant that
 we should have a single cached rtx that we can change via PUT_CODE and
 PUT_MODE_RAW for each new query, rather than allocating a new rtx each
 time.
 
 Something like:
 
 static GTY ((cache)) rtx cached_binop;
 
 rtx
 get_cached_binop (machine_mode mode, rtx_code code, machine_mode 
 op_mode)
 {
 if (cached_binop)
 {
   PUT_CODE (cached_binop, code);
   PUT_MODE_RAW (cached_binop, mode);
   PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
   PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
 }
 else
 {
   rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
   rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
   cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
 }
 return cached_binop;
 }
>>> 
>>> Hmm, maybe we need  auto_rtx (code) that constructs such
>>> RTX on the stack instead of wasting a GC root (and causing
>>> issues for future threading of GCC ;)).
>> 
>> Do you mean something like this?
>> 
>> union {
>> char raw[rtx_code_size[code]];
>> rtx rtx;
>> } binop;
>> 
>> Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show
>> anything useful), or should I implement this?
> 
> It doesn't exist AFAIK, I thought about using alloca like
> 
> rtx tem;
> rtx_alloca (tem, PLUS);
> 
> and due to using alloca rtx_alloca has to be a macro like
> 
> #define rtx_alloca(r, code) r = (rtx)alloca (RTX_CODE_SIZE(code))

Re: [Patch][PR91504] Inlining misses some logical operation folding

2019-08-27 Thread Richard Biener
On Mon, 26 Aug 2019, Segher Boessenkool wrote:

> On Mon, Aug 26, 2019 at 02:04:25PM +0200, Richard Biener wrote:
> > On Mon, 26 Aug 2019, kamlesh kumar wrote:
> > > +/* (~a & b) ^ a  -->   (a | b)   */
> > > +(simplify
> > > + (bit_xor:c (bit_and:cs (bit_not @0) @1) @0)
> > > + (bit_ior @0 @1))
> > > +
> > 
> > Are you sure?
> > 
> > (~1804289383 & 846930886) ^ 1804289383 != 1804289383 | 846930886
> 
> Both are hex 7bfb67e7.
> 
> a|b = a|(b&~a) = a^(b&~a)
> 
> >   if ((~a & b) ^ a != a | b)
> 
> != has higher precedence than ^ and | (and &).

Oops.  Need more coffee (and -Wall).

Patch is OK.

Thanks,
Richard.


[PATCH] Share a prevailing name for remove debug info symbols w/ LTO.

2019-08-27 Thread Martin Liška
Hi.

The patch is about better symbol table manipulation
for debug info objects. The patch fixes reported issue
on hppa64-hp-hpux11.11.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

libiberty/ChangeLog:

2019-08-27  Martin Liska  

PR lto/91478
* simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
First find a WEAK HIDDEN symbol in symbol table that will be
preserved.  Later, use the symbol name for all removed symbols.
---
 libiberty/simple-object-elf.c | 71 +++
 1 file changed, 48 insertions(+), 23 deletions(-)


diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
index 75159266596..b637e4b0c92 100644
--- a/libiberty/simple-object-elf.c
+++ b/libiberty/simple-object-elf.c
@@ -1366,30 +1366,17 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	  return errmsg;
 	}
 
-  /* If we are processing .symtab purge __gnu_lto_slim symbol
-	 from it and any symbols in discarded sections.  */
+  /* If we are processing .symtab purge any symbols
+	 in discarded sections.  */
   if (sh_type == SHT_SYMTAB)
 	{
 	  unsigned entsize = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
 	  shdr, sh_entsize, Elf_Addr);
 	  unsigned strtab = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
 	 shdr, sh_link, Elf_Word);
-	  unsigned char *strshdr = shdrs + (strtab - 1) * shdr_size;
-	  off_t stroff = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-	  strshdr, sh_offset, Elf_Addr);
-	  size_t strsz = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-	  strshdr, sh_size, Elf_Addr);
-	  char *strings = XNEWVEC (char, strsz);
-	  char *gnu_lto = strings;
+	  size_t prevailing_name_idx = 0;
 	  unsigned char *ent;
 	  unsigned *shndx_table = NULL;
-	  simple_object_internal_read (sobj->descriptor,
-   sobj->offset + stroff,
-   (unsigned char *)strings,
-   strsz, &errmsg, err);
-	  /* Find first '\0' in strings.  */
-	  gnu_lto = (char *) memchr (gnu_lto + 1, '\0',
- strings + strsz - gnu_lto);
 	  /* Read the section index table if present.  */
 	  if (symtab_indices_shndx[i - 1] != 0)
 	{
@@ -1404,6 +1391,41 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	   (unsigned char *)shndx_table,
 	   sidxsz, &errmsg, err);
 	}
+
+	  /* Find a WEAK HIDDEN symbol which name we will use for removed
+	 symbols.  */
+	  for (ent = buf; ent < buf + length; ent += entsize)
+	{
+	  unsigned st_shndx = ELF_FETCH_FIELD (type_functions, ei_class,
+		   Sym, ent,
+		   st_shndx, Elf_Half);
+	  unsigned char *st_info;
+	  unsigned char *st_other;
+	  if (ei_class == ELFCLASS32)
+		{
+		  st_info = &((Elf32_External_Sym *)ent)->st_info;
+		  st_other = &((Elf32_External_Sym *)ent)->st_other;
+		}
+	  else
+		{
+		  st_info = &((Elf64_External_Sym *)ent)->st_info;
+		  st_other = &((Elf64_External_Sym *)ent)->st_other;
+		}
+	  if (st_shndx == SHN_XINDEX)
+		st_shndx = type_functions->fetch_Elf_Word
+		((unsigned char *)(shndx_table + (ent - buf) / entsize));
+
+	  if (st_shndx != SHN_COMMON
+		  && !(st_shndx != SHN_UNDEF
+		   && st_shndx < shnum
+		   && pfnret[st_shndx - 1] == -1)
+		  && ELF_ST_BIND (*st_info) == STB_WEAK
+		  && *st_other == STV_HIDDEN)
+		prevailing_name_idx = ELF_FETCH_FIELD (type_functions, ei_class,
+		   Sym, ent,
+		   st_name, Elf_Addr);
+	}
+
 	  for (ent = buf; ent < buf + length; ent += entsize)
 	{
 	  unsigned st_shndx = ELF_FETCH_FIELD (type_functions, ei_class,
@@ -1426,9 +1448,10 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	  if (st_shndx == SHN_XINDEX)
 		st_shndx = type_functions->fetch_Elf_Word
 		((unsigned char *)(shndx_table + (ent - buf) / entsize));
-	  /* Eliminate all COMMONs - this includes __gnu_lto_v1
-		 and __gnu_lto_slim which otherwise cause endless
-		 LTO plugin invocation.  */
+	  /* Eliminate all COMMONs - this includes __gnu_lto_slim
+		 which otherwise cause endless LTO plugin invocation.
+		 FIXME: remove the condition once we remove emission
+		 of __gnu_lto_slim symbol.  */
 	  if (st_shndx == SHN_COMMON)
 		discard = 1;
 	  /* We also need to remove symbols refering to sections
@@ -1460,12 +1483,15 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 		  else
 		{
 		  /* Make discarded global symbols hidden weak
-			 undefined and sharing the gnu_lto_ name.  */
+			 undefined and sharing a name of a prevailing
+			 symbol.  */
 		  bind = STB_WEAK;
 		  other = STV_HIDDEN;
+
 		  ELF_SET_FIELD (type_functions, ei_class, Sym,
- ent, st_name, Elf_Word,
- gnu_lto - strings);
+ ent, st_name, Elf_Addr,
+ prevailing_name_idx);
+
 		  ELF_SET_FIELD (type_functions, ei_class, Sym,
  e

Re: [RFC PATCH, i386]: Improve STV pass by correcting the cost of moves to/from XMM reg

2019-08-27 Thread Uros Bizjak
On Mon, Aug 26, 2019 at 2:49 PM Richard Biener  wrote:
>
> On Mon, 26 Aug 2019, Uros Bizjak wrote:
>
> > On Mon, Aug 26, 2019 at 2:11 PM Uros Bizjak  wrote:
> > >
> > > On Mon, Aug 26, 2019 at 1:46 PM Richard Biener  wrote:
> > > >
> > > > On Fri, 23 Aug 2019, Uros Bizjak wrote:
> > > >
> > > > > On Fri, Aug 23, 2019 at 1:52 PM Richard Biener  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, 23 Aug 2019, Uros Bizjak wrote:
> > > > > >
> > > > > > > This is currently a heads-up patch that removes the minimum 
> > > > > > > limitation
> > > > > > > of cost of moves to/from XMM reg. The immediate benefit is the 
> > > > > > > removal
> > > > > > > of mismatched spills, caused by subreg usage.
> > > > > > >
> > > > > > > *If* the patch proves to be beneficial (as in "doesn't regress
> > > > > > > important benchmarks"), then we should be able to un-hide the
> > > > > > > inter-regset moves from RA and allow it to collapse some moves. 
> > > > > > > As an
> > > > > > > example, patched compiler removes a movd in 
> > > > > > > gcc.target/i386/minmax-6.c
> > > > > > > and still avoids mismatched spill.
> > > > > > >
> > > > > > > 2019-08-23  Uroš Bizjak  
> > > > > > >
> > > > > > > * config/i386/i386.c (ix86_register_move_cost): Do not
> > > > > > > limit the cost of moves to/from XMM register to minimum 8.
> > > > > > > * config/i386/i386-features.c
> > > > > > > (general_scalar_chain::make_vector_copies): Do not generate
> > > > > > > zeroing move from GPR to XMM register, use gen_move_insn
> > > > > > > instead of gen_gpr_to_xmm_move_src.
> > > > > > > (general_scalar_chain::convert_op): Ditto.
> > > > > > > (gen_gpr_to_xmm_move_src): Remove.
> > > > > > >
> > > > > > > The patch was bootstrapped and regression tested on 
> > > > > > > x86_64-linux-gnu
> > > > > > > {,-m32}, configured w/ and w/o -with-arch=ivybridge.
> > > > > > >
> > > > > > > The patch regresses PR80481 scan-asm-not (where the compiler 
> > > > > > > generates
> > > > > > > unrelated XMM spill on register starved x86_32). However, during 
> > > > > > > the
> > > > > > > analysis, I found that the original issue is not fixed, and is 
> > > > > > > still
> > > > > > > visible without -funrol-loops [1].
> > > > > > >
> > > > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80481#c10
> > > > > > >
> > > > > > > So, I'd wait for the HJ's benchmark results of the cost to/from 
> > > > > > > XMM
> > > > > > > change before proceeding with the patch.
> > > > > >
> > > > > > Otherwise looks good to me, it might clash (whitespace wise)
> > > > > > with my STV "rewrite" though.
> > > > > >
> > > > > > We might want to adjust ix86_register_move_cost separately from
> > > > > > the STV change to use regular moves though?  Just to make bisection
> > > > > > point to either of the two - STV "fallout"" is probably minor
> > > > > > compared to fallout elsewhere...
> > > > >
> > > > > Yes, this is also my plan.
> > > >
> > > > Btw, when testing w/ the costing disabled I run into
> > > >
> > > > (insn 31 7 8 2 (set (subreg:V4SI (reg:SI 107) 0)
> > > > (vec_merge:V4SI (vec_duplicate:V4SI (mem/c:SI (const:DI (plus:DI
> > > > (symbol_ref:DI ("peakbuf") [flags 0x2]   > > > peakbuf>)
> > > > (const_int 40002 [0x17d7d220]))) [1
> > > > peakbuf.PeaksInBuf+0 S4 A64]))
> > > > (const_vector:V4SI [
> > > > (const_int 0 [0]) repeated x4
> > > > ])
> > > > (const_int 1 [0x1])))
> > > >
> > > > being not recognized (for the large immediate I guess).  So when
> > > > just doing
> > >
> > > Hard to say without testcase, but I don't think this is valid memory
> > > address. Can you see in dumps which insn is getting converted here?
> >
> > Ah, we can use movabsdi in this case.
>
> Yes, this is simply STVing such a single load... (quite pointless
> of course).
>
> > > >   (set (reg:SI 107) (mem:SI ...))
> > > >
> > > > here we expect IRA to be able to allocate 107 into xmm, realizing
> > > > it needs to reload the RHS first?
> > > >
> > > > For current code, is testing x86_64_general_operand like in the
> > > > following the correct thing to do?
> > >
> > > x86_64_general_operand will only limit immediates using
> > > x86_64_immediate_operand, it will allow all memory_operands.
> >
> > Yes, I guess it is OK, invalid embedded address can be loaded using movabsq.
>
> OK, combining this with your patch for extended testing now.

Please note that this new code will be removed by the above patch. The
failing pattern will be replaced with simple move to SUBREG, where RA
is able to correctly reload the operands.

Uros.


Re: [RFC PATCH, i386]: Improve STV pass by correcting the cost of moves to/from XMM reg

2019-08-27 Thread Uros Bizjak
On Tue, Aug 27, 2019 at 9:55 AM Uros Bizjak  wrote:

> > > > > > > > This is currently a heads-up patch that removes the minimum 
> > > > > > > > limitation
> > > > > > > > of cost of moves to/from XMM reg. The immediate benefit is the 
> > > > > > > > removal
> > > > > > > > of mismatched spills, caused by subreg usage.
> > > > > > > >
> > > > > > > > *If* the patch proves to be beneficial (as in "doesn't regress
> > > > > > > > important benchmarks"), then we should be able to un-hide the
> > > > > > > > inter-regset moves from RA and allow it to collapse some moves. 
> > > > > > > > As an
> > > > > > > > example, patched compiler removes a movd in 
> > > > > > > > gcc.target/i386/minmax-6.c
> > > > > > > > and still avoids mismatched spill.
> > > > > > > >
> > > > > > > > 2019-08-23  Uroš Bizjak  
> > > > > > > >
> > > > > > > > * config/i386/i386.c (ix86_register_move_cost): Do not
> > > > > > > > limit the cost of moves to/from XMM register to minimum 8.
> > > > > > > > * config/i386/i386-features.c
> > > > > > > > (general_scalar_chain::make_vector_copies): Do not generate
> > > > > > > > zeroing move from GPR to XMM register, use gen_move_insn
> > > > > > > > instead of gen_gpr_to_xmm_move_src.
> > > > > > > > (general_scalar_chain::convert_op): Ditto.
> > > > > > > > (gen_gpr_to_xmm_move_src): Remove.
> > > > > > > >
> > > > > > > > The patch was bootstrapped and regression tested on 
> > > > > > > > x86_64-linux-gnu
> > > > > > > > {,-m32}, configured w/ and w/o -with-arch=ivybridge.
> > > > > > > >
> > > > > > > > The patch regresses PR80481 scan-asm-not (where the compiler 
> > > > > > > > generates
> > > > > > > > unrelated XMM spill on register starved x86_32). However, 
> > > > > > > > during the
> > > > > > > > analysis, I found that the original issue is not fixed, and is 
> > > > > > > > still
> > > > > > > > visible without -funrol-loops [1].
> > > > > > > >
> > > > > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80481#c10
> > > > > > > >
> > > > > > > > So, I'd wait for the HJ's benchmark results of the cost to/from 
> > > > > > > > XMM
> > > > > > > > change before proceeding with the patch.
> > > > > > >
> > > > > > > Otherwise looks good to me, it might clash (whitespace wise)
> > > > > > > with my STV "rewrite" though.
> > > > > > >
> > > > > > > We might want to adjust ix86_register_move_cost separately from
> > > > > > > the STV change to use regular moves though?  Just to make 
> > > > > > > bisection
> > > > > > > point to either of the two - STV "fallout"" is probably minor
> > > > > > > compared to fallout elsewhere...
> > > > > >
> > > > > > Yes, this is also my plan.
> > > > >
> > > > > Btw, when testing w/ the costing disabled I run into
> > > > >
> > > > > (insn 31 7 8 2 (set (subreg:V4SI (reg:SI 107) 0)
> > > > > (vec_merge:V4SI (vec_duplicate:V4SI (mem/c:SI (const:DI 
> > > > > (plus:DI
> > > > > (symbol_ref:DI ("peakbuf") [flags 0x2]   > > > > peakbuf>)
> > > > > (const_int 40002 [0x17d7d220]))) [1
> > > > > peakbuf.PeaksInBuf+0 S4 A64]))
> > > > > (const_vector:V4SI [
> > > > > (const_int 0 [0]) repeated x4
> > > > > ])
> > > > > (const_int 1 [0x1])))
> > > > >
> > > > > being not recognized (for the large immediate I guess).  So when
> > > > > just doing
> > > >
> > > > Hard to say without testcase, but I don't think this is valid memory
> > > > address. Can you see in dumps which insn is getting converted here?
> > >
> > > Ah, we can use movabsdi in this case.
> >
> > Yes, this is simply STVing such a single load... (quite pointless
> > of course).
> >
> > > > >   (set (reg:SI 107) (mem:SI ...))
> > > > >
> > > > > here we expect IRA to be able to allocate 107 into xmm, realizing
> > > > > it needs to reload the RHS first?
> > > > >
> > > > > For current code, is testing x86_64_general_operand like in the
> > > > > following the correct thing to do?
> > > >
> > > > x86_64_general_operand will only limit immediates using
> > > > x86_64_immediate_operand, it will allow all memory_operands.
> > >
> > > Yes, I guess it is OK, invalid embedded address can be loaded using 
> > > movabsq.
> >
> > OK, combining this with your patch for extended testing now.
>
> Please note that this new code will be removed by the above patch. The
> failing pattern will be replaced with simple move to SUBREG, where RA
> is able to correctly reload the operands.

BTW: A testcase would come handy here to check if the above assumption
realy stands...

Uros.


Re: [PR fortran/91496] !GCC$ directives error if mistyped or unknown

2019-08-27 Thread Paul Richard Thomas
Hi Harald,

This is OK for trunk.

Thanks!

Paul

On Mon, 26 Aug 2019 at 22:13, Harald Anlauf  wrote:
>
> Dear all,
>
> the attached patch adds Fortran support for the following pragmas
> (loop annotations): IVDEP (ignore vector dependencies), VECTOR, and
> NOVECTOR.  Furthermore, it downgrades unsupported directives from
> error to warning (by default, it stays an error with -pedantic),
> thus fixing the PR.
>
> It has no effect on existing code (thus regtested cleanly on
> x86_64-pc-linux-gnu), but gives users an option for fine-grained
> control of optimization.  The above pragmas are supported by other
> compilers (with different sentinels, e.g. !DIR$ for Intel, Cray,
> sometimes with slightly different keywords).
>
> OK for trunk, and backport to 9?
>
> Thanks,
> Harald
>
>
> 2019-08-26  Harald Anlauf  
>
> PR fortran/91496
> * gfortran.h: Extend struct gfc_iterator for loop annotations.
> * array.c (gfc_copy_iterator): Copy loop annotations by IVDEP,
> VECTOR, and NOVECTOR pragmas.
> * decl.c (gfc_match_gcc_ivdep, gfc_match_gcc_vector)
> (gfc_match_gcc_novector): New matcher functions handling IVDEP,
> VECTOR, and NOVECTOR pragmas.
> * match.h: Declare prototypes of matcher functions handling IVDEP,
> VECTOR, and NOVECTOR pragmas.
> * parse.c (decode_gcc_attribute, parse_do_block)
> (parse_executable): Decode IVDEP, VECTOR, and NOVECTOR pragmas;
> emit warning for unrecognized pragmas instead of error.
> * trans-stmt.c (gfc_trans_simple_do, gfc_trans_do): Add code to
> emit annotations for IVDEP, VECTOR, and NOVECTOR pragmas.
> * gfortran.texi: Document IVDEP, VECTOR, and NOVECTOR pragmas.
>
> 2019-08-26  Harald Anlauf  
>
> PR fortran/91496
> * gfortran.dg/pr91496.f90: New testcase.
>


-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-08-27 Thread Christophe Lyon
On Fri, 23 Aug 2019 at 04:14, Jeff Law  wrote:
>
> On 8/12/19 4:09 PM, Martin Sebor wrote:
>
> >
> > gcc-83431.diff
> >
> > PR tree-optimization/83431 - -Wformat-truncation may incorrectly report 
> > truncation
> >
> > gcc/ChangeLog:
> >
> >   PR c++/83431
> >   * gimple-ssa-sprintf.c (pass_data_sprintf_length): Remove object.
> >   (sprintf_dom_walker): Remove class.
> >   (get_int_range): Make argument const.
> >   (directive::fmtfunc, directive::set_precision): Same.
> >   (format_none): Same.
> >   (build_intmax_type_nodes): Same.
> >   (adjust_range_for_overflow): Same.
> >   (format_floating): Same.
> >   (format_character): Same.
> >   (format_string): Same.
> >   (format_plain): Same.
> >   (get_int_range): Cast away constness.
> >   (format_integer): Same.
> >   (get_string_length): Call get_range_strlen_dynamic.  Handle
> >   null lendata.maxbound.
> >   (should_warn_p): Adjust argument scope qualifier.
> >   (maybe_warn): Same.
> >   (format_directive): Same.
> >   (parse_directive): Same.
> >   (is_call_safe): Same.
> >   (try_substitute_return_value): Same.
> >   (sprintf_dom_walker::handle_printf_call): Rename...
> >   (handle_printf_call): ...to this.  Initialize target to host charmap
> >   here instead of in pass_sprintf_length::execute.
> >   (struct call_info): Make global.
> >   (sprintf_dom_walker::compute_format_length): Make global.
> >   (sprintf_dom_walker::handle_gimple_call): Same.
> >   * passes.def (pass_sprintf_length): Replace with pass_strlen.
> >   * print-rtl.c (print_pattern): Reduce the number of spaces to
> >   avoid -Wformat-truncation.
> >   * tree-pass.h (make_pass_warn_printf): New function.
> >   * tree-ssa-strlen.c (strlen_optimize): New variable.
> >   (get_string_length): Add comments.
> >   (get_range_strlen_dynamic): New function.
> >   (check_and_optimize_call): New function.
> >   (handle_integral_assign): New function.
> >   (strlen_check_and_optimize_stmt): Factor code out into
> >   strlen_check_and_optimize_call and handle_integral_assign.
> >   (strlen_dom_walker::evrp): New member.
> >   (strlen_dom_walker::before_dom_children): Use evrp member.
> >   (strlen_dom_walker::after_dom_children): Use evrp member.
> >   (printf_strlen_execute): New function.
> >   (pass_strlen::gate): Update to handle printf calls.
> >   (dump_strlen_info): New function.
> >   (pass_data_warn_printf): New variable.
> >   (pass_warn_printf): New class.
> >   * tree-ssa-strlen.h (get_range_strlen_dynamic): Declare.
> >   (handle_printf_call): Same.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR c++/83431
> >   * gcc.dg/strlenopt-63.c: New test.
> >   * gcc.dg/pr79538.c: Adjust text of expected warning.
> >   * gcc.dg/pr81292-1.c: Adjust pass name.
> >   * gcc.dg/pr81292-2.c: Same.
> >   * gcc.dg/pr81703.c: Same.
> >   * gcc.dg/strcmpopt_2.c: Same.
> >   * gcc.dg/strcmpopt_3.c: Same.
> >   * gcc.dg/strcmpopt_4.c: Same.
> >   * gcc.dg/strlenopt-1.c: Same.
> >   * gcc.dg/strlenopt-10.c: Same.
> >   * gcc.dg/strlenopt-11.c: Same.
> >   * gcc.dg/strlenopt-13.c: Same.
> >   * gcc.dg/strlenopt-14g.c: Same.
> >   * gcc.dg/strlenopt-14gf.c: Same.
> >   * gcc.dg/strlenopt-15.c: Same.
> >   * gcc.dg/strlenopt-16g.c: Same.
> >   * gcc.dg/strlenopt-17g.c: Same.
> >   * gcc.dg/strlenopt-18g.c: Same.
> >   * gcc.dg/strlenopt-19.c: Same.
> >   * gcc.dg/strlenopt-1f.c: Same.
> >   * gcc.dg/strlenopt-2.c: Same.
> >   * gcc.dg/strlenopt-20.c: Same.
> >   * gcc.dg/strlenopt-21.c: Same.
> >   * gcc.dg/strlenopt-22.c: Same.
> >   * gcc.dg/strlenopt-22g.c: Same.
> >   * gcc.dg/strlenopt-24.c: Same.
> >   * gcc.dg/strlenopt-25.c: Same.
> >   * gcc.dg/strlenopt-26.c: Same.
> >   * gcc.dg/strlenopt-27.c: Same.
> >   * gcc.dg/strlenopt-28.c: Same.
> >   * gcc.dg/strlenopt-29.c: Same.
> >   * gcc.dg/strlenopt-2f.c: Same.
> >   * gcc.dg/strlenopt-3.c: Same.
> >   * gcc.dg/strlenopt-30.c: Same.
> >   * gcc.dg/strlenopt-31g.c: Same.
> >   * gcc.dg/strlenopt-32.c: Same.
> >   * gcc.dg/strlenopt-33.c: Same.
> >   * gcc.dg/strlenopt-33g.c: Same.
> >   * gcc.dg/strlenopt-34.c: Same.
> >   * gcc.dg/strlenopt-35.c: Same.
> >   * gcc.dg/strlenopt-4.c: Same.
> >   * gcc.dg/strlenopt-48.c: Same.
> >   * gcc.dg/strlenopt-49.c: Same.
> >   * gcc.dg/strlenopt-4g.c: Same.
> >   * gcc.dg/strlenopt-4gf.c: Same.
> >   * gcc.dg/strlenopt-5.c: Same.
> >   * gcc.dg/strlenopt-50.c: Same.
> >   * gcc.dg/strlenopt-51.c: Same.
> >   * gcc.dg/strlenopt-52.c: Same.
> >   * gcc.dg/strlenopt-53.c: Same.
> >   * gcc.dg/strlenopt-54.c: Same.
> >   * gcc.dg/strlenopt-55.c: Same.
> >   * gcc.dg/strlenopt-56.c: Same.
> >   * gcc.d

[PATCH/RFC] Simplify wrapped RTL op

2019-08-27 Thread Robin Dapp
Hi,

as announced in the wrapped-binop gimple patch mail, on s390 we still
emit odd code in front of loops:

  void v1 (unsigned long *in, unsigned long *out, unsigned int n)
  {
int i;
for (i = 0; i < n; i++)
{
  out[i] = in[i];
}
   }

   -->

   aghi%r1,-8
   srlg%r1,%r1,3
   aghi%r1,1

This is created by doloop after getting niter from the loop as n - 1 or
"n * 8 - 8" with a step width of 8.  Realizing s390's doloop pattern
compares against 1, we add 1 to niter resulting in the code above.

When going a similar route as with the gimple patch, something like

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 9359a3cdb4d..9c06c9b6ee9 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2364,6 +2364,24 @@ simplify_binary_operation_1 (enum rtx_code code,
machine_mode mode,
   in1, in2));
}

+  /* Transform (plus (lshiftrt (plus A -C1) C2) C3) to (lshiftrt A C2)
+ if C1 == -C3 * (1 << C2).  */
+  if (CONST_SCALAR_INT_P (op1)
+ && GET_CODE (op0) == LSHIFTRT
+ && CONST_SCALAR_INT_P (XEXP (op0, 1))
+ && GET_CODE (XEXP (op0, 0)) == PLUS
+ && CONST_SCALAR_INT_P (XEXP (XEXP (op0, 0), 1)))
+   {
+ rtx c3 = op1;
+ rtx c2 = XEXP (op0, 1);
+ rtx c1 = XEXP (XEXP (op0, 0), 1);
+
+ rtx a = XEXP (XEXP (op0, 0), 0);
+
+ if (-INTVAL (c3) * (1 << INTVAL (c2)) == INTVAL (c1))
+   return simplify_gen_binary (LSHIFTRT, mode, a, c2);
+   }
+
   /* (plus (comparison A B) C) can become (neg (rev-comp A B)) if
 C is 1 and STORE_FLAG_VALUE is -1 or if C is -1 and
STORE_FLAG_VALUE
 is 1.  */

helps immediately, yet overflow/range information is not considered.  Do
we somehow guarantee that the niter-related we created until doloop do
not overflow?  I did not note something when looking through the code.
Granted, the simplification seems oddly specific and is probably not
useful for a wide range of targets and situations.


Another approach would be to store "niter+1" (== n) when niter (== n-1)
is calculated and, when we need to do the increment, use the niter+1
that we already have without needing to simplify (n - 8) >> 3 + 1.

Any comments on this?

The patch above bootstraps and test suite is without regressions on s390
fwiw.

Regards
 Robin



Re: [RFC PATCH, i386]: Improve STV pass by correcting the cost of moves to/from XMM reg

2019-08-27 Thread Richard Biener
On Tue, 27 Aug 2019, Uros Bizjak wrote:

> On Tue, Aug 27, 2019 at 9:55 AM Uros Bizjak  wrote:
> 
> > > > > > > > > This is currently a heads-up patch that removes the minimum 
> > > > > > > > > limitation
> > > > > > > > > of cost of moves to/from XMM reg. The immediate benefit is 
> > > > > > > > > the removal
> > > > > > > > > of mismatched spills, caused by subreg usage.
> > > > > > > > >
> > > > > > > > > *If* the patch proves to be beneficial (as in "doesn't regress
> > > > > > > > > important benchmarks"), then we should be able to un-hide the
> > > > > > > > > inter-regset moves from RA and allow it to collapse some 
> > > > > > > > > moves. As an
> > > > > > > > > example, patched compiler removes a movd in 
> > > > > > > > > gcc.target/i386/minmax-6.c
> > > > > > > > > and still avoids mismatched spill.
> > > > > > > > >
> > > > > > > > > 2019-08-23  Uroš Bizjak  
> > > > > > > > >
> > > > > > > > > * config/i386/i386.c (ix86_register_move_cost): Do not
> > > > > > > > > limit the cost of moves to/from XMM register to minimum 8.
> > > > > > > > > * config/i386/i386-features.c
> > > > > > > > > (general_scalar_chain::make_vector_copies): Do not 
> > > > > > > > > generate
> > > > > > > > > zeroing move from GPR to XMM register, use gen_move_insn
> > > > > > > > > instead of gen_gpr_to_xmm_move_src.
> > > > > > > > > (general_scalar_chain::convert_op): Ditto.
> > > > > > > > > (gen_gpr_to_xmm_move_src): Remove.
> > > > > > > > >
> > > > > > > > > The patch was bootstrapped and regression tested on 
> > > > > > > > > x86_64-linux-gnu
> > > > > > > > > {,-m32}, configured w/ and w/o -with-arch=ivybridge.
> > > > > > > > >
> > > > > > > > > The patch regresses PR80481 scan-asm-not (where the compiler 
> > > > > > > > > generates
> > > > > > > > > unrelated XMM spill on register starved x86_32). However, 
> > > > > > > > > during the
> > > > > > > > > analysis, I found that the original issue is not fixed, and 
> > > > > > > > > is still
> > > > > > > > > visible without -funrol-loops [1].
> > > > > > > > >
> > > > > > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80481#c10
> > > > > > > > >
> > > > > > > > > So, I'd wait for the HJ's benchmark results of the cost 
> > > > > > > > > to/from XMM
> > > > > > > > > change before proceeding with the patch.
> > > > > > > >
> > > > > > > > Otherwise looks good to me, it might clash (whitespace wise)
> > > > > > > > with my STV "rewrite" though.
> > > > > > > >
> > > > > > > > We might want to adjust ix86_register_move_cost separately from
> > > > > > > > the STV change to use regular moves though?  Just to make 
> > > > > > > > bisection
> > > > > > > > point to either of the two - STV "fallout"" is probably minor
> > > > > > > > compared to fallout elsewhere...
> > > > > > >
> > > > > > > Yes, this is also my plan.
> > > > > >
> > > > > > Btw, when testing w/ the costing disabled I run into
> > > > > >
> > > > > > (insn 31 7 8 2 (set (subreg:V4SI (reg:SI 107) 0)
> > > > > > (vec_merge:V4SI (vec_duplicate:V4SI (mem/c:SI (const:DI 
> > > > > > (plus:DI
> > > > > > (symbol_ref:DI ("peakbuf") [flags 0x2]   > > > > > peakbuf>)
> > > > > > (const_int 40002 [0x17d7d220]))) [1
> > > > > > peakbuf.PeaksInBuf+0 S4 A64]))
> > > > > > (const_vector:V4SI [
> > > > > > (const_int 0 [0]) repeated x4
> > > > > > ])
> > > > > > (const_int 1 [0x1])))
> > > > > >
> > > > > > being not recognized (for the large immediate I guess).  So when
> > > > > > just doing
> > > > >
> > > > > Hard to say without testcase, but I don't think this is valid memory
> > > > > address. Can you see in dumps which insn is getting converted here?
> > > >
> > > > Ah, we can use movabsdi in this case.
> > >
> > > Yes, this is simply STVing such a single load... (quite pointless
> > > of course).
> > >
> > > > > >   (set (reg:SI 107) (mem:SI ...))
> > > > > >
> > > > > > here we expect IRA to be able to allocate 107 into xmm, realizing
> > > > > > it needs to reload the RHS first?
> > > > > >
> > > > > > For current code, is testing x86_64_general_operand like in the
> > > > > > following the correct thing to do?
> > > > >
> > > > > x86_64_general_operand will only limit immediates using
> > > > > x86_64_immediate_operand, it will allow all memory_operands.
> > > >
> > > > Yes, I guess it is OK, invalid embedded address can be loaded using 
> > > > movabsq.
> > >
> > > OK, combining this with your patch for extended testing now.
> >
> > Please note that this new code will be removed by the above patch. The
> > failing pattern will be replaced with simple move to SUBREG, where RA
> > is able to correctly reload the operands.
> 
> BTW: A testcase would come handy here to check if the above assumption
> realy stands...

int arr[1L<<31];
int max;

void foo ()
{
  max = arr[1L<<30 + 1] < arr[1L<<30 + 2] ? arr[1L<<30 + 2] : arr[1L<<30 + 
1];
}

which has two loads:

(

Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment

2019-08-27 Thread Kyrill Tkachov

Hi Bernd,

On 8/15/19 8:47 PM, Bernd Edlinger wrote:

Hi,

this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 
89544)"
which is sanitizing the middle-end interface to the back-end for strict 
alignment,
and a couple of bug-fixes that are necessary to survive boot-strap.
It is intended to be applied after the PR 89544 fix.

I think it would be possible to change the default implementation of 
STACK_SLOT_ALIGNMENT
to make all stack variables always naturally aligned instead of doing that only
in assign_parm_setup_stack, but would still like to avoid changing too many 
things
that do not seem to have a problem.  Since this would affect many targets, and 
more
kinds of variables that may probably not have a strict alignment problem.
But I am ready to take your advice though.


Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
Is it OK for trunk?


I'm not opposed to the checks but...




Thanks
Bernd.



Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (Revision 274531)
+++ gcc/config/arm/arm.md   (Arbeitskopie)
@@ -5838,6 +5838,12 @@
(match_operand:DI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+  || MEM_ALIGN (operands[0])
+ >= GET_MODE_ALIGNMENT (DImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+  || MEM_ALIGN (operands[1])
+ >= GET_MODE_ALIGNMENT (DImode));
   if (can_create_pseudo_p ())
 {
   if (!REG_P (operands[0]))
@@ -6014,6 +6020,12 @@
   {
   rtx base, offset, tmp;
 
+  gcc_checking_assert (!MEM_P (operands[0])

+  || MEM_ALIGN (operands[0])
+ >= GET_MODE_ALIGNMENT (SImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+  || MEM_ALIGN (operands[1])
+ >= GET_MODE_ALIGNMENT (SImode));
   if (TARGET_32BIT || TARGET_HAVE_MOVT)
 {
   /* Everything except mem = const or mem = mem can be done easily.  */
@@ -6503,6 +6515,12 @@
(match_operand:HI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+  || MEM_ALIGN (operands[0])
+ >= GET_MODE_ALIGNMENT (HImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+  || MEM_ALIGN (operands[1])
+ >= GET_MODE_ALIGNMENT (HImode));
   if (TARGET_ARM)
 {
   if (can_create_pseudo_p ())
@@ -6912,6 +6930,12 @@
(match_operand:HF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+  || MEM_ALIGN (operands[0])
+ >= GET_MODE_ALIGNMENT (HFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+  || MEM_ALIGN (operands[1])
+ >= GET_MODE_ALIGNMENT (HFmode));
   if (TARGET_32BIT)
 {
   if (MEM_P (operands[0]))
@@ -6976,6 +7000,12 @@
(match_operand:SF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+  || MEM_ALIGN (operands[0])
+ >= GET_MODE_ALIGNMENT (SFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+  || MEM_ALIGN (operands[1])
+ >= GET_MODE_ALIGNMENT (SFmode));
   if (TARGET_32BIT)
 {
   if (MEM_P (operands[0]))
@@ -7071,6 +7101,12 @@
(match_operand:DF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+  || MEM_ALIGN (operands[0])
+ >= GET_MODE_ALIGNMENT (DFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+  || MEM_ALIGN (operands[1])
+ >= GET_MODE_ALIGNMENT (DFmode));
   if (TARGET_32BIT)
 {
   if (MEM_P (operands[0]))
Index: gcc/config/arm/neon.md
===
--- gcc/config/arm/neon.md  (Revision 274531)
+++ gcc/config/arm/neon.md  (Arbeitskopie)
@@ -127,6 +127,12 @@
(match_operand:TI 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+  || MEM_ALIGN (operands[0])
+ >= GET_MODE_ALIGNMENT (TImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+  || MEM_ALIGN (operands[1])
+ >= GET_MODE_ALIGNMENT (TImode));
   if (can_create_pseudo_p ())
 {
   if (!REG_P (operands[0]))
@@ -139,6 +145,12 @@
(match_operand:VSTRUCT 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+  || MEM_ALIGN (operands[0])
+ >= GET_MODE_ALIGNMENT (mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+  || MEM_ALIGN (operands[1])
+   

Re: [PATCH V3 01/11] Update config.sub and config.guess.

2019-08-27 Thread Segher Boessenkool
On Mon, Aug 26, 2019 at 07:14:40PM +0200, Jose E. Marchesi wrote:
>   * config.sub: Import upstream version 2019-06-30.
>   * config.guess: Import upstream version 2019-07-24.

These fall under the "obvious" rule.  Please make sure you install the
current upstream version.  Thanks,


Segher


Re: [PATCH V3 10/11] bpf: manual updates for eBPF

2019-08-27 Thread Segher Boessenkool
On Mon, Aug 26, 2019 at 07:14:49PM +0200, Jose E. Marchesi wrote:
> +@table @gcctabopt
> +@item -mframe-limit=@var{bytes}
> +This specifies the hard limit for frame sizes, in bytes.  Currently,
> +the value that can be specified should be less or equal than

"less than or equal to".

I didn't check the core port this time.  But I did look at everything
else, and it is ready to go I'd say :-)


Segher


Re: [SVE] PR86753

2019-08-27 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> On Mon, 26 Aug 2019 at 14:48, Richard Biener  
> wrote:
>>
>> On Sun, Aug 25, 2019 at 11:13 PM Prathamesh Kulkarni
>>  wrote:
>> >
>> > On Fri, 23 Aug 2019 at 19:43, Richard Sandiford
>> >  wrote:
>> > >
>> > > Prathamesh Kulkarni  writes:
>> > > > On Fri, 23 Aug 2019 at 18:15, Richard Sandiford
>> > > >  wrote:
>> > > >>
>> > > >> Prathamesh Kulkarni  writes:
>> > > >> > On Thu, 22 Aug 2019 at 16:44, Richard Biener 
>> > > >> >  wrote:
>> > > >> >> It looks a bit odd to me.  I'd have expected it to work by 
>> > > >> >> generating
>> > > >> >> the stmts as before in the vectorizer and then on the stmts we care
>> > > >> >> invoke vn_visit_stmt that does both value-numbering and 
>> > > >> >> elimination.
>> > > >> >> Alternatively you could ask the VN state to generate the stmt for
>> > > >> >> you via vn_nary_build_or_lookup () (certainly that needs a bit more
>> > > >> >> work).  One complication might be availability if you don't 
>> > > >> >> value-number
>> > > >> >> all stmts in the block, but well.  I'm not sure constraining to a 
>> > > >> >> single
>> > > >> >> block is necessary - I've thought of having a "CSE"ing gimple_build
>> > > >> >> for some time (add & CSE new stmts onto a sequence), so one
>> > > >> >> should keep this mode in mind when designing the one working on
>> > > >> >> an existing BB.  Note as you write it it depends on visiting the
>> > > >> >> stmts in proper order - is that guaranteed when for example
>> > > >> >> vectorizing SLP?
>> > > >> > Hi,
>> > > >> > Indeed, I wrote the function with assumption that, stmts would be
>> > > >> > visited in proper order.
>> > > >> > This doesn't affect SLP currently, because call to vn_visit_stmt in
>> > > >> > vect_transform_stmt is
>> > > >> > conditional on cond_to_vec_mask, which is only allocated inside
>> > > >> > vect_transform_loop.
>> > > >> > But I agree we could make it more general.
>> > > >> > AFAIU, the idea of constraining VN to single block was to avoid 
>> > > >> > using defs from
>> > > >> > non-dominating scalar stmts during outer-loop vectorization.
>> > > >>
>> > > >> Maybe we could do the numbering in a separate walk immediately before
>> > > >> the transform phase instead.
>> > > > Um sorry, I didn't understand. Do you mean we should do dom based VN
>> > > > just before transform phase
>> > > > or run full VN ?
>> > >
>> > > No, I just meant that we could do a separate walk of the contents
>> > > of the basic block:
>> > >
>> > > > @@ -8608,6 +8609,8 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>> > > >  {
>> > > >basic_block bb = bbs[i];
>> > > >stmt_vec_info stmt_info;
>> > > > +  vn_bb_init (bb);
>> > > > +  loop_vinfo->cond_to_vec_mask = new cond_vmask_map_type (8);
>> > > >
>> > >
>> > > ...here, rather than doing it on the fly during vect_transform_stmt
>> > > itself.  The walk should be gated on LOOP_VINFO_FULLY_MASKED_P so that
>> > > others don't have to pay the compile-time penalty.  (Same for
>> > > cond_to_vec_mask itself really.)
>> > Hi,
>> > Does the attached patch look OK ?
>> > In patch, I put call to vn_visit stmt in bb loop in
>> > vect_transform_loop to avoid replicating logic for processing phi and
>> > stmts.
>> > AFAIU, vect_transform_loop_stmt is only called from bb loop, so
>> > compile time penalty for checking cond_to_vec_mask
>> > should be pretty small ?
>> > If this is not OK, I will walk bb immediately before the bb loop.
>>
>> So if I understand correctly you never have vectorizable COND_EXPRs
>> in SLP mode?  Because we vectorize all SLP chains before entering
>> the loop in vect_transform_loop where you VN existing scalar(!) stmts.

On the "!": the idea behind the patch is to find cases in which a
scalar condition is used in both a statement that needs to be masked
for correctness reasons and a statement that we can choose to mask
if we want to.  It also tries (opportunisticly) to match the ?: order
with other conditions.

That's why it's operating on scalar values rather than vector values.
In principle it could be done as a subpass before vectorisation rather
than on the fly, when there aren't any vector stmts around.

>> Then all this hew hash-table stuff should not be needed since this
>> is what VN should provide you with.  You of course need to visit
>> generated condition stmts.  And condition support is weak
>> in VN due to it possibly having two operations in a single stmt.
>> Bad GIMPLE IL.  So I'm not sure VN is up to the task here or
>> why you even need it given you are doing your own hashing?
> Well, we thought of using VN for comparing operands for cases
> operand_equal_p would not
> work. Actually, VN seems not to be required for test-cases in PR
> because both conditions
> are _4 != 0 (_35 = _4 != 0 and in cond_expr), which works to match
> with operand_equal_p.

Right, that's why I was suggesting in the earlier thread that we
treat value numbering as a follow-on.  But...

> Input to vectorizer is:
>

[arm][aarch64] Add comments warning that stack-protector initializer insns shouldn't be split

2019-08-27 Thread Richard Earnshaw (lists)
Following the publication of https://kb.cert.org/vuls/id/129209/ I've 
been having a look at GCC's implementation for Arm and AArch64.  I 
haven't identified any issues yet, but it's a bit early to be completely 
sure.


One observation, however, is that the instruction sequence that 
initializes the stack canary might be vulnerable to producing a reusable 
value if it were ever split early.  I don't think we ever would, because 
the memory locations involved with the stack protector are all marked 
volatile to ensure that the values are only loaded at the point in time 
when the test is intended to happen, and that also has the effect of 
making it unlikely that the value would be reused without reloading. 
Nevertheless, defence in depth is probably warranted here.


So this patch just adds some comments warning that the patterns should 
not be split.


* config/arm/arm.md (stack_protect_set_insn): Add security-related
comment.
* config/aarch64/aarch64.md (stack_protect_set_): Likewise.

committed to trunk.
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md	(revision 274945)
+++ gcc/config/aarch64/aarch64.md	(working copy)
@@ -7016,6 +7016,8 @@
  }
  [(set_attr "type" "mrs")])
 
+;; DO NOT SPLIT THIS PATTERN.  It is important for security reasons that the
+;; canary value does not live beyond the life of this sequence.
 (define_insn "stack_protect_set_"
   [(set (match_operand:PTR 0 "memory_operand" "=m")
 	(unspec:PTR [(match_operand:PTR 1 "memory_operand" "m")]
@@ -7022,7 +7024,7 @@
 	 UNSPEC_SP_SET))
(set (match_scratch:PTR 2 "=&r") (const_int 0))]
   ""
-  "ldr\\t%2, %1\;str\\t%2, %0\;mov\t%2,0"
+  "ldr\\t%2, %1\;str\\t%2, %0\;mov\t%2, 0"
   [(set_attr "length" "12")
(set_attr "type" "multiple")])
 
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md	(revision 274945)
+++ gcc/config/arm/arm.md	(working copy)
@@ -8208,6 +8208,8 @@
   [(set_attr "arch" "t1,32")]
 )
 
+;; DO NOT SPLIT THIS INSN.  It's important for security reasons that the
+;; canary value does not live beyond the life of this sequence.
 (define_insn "*stack_protect_set_insn"
   [(set (match_operand:SI 0 "memory_operand" "=m,m")
 	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "+&l,&r"))]
@@ -8215,8 +8217,8 @@
(clobber (match_dup 1))]
   ""
   "@
-   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0
-   ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,#0"
+   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1, #0
+   ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1, #0"
   [(set_attr "length" "8,12")
(set_attr "conds" "clob,nocond")
(set_attr "type" "multiple")


Re: [PATCH/RFC] Simplify wrapped RTL op

2019-08-27 Thread Segher Boessenkool
On Tue, Aug 27, 2019 at 11:12:32AM +0200, Robin Dapp wrote:
> as announced in the wrapped-binop gimple patch mail, on s390 we still
> emit odd code in front of loops:

>aghi%r1,-8
>srlg%r1,%r1,3
>aghi%r1,1

This is done like this because %r1 might be 0.

We see this same problem on Power; there are quite a few PRs about it.

[ ... ]

> helps immediately, yet overflow/range information is not considered.

Yeah, and it has to be.

> Do
> we somehow guarantee that the niter-related we created until doloop do
> not overflow?  I did not note something when looking through the code.
> Granted, the simplification seems oddly specific and is probably not
> useful for a wide range of targets and situations.

You're at least the third target, and it's pretty annoying, and it tends
to cost more than two insns (because things can often be simplified
further after this).  It won't do super much for execution time, there
is a loop after this after all, a handful of insns executed once can't
be all that expensive relatively.

> Another approach would be to store "niter+1" (== n) when niter (== n-1)
> is calculated and, when we need to do the increment, use the niter+1
> that we already have without needing to simplify (n - 8) >> 3 + 1.
> 
> Any comments on this?
> 
> The patch above bootstraps and test suite is without regressions on s390
> fwiw.

When something similar was tried before there were regressions for
rs6000.  I'll find the PR later.

I was hoping that now that ivopts learns about doloops, this can be
handled better as well.  Ideally the doloop pass can move closer to
expand, and do much less analysis and work, all the heavy lifting has
been done already.


Segher


Re: [RFC] [AARCH64] Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2019-08-27 Thread Kyrill Tkachov

Hi Shaokun,

On 8/22/19 3:10 PM, Shaokun Zhang wrote:

The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
Let's support the two bits if they are enabled, then we can get some
performance benefit from this feature.

2019-08-22  Shaokun Zhang 

    * config/aarch64/sync-cache.c: Support CTR_EL0.IDC and CTR_EL0.DIC

This needs to mention __aarch64_sync_cache_range as the function being 
changed.




---
 libgcc/config/aarch64/sync-cache.c | 56 
--

 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/libgcc/config/aarch64/sync-cache.c 
b/libgcc/config/aarch64/sync-cache.c

index 791f5e42ff44..0b057efbdcab 100644
--- a/libgcc/config/aarch64/sync-cache.c
+++ b/libgcc/config/aarch64/sync-cache.c
@@ -23,6 +23,9 @@ a copy of the GCC Runtime Library Exception along 
with this program;

 see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
 . */

+#define CTR_IDC_SHIFT   28
+#define CTR_DIC_SHIFT   29
+
 void __aarch64_sync_cache_range (const void *, const void *);

 void
@@ -41,32 +44,43 @@ __aarch64_sync_cache_range (const void *base, 
const void *end)

   icache_lsize = 4 << (cache_info & 0xF);
   dcache_lsize = 4 << ((cache_info >> 16) & 0xF);

-  /* Loop over the address range, clearing one cache line at once.
- Data cache must be flushed to unification first to make sure the
- instruction cache fetches the updated data.  'end' is exclusive,
- as per the GNU definition of __clear_cache.  */
+  /* If CTR_EL0.IDC is enabled, Data cache clean to the Point of 
Unification is

+ not required for instruction to data coherence.  */
+
+  if ((cache_info >> CTR_IDC_SHIFT) & 0x1 == 0x0) {
+    /* Loop over the address range, clearing one cache line at once.
+   Data cache must be flushed to unification first to make sure the
+   instruction cache fetches the updated data.  'end' is exclusive,
+   as per the GNU definition of __clear_cache.  */

-  /* Make the start address of the loop cache aligned.  */
-  address = (const char*) ((__UINTPTR_TYPE__) base
-  & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));
+    /* Make the start address of the loop cache aligned. */
+    address = (const char*) ((__UINTPTR_TYPE__) base
+    & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));

-  for (; address < (const char *) end; address += dcache_lsize)
-    asm volatile ("dc\tcvau, %0"
- :
- : "r" (address)
- : "memory");
+    for (; address < (const char *) end; address += dcache_lsize)
+  asm volatile ("dc\tcvau, %0"
+   :
+   : "r" (address)
+   : "memory");
+  }

   asm volatile ("dsb\tish" : : : "memory");

-  /* Make the start address of the loop cache aligned.  */
-  address = (const char*) ((__UINTPTR_TYPE__) base
-  & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));
+  /* If CTR_EL0.DIC is enabled, Instruction cache cleaning to the 
Point of

+ Unification is not required for instruction to data coherence.  */
+
+  if ((cache_info >> CTR_DIC_SHIFT) & 0x1 == 0x0) {
+    /* Make the start address of the loop cache aligned. */
+    address = (const char*) ((__UINTPTR_TYPE__) base
+    & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));

-  for (; address < (const char *) end; address += icache_lsize)
-    asm volatile ("ic\tivau, %0"
- :
- : "r" (address)
- : "memory");
+    for (; address < (const char *) end; address += icache_lsize)
+  asm volatile ("ic\tivau, %0"
+   :
+   : "r" (address)
+   : "memory");

-  asm volatile ("dsb\tish; isb" : : : "memory");
+    asm volatile ("dsb\tish" : : : "memory");
+  }
+  asm volatile("isb" : : : "memory")
 }


This looks ok to me (but you'll need approval from the maintainers).

There is a question of whether we need the barriers if both DIC and IDC 
are 1 (in which case no cache-maintentance instructions are emitted).


I think we still want them to ensure the writes have been completed and 
the fetches from the updated cache are up-to-date.


For arch versions before CTR_EL0.{DIC, IDC} these bits are reserved as 
zero so the code will do the right thing on those targets.


How has this patch been tested? Do you also have any performance results 
you can share?


Thanks,

Kyrill



--
2.7.4



Re: [PATCH] Share a prevailing name for remove debug info symbols w/ LTO.

2019-08-27 Thread Richard Biener
On Tue, Aug 27, 2019 at 9:28 AM Martin Liška  wrote:
>
> Hi.
>
> The patch is about better symbol table manipulation
> for debug info objects. The patch fixes reported issue
> on hppa64-hp-hpux11.11.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

+   prevailing_name_idx = ELF_FETCH_FIELD (type_functions, ei_class,
+  Sym, ent,
+  st_name, Elf_Addr);

this should be Elf_Word.  Please add a break; after you found a symbol.
Please also amend the comment before this loop to say that we know
there's a prevailing weak hidden symbol at the start of the .debug_info section
so we'll always at least find that one.

-undefined and sharing the gnu_lto_ name.  */
+undefined and sharing a name of a prevailing
+symbol.  */
  bind = STB_WEAK;
  other = STV_HIDDEN;
+
  ELF_SET_FIELD (type_functions, ei_class, Sym,
-ent, st_name, Elf_Word,
-gnu_lto - strings);
+ent, st_name, Elf_Addr,
+prevailing_name_idx);
+

Likewise Elf_Word, no need to add vertical spacing before and after
this stmt.

  ELF_SET_FIELD (type_functions, ei_class, Sym,




> Thanks,
> Martin
>
> libiberty/ChangeLog:
>
> 2019-08-27  Martin Liska  
>
> PR lto/91478
> * simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
> First find a WEAK HIDDEN symbol in symbol table that will be
> preserved.  Later, use the symbol name for all removed symbols.
> ---
>  libiberty/simple-object-elf.c | 71 +++
>  1 file changed, 48 insertions(+), 23 deletions(-)
>
>


[committed] Fix libgomp scan* tests for non-avx_runtime testing (PR libgomp/91530)

2019-08-27 Thread Jakub Jelinek
Hi!

On targets that aren't avx capable, but are sse2 capable, we don't actually
pass -mavx nor -msse2, but expect vectorized messages that don't appear if
-msse2 isn't the default.

Fixed thusly, tested on x86_64-linux and i686-linux, committed to trunk.

2019-08-27  Jakub Jelinek  

PR libgomp/91530
* testsuite/libgomp.c/scan-11.c: Add -msse2 option for sse2_runtime
targets.
* testsuite/libgomp.c/scan-12.c: Likewise.
* testsuite/libgomp.c/scan-13.c: Likewise.
* testsuite/libgomp.c/scan-14.c: Likewise.
* testsuite/libgomp.c/scan-15.c: Likewise.
* testsuite/libgomp.c/scan-16.c: Likewise.
* testsuite/libgomp.c/scan-17.c: Likewise.
* testsuite/libgomp.c/scan-18.c: Likewise.
* testsuite/libgomp.c/scan-19.c: Likewise.
* testsuite/libgomp.c/scan-20.c: Likewise.
* testsuite/libgomp.c++/scan-9.C: Likewise.
* testsuite/libgomp.c++/scan-10.C: Likewise.
* testsuite/libgomp.c++/scan-11.C: Likewise.
* testsuite/libgomp.c++/scan-12.C: Likewise.
* testsuite/libgomp.c++/scan-14.C: Likewise.
* testsuite/libgomp.c++/scan-15.C: Likewise.
* testsuite/libgomp.c++/scan-13.C: Likewise.  Use sse2_runtime
instead of i?86-*-* x86_64-*-* as target for scan-tree-dump-times.
* testsuite/libgomp.c++/scan-16.C: Likewise.

--- libgomp/testsuite/libgomp.c/scan-11.c.jj2019-07-20 21:00:52.0 
+0200
+++ libgomp/testsuite/libgomp.c/scan-11.c   2019-08-23 16:07:38.472809282 
+0200
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target size32plus } */
 /* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details" } */
+/* { dg-additional-options "-msse2" { target sse2_runtime } } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 /* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" { 
target sse2_runtime } } } */
 
--- libgomp/testsuite/libgomp.c/scan-12.c.jj2019-07-20 21:00:52.0 
+0200
+++ libgomp/testsuite/libgomp.c/scan-12.c   2019-08-23 16:08:19.569368541 
+0200
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target size32plus } */
 /* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details" } */
+/* { dg-additional-options "-msse2" { target sse2_runtime } } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 /* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" { 
target sse2_runtime } } } */
 
--- libgomp/testsuite/libgomp.c/scan-13.c.jj2019-07-20 21:00:52.0 
+0200
+++ libgomp/testsuite/libgomp.c/scan-13.c   2019-08-23 16:08:23.264328915 
+0200
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target size32plus } */
 /* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details" } */
+/* { dg-additional-options "-msse2" { target sse2_runtime } } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 /* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" { 
target sse2_runtime } } } */
 
--- libgomp/testsuite/libgomp.c/scan-14.c.jj2019-07-20 21:00:52.0 
+0200
+++ libgomp/testsuite/libgomp.c/scan-14.c   2019-08-23 16:08:27.015288688 
+0200
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target size32plus } */
 /* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details" } */
+/* { dg-additional-options "-msse2" { target sse2_runtime } } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 /* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" { 
target sse2_runtime } } } */
 
--- libgomp/testsuite/libgomp.c/scan-15.c.jj2019-07-20 21:00:52.0 
+0200
+++ libgomp/testsuite/libgomp.c/scan-15.c   2019-08-23 16:08:31.287242873 
+0200
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target size32plus } */
 /* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details" } */
+/* { dg-additional-options "-msse2" { target sse2_runtime } } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 /* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" { 
target sse2_runtime } } } */
 
--- libgomp/testsuite/libgomp.c/scan-16.c.jj2019-07-20 21:00:52.0 
+0200
+++ libgomp/testsuite/libgomp.c/scan-16.c   2019-08-23 16:08:34.878204362 
+0200
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target size32plus } */
 /* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details" } */
+/* { dg-additional-options "-msse2" { target sse2_runtime } } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 /* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" { 
target sse2_runtime } } } */
 
--- libgomp/testsuite/libgomp.c/scan-17.c.jj2019-07-20 21:00:52.0 
+0200
+++ libgomp/testsuite/libgomp.c/scan-17.c   2019-08-23 16:08:41.647131768 
+0200
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target size32plus } */
 /* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details" } */
+/* { dg-additional-options "-msse

Re: [SVE] PR86753

2019-08-27 Thread Richard Biener
On Tue, Aug 27, 2019 at 11:58 AM Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Mon, 26 Aug 2019 at 14:48, Richard Biener  
> > wrote:
> >>
> >> On Sun, Aug 25, 2019 at 11:13 PM Prathamesh Kulkarni
> >>  wrote:
> >> >
> >> > On Fri, 23 Aug 2019 at 19:43, Richard Sandiford
> >> >  wrote:
> >> > >
> >> > > Prathamesh Kulkarni  writes:
> >> > > > On Fri, 23 Aug 2019 at 18:15, Richard Sandiford
> >> > > >  wrote:
> >> > > >>
> >> > > >> Prathamesh Kulkarni  writes:
> >> > > >> > On Thu, 22 Aug 2019 at 16:44, Richard Biener 
> >> > > >> >  wrote:
> >> > > >> >> It looks a bit odd to me.  I'd have expected it to work by 
> >> > > >> >> generating
> >> > > >> >> the stmts as before in the vectorizer and then on the stmts we 
> >> > > >> >> care
> >> > > >> >> invoke vn_visit_stmt that does both value-numbering and 
> >> > > >> >> elimination.
> >> > > >> >> Alternatively you could ask the VN state to generate the stmt for
> >> > > >> >> you via vn_nary_build_or_lookup () (certainly that needs a bit 
> >> > > >> >> more
> >> > > >> >> work).  One complication might be availability if you don't 
> >> > > >> >> value-number
> >> > > >> >> all stmts in the block, but well.  I'm not sure constraining to 
> >> > > >> >> a single
> >> > > >> >> block is necessary - I've thought of having a "CSE"ing 
> >> > > >> >> gimple_build
> >> > > >> >> for some time (add & CSE new stmts onto a sequence), so one
> >> > > >> >> should keep this mode in mind when designing the one working on
> >> > > >> >> an existing BB.  Note as you write it it depends on visiting the
> >> > > >> >> stmts in proper order - is that guaranteed when for example
> >> > > >> >> vectorizing SLP?
> >> > > >> > Hi,
> >> > > >> > Indeed, I wrote the function with assumption that, stmts would be
> >> > > >> > visited in proper order.
> >> > > >> > This doesn't affect SLP currently, because call to vn_visit_stmt 
> >> > > >> > in
> >> > > >> > vect_transform_stmt is
> >> > > >> > conditional on cond_to_vec_mask, which is only allocated inside
> >> > > >> > vect_transform_loop.
> >> > > >> > But I agree we could make it more general.
> >> > > >> > AFAIU, the idea of constraining VN to single block was to avoid 
> >> > > >> > using defs from
> >> > > >> > non-dominating scalar stmts during outer-loop vectorization.
> >> > > >>
> >> > > >> Maybe we could do the numbering in a separate walk immediately 
> >> > > >> before
> >> > > >> the transform phase instead.
> >> > > > Um sorry, I didn't understand. Do you mean we should do dom based VN
> >> > > > just before transform phase
> >> > > > or run full VN ?
> >> > >
> >> > > No, I just meant that we could do a separate walk of the contents
> >> > > of the basic block:
> >> > >
> >> > > > @@ -8608,6 +8609,8 @@ vect_transform_loop (loop_vec_info loop_vinfo)
> >> > > >  {
> >> > > >basic_block bb = bbs[i];
> >> > > >stmt_vec_info stmt_info;
> >> > > > +  vn_bb_init (bb);
> >> > > > +  loop_vinfo->cond_to_vec_mask = new cond_vmask_map_type (8);
> >> > > >
> >> > >
> >> > > ...here, rather than doing it on the fly during vect_transform_stmt
> >> > > itself.  The walk should be gated on LOOP_VINFO_FULLY_MASKED_P so that
> >> > > others don't have to pay the compile-time penalty.  (Same for
> >> > > cond_to_vec_mask itself really.)
> >> > Hi,
> >> > Does the attached patch look OK ?
> >> > In patch, I put call to vn_visit stmt in bb loop in
> >> > vect_transform_loop to avoid replicating logic for processing phi and
> >> > stmts.
> >> > AFAIU, vect_transform_loop_stmt is only called from bb loop, so
> >> > compile time penalty for checking cond_to_vec_mask
> >> > should be pretty small ?
> >> > If this is not OK, I will walk bb immediately before the bb loop.
> >>
> >> So if I understand correctly you never have vectorizable COND_EXPRs
> >> in SLP mode?  Because we vectorize all SLP chains before entering
> >> the loop in vect_transform_loop where you VN existing scalar(!) stmts.
>
> On the "!": the idea behind the patch is to find cases in which a
> scalar condition is used in both a statement that needs to be masked
> for correctness reasons and a statement that we can choose to mask
> if we want to.  It also tries (opportunisticly) to match the ?: order
> with other conditions.
>
> That's why it's operating on scalar values rather than vector values.
> In principle it could be done as a subpass before vectorisation rather
> than on the fly, when there aren't any vector stmts around.
>
> >> Then all this hew hash-table stuff should not be needed since this
> >> is what VN should provide you with.  You of course need to visit
> >> generated condition stmts.  And condition support is weak
> >> in VN due to it possibly having two operations in a single stmt.
> >> Bad GIMPLE IL.  So I'm not sure VN is up to the task here or
> >> why you even need it given you are doing your own hashing?
> > Well, we thought of using VN for comparing operands for cases
> > operand_equal_p would

Re: [PATCH V3 05/11] bpf: new GCC port

2019-08-27 Thread Richard Sandiford
Mostly trivial formatting comments, but I think there's still a couple
of substantive points too.

jema...@gnu.org (Jose E. Marchesi) writes:
> +/* Return the builtin code corresponding to the kernel helper builtin
> +   __builtin_NAME, or 0 if the name doesn't correspond to a kernel
> +   helper builtin.  */
> +
> +static inline int
> +bpf_helper_code (const char *name)
> +{
> +  int i;
> +
> +  for (i = 1; i < BPF_BUILTIN_HELPER_MAX; ++i)
> +{
> +  if (strcmp (name, bpf_helper_names[i]) == 0)
> + return i;
> +}

Redundant braces, usual GCC style is to leave them out.

> +/* Return an RTX representing the place where a function returns or
> +   receives a value of data type RET_TYPE, a tree node representing a
> +   data type.  */
> +
> +static rtx
> +bpf_function_value (const_tree ret_type,
> + const_tree fntype_or_decl ATTRIBUTE_UNUSED,

This *is* used. :-)  I only noticed because...

> + bool outgoing ATTRIBUTE_UNUSED)
> +{
> +  enum machine_mode mode;
> +  int unsignedp;
> +
> +  mode = TYPE_MODE (ret_type);
> +  if (INTEGRAL_TYPE_P (ret_type))
> +mode = promote_function_mode (ret_type, mode, &unsignedp, 
> fntype_or_decl, 1);

...long line.

> +/* Return the initial difference between the specified pair of
> +   registers.  The registers that can figure in FROM, and TO, are
> +   specified by ELIMINABLE_REGS in bpf.h.
> +
> +   This function is used in the definition of
> +   INITIAL_ELIMINATION_OFFSET in bpf.h  */
> +
> +HOST_WIDE_INT
> +bpf_initial_elimination_offset (int from,
> + int to)

Odd line split.

> +{
> +  HOST_WIDE_INT ret;
> +
> +  if (from == ARG_POINTER_REGNUM && to == STACK_POINTER_REGNUM)
> +{
> +  ret = (cfun->machine->local_vars_size
> +  + cfun->machine->callee_saved_reg_size);
> +}

Redundant braces.

> +/* Return true if X (a RTX) is a legitimate memory address on the
> +   target machine for a memory operand of mode MODE.  */
> +
> +static bool
> +bpf_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
> +   rtx x,
> +   bool strict)
> +{
> +  switch (GET_CODE (x))
> +{
> +case LABEL_REF:
> +case SYMBOL_REF:
> +case CONST:
> +  /* These are assumed to fit in 32-bit, because the kernel
> +  imposes a limit to the size of eBPF programs.  */
> +  return true;
> +  break;

Usual style is not to break after a return.  Same for rest of file.

> +/* Return true if an argument at the position indicated by CUM should
> +   be passed by reference.  If the hook returns true, a copy of that
> +   argument is made in memory and a pointer to the argument is passed
> +   instead of the argument itself.  */
> +
> +static bool
> +bpf_pass_by_reference (cumulative_args_t cum ATTRIBUTE_UNUSED,
> +const function_arg_info &arg)
> +{
> +  unsigned num_bytes
> += (arg.type
> +   ? int_size_in_bytes (arg.type) : GET_MODE_SIZE (arg.mode));

arg.type_size_in_bytes ()

> +
> +  /* Pass aggregates and values bigger than 5 words by reference.
> + Everything else is passed by copy.  */
> +  return ((arg.type && AGGREGATE_TYPE_P (arg.type))

arg.aggregate_type_p ()

> +   || (num_bytes > 8*5));
> +}
> +
> +#undef TARGET_PASS_BY_REFERENCE
> +#define TARGET_PASS_BY_REFERENCE bpf_pass_by_reference
> +
> +/* Return a RTX indicating whether a function argument is passed in a
> +   register and if so, which register.  */
> +
> +static rtx
> +bpf_function_arg (cumulative_args_t ca, const function_arg_info &arg)
> +{
> +  CUMULATIVE_ARGS *cum = get_cumulative_args (ca);
> +
> +  if (*cum < 5)
> +return gen_rtx_REG (arg.mode, *cum + 1);
> +  else
> +/* An error will be emitted for this in
> +   bpf_function_arg_advance.  */
> +return NULL_RTX;
> +}
> +
> +#undef TARGET_FUNCTION_ARG
> +#define TARGET_FUNCTION_ARG bpf_function_arg
> +
> +/* Update the summarizer variable pointed by CA to advance past an
> +   argument in the argument list.  */
> +
> +static void
> +bpf_function_arg_advance (cumulative_args_t ca,
> +   const function_arg_info &arg)
> +{
> +  CUMULATIVE_ARGS *cum = get_cumulative_args (ca);
> +
> +  if (*cum > 4)
> +error ("too many function arguments for eBPF");
> +  else
> +{
> +  unsigned num_bytes
> + = (arg.type
> +? int_size_in_bytes (arg.type) : GET_MODE_SIZE (arg.mode));

arg.type_size_in_bytes ()

> +  unsigned num_words
> + = CEIL (num_bytes, UNITS_PER_WORD);
> +
> +  *cum += num_words;
> +}

I think my previous comment still stands here.  *cum > 4 won't raise
an error for 4 DI arguments followed by a TI argument, which requires
6 registers in total.  It'd also be good to avoid repeating the error
message within a single argument list.

Something like:

  *cum <= 5 && *cum + num_words > 5

might be better, so that you only report an error on the argument that
tries to include r6.

> +/* Output the assembly c

[C++ Patch] Improve check_var_type locations

2019-08-27 Thread Paolo Carlini

Hi,

by adding a location_t parameter we can improve the locations of the 
error messages. At the moment the locations of the first and third 
message end up being input_location anyway, but we can imagine improving 
those later (this a much more general issue, the locations we use when 
unnamed entities are involved, see for example all the 'if (name)' in 
grokdeclarator and elsewhere: input_location is very rarely the best 
location).


Testex x86_64-linux.

Thanks, Paolo.

/

/cp
2019-08-27  Paolo Carlini  

* decl.c (check_var_type): Add location_t parameter and use it.
(grokdeclarator): Adjust call.
* pt.c (tsubst_decl): Likewise.
* cp-tree.h: Adjust declaration.

/testsuite
2019-08-27  Paolo Carlini  

* g++.dg/spellcheck-typenames.C: Adjust expected locations.
* g++.dg/cpp0x/pr84676.C: Check locations.
* g++.dg/other/pr88187.C: Likewise.
* g++.dg/parse/crash13.C: Likewise.
* g++.dg/parse/crash46.C: Likewise.
* g++.dg/parse/template28.C: Likewise.
* g++.dg/parse/typename4.C: Likewise.
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 274945)
+++ cp/cp-tree.h(working copy)
@@ -6469,7 +6469,7 @@ extern tree cxx_comdat_group  (tree);
 extern bool cp_missing_noreturn_ok_p   (tree);
 extern bool is_direct_enum_init(tree, tree);
 extern void initialize_artificial_var  (tree, vec *);
-extern tree check_var_type (tree, tree);
+extern tree check_var_type (tree, tree, location_t);
 extern tree reshape_init(tree, tree, tsubst_flags_t);
 extern tree next_initializable_field (tree);
 extern tree fndecl_declared_return_type(tree);
Index: cp/decl.c
===
--- cp/decl.c   (revision 274945)
+++ cp/decl.c   (working copy)
@@ -10278,19 +10278,20 @@ check_special_function_return_type (special_functi
error-recovery purposes.  */
 
 tree
-check_var_type (tree identifier, tree type)
+check_var_type (tree identifier, tree type, location_t loc)
 {
   if (VOID_TYPE_P (type))
 {
   if (!identifier)
-   error ("unnamed variable or field declared void");
+   error_at (loc, "unnamed variable or field declared void");
   else if (identifier_p (identifier))
{
  gcc_assert (!IDENTIFIER_ANY_OP_P (identifier));
- error ("variable or field %qE declared void", identifier);
+ error_at (loc, "variable or field %qE declared void",
+   identifier);
}
   else
-   error ("variable or field declared void");
+   error_at (loc, "variable or field declared void");
   type = error_mark_node;
 }
 
@@ -12407,7 +12408,7 @@ grokdeclarator (const cp_declarator *declarator,
  error message later.  */
   if (decl_context != PARM)
 {
-  type = check_var_type (unqualified_id, type);
+  type = check_var_type (unqualified_id, type, id_loc);
   if (type == error_mark_node)
 return error_mark_node;
 }
Index: cp/pt.c
===
--- cp/pt.c (revision 274945)
+++ cp/pt.c (working copy)
@@ -13894,7 +13895,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t com
/* Wait until cp_finish_decl to set this again, to handle
   circular dependency (template/instantiate6.C). */
DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (r) = 0;
-   type = check_var_type (DECL_NAME (r), type);
+   type = check_var_type (DECL_NAME (r), type,
+  DECL_SOURCE_LOCATION (r));
 
if (DECL_HAS_VALUE_EXPR_P (t))
  {
Index: testsuite/g++.dg/cpp0x/pr84676.C
===
--- testsuite/g++.dg/cpp0x/pr84676.C(revision 274945)
+++ testsuite/g++.dg/cpp0x/pr84676.C(working copy)
@@ -1,4 +1,5 @@
 // { dg-do compile { target c++11 } }
 
 int a;
-void b(__attribute__((c([](int *) {} (a == (0 = auto));  // { dg-error "" }
+void b(__attribute__((c([](int *) {} (a == (0 = auto));  // { dg-error 
"6:variable or field .b. declared void" }
+// { dg-error "expected" "" { target c++11 } .-1 }
Index: testsuite/g++.dg/other/pr88187.C
===
--- testsuite/g++.dg/other/pr88187.C(revision 274945)
+++ testsuite/g++.dg/other/pr88187.C(working copy)
@@ -2,6 +2,6 @@
 // { dg-do compile }
 
 template  struct A;
-void f (A ()); // { dg-error "variable or field 'f' declared void" "" { target 
c++14_down } }
+void f (A ()); // { dg-error "6:variable or field 'f' declared void" "" { 
target c++14_down } }
// { dg-error "missing template arguments before '\\(' token" 
"" { target c++14_d

Re: [PATCH v2 3/9] Introduce can_vector_compare_p function

2019-08-27 Thread Richard Biener
On Tue, Aug 27, 2019 at 9:01 AM Richard Sandiford
 wrote:
>
> Ilya Leoshkevich  writes:
> >> Am 26.08.2019 um 15:17 schrieb Ilya Leoshkevich :
> >>
> >>> Am 26.08.2019 um 15:06 schrieb Richard Biener 
> >>> :
> >>>
> >>> On Mon, Aug 26, 2019 at 1:54 PM Ilya Leoshkevich  
> >>> wrote:
> 
> > Am 26.08.2019 um 10:49 schrieb Richard Biener 
> > :
> >
> > On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich  
> > wrote:
> >>
> >>> Am 23.08.2019 um 13:24 schrieb Richard Biener 
> >>> :
> >>>
> >>> On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
> >>>  wrote:
> 
>  Ilya Leoshkevich  writes:
> > @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, 
> > machine_mode mode,
> > return 0;
> > }
> >
> > +/* can_vector_compare_p presents fake rtx binary operations to the 
> > the back-end
> > +   in order to determine its capabilities.  In order to avoid 
> > creating fake
> > +   operations on each call, values from previous calls are cached 
> > in a global
> > +   cached_binops hash_table.  It contains rtxes, which can be 
> > looked up using
> > +   binop_keys.  */
> > +
> > +struct binop_key {
> > +  enum rtx_code code;/* Operation code.  */
> > +  machine_mode value_mode;   /* Result mode. */
> > +  machine_mode cmp_op_mode;  /* Operand mode.*/
> > +};
> > +
> > +struct binop_hasher : pointer_hash_mark, 
> > ggc_cache_remove {
> > +  typedef rtx value_type;
> > +  typedef binop_key compare_type;
> > +
> > +  static hashval_t
> > +  hash (enum rtx_code code, machine_mode value_mode, machine_mode 
> > cmp_op_mode)
> > +  {
> > +inchash::hash hstate (0);
> > +hstate.add_int (code);
> > +hstate.add_int (value_mode);
> > +hstate.add_int (cmp_op_mode);
> > +return hstate.end ();
> > +  }
> > +
> > +  static hashval_t
> > +  hash (const rtx &ref)
> > +  {
> > +return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP 
> > (ref, 0)));
> > +  }
> > +
> > +  static bool
> > +  equal (const rtx &ref1, const binop_key &ref2)
> > +  {
> > +return (GET_CODE (ref1) == ref2.code)
> > +&& (GET_MODE (ref1) == ref2.value_mode)
> > +&& (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
> > +  }
> > +};
> > +
> > +static GTY ((cache)) hash_table *cached_binops;
> > +
> > +static rtx
> > +get_cached_binop (enum rtx_code code, machine_mode value_mode,
> > +   machine_mode cmp_op_mode)
> > +{
> > +  if (!cached_binops)
> > +cached_binops = hash_table::create_ggc (1024);
> > +  binop_key key = { code, value_mode, cmp_op_mode };
> > +  hashval_t hash = binop_hasher::hash (code, value_mode, 
> > cmp_op_mode);
> > +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, 
> > INSERT);
> > +  if (!*slot)
> > +*slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx 
> > (cmp_op_mode),
> > + gen_reg_rtx (cmp_op_mode));
> > +  return *slot;
> > +}
> 
>  Sorry, I didn't mean anything this complicated.  I just meant that
>  we should have a single cached rtx that we can change via PUT_CODE 
>  and
>  PUT_MODE_RAW for each new query, rather than allocating a new rtx 
>  each
>  time.
> 
>  Something like:
> 
>  static GTY ((cache)) rtx cached_binop;
> 
>  rtx
>  get_cached_binop (machine_mode mode, rtx_code code, machine_mode 
>  op_mode)
>  {
>  if (cached_binop)
>  {
>    PUT_CODE (cached_binop, code);
>    PUT_MODE_RAW (cached_binop, mode);
>    PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
>    PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
>  }
>  else
>  {
>    rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
>    rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
>    cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
>  }
>  return cached_binop;
>  }
> >>>
> >>> Hmm, maybe we need  auto_rtx (code) that constructs such
> >>> RTX on the stack instead of wasting a GC root (and causing
> >>> issues for future threading of GCC ;)).
> >>
> >> Do you mean something like this?
> >>
> >> union {
> >> char raw[rtx_code_size[code]];
> >> rtx rtx;
> >> } binop;
> >>
> >> Does this exist alread

Re: [PATCH] Share a prevailing name for remove debug info symbols w/ LTO.

2019-08-27 Thread Martin Liška
On 8/27/19 12:40 PM, Richard Biener wrote:
> On Tue, Aug 27, 2019 at 9:28 AM Martin Liška  wrote:
>>
>> Hi.
>>
>> The patch is about better symbol table manipulation
>> for debug info objects. The patch fixes reported issue
>> on hppa64-hp-hpux11.11.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
> 
> +   prevailing_name_idx = ELF_FETCH_FIELD (type_functions, 
> ei_class,
> +  Sym, ent,
> +  st_name, Elf_Addr);
> 
> this should be Elf_Word.  Please add a break; after you found a symbol.
> Please also amend the comment before this loop to say that we know
> there's a prevailing weak hidden symbol at the start of the .debug_info 
> section
> so we'll always at least find that one.
> 
> -undefined and sharing the gnu_lto_ name.  */
> +undefined and sharing a name of a prevailing
> +symbol.  */
>   bind = STB_WEAK;
>   other = STV_HIDDEN;
> +
>   ELF_SET_FIELD (type_functions, ei_class, Sym,
> -ent, st_name, Elf_Word,
> -gnu_lto - strings);
> +ent, st_name, Elf_Addr,
> +prevailing_name_idx);
> +
> 
> Likewise Elf_Word, no need to add vertical spacing before and after
> this stmt.
> 
>   ELF_SET_FIELD (type_functions, ei_class, Sym,

Thanks for review, all should be addresses in updated patch.

Martin

> 
> 
> 
> 
>> Thanks,
>> Martin
>>
>> libiberty/ChangeLog:
>>
>> 2019-08-27  Martin Liska  
>>
>> PR lto/91478
>> * simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
>> First find a WEAK HIDDEN symbol in symbol table that will be
>> preserved.  Later, use the symbol name for all removed symbols.
>> ---
>>  libiberty/simple-object-elf.c | 71 +++
>>  1 file changed, 48 insertions(+), 23 deletions(-)
>>
>>

>From 44cd8d309922464697847e1c70f2bd37e748e7bd Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 26 Aug 2019 16:11:34 +0200
Subject: [PATCH] Share a prevailing name for remove debug info symbols w/ LTO.

libiberty/ChangeLog:

2019-08-27  Martin Liska  

	PR lto/91478
	* simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
	First find a WEAK HIDDEN symbol in symbol table that will be
	preserved.  Later, use the symbol name for all removed symbols.
---
 libiberty/simple-object-elf.c | 71 ---
 1 file changed, 49 insertions(+), 22 deletions(-)

diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
index 75159266596..03ca42498f3 100644
--- a/libiberty/simple-object-elf.c
+++ b/libiberty/simple-object-elf.c
@@ -1366,30 +1366,17 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	  return errmsg;
 	}
 
-  /* If we are processing .symtab purge __gnu_lto_slim symbol
-	 from it and any symbols in discarded sections.  */
+  /* If we are processing .symtab purge any symbols
+	 in discarded sections.  */
   if (sh_type == SHT_SYMTAB)
 	{
 	  unsigned entsize = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
 	  shdr, sh_entsize, Elf_Addr);
 	  unsigned strtab = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
 	 shdr, sh_link, Elf_Word);
-	  unsigned char *strshdr = shdrs + (strtab - 1) * shdr_size;
-	  off_t stroff = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-	  strshdr, sh_offset, Elf_Addr);
-	  size_t strsz = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-	  strshdr, sh_size, Elf_Addr);
-	  char *strings = XNEWVEC (char, strsz);
-	  char *gnu_lto = strings;
+	  size_t prevailing_name_idx = 0;
 	  unsigned char *ent;
 	  unsigned *shndx_table = NULL;
-	  simple_object_internal_read (sobj->descriptor,
-   sobj->offset + stroff,
-   (unsigned char *)strings,
-   strsz, &errmsg, err);
-	  /* Find first '\0' in strings.  */
-	  gnu_lto = (char *) memchr (gnu_lto + 1, '\0',
- strings + strsz - gnu_lto);
 	  /* Read the section index table if present.  */
 	  if (symtab_indices_shndx[i - 1] != 0)
 	{
@@ -1404,6 +1391,45 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	   (unsigned char *)shndx_table,
 	   sidxsz, &errmsg, err);
 	}
+
+	  /* Find a WEAK HIDDEN symbol which name we will use for removed
+	 symbols.  We know there's a prevailing weak hidden symbol
+	 at the start of the .debug_info section.  */
+	  for (ent = buf; ent < buf + length; ent += entsize)
+	{
+	  unsigned st_shndx = ELF_FETCH_FIELD (type_functions, ei_class,
+		   Sym, ent,
+		   st_shndx, Elf_Half);
+	  unsigned char *st_info;
+	  unsigned char *st_other;
+	  if (ei_clas

Re: [SVE] PR86753

2019-08-27 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Aug 27, 2019 at 11:58 AM Richard Sandiford
>  wrote:
>> ifcvt produces:
>>
>>[local count: 1063004407]:
>>   # i_34 = PHI 
>>   # ivtmp_5 = PHI 
>>   _1 = (long unsigned int) i_34;
>>   _2 = _1 * 2;
>>   _3 = a_23(D) + _2;
>>   _4 = *_3;
>>   _7 = b_24(D) + _2;
>>   _49 = _4 > 0;
>>   _8 = .MASK_LOAD (_7, 16B, _49);
>>   _12 = _4 > 0;
>>   _13 = _8 > 0;
>>   _9 = _12 & _13;
>>   _10 = _4 > 0;
>>   _11 = _8 > 0;
>>   _27 = ~_11;
>>   _15 = _10 & _27;
>>   _14 = c_25(D) + _2;
>>   iftmp.0_26 = .MASK_LOAD (_14, 16B, _15);
>>   iftmp.0_19 = _9 ? _4 : iftmp.0_26;
>>   _17 = x_28(D) + _2;
>>   _50 = _4 > 0;
>>   .MASK_STORE (_17, 16B, _50, iftmp.0_19);
>>   i_30 = i_34 + 1;
>>   ivtmp_6 = ivtmp_5 - 1;
>>   if (ivtmp_6 != 0)
>> goto ; [98.99%]
>>   else
>> goto ; [1.01%]
>>
>>[local count: 1052266994]:
>>   goto ; [100.00%]
>>
>> which has 4 copies of _4 > 0 (a[i] > 0) and 2 copies of _8 > 0 (b[i] > 0).
>
> Huh.  if-conversion does
>
>   /* Now all statements are if-convertible.  Combine all the basic
>  blocks into one huge basic block doing the if-conversion
>  on-the-fly.  */
>   combine_blocks (loop);
>
>   /* Delete dead predicate computations.  */
>   ifcvt_local_dce (loop->header);
>
>   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
>  and stores are involved.  CSE only the loop body, not the entry
>  PHIs, those are to be kept in sync with the non-if-converted copy.
>  ???  We'll still keep dead stores though.  */
>   exit_bbs = BITMAP_ALLOC (NULL);
>   bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
>   bitmap_set_bit (exit_bbs, loop->latch->index);
>   todo |= do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs);
>
> which should remove those redundant _4 > 0 checks.  In fact when I
> run this on x86_64 with -mavx512bw I see
>
>[local count: 1063004407]:
>   # i_25 = PHI 
>   # ivtmp_24 = PHI 
>   _1 = (long unsigned int) i_25;
>   _2 = _1 * 2;
>   _3 = a_14(D) + _2;
>   _4 = *_3;
>   _5 = b_15(D) + _2;
>   _49 = _4 > 0;
>   _6 = .MASK_LOAD (_5, 16B, _49);
>   _22 = _6 > 0;
>   _28 = ~_22;
>   _29 = _28 & _49;
>   _7 = c_16(D) + _2;
>   iftmp.0_17 = .MASK_LOAD (_7, 16B, _29);
>   iftmp.0_10 = _29 ? iftmp.0_17 : _4;
>   _8 = x_18(D) + _2;
>   .MASK_STORE (_8, 16B, _49, iftmp.0_10);
>   i_20 = i_25 + 1;
>   ivtmp_12 = ivtmp_24 - 1;
>   if (ivtmp_12 != 0)
>
> after if-conversion (that should be the case already on the GCC 9 branch).

Gah, sorry for the noise.  Turns out I still had a local change that was
trying to poke the patch into doing something wrong.  Will try to check
my facts more carefully next time.

The redundant pattern statements I was thinking of come from
vect_recog_mask_conversion_pattern, but I guess that isn't so
interesting here.

So yeah, let's drop this whole vn thing for now...

Thanks,
Richard


[PATCH][STV] More compile-time improvements

2019-08-27 Thread Richard Biener


With forcing STV for all chains I run into the same issue as I
fixed for the analysis phase which is looping over all defs
of a pseudo rather just those interesting.  The following fixes
this by recording insn/pseudo pairs in mark_dual_mode_def;
well, not actually pairs but tracking insns in addition to
pseudos which allows us to cheaply do the set property plus
visit the defs that are interesting (assuming the number of
defs on an insn is a lot less endless than the number of defs
of a pseudo).

This way the original testcase with STV forced goes down from
200s to 24s compile-time with

 machine dep reorg  :   7.09 ( 20%)   0.01 (  2%)   8.18 ( 
21%)1854 kB (  2%)

where almost all of the compile-time is spent in DFs
deferred rescan processing.  I'm not sure we even need
that but I'm not too eager to dig more into DF than necessary
right now (we call df_finish () at the end which removes
the DU/UD_CHAIN and the MD problem (what do we need that for!?).
We don't appropriately scan all insn we add so the prevailing
problems like LIVE are not updated, but well...

Bootstrapped and tested on x86_64-unknown-linux-gnu with STV
forced and for Haswell this time.

OK?

Thanks,
Richard.

2019-08-27  Richard Biener  

* config/i386/i386-features.h
(general_scalar_chain::~general_scalar_chain): Add.
(general_scalar_chain::insns_conv): New bitmap.
(general_scalar_chain::n_sse_to_integer): New.
(general_scalar_chain::n_integer_to_sse): Likewise.
(general_scalar_chain::make_vector_copies): Adjust signature.
* config/i386/i386-features.c
(general_scalar_chain::general_scalar_chain): Outline,
initialize new members.
(general_scalar_chain::~general_scalar_chain): New.
(general_scalar_chain::mark_dual_mode_def): Record insns
we need to insert conversions at and count them.
(general_scalar_chain::compute_convert_gain): Account
for conversion instructions at chain boundary.
(general_scalar_chain::make_vector_copies): Generate a single
copy for a def by a specific insn.
(general_scalar_chain::convert_registers): First populate
defs_map, then make copies at out-of chain insns.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 274945)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -320,6 +320,20 @@ scalar_chain::add_to_queue (unsigned ins
   bitmap_set_bit (queue, insn_uid);
 }
 
+general_scalar_chain::general_scalar_chain (enum machine_mode smode_,
+   enum machine_mode vmode_)
+ : scalar_chain (smode_, vmode_)
+{
+  insns_conv = BITMAP_ALLOC (NULL);
+  n_sse_to_integer = 0;
+  n_integer_to_sse = 0;
+}
+
+general_scalar_chain::~general_scalar_chain ()
+{
+  BITMAP_FREE (insns_conv);
+}
+
 /* For DImode conversion, mark register defined by DEF as requiring
conversion.  */
 
@@ -328,15 +342,27 @@ general_scalar_chain::mark_dual_mode_def
 {
   gcc_assert (DF_REF_REG_DEF_P (def));
 
-  if (bitmap_bit_p (defs_conv, DF_REF_REGNO (def)))
-return;
-
+  /* Record the def/insn pair so we can later efficiently iterate over
+ the defs to convert on insns not in the chain.  */
+  bool reg_new = bitmap_set_bit (defs_conv, DF_REF_REGNO (def));
+  if (!bitmap_bit_p (insns, DF_REF_INSN_UID (def)))
+{
+  if (!bitmap_set_bit (insns_conv, DF_REF_INSN_UID (def))
+ && !reg_new)
+   return;
+  n_integer_to_sse++;
+}
+  else
+{
+  if (!reg_new)
+   return;
+  n_sse_to_integer++;
+}
+ 
   if (dump_file)
 fprintf (dump_file,
 "  Mark r%d def in insn %d as requiring both modes in chain #%d\n",
 DF_REF_REGNO (def), DF_REF_INSN_UID (def), chain_id);
-
-  bitmap_set_bit (defs_conv, DF_REF_REGNO (def));
 }
 
 /* For TImode conversion, it is unused.  */
@@ -523,7 +549,7 @@ general_scalar_chain::compute_convert_ga
   || GET_CODE (src) == ASHIFTRT
   || GET_CODE (src) == LSHIFTRT)
{
- if (CONST_INT_P (XEXP (src, 0)))
+ if (CONST_INT_P (XEXP (src, 0)))
igain -= vector_const_cost (XEXP (src, 0));
  igain += m * ix86_cost->shift_const - ix86_cost->sse_op;
  if (INTVAL (XEXP (src, 1)) >= 32)
@@ -588,9 +614,12 @@ general_scalar_chain::compute_convert_ga
   if (dump_file)
 fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
 
-  /* ???  What about integer to SSE?  */
-  EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
-cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer;
+  /* Cost the integer to sse and sse to integer moves.  */
+  cost += n_sse_to_integer * ix86_cost->sse_to_integer;
+  /* ???  integer_to_sse but we only have that in the RA cost table.
+ Assume sse_to_integer/integer_to_sse are the same which they
+ are at the moment.  

Re: [PATCH][STV] More compile-time improvements

2019-08-27 Thread Uros Bizjak
On Tue, Aug 27, 2019 at 2:14 PM Richard Biener  wrote:
>
>
> With forcing STV for all chains I run into the same issue as I
> fixed for the analysis phase which is looping over all defs
> of a pseudo rather just those interesting.  The following fixes
> this by recording insn/pseudo pairs in mark_dual_mode_def;
> well, not actually pairs but tracking insns in addition to
> pseudos which allows us to cheaply do the set property plus
> visit the defs that are interesting (assuming the number of
> defs on an insn is a lot less endless than the number of defs
> of a pseudo).
>
> This way the original testcase with STV forced goes down from
> 200s to 24s compile-time with
>
>  machine dep reorg  :   7.09 ( 20%)   0.01 (  2%)   8.18 (
> 21%)1854 kB (  2%)
>
> where almost all of the compile-time is spent in DFs
> deferred rescan processing.  I'm not sure we even need
> that but I'm not too eager to dig more into DF than necessary
> right now (we call df_finish () at the end which removes
> the DU/UD_CHAIN and the MD problem (what do we need that for!?).
> We don't appropriately scan all insn we add so the prevailing
> problems like LIVE are not updated, but well...
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu with STV
> forced and for Haswell this time.
>
> OK?
>
> Thanks,
> Richard.
>
> 2019-08-27  Richard Biener  
>
> * config/i386/i386-features.h
> (general_scalar_chain::~general_scalar_chain): Add.
> (general_scalar_chain::insns_conv): New bitmap.
> (general_scalar_chain::n_sse_to_integer): New.
> (general_scalar_chain::n_integer_to_sse): Likewise.
> (general_scalar_chain::make_vector_copies): Adjust signature.
> * config/i386/i386-features.c
> (general_scalar_chain::general_scalar_chain): Outline,
> initialize new members.
> (general_scalar_chain::~general_scalar_chain): New.
> (general_scalar_chain::mark_dual_mode_def): Record insns
> we need to insert conversions at and count them.
> (general_scalar_chain::compute_convert_gain): Account
> for conversion instructions at chain boundary.
> (general_scalar_chain::make_vector_copies): Generate a single
> copy for a def by a specific insn.
> (general_scalar_chain::convert_registers): First populate
> defs_map, then make copies at out-of chain insns.

LGTM.

Thanks,
Uros.

> Index: gcc/config/i386/i386-features.c
> ===
> --- gcc/config/i386/i386-features.c (revision 274945)
> +++ gcc/config/i386/i386-features.c (working copy)
> @@ -320,6 +320,20 @@ scalar_chain::add_to_queue (unsigned ins
>bitmap_set_bit (queue, insn_uid);
>  }
>
> +general_scalar_chain::general_scalar_chain (enum machine_mode smode_,
> +   enum machine_mode vmode_)
> + : scalar_chain (smode_, vmode_)
> +{
> +  insns_conv = BITMAP_ALLOC (NULL);
> +  n_sse_to_integer = 0;
> +  n_integer_to_sse = 0;
> +}
> +
> +general_scalar_chain::~general_scalar_chain ()
> +{
> +  BITMAP_FREE (insns_conv);
> +}
> +
>  /* For DImode conversion, mark register defined by DEF as requiring
> conversion.  */
>
> @@ -328,15 +342,27 @@ general_scalar_chain::mark_dual_mode_def
>  {
>gcc_assert (DF_REF_REG_DEF_P (def));
>
> -  if (bitmap_bit_p (defs_conv, DF_REF_REGNO (def)))
> -return;
> -
> +  /* Record the def/insn pair so we can later efficiently iterate over
> + the defs to convert on insns not in the chain.  */
> +  bool reg_new = bitmap_set_bit (defs_conv, DF_REF_REGNO (def));
> +  if (!bitmap_bit_p (insns, DF_REF_INSN_UID (def)))
> +{
> +  if (!bitmap_set_bit (insns_conv, DF_REF_INSN_UID (def))
> + && !reg_new)
> +   return;
> +  n_integer_to_sse++;
> +}
> +  else
> +{
> +  if (!reg_new)
> +   return;
> +  n_sse_to_integer++;
> +}
> +
>if (dump_file)
>  fprintf (dump_file,
>  "  Mark r%d def in insn %d as requiring both modes in chain 
> #%d\n",
>  DF_REF_REGNO (def), DF_REF_INSN_UID (def), chain_id);
> -
> -  bitmap_set_bit (defs_conv, DF_REF_REGNO (def));
>  }
>
>  /* For TImode conversion, it is unused.  */
> @@ -523,7 +549,7 @@ general_scalar_chain::compute_convert_ga
>|| GET_CODE (src) == ASHIFTRT
>|| GET_CODE (src) == LSHIFTRT)
> {
> - if (CONST_INT_P (XEXP (src, 0)))
> + if (CONST_INT_P (XEXP (src, 0)))
> igain -= vector_const_cost (XEXP (src, 0));
>   igain += m * ix86_cost->shift_const - ix86_cost->sse_op;
>   if (INTVAL (XEXP (src, 1)) >= 32)
> @@ -588,9 +614,12 @@ general_scalar_chain::compute_convert_ga
>if (dump_file)
>  fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
>
> -  /* ???  What about integer to SSE?  */
> -  EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
> -cost += DF_REG_DEF_COUNT (insn_

Re: [PATCH] Share a prevailing name for remove debug info symbols w/ LTO.

2019-08-27 Thread Richard Biener
On Tue, Aug 27, 2019 at 1:31 PM Martin Liška  wrote:
>
> On 8/27/19 12:40 PM, Richard Biener wrote:
> > On Tue, Aug 27, 2019 at 9:28 AM Martin Liška  wrote:
> >>
> >> Hi.
> >>
> >> The patch is about better symbol table manipulation
> >> for debug info objects. The patch fixes reported issue
> >> on hppa64-hp-hpux11.11.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > +   prevailing_name_idx = ELF_FETCH_FIELD (type_functions, 
> > ei_class,
> > +  Sym, ent,
> > +  st_name, Elf_Addr);
> >
> > this should be Elf_Word.  Please add a break; after you found a symbol.
> > Please also amend the comment before this loop to say that we know
> > there's a prevailing weak hidden symbol at the start of the .debug_info 
> > section
> > so we'll always at least find that one.
> >
> > -undefined and sharing the gnu_lto_ name.  */
> > +undefined and sharing a name of a prevailing
> > +symbol.  */
> >   bind = STB_WEAK;
> >   other = STV_HIDDEN;
> > +
> >   ELF_SET_FIELD (type_functions, ei_class, Sym,
> > -ent, st_name, Elf_Word,
> > -gnu_lto - strings);
> > +ent, st_name, Elf_Addr,
> > +prevailing_name_idx);
> > +
> >
> > Likewise Elf_Word, no need to add vertical spacing before and after
> > this stmt.
> >
> >   ELF_SET_FIELD (type_functions, ei_class, Sym,
>
> Thanks for review, all should be addresses in updated patch.

OK.

Richard.

> Martin
>
> >
> >
> >
> >
> >> Thanks,
> >> Martin
> >>
> >> libiberty/ChangeLog:
> >>
> >> 2019-08-27  Martin Liska  
> >>
> >> PR lto/91478
> >> * simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
> >> First find a WEAK HIDDEN symbol in symbol table that will be
> >> preserved.  Later, use the symbol name for all removed symbols.
> >> ---
> >>  libiberty/simple-object-elf.c | 71 +++
> >>  1 file changed, 48 insertions(+), 23 deletions(-)
> >>
> >>
>


Re: [PATCH V3 05/11] bpf: new GCC port

2019-08-27 Thread Jose E. Marchesi


Hi Richard.

> +(define_insn "mulsidi3"
> +  [(set (match_operand:DI   0 "register_operand" "=r,r")
> +(sign_extend:DI
> + (mult:SI (match_operand:SI 1 "register_operand" "0,0")
> +  (match_operand:SI 2 "reg_or_imm_operand" "r,I"]
> +  ""
> +  "mul32\t%0,%2"
> +  [(set_attr "type" "alu32")])

Sorry, Segher was right and I was wrong: mulsidi3 is instead:

  [(set (match_operand:DI  0 "register_operand" "=r,r")
(mult:DI
 (sign_extend:DI (match_operand:SI 1 "register_operand" "0,0"))
 (sign_extend:DI (match_operand:SI 2 "reg_or_imm_operand" "r,I"]

i.e. extend the operands rather than the result.  So the define_insn
shouldn't be called mulsidi3 after all.

Are you sure this is a sign extension though?  From a quick look at the
kernel sources, I got the impression it was a zero extension instead.

You are right.  This is from linux/kernel/bpf/core.c:

#define ALU(OPCODE, OP) \
ALU64_##OPCODE##_X: \
DST = DST OP SRC;   \
CONT;   \
ALU_##OPCODE##_X:   \
DST = (u32) DST OP (u32) SRC;   \
CONT;   \
ALU64_##OPCODE##_K: \
DST = DST OP IMM;   \
CONT;   \
ALU_##OPCODE##_K:   \
DST = (u32) DST OP (u32) IMM;   \
CONT;

[...]

ALU(MUL,  *)

So, mul32 zero-extends arguments and then multiplies, leaving the result
in the 64-bit register.  This is an unsigned widening multiplication,
that according to the internal manuals should be handled with a pattern
like:

(define_insn "umulsidi3"
  [(set (match_operand:DI  0 "register_operand" "=r,r")
(mult:DI
  (zero_extend:DI (match_operand:SI 1 "register_operand" "0,0"))
  (zero_extend:DI (match_operand:SI 2 "reg_or_imm_operand" "r,I"]
  ""
  "mul32\t%0,%2"
  [(set_attr "type" "alu32")])



Re: [PATCH V3 05/11] bpf: new GCC port

2019-08-27 Thread Jose E. Marchesi

> +(define_insn "mulsidi3"
> +  [(set (match_operand:DI   0 "register_operand" "=r,r")
> +(sign_extend:DI
> + (mult:SI (match_operand:SI 1 "register_operand" "0,0")
> +  (match_operand:SI 2 "reg_or_imm_operand" "r,I"]
> +  ""
> +  "mul32\t%0,%2"
> +  [(set_attr "type" "alu32")])

Sorry, Segher was right and I was wrong: mulsidi3 is instead:

  [(set (match_operand:DI  0 "register_operand" "=r,r")
(mult:DI
 (sign_extend:DI (match_operand:SI 1 "register_operand" "0,0"))
 (sign_extend:DI (match_operand:SI 2 "reg_or_imm_operand" 
"r,I"]

i.e. extend the operands rather than the result.  So the define_insn
shouldn't be called mulsidi3 after all.

Are you sure this is a sign extension though?  From a quick look at the
kernel sources, I got the impression it was a zero extension instead.

You are right.  This is from linux/kernel/bpf/core.c:

#define ALU(OPCODE, OP) \
ALU64_##OPCODE##_X: \
DST = DST OP SRC;   \
CONT;   \
ALU_##OPCODE##_X:   \
DST = (u32) DST OP (u32) SRC;   \
CONT;   \
ALU64_##OPCODE##_K: \
DST = DST OP IMM;   \
CONT;   \
ALU_##OPCODE##_K:   \
DST = (u32) DST OP (u32) IMM;   \
CONT;

[...]

ALU(MUL,  *)

So, mul32 zero-extends arguments and then multiplies, leaving the result
in the 64-bit register.

glglgl, scratch that, it is actually a 32-bit multiplication that then
gets extended to 64-bits:

(define_insn "*mulsidi3_zeroextended"
  [(set (match_operand:DI  0 "register_operand" "=r,r")
(zero_extend:DI
 (mult:SI (match_operand:SI 1 "register_operand" "0,0")
  (match_operand:SI 2 "reg_or_imm_operand" "r,I"]
  ""
  "mul32\t%0,%2"
  [(set_attr "type" "alu32")])


[PR 91468] Small fixes in ipa-cp.c and ipa-prop.c

2019-08-27 Thread Martin Jambor
Hi,

Feng Xue read through much of ipa-cp.c and ipa-prop.c and reported a few
redundancies and small errors in PR 91468.  The patch below fixes all of
them, specifically:

1) A typo in ipcp_modif_dom_walker::before_dom_children where a wrong
   tree variable was checked if it is not a VIEW_CONVERT_EXPR.

2) update_jump_functions_after_inlining currently handles combinations
   of unary arithmetic functions and ancestor jump functions which make
   no sense, cannot happen in meaningful code, and the code path could
   conceivably be triggered only if LTO was abused to avoid
   type-casting.  In any case the handling should not be there and does
   not do anything useful (see discussion in bugzilla for more) and so
   the patch removes it.

3) compute_complex_assign_jump_func tests a few things twice, because of
   a rather mechanical cleanup of mine, so these are removed.

4) merge_agg_lats_step contains a redundant condition too, but this one
   is an important correctness invariant, so I strengthened the already
   existing checking assert afterwards to be a normal assert.

Passed bootstrap and testing on x86_64-linux.  OK for trunk?

Thanks,

Martin


2019-08-26  Martin Jambor  

PR ipa/91468
* ipa-cp.c (merge_agg_lats_step): Removed redundant test, made a
checking assert a normal assert to test it really is redundant.
* ipa-prop.c (compute_complex_assign_jump_func): Removed
redundant test.
(update_jump_functions_after_inlining): Removed combining unary
arithmetic operations with an ancestor jump function.
(ipcp_modif_dom_walker::before_dom_children): Fix wrong use of rhs
instead of t.
---
 gcc/ipa-cp.c   |  8 +++-
 gcc/ipa-prop.c | 12 ++--
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 0046064fea1..33d52fe5537 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -2026,15 +2026,13 @@ merge_agg_lats_step (class ipcp_param_lattices 
*dest_plats,
 
   if (**aglat && (**aglat)->offset == offset)
 {
-  if ((**aglat)->size != val_size
- || ((**aglat)->next
- && (**aglat)->next->offset < offset + val_size))
+  if ((**aglat)->size != val_size)
{
  set_agg_lats_to_bottom (dest_plats);
  return false;
}
-  gcc_checking_assert (!(**aglat)->next
-  || (**aglat)->next->offset >= offset + val_size);
+  gcc_assert (!(**aglat)->next
+ || (**aglat)->next->offset >= offset + val_size);
   return true;
 }
   else
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 1a0e12e6c0c..a23aa2590a0 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -1243,9 +1243,7 @@ compute_complex_assign_jump_func (struct 
ipa_func_body_info *fbi,
break;
  }
case GIMPLE_UNARY_RHS:
- if (is_gimple_assign (stmt)
- && gimple_assign_rhs_class (stmt) == GIMPLE_UNARY_RHS
- && ! CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)))
+ if (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)))
ipa_set_jf_unary_pass_through (jfunc, index,
   gimple_assign_rhs_code (stmt));
default:;
@@ -2725,12 +2723,6 @@ update_jump_functions_after_inlining (struct cgraph_edge 
*cs,
  dst->value.ancestor.agg_preserved &=
src->value.pass_through.agg_preserved;
}
- else if (src->type == IPA_JF_PASS_THROUGH
-  && TREE_CODE_CLASS (src->value.pass_through.operation) == 
tcc_unary)
-   {
- dst->value.ancestor.formal_id = src->value.pass_through.formal_id;
- dst->value.ancestor.agg_preserved = false;
-   }
  else if (src->type == IPA_JF_ANCESTOR)
{
  dst->value.ancestor.formal_id = src->value.ancestor.formal_id;
@@ -4933,7 +4925,7 @@ ipcp_modif_dom_walker::before_dom_children (basic_block 
bb)
{
  /* V_C_E can do things like convert an array of integers to one
 bigger integer and similar things we do not handle below.  */
- if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR)
+ if (TREE_CODE (t) == VIEW_CONVERT_EXPR)
{
  vce = true;
  break;
-- 
2.22.0



Re: [RFC][vect]PR: 65930 teach vectorizer to handle SUM reductions with sign-change casts

2019-08-27 Thread Richard Biener
On Fri, 23 Aug 2019, Andre Vieira (lists) wrote:

> Hi Richard,
> 
> I have come up with a way to teach the vectorizer to handle sign-changing
> reductions, restricted to SUM operations as I'm not sure other reductions are
> equivalent with different signs.
> 
> The main nature of this approach is to let it recognize reductions of the
> form: Phi->NopConversion?->Plus/Minus-reduction->NopConversion?->Phi. Then
> vectorize the statements normally, with some extra workarounds to handle the
> conversions. This is mainly needed where it looks for uses of the result of
> the reduction, we now need to check the uses of the result of the conversion
> instead.
> 
> I am curious to know what you think of this approach. I have regression tested
> this on aarch64 and x86_64 with AVX512 and it shows no regressions. On the 1
> month old version of trunk I tested on it even seems to make
> gcc.dg/vect/pr89268.c pass, where it used to fail with an ICE complaining
> about a definition not dominating a use.

Aww.  Yeah, I had a half-way working patch along this line as well
and threw it away because of ugliness.

So I was hoping we can at some point refactor the reduction detection
code to use the path discovery in check_reduction_path (which is basically
a lame SCC finding algorithm), massage the detected reduction path
and in the reduction PHI meta-data record something like
"this reduction SUMs _1, _4, _3 and _5" plus for the conversions
"do the reduction in SIGN" and during code-generation just look at
the PHI node and the backedge def which we'd replace.

But of course I stopped short trying that because the reduction code
is a big mess.  And I threw away the attempt that looked like yours
because I didn't want to make an even bigger mess out of it :/

On the branch throwing away the non-SLP paths I started to 
refactor^Wrewrite all this but got stuck as well.  One thing I
realized on the branch is that nested cycle handling should be
more straight-forward and done in a separate vectorizable_*
routine.  Not sure it simplified things a lot, but well.  Maybe
also simply always building a SLP graph for reductions only
helps.

Well.

Maybe you can try experimenting with amending check_reduction_path
with conversion support - from a quick look your patch wouldn't
handle

_1 = PHI <.., _4>
_2 = (unsigned) _1;
_3 = _2 + ...;
_4 = (signed) _3;

since the last stmt you expect is still a PLUS?

> The initial benchmarks I did also show a 14% improvement on x264_r on SPEC2017
> for aarch64.

Interesting, on x86 IIRC I didn't see any such big effect on x264_r
but it was the testcase I ran into this first.

Richard.


Re: [RFC][vect]PR: 65930 teach vectorizer to handle SUM reductions with sign-change casts

2019-08-27 Thread Richard Biener
On Tue, 27 Aug 2019, Richard Biener wrote:

> On Fri, 23 Aug 2019, Andre Vieira (lists) wrote:
> 
> > Hi Richard,
> > 
> > I have come up with a way to teach the vectorizer to handle sign-changing
> > reductions, restricted to SUM operations as I'm not sure other reductions 
> > are
> > equivalent with different signs.
> > 
> > The main nature of this approach is to let it recognize reductions of the
> > form: Phi->NopConversion?->Plus/Minus-reduction->NopConversion?->Phi. Then
> > vectorize the statements normally, with some extra workarounds to handle the
> > conversions. This is mainly needed where it looks for uses of the result of
> > the reduction, we now need to check the uses of the result of the conversion
> > instead.
> > 
> > I am curious to know what you think of this approach. I have regression 
> > tested
> > this on aarch64 and x86_64 with AVX512 and it shows no regressions. On the 1
> > month old version of trunk I tested on it even seems to make
> > gcc.dg/vect/pr89268.c pass, where it used to fail with an ICE complaining
> > about a definition not dominating a use.
> 
> Aww.  Yeah, I had a half-way working patch along this line as well
> and threw it away because of ugliness.
> 
> So I was hoping we can at some point refactor the reduction detection
> code to use the path discovery in check_reduction_path (which is basically
> a lame SCC finding algorithm), massage the detected reduction path
> and in the reduction PHI meta-data record something like
> "this reduction SUMs _1, _4, _3 and _5" plus for the conversions
> "do the reduction in SIGN" and during code-generation just look at
> the PHI node and the backedge def which we'd replace.
> 
> But of course I stopped short trying that because the reduction code
> is a big mess.  And I threw away the attempt that looked like yours
> because I didn't want to make an even bigger mess out of it :/
> 
> On the branch throwing away the non-SLP paths I started to 
> refactor^Wrewrite all this but got stuck as well.

Before you start looking I figured this all is only in my
working tree...

Richard.


[PATCH] Remove code leftover that has never been used.

2019-08-27 Thread Martin Liška
Hi.

I would like to remove the leftover that hasn't been used since
when it was introduced.

Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2019-08-27  Martin Liska  

PR tree-optimization/90970
* builtins.c (check_access): Remove assignment to maxread
as it hasn't been used since when it was introduced in r255755.
---
 gcc/builtins.c | 5 -
 1 file changed, 5 deletions(-)


diff --git a/gcc/builtins.c b/gcc/builtins.c
index f902e246f1f..0b25adc17a0 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3475,11 +3475,6 @@ check_access (tree exp, tree, tree, tree dstwrite,
   if (maxread)
 {
   get_size_range (maxread, range);
-
-  /* Use the lower end for MAXREAD from now on.  */
-  if (range[0])
-	maxread = range[0];
-
   if (range[0] && dstsize && tree_fits_uhwi_p (dstsize))
 	{
 	  location_t loc = tree_nonartificial_location (exp);



libgo patch committed: Rebuild runtime.inc if mkruntimeinc.sh changes

2019-08-27 Thread Ian Lance Taylor
This libgo patch rebuilds runtime.inc if mkruntimeinc.sh changes.  The
Makefile was missing a dependency.  Also remove runtime.inc.raw in
mostlyclean.  Bootstrapped on x86_64-pc-linux-gnu.  Committed to
mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 274935)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-58c0fc64d91edc53ef9828b85cf3dc86aeb94e12
+a6ddd0e1208a7d229c10be630c1110b3914038f5
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 274169)
+++ libgo/Makefile.am   (working copy)
@@ -612,7 +612,7 @@ s-zdefaultcc: Makefile
 # compiling runtime) to prune out certain types that should not be
 # exported back to C. See comments in mkruntimeinc.sh for more details.
 runtime.inc: s-runtime-inc; @true
-s-runtime-inc: runtime.lo Makefile
+s-runtime-inc: runtime.lo mkruntimeinc.sh Makefile
$(SHELL) $(srcdir)/mkruntimeinc.sh
$(SHELL) $(srcdir)/mvifdiff.sh tmp-runtime.inc runtime.inc
$(STAMP) $@
@@ -1205,7 +1205,8 @@ MOSTLYCLEANFILES = \
s-libcalls s-libcalls-list s-syscall_arch s-gen-sysinfo s-sysinfo \
s-errno s-epoll \
libgo.head libgo.sum.sep libgo.log.sep libgo.var \
-   libcalls-list runtime.inc runtime.inc.tmp2 runtime.inc.tmp3
+   libcalls-list \
+   runtime.inc runtime.inc.tmp2 runtime.inc.tmp3 runtime.inc.raw
 
 mostlyclean-local:
find . -name '*.lo' -print | xargs $(LIBTOOL) --mode=clean rm -f


Re: [SVE] PR86753

2019-08-27 Thread Prathamesh Kulkarni
On Tue, 27 Aug 2019 at 17:29, Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Aug 27, 2019 at 11:58 AM Richard Sandiford
> >  wrote:
> >> ifcvt produces:
> >>
> >>[local count: 1063004407]:
> >>   # i_34 = PHI 
> >>   # ivtmp_5 = PHI 
> >>   _1 = (long unsigned int) i_34;
> >>   _2 = _1 * 2;
> >>   _3 = a_23(D) + _2;
> >>   _4 = *_3;
> >>   _7 = b_24(D) + _2;
> >>   _49 = _4 > 0;
> >>   _8 = .MASK_LOAD (_7, 16B, _49);
> >>   _12 = _4 > 0;
> >>   _13 = _8 > 0;
> >>   _9 = _12 & _13;
> >>   _10 = _4 > 0;
> >>   _11 = _8 > 0;
> >>   _27 = ~_11;
> >>   _15 = _10 & _27;
> >>   _14 = c_25(D) + _2;
> >>   iftmp.0_26 = .MASK_LOAD (_14, 16B, _15);
> >>   iftmp.0_19 = _9 ? _4 : iftmp.0_26;
> >>   _17 = x_28(D) + _2;
> >>   _50 = _4 > 0;
> >>   .MASK_STORE (_17, 16B, _50, iftmp.0_19);
> >>   i_30 = i_34 + 1;
> >>   ivtmp_6 = ivtmp_5 - 1;
> >>   if (ivtmp_6 != 0)
> >> goto ; [98.99%]
> >>   else
> >> goto ; [1.01%]
> >>
> >>[local count: 1052266994]:
> >>   goto ; [100.00%]
> >>
> >> which has 4 copies of _4 > 0 (a[i] > 0) and 2 copies of _8 > 0 (b[i] > 0).
> >
> > Huh.  if-conversion does
> >
> >   /* Now all statements are if-convertible.  Combine all the basic
> >  blocks into one huge basic block doing the if-conversion
> >  on-the-fly.  */
> >   combine_blocks (loop);
> >
> >   /* Delete dead predicate computations.  */
> >   ifcvt_local_dce (loop->header);
> >
> >   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
> >  and stores are involved.  CSE only the loop body, not the entry
> >  PHIs, those are to be kept in sync with the non-if-converted copy.
> >  ???  We'll still keep dead stores though.  */
> >   exit_bbs = BITMAP_ALLOC (NULL);
> >   bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> >   bitmap_set_bit (exit_bbs, loop->latch->index);
> >   todo |= do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs);
> >
> > which should remove those redundant _4 > 0 checks.  In fact when I
> > run this on x86_64 with -mavx512bw I see
> >
> >[local count: 1063004407]:
> >   # i_25 = PHI 
> >   # ivtmp_24 = PHI 
> >   _1 = (long unsigned int) i_25;
> >   _2 = _1 * 2;
> >   _3 = a_14(D) + _2;
> >   _4 = *_3;
> >   _5 = b_15(D) + _2;
> >   _49 = _4 > 0;
> >   _6 = .MASK_LOAD (_5, 16B, _49);
> >   _22 = _6 > 0;
> >   _28 = ~_22;
> >   _29 = _28 & _49;
> >   _7 = c_16(D) + _2;
> >   iftmp.0_17 = .MASK_LOAD (_7, 16B, _29);
> >   iftmp.0_10 = _29 ? iftmp.0_17 : _4;
> >   _8 = x_18(D) + _2;
> >   .MASK_STORE (_8, 16B, _49, iftmp.0_10);
> >   i_20 = i_25 + 1;
> >   ivtmp_12 = ivtmp_24 - 1;
> >   if (ivtmp_12 != 0)
> >
> > after if-conversion (that should be the case already on the GCC 9 branch).
>
> Gah, sorry for the noise.  Turns out I still had a local change that was
> trying to poke the patch into doing something wrong.  Will try to check
> my facts more carefully next time.
>
> The redundant pattern statements I was thinking of come from
> vect_recog_mask_conversion_pattern, but I guess that isn't so
> interesting here.
>
> So yeah, let's drop this whole vn thing for now...
The attached version drops VN, and uses operand_equal_p for comparison.
Does it look OK ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
index 5c04bcdb3f5..a1b0667dab5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
@@ -15,5 +15,9 @@ f (double *restrict a, double *restrict b, double *restrict c,
 }
 }
 
-/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
+/* See https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01644.html
+   for XFAILing the below test.  */
+
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
 /* { dg-final { scan-assembler-not {\tfmad\t} } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b0cbbac0cb5..0fc7171d7ea 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -8609,6 +8609,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   basic_block bb = bbs[i];
   stmt_vec_info stmt_info;
 
+  if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+	loop_vinfo->cond_to_vec_mask = new cond_vmask_map_type (8);
+
   for (gphi_iterator si = gsi_start_phis (bb); !gsi_end_p (si);
 	   gsi_next (&si))
 {
@@ -8717,6 +8720,12 @@ vect_transform_loop (loop_vec_info loop_vinfo)
 		}
 	}
 	}
+
+  if (loop_vinfo->cond_to_vec_mask)
+	{
+	  delete loop_vinfo->cond_to_vec_mask;
+	  loop_vinfo->cond_to_vec_mask = 0;
+	}
 }/* BBs in loop */
 
   /* The vectorization factor is always > 1, so if we use an IV increment of 1.
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 1e2dfe5d22d..862206b3256 100644
--- a/

Re: [PATCH] Remove code leftover that has never been used.

2019-08-27 Thread Martin Sebor

On 8/27/19 8:04 AM, Martin Liška wrote:

Hi.

I would like to remove the leftover that hasn't been used since
when it was introduced.

Ready for trunk?


It's fine with me, thank you.

Martin


Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-08-27 Thread Martin Sebor

On 8/27/19 3:09 AM, Christophe Lyon wrote:

On Fri, 23 Aug 2019 at 04:14, Jeff Law  wrote:


On 8/12/19 4:09 PM, Martin Sebor wrote:



gcc-83431.diff

PR tree-optimization/83431 - -Wformat-truncation may incorrectly report 
truncation

gcc/ChangeLog:

   PR c++/83431
   * gimple-ssa-sprintf.c (pass_data_sprintf_length): Remove object.
   (sprintf_dom_walker): Remove class.
   (get_int_range): Make argument const.
   (directive::fmtfunc, directive::set_precision): Same.
   (format_none): Same.
   (build_intmax_type_nodes): Same.
   (adjust_range_for_overflow): Same.
   (format_floating): Same.
   (format_character): Same.
   (format_string): Same.
   (format_plain): Same.
   (get_int_range): Cast away constness.
   (format_integer): Same.
   (get_string_length): Call get_range_strlen_dynamic.  Handle
   null lendata.maxbound.
   (should_warn_p): Adjust argument scope qualifier.
   (maybe_warn): Same.
   (format_directive): Same.
   (parse_directive): Same.
   (is_call_safe): Same.
   (try_substitute_return_value): Same.
   (sprintf_dom_walker::handle_printf_call): Rename...
   (handle_printf_call): ...to this.  Initialize target to host charmap
   here instead of in pass_sprintf_length::execute.
   (struct call_info): Make global.
   (sprintf_dom_walker::compute_format_length): Make global.
   (sprintf_dom_walker::handle_gimple_call): Same.
   * passes.def (pass_sprintf_length): Replace with pass_strlen.
   * print-rtl.c (print_pattern): Reduce the number of spaces to
   avoid -Wformat-truncation.
   * tree-pass.h (make_pass_warn_printf): New function.
   * tree-ssa-strlen.c (strlen_optimize): New variable.
   (get_string_length): Add comments.
   (get_range_strlen_dynamic): New function.
   (check_and_optimize_call): New function.
   (handle_integral_assign): New function.
   (strlen_check_and_optimize_stmt): Factor code out into
   strlen_check_and_optimize_call and handle_integral_assign.
   (strlen_dom_walker::evrp): New member.
   (strlen_dom_walker::before_dom_children): Use evrp member.
   (strlen_dom_walker::after_dom_children): Use evrp member.
   (printf_strlen_execute): New function.
   (pass_strlen::gate): Update to handle printf calls.
   (dump_strlen_info): New function.
   (pass_data_warn_printf): New variable.
   (pass_warn_printf): New class.
   * tree-ssa-strlen.h (get_range_strlen_dynamic): Declare.
   (handle_printf_call): Same.

gcc/testsuite/ChangeLog:

   PR c++/83431
   * gcc.dg/strlenopt-63.c: New test.
   * gcc.dg/pr79538.c: Adjust text of expected warning.
   * gcc.dg/pr81292-1.c: Adjust pass name.
   * gcc.dg/pr81292-2.c: Same.
   * gcc.dg/pr81703.c: Same.
   * gcc.dg/strcmpopt_2.c: Same.
   * gcc.dg/strcmpopt_3.c: Same.
   * gcc.dg/strcmpopt_4.c: Same.
   * gcc.dg/strlenopt-1.c: Same.
   * gcc.dg/strlenopt-10.c: Same.
   * gcc.dg/strlenopt-11.c: Same.
   * gcc.dg/strlenopt-13.c: Same.
   * gcc.dg/strlenopt-14g.c: Same.
   * gcc.dg/strlenopt-14gf.c: Same.
   * gcc.dg/strlenopt-15.c: Same.
   * gcc.dg/strlenopt-16g.c: Same.
   * gcc.dg/strlenopt-17g.c: Same.
   * gcc.dg/strlenopt-18g.c: Same.
   * gcc.dg/strlenopt-19.c: Same.
   * gcc.dg/strlenopt-1f.c: Same.
   * gcc.dg/strlenopt-2.c: Same.
   * gcc.dg/strlenopt-20.c: Same.
   * gcc.dg/strlenopt-21.c: Same.
   * gcc.dg/strlenopt-22.c: Same.
   * gcc.dg/strlenopt-22g.c: Same.
   * gcc.dg/strlenopt-24.c: Same.
   * gcc.dg/strlenopt-25.c: Same.
   * gcc.dg/strlenopt-26.c: Same.
   * gcc.dg/strlenopt-27.c: Same.
   * gcc.dg/strlenopt-28.c: Same.
   * gcc.dg/strlenopt-29.c: Same.
   * gcc.dg/strlenopt-2f.c: Same.
   * gcc.dg/strlenopt-3.c: Same.
   * gcc.dg/strlenopt-30.c: Same.
   * gcc.dg/strlenopt-31g.c: Same.
   * gcc.dg/strlenopt-32.c: Same.
   * gcc.dg/strlenopt-33.c: Same.
   * gcc.dg/strlenopt-33g.c: Same.
   * gcc.dg/strlenopt-34.c: Same.
   * gcc.dg/strlenopt-35.c: Same.
   * gcc.dg/strlenopt-4.c: Same.
   * gcc.dg/strlenopt-48.c: Same.
   * gcc.dg/strlenopt-49.c: Same.
   * gcc.dg/strlenopt-4g.c: Same.
   * gcc.dg/strlenopt-4gf.c: Same.
   * gcc.dg/strlenopt-5.c: Same.
   * gcc.dg/strlenopt-50.c: Same.
   * gcc.dg/strlenopt-51.c: Same.
   * gcc.dg/strlenopt-52.c: Same.
   * gcc.dg/strlenopt-53.c: Same.
   * gcc.dg/strlenopt-54.c: Same.
   * gcc.dg/strlenopt-55.c: Same.
   * gcc.dg/strlenopt-56.c: Same.
   * gcc.dg/strlenopt-6.c: Same.
   * gcc.dg/strlenopt-61.c: Same.
   * gcc.dg/strlenopt-7.c: Same.
   * gcc.dg/strlenopt-8.c: Same.
   * gcc.dg/strlenopt-9.c: Same.
   * gcc.dg/strlenopt.h (snprintf, snprintf): Declare.
   * gcc.dg/tree-ssa/builtin-snprintf-6.c: New test.
   * gcc.dg/tre

Re: [SVE] PR86753

2019-08-27 Thread Richard Sandiford
Richard should have the final say, but some comments...

Prathamesh Kulkarni  writes:
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 1e2dfe5d22d..862206b3256 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info loop_vinfo, 
> tree vectype,
>  
>  static tree
>  prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask,
> -  gimple_stmt_iterator *gsi)
> +  gimple_stmt_iterator *gsi, tree mask,
> +  cond_vmask_map_type *cond_to_vec_mask)

"scalar_mask" might be a better name.  But maybe we should key off the
vector mask after all, now that we're relying on the code having no
redundancies.

Passing the vinfo would be better than passing the cond_vmask_map_type
directly.

>  {
>gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE (vec_mask)));
>if (!loop_mask)
>  return vec_mask;
>  
>gcc_assert (TREE_TYPE (loop_mask) == mask_type);
> +
> +  tree *slot = 0;
> +  if (cond_to_vec_mask)

The pointer should never be null in this context.

> +{
> +  cond_vmask_key cond (mask, loop_mask);
> +  slot = &cond_to_vec_mask->get_or_insert (cond);
> +  if (*slot)
> + return *slot;
> +}
> +
>tree and_res = make_temp_ssa_name (mask_type, NULL, "vec_mask_and");
>gimple *and_stmt = gimple_build_assign (and_res, BIT_AND_EXPR,
> vec_mask, loop_mask);
>gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT);
> +
> +  if (slot)
> +*slot = and_res;
>return and_res;
>  }
> [...]
> @@ -9975,6 +9997,38 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
>/* Handle cond expr.  */
>for (j = 0; j < ncopies; j++)
>  {
> +  tree vec_mask = NULL_TREE;
> +
> +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)

Nit: one condition per line when the whole thing doesn't fit on a single line.

> +   && TREE_CODE_CLASS (TREE_CODE (cond_expr)) == tcc_comparison

Why restrict this to embedded comparisons?  It should work for separate
comparisons too.

> +   && loop_vinfo->cond_to_vec_mask)

This should always be nonnull given the above.

> + {
> +   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +   if (masks)

This is never null.

> + {
> +   tree loop_mask = vect_get_loop_mask (gsi, masks,
> +ncopies, vectype, j);
> +
> +   cond_vmask_key cond (cond_expr, loop_mask);
> +   tree *slot = loop_vinfo->cond_to_vec_mask->get (cond);
> +   if (slot && *slot)
> + vec_mask = *slot;
> +   else
> + {
> +   cond.cond_ops.code
> + = invert_tree_comparison (cond.cond_ops.code, true);
> +   slot = loop_vinfo->cond_to_vec_mask->get (cond);
> +   if (slot && *slot)
> + {
> +   vec_mask = *slot;
> +   tree tmp = then_clause;
> +   then_clause = else_clause;
> +   else_clause = tmp;

Can use std::swap.

> + }
> + }
> + }
> + }
> +
>stmt_vec_info new_stmt_info = NULL;
>if (j == 0)
>   {
> @@ -10054,6 +10108,8 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
>  
> if (masked)
>   vec_compare = vec_cond_lhs;
> +   else if (vec_mask)
> + vec_compare = vec_mask;

If we do drop the comparison check above, this should win over "masked".

> @@ -193,6 +194,81 @@ public:
>poly_uint64 min_value;
>  };
>  
> +struct cond_vmask_key

I'm no good at naming things, but since "vmask" doesn't occur elsewhere
in target-independent code, how about "vec_masked_cond_key"?

> +{
> +  cond_vmask_key (tree t, tree loop_mask_)
> +: cond_ops (t), loop_mask (loop_mask_)
> +  {}
> +
> +  hashval_t hash () const
> +  {
> +inchash::hash h;
> +h.add_int (cond_ops.code);
> +h.add_int (TREE_HASH (cond_ops.op0));
> +h.add_int (TREE_HASH (cond_ops.op1));

These two need to use inchash::add_expr, since you're hashing for
operand_equal_p.

> +h.add_int (TREE_HASH (loop_mask));
> +return h.end ();
> +  }
> +
> +  void mark_empty ()
> +  {
> +loop_mask = NULL_TREE;
> +  }
> +
> +  bool is_empty ()
> +  {
> +return loop_mask == NULL_TREE;
> +  }
> +
> +  tree_cond_ops cond_ops;
> +  tree loop_mask;
> +};
> +
> +inline bool operator== (const cond_vmask_key& c1, const cond_vmask_key& c2)
> +{
> +  return c1.loop_mask == c2.loop_mask
> +  && c1.cond_ops == c2.cond_ops;

Multi-line expressions should be in brackets (or just put this one on
a single line).

> +}
> +
> +struct cond_vmask_key_traits

Might as well make this:

template<>
struct default_hash_traits

and then you can drop the third template parameter from hash_map.

> +{
> +  typedef

Re: [PATCH V3 05/11] bpf: new GCC port

2019-08-27 Thread Jose E. Marchesi


> +(define_expand "zero_extendsidi2"
> +  [(set (match_operand:DI 0 "register_operand")
> + (zero_extend:DI (match_operand:SI 1 "reg_or_indirect_memory_operand")))]
> +  ""
> +{
> +  if (register_operand (operands[1], SImode))
> +{
> +  operands[1] = gen_lowpart (DImode, operands[1]);
> +  emit_insn (gen_ashldi3 (operands[0], operands[1], GEN_INT (32)));
> +  emit_insn (gen_lshrdi3 (operands[0], operands[0], GEN_INT (32)));
> +  DONE;
> +}
> +})
> +
> +(define_insn "*zero_extendsidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
> + (zero_extend:DI (match_operand:SI 1 "reg_or_indirect_memory_operand" 
"0,m")))]
> +  ""
> +  "@
> +   lsh\t%0,32\n\trsh\t%0,32
> +   ldxw\t%0,%1"
> +  [(set_attr "type" "alu,ldx")
> +   (set_attr "length" "16,8")])

Sorry, should have noticed last time, but: you shouldn't need to handle
register operands here given the expander above.  It's OK if you find it
improves code quality, but it'd be interesting to know why if so

If I remove the 0,=r alternative from the insn above, and also adjust
the predicate to indirect_memory_operand, then I get a segfault in one
test, in update_costs_from_allocno (ira-color.c:1382), because:

(gdb) print mode
$1 = E_SImode
(gdb) print default_target_ira_int->x_ira_register_move_cost[mode]
$13 = (move_table *) 0x0

What I think is going on is:

1. The expand above is used, and

2. there is no insn in the program matched by a pattern that involves a
   SI operand, and therefore record_operand_costs is never called on
   SImode operand, and therefore the lazily-initialized
   x_ira_register_move_cost is never filled in for E_SImode, and then

3. ira() -> ira_color() -> color () -> do_coloring () ->
   ira_traverse_loop_tree () -> color_pass () -> color_allocnos () ->
   update_costs_from_prefs () -> update_costs_from_allocno () *CRASH*

Is this a bug, or am I expected to somehow trigger the initialization of
the SImode entry in the ira register move table in some other way?

> +
> +;;; Sign-extension
> +
> +;; Sign-extending a 32-bit value into a 64-bit value is achieved using
> +;; shifting, with instructions generated by the expand below.
> +
> +(define_expand "extendsidi2"
> +  [(set (match_operand:DI 0 "register_operand")
> + (sign_extend:DI (match_operand:SI 1 "register_operand")))]
> +  ""
> +{
> +  operands[1] = gen_lowpart (DImode, operands[1]);
> +  emit_insn (gen_ashldi3 (operands[0], operands[1], GEN_INT (32)));
> +  emit_insn (gen_ashrdi3 (operands[0], operands[0], GEN_INT (32)));
> +  DONE;
> +})
> +
> +(define_insn "*extendsidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (sign_extend:DI (match_operand:SI 1 "register_operand" "0")))]
> +  ""
> +  "lsh\t%0,32\n\tarsh\t%0,32"
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "16")])

Likewise this define_insn shouldn't be needed.

The removal of this one doesn't trigger any problem that I can see
running compile.exp.  I'm glad to get rid of yet another insn! :)


Re: [PATCH V3 05/11] bpf: new GCC port

2019-08-27 Thread Segher Boessenkool
On Tue, Aug 27, 2019 at 02:59:01PM +0200, Jose E. Marchesi wrote:
> glglgl, scratch that, it is actually a 32-bit multiplication that then
> gets extended to 64-bits:
> 
> (define_insn "*mulsidi3_zeroextended"
>   [(set (match_operand:DI0 "register_operand" "=r,r")
> (zero_extend:DI
>  (mult:SI (match_operand:SI 1 "register_operand" "0,0")
>   (match_operand:SI 2 "reg_or_imm_operand" "r,I"]
>   ""
>   "mul32\t%0,%2"
>   [(set_attr "type" "alu32")])

Like pretty much *all* the 32-bit instructions, btw...  All of them could
have a variant with a zero_extend like this.  Something for define_subst,
perhaps.

On rs6000 we call such patterns "*rotlsi3_64" (where the pure 32-bit one
is called "rotlsi3"), maybe you want some similar naming?  Something that
makes it clearer that it is the same insn as mulsi3.


Segher


Re: [PATCH V3 05/11] bpf: new GCC port

2019-08-27 Thread Segher Boessenkool
Hi!

On Tue, Aug 27, 2019 at 12:08:50PM +0100, Richard Sandiford wrote:
> Mostly trivial formatting comments, but I think there's still a couple
> of substantive points too.

Sorry to piggy-back on your review once again.

> jema...@gnu.org (Jose E. Marchesi) writes:
> > +static rtx
> > +bpf_function_value (const_tree ret_type,
> > +   const_tree fntype_or_decl ATTRIBUTE_UNUSED,
> 
> This *is* used. :-)  I only noticed because...

More modern style doesn't use ATTRIBUTE_UNUSED, it simply names the
parameter unnamed (or commented out, "const_tree /*fntype_or_decl*/").
Good luck accidentally using it, with that C++ style :-)

> > +(define_expand "zero_extendsidi2"
> > +  [(set (match_operand:DI 0 "register_operand")
> > +   (zero_extend:DI (match_operand:SI 1 "reg_or_indirect_memory_operand")))]
> > +  ""
> > +{
> > +  if (register_operand (operands[1], SImode))
> > +{
> > +  operands[1] = gen_lowpart (DImode, operands[1]);
> > +  emit_insn (gen_ashldi3 (operands[0], operands[1], GEN_INT (32)));
> > +  emit_insn (gen_lshrdi3 (operands[0], operands[0], GEN_INT (32)));
> > +  DONE;
> > +}
> > +})
> > +
> > +(define_insn "*zero_extendsidi2"
> > +  [(set (match_operand:DI 0 "register_operand" "=r,r")
> > +   (zero_extend:DI (match_operand:SI 1 "reg_or_indirect_memory_operand" 
> > "0,m")))]
> > +  ""
> > +  "@
> > +   lsh\t%0,32\n\trsh\t%0,32
> > +   ldxw\t%0,%1"
> > +  [(set_attr "type" "alu,ldx")
> > +   (set_attr "length" "16,8")])
> 
> Sorry, should have noticed last time, but: you shouldn't need to handle
> register operands here given the expander above.  It's OK if you find it
> improves code quality, but it'd be interesting to know why if so

Can't you just do "mov32 dst,src" for this?  Or "add32 dst,0" if that has
any advantages (maybe it is a shorter insn?  No idea).

So that then gets you

(define_insn "zero_extendsidi2"
  [(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI
  (match_operand:SI 1 "reg_or_indirect_memory_operand" "r,m")))]
  ""
  "@
   mov32\t%0,%1
   ldxw\t%0,%1"
  [(set_attr "type" "alu,ldx")])

(and no expander or anything).

(You still need one for the sign-extend, but only a define_expand for that,
nothing more).


Segher


[PATCH, testsuite]: Use -mfpmath=sse in gcc.target/i386/sse4_1-round-roundeven-?.c tests

2019-08-27 Thread Uros Bizjak
Fix 32bit testing, where -mfpmath=387 is the default.

2019-08-27  Uroš Bizjak  

* gcc.target/i386/sse4_1-round-roundeven-1.c (dg-options):
Add -mfpmath=sse.
* gcc.target/i386/sse4_1-round-roundeven-2.c (dg-options): Ditto.

Tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: gcc.target/i386/sse4_1-round-roundeven-1.c
===
--- gcc.target/i386/sse4_1-round-roundeven-1.c  (revision 274961)
+++ gcc.target/i386/sse4_1-round-roundeven-1.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -msse4.1" } */
+/* { dg-options "-O2 -msse4.1 -mfpmath=sse" } */
 
 __attribute__((noinline, noclone)) double
 f1 (double x)
Index: gcc.target/i386/sse4_1-round-roundeven-2.c
===
--- gcc.target/i386/sse4_1-round-roundeven-2.c  (revision 274961)
+++ gcc.target/i386/sse4_1-round-roundeven-2.c  (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-effective-target sse4 } */
-/* { dg-options "-O2 -msse4.1" } */
+/* { dg-options "-O2 -msse4.1 -mfpmath=sse" } */
 
 #include "sse4_1-check.h"
 #include "sse4_1-round-roundeven-1.c"


Re: [PATCH V3 05/11] bpf: new GCC port

2019-08-27 Thread Jose E. Marchesi


> +(define_expand "zero_extendsidi2"
> +  [(set (match_operand:DI 0 "register_operand")
> + (zero_extend:DI (match_operand:SI 1 
"reg_or_indirect_memory_operand")))]
> +  ""
> +{
> +  if (register_operand (operands[1], SImode))
> +{
> +  operands[1] = gen_lowpart (DImode, operands[1]);
> +  emit_insn (gen_ashldi3 (operands[0], operands[1], GEN_INT 
(32)));
> +  emit_insn (gen_lshrdi3 (operands[0], operands[0], GEN_INT 
(32)));
> +  DONE;
> +}
> +})
> +
> +(define_insn "*zero_extendsidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
> + (zero_extend:DI (match_operand:SI 1 
"reg_or_indirect_memory_operand" "0,m")))]
> +  ""
> +  "@
> +   lsh\t%0,32\n\trsh\t%0,32
> +   ldxw\t%0,%1"
> +  [(set_attr "type" "alu,ldx")
> +   (set_attr "length" "16,8")])

Sorry, should have noticed last time, but: you shouldn't need to handle
register operands here given the expander above.  It's OK if you find it
improves code quality, but it'd be interesting to know why if so

If I remove the 0,=r alternative from the insn above, and also adjust
the predicate to indirect_memory_operand, then I get a segfault in one
test, in update_costs_from_allocno (ira-color.c:1382), because:

(gdb) print mode
$1 = E_SImode
(gdb) print default_target_ira_int->x_ira_register_move_cost[mode]
$13 = (move_table *) 0x0

What I think is going on is:

1. The expand above is used, and

2. there is no insn in the program matched by a pattern that involves a
   SI operand, and therefore record_operand_costs is never called on
   SImode operand, and therefore the lazily-initialized
   x_ira_register_move_cost is never filled in for E_SImode, and then

3. ira() -> ira_color() -> color () -> do_coloring () ->
   ira_traverse_loop_tree () -> color_pass () -> color_allocnos () ->
   update_costs_from_prefs () -> update_costs_from_allocno () *CRASH*

Is this a bug, or am I expected to somehow trigger the initialization of
the SImode entry in the ira register move table in some other way?

This is the backtrace btw:

jemarch@termi:~/gnu/src/gcc-git/build-bpf/gcc$ PATH=.:$PATH ./xgcc -O2 -c 
/home/jemarch/gnu/src/gcc-git/gcc/testsuite/gcc.c-torture/compile/pr39928-2.c
during RTL pass: ira
/home/jemarch/gnu/src/gcc-git/gcc/testsuite/gcc.c-torture/compile/pr39928-2.c: 
In function ‘vq_nbest’:
/home/jemarch/gnu/src/gcc-git/gcc/testsuite/gcc.c-torture/compile/pr39928-2.c:8:1:
 internal compiler error: Segmentation fault
8 | }
  | ^
0xfa8cb7 crash_signal
../../gcc/toplev.c:326
0x7f278d44e05f ???

/build/glibc-yWQXbR/glibc-2.24/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0xd4ec8b update_costs_from_allocno
../../gcc/ira-color.c:1382
0xd4ee09 update_costs_from_prefs
../../gcc/ira-color.c:1422
0xd535d9 color_allocnos
../../gcc/ira-color.c:3170
0xd53e21 color_pass
../../gcc/ira-color.c:3309
0xd39b10 ira_traverse_loop_tree(bool, ira_loop_tree_node*, void 
(*)(ira_loop_tree_node*), void (*)(ira_loop_tree_node*))
../../gcc/ira-build.c:1781
0xd54703 do_coloring
../../gcc/ira-color.c:3460
0xd58203 color
../../gcc/ira-color.c:4837
0xd58746 ira_color()
../../gcc/ira-color.c:4968
0xd34990 ira
../../gcc/ira.c:5365
0xd35118 execute
../../gcc/ira.c:5663
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


[PATCH, i386]: Fix PR 91528, ICE in ix86_expand_prologue

2019-08-27 Thread Uros Bizjak
When stack alignment is increased in convert_scalars_to_vector, we
have to update several other dependant fields in crtl struct, similar
to what expand_stack_alignment from cfgexpand.c does.

2019-08-27  Uroš Bizjak  

PR target/91528
* config/i386/i386-features.c (convert_scalars_to_vector):
Update crtl->stack_realign_needed, crtl->stack_realign_tried and
crtl->stack_realign_processed.  Update crtl->drap_reg by calling
targetm.calls.get_drap_rtx.  If drap_rtx is non-null then
Update crtl->args.internal_arg_pointer and call fixup_tail_calls.

testsuite/ChangeLog:

2019-08-27  Uroš Bizjak  

PR target/91528
* gcc.target/i386/pr91528.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386-features.c
===
--- config/i386/i386-features.c (revision 274958)
+++ config/i386/i386-features.c (working copy)
@@ -1651,6 +1651,32 @@ convert_scalars_to_vector (bool timode_p)
crtl->stack_alignment_needed = 128;
   if (crtl->stack_alignment_estimated < 128)
crtl->stack_alignment_estimated = 128;
+
+  crtl->stack_realign_needed
+   = INCOMING_STACK_BOUNDARY < crtl->stack_alignment_estimated;
+  crtl->stack_realign_tried = crtl->stack_realign_needed;
+
+  crtl->stack_realign_processed = true;
+
+  if (!crtl->drap_reg)
+   {
+ rtx drap_rtx = targetm.calls.get_drap_rtx ();
+
+ /* stack_realign_drap and drap_rtx must match.  */
+ gcc_assert ((stack_realign_drap != 0) == (drap_rtx != NULL));
+
+ /* Do nothing if NULL is returned,
+which means DRAP is not needed.  */
+ if (drap_rtx != NULL)
+   {
+ crtl->args.internal_arg_pointer = drap_rtx;
+
+ /* Call fixup_tail_calls to clean up
+REG_EQUIV note if DRAP is needed. */
+ fixup_tail_calls ();
+   }
+   }
+
   /* Fix up DECL_RTL/DECL_INCOMING_RTL of arguments.  */
   if (TARGET_64BIT)
for (tree parm = DECL_ARGUMENTS (current_function_decl);
Index: testsuite/gcc.target/i386/pr91528.c
===
--- testsuite/gcc.target/i386/pr91528.c (nonexistent)
+++ testsuite/gcc.target/i386/pr91528.c (working copy)
@@ -0,0 +1,14 @@
+/* PR target/91528 */
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-Os -mavx512vbmi2 -mforce-drap" } */
+
+extern long int labs (long int j);
+
+int
+main ()
+{
+  long *a = (long *)"empty";
+  int i = 1441516387;
+  a[i] = labs (a[i]);
+  return 0;
+}


Re: C++ PATCH to implement C++20 P1143R2, constinit (PR c++/91360)

2019-08-27 Thread Marek Polacek
On Fri, Aug 23, 2019 at 03:10:37PM -0700, Jason Merrill wrote:
> > +/* True if DECL is declared 'constinit'.  */
> > +#define DECL_DECLARED_CONSTINIT_P(DECL) \
> > +  DECL_LANG_FLAG_0 (VAR_DECL_CHECK (STRIP_TEMPLATE (DECL)))
> 
> Hmm, given that 'constinit' only affects the declaration, do we really need
> a flag on the VAR_DECL?

I was able to do without DECL_DECLARED_CONSTINIT_P.  To achieve that I
introduced LOOKUP_CONSTINIT (yes, it's an abomination; we should probably
extirpate those LOOKUP_ macros).

But that wasn't enough for variable templates, so I also had to introduce
TINFO_VAR_DECLARED_CONSTINIT.  I suppose those DECL_TEMPLATE_INFO aren't
as precious as the VAR_DECL bits.

> Hmm, the existing code limiting the unification of constexpr to class-scope
> variables seems wrong:
> 
> constexpr float pi = 3.14;
> extern const float pi;
> constexpr float x = pi; // should be OK

Thanks for fixing this meanwhile.

Is this any better?

Bootstrapped/regtested on x86_64-linux.

2019-08-27  Marek Polacek  

PR c++/91360 - Implement C++20 P1143R2: constinit.
* c-common.c (c_common_reswords): Add constinit and __constinit.
(keyword_is_decl_specifier): Handle RID_CONSTINIT.
* c-common.h (enum rid): Add RID_CONSTINIT, RID_FIRST_CXX20, and
RID_LAST_CXX20.
(D_CXX20): Define.
* c-cppbuiltin.c (c_cpp_builtins): Define __cpp_constinit.
* c-format.c (cxx_keywords): Add "constinit".
* c.opt (Wc++2a-compat, Wc++20-compat): New options.

* cp-tree.h (TINFO_VAR_DECLARED_CONSTINIT): Define.
(LOOKUP_CONSTINIT): Define.
(enum cp_decl_spec): Add ds_constinit.
* decl.c (check_tag_decl): Give an error for constinit in type
declarations.
(check_initializer): Also check LOOKUP_CONSTINIT.
(cp_finish_decl): Add checking for a constinit declaration.  Set
TINFO_VAR_DECLARED_CONSTINIT.
(grokdeclarator): Add checking for a declaration with the constinit
specifier.
* lex.c (init_reswords): Handle D_CXX20.
* parser.c (cp_lexer_get_preprocessor_token): Pass a better location
to warning_at.  Warn about C++20 keywords.
(cp_keyword_starts_decl_specifier_p): Handle RID_CONSTINIT.
(cp_parser_diagnose_invalid_type_name): Add an inform about constinit.
(cp_parser_decomposition_declaration): Maybe pass LOOKUP_CONSTINIT to
cp_finish_decl.
(cp_parser_decl_specifier_seq): Handle RID_CONSTINIT.
(cp_parser_init_declarator): Maybe pass LOOKUP_CONSTINIT to
cp_finish_decl.
(set_and_check_decl_spec_loc): Add "constinit".
* pt.c (tsubst_decl): Set TINFO_VAR_DECLARED_CONSTINIT.
(instantiate_decl): Maybe pass LOOKUP_CONSTINIT to cp_finish_decl.
* typeck2.c (store_init_value): If a constinit variable wasn't
initialized using a constant initializer, give an error.

* doc/invoke.texi: Document -Wc++20-compat.

* g++.dg/cpp2a/constinit1.C: New test.
* g++.dg/cpp2a/constinit2.C: New test.
* g++.dg/cpp2a/constinit3.C: New test.
* g++.dg/cpp2a/constinit4.C: New test.
* g++.dg/cpp2a/constinit5.C: New test.
* g++.dg/cpp2a/constinit6.C: New test.
* g++.dg/cpp2a/constinit7.C: New test.
* g++.dg/cpp2a/constinit8.C: New test.
* g++.dg/cpp2a/constinit9.C: New test.
* g++.dg/cpp2a/constinit10.C: New test.
* g++.dg/cpp2a/constinit11.C: New test.
* g++.dg/cpp2a/constinit12.C: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index d516deaf24c..17ca1e683a2 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -326,8 +326,9 @@ static bool nonnull_check_p (tree, unsigned HOST_WIDE_INT);
C --std=c89: D_C99 | D_CXXONLY | D_OBJC | D_CXX_OBJC
C --std=c99: D_CXXONLY | D_OBJC
ObjC is like C except that D_OBJC and D_CXX_OBJC are not set
-   C++ --std=c++98: D_CONLY | D_CXX11 | D_OBJC
-   C++ --std=c++11: D_CONLY | D_OBJC
+   C++ --std=c++98: D_CONLY | D_CXX11 | D_CXX20 | D_OBJC
+   C++ --std=c++11: D_CONLY | D_CXX20 | D_OBJC
+   C++ --std=c++2a: D_CONLY | D_OBJC
ObjC++ is like C++ except that D_OBJC is not set
 
If -fno-asm is used, D_ASM is added to the mask.  If
@@ -392,6 +393,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__complex__", RID_COMPLEX,0 },
   { "__const", RID_CONST,  0 },
   { "__const__",   RID_CONST,  0 },
+  { "__constinit", RID_CONSTINIT,  D_CXXONLY },
   { "__decltype",   RID_DECLTYPE,   D_CXXONLY },
   { "__direct_bases",   RID_DIRECT_BASES, D_CXXONLY },
   { "__extension__",   RID_EXTENSION,  0 },
@@ -462,6 +464,7 @@ const struct c_common_resword c_common_reswords[] =
   { "class",   RID_CLASS,  D_CXX_OBJC | D_CXXWARN },
   { "const",   RID_CONST,  0 },
   { "constexpr",   RID_CONSTEXPR,  D_CXXONLY | D_CXX11 | D_CXXWARN },
+  { "constinit",   RID_CONSTINIT, 

Re: [SVE] PR86753

2019-08-27 Thread Prathamesh Kulkarni
On Tue, 27 Aug 2019 at 21:14, Richard Sandiford
 wrote:
>
> Richard should have the final say, but some comments...
>
> Prathamesh Kulkarni  writes:
> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> > index 1e2dfe5d22d..862206b3256 100644
> > --- a/gcc/tree-vect-stmts.c
> > +++ b/gcc/tree-vect-stmts.c
> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info loop_vinfo, 
> > tree vectype,
> >
> >  static tree
> >  prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask,
> > -  gimple_stmt_iterator *gsi)
> > +  gimple_stmt_iterator *gsi, tree mask,
> > +  cond_vmask_map_type *cond_to_vec_mask)
>
> "scalar_mask" might be a better name.  But maybe we should key off the
> vector mask after all, now that we're relying on the code having no
> redundancies.
>
> Passing the vinfo would be better than passing the cond_vmask_map_type
> directly.
>
> >  {
> >gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE (vec_mask)));
> >if (!loop_mask)
> >  return vec_mask;
> >
> >gcc_assert (TREE_TYPE (loop_mask) == mask_type);
> > +
> > +  tree *slot = 0;
> > +  if (cond_to_vec_mask)
>
> The pointer should never be null in this context.
Disabling check for NULL results in segfault with cond_arith_4.c because we
reach prepare_load_store_mask via vect_schedule_slp, called from
here in vect_transform_loop:
 /* Schedule the SLP instances first, then handle loop vectorization
 below.  */
  if (!loop_vinfo->slp_instances.is_empty ())
{
  DUMP_VECT_SCOPE ("scheduling SLP instances");
  vect_schedule_slp (loop_vinfo);
}

which is before bb processing loop.
>
> > +{
> > +  cond_vmask_key cond (mask, loop_mask);
> > +  slot = &cond_to_vec_mask->get_or_insert (cond);
> > +  if (*slot)
> > + return *slot;
> > +}
> > +
> >tree and_res = make_temp_ssa_name (mask_type, NULL, "vec_mask_and");
> >gimple *and_stmt = gimple_build_assign (and_res, BIT_AND_EXPR,
> > vec_mask, loop_mask);
> >gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT);
> > +
> > +  if (slot)
> > +*slot = and_res;
> >return and_res;
> >  }
> > [...]
> > @@ -9975,6 +9997,38 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >/* Handle cond expr.  */
> >for (j = 0; j < ncopies; j++)
> >  {
> > +  tree vec_mask = NULL_TREE;
> > +
> > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
>
> Nit: one condition per line when the whole thing doesn't fit on a single line.
>
> > +   && TREE_CODE_CLASS (TREE_CODE (cond_expr)) == tcc_comparison
>
> Why restrict this to embedded comparisons?  It should work for separate
> comparisons too.
>
> > +   && loop_vinfo->cond_to_vec_mask)
>
> This should always be nonnull given the above.
>
> > + {
> > +   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +   if (masks)
>
> This is never null.
>
> > + {
> > +   tree loop_mask = vect_get_loop_mask (gsi, masks,
> > +ncopies, vectype, j);
> > +
> > +   cond_vmask_key cond (cond_expr, loop_mask);
> > +   tree *slot = loop_vinfo->cond_to_vec_mask->get (cond);
> > +   if (slot && *slot)
> > + vec_mask = *slot;
> > +   else
> > + {
> > +   cond.cond_ops.code
> > + = invert_tree_comparison (cond.cond_ops.code, true);
> > +   slot = loop_vinfo->cond_to_vec_mask->get (cond);
> > +   if (slot && *slot)
> > + {
> > +   vec_mask = *slot;
> > +   tree tmp = then_clause;
> > +   then_clause = else_clause;
> > +   else_clause = tmp;
>
> Can use std::swap.
>
> > + }
> > + }
> > + }
> > + }
> > +
> >stmt_vec_info new_stmt_info = NULL;
> >if (j == 0)
> >   {
> > @@ -10054,6 +10108,8 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >
> > if (masked)
> >   vec_compare = vec_cond_lhs;
> > +   else if (vec_mask)
> > + vec_compare = vec_mask;
>
> If we do drop the comparison check above, this should win over "masked".
>
> > @@ -193,6 +194,81 @@ public:
> >poly_uint64 min_value;
> >  };
> >
> > +struct cond_vmask_key
>
> I'm no good at naming things, but since "vmask" doesn't occur elsewhere
> in target-independent code, how about "vec_masked_cond_key"?
>
> > +{
> > +  cond_vmask_key (tree t, tree loop_mask_)
> > +: cond_ops (t), loop_mask (loop_mask_)
> > +  {}
> > +
> > +  hashval_t hash () const
> > +  {
> > +inchash::hash h;
> > +h.add_int (cond_ops.code);
> > +h.add_int (TREE_HASH (cond_ops.op0));
> > +h.add_int (TREE_HASH (cond_ops.op1));
>
> These two need to use inchash::add_expr, since you're hashing for

Re: [PR fortran/91496] !GCC$ directives error if mistyped or unknown

2019-08-27 Thread Harald Anlauf
Committed to trunk as svn revision 274966, after removing some
accidentally left-over unused variable declarations (copy&paste).
The actual committed version is attached.

Thanks, Paul, for the quick review!

Unless there are strong objections, I'd like to commit this to
9-branch, too, so that this can be used in the 9.3 release.
Applies/regtests cleanly.  Will wait for a week or so.

Harald

On 08/27/19 10:33, Paul Richard Thomas wrote:
> Hi Harald,
>
> This is OK for trunk.
>
> Thanks!
>
> Paul
>
> On Mon, 26 Aug 2019 at 22:13, Harald Anlauf  wrote:
>>
>> Dear all,
>>
>> the attached patch adds Fortran support for the following pragmas
>> (loop annotations): IVDEP (ignore vector dependencies), VECTOR, and
>> NOVECTOR.  Furthermore, it downgrades unsupported directives from
>> error to warning (by default, it stays an error with -pedantic),
>> thus fixing the PR.
>>
>> It has no effect on existing code (thus regtested cleanly on
>> x86_64-pc-linux-gnu), but gives users an option for fine-grained
>> control of optimization.  The above pragmas are supported by other
>> compilers (with different sentinels, e.g. !DIR$ for Intel, Cray,
>> sometimes with slightly different keywords).
>>
>> OK for trunk, and backport to 9?
>>
>> Thanks,
>> Harald
>>
>>
>> 2019-08-26  Harald Anlauf  
>>
>> PR fortran/91496
>> * gfortran.h: Extend struct gfc_iterator for loop annotations.
>> * array.c (gfc_copy_iterator): Copy loop annotations by IVDEP,
>> VECTOR, and NOVECTOR pragmas.
>> * decl.c (gfc_match_gcc_ivdep, gfc_match_gcc_vector)
>> (gfc_match_gcc_novector): New matcher functions handling IVDEP,
>> VECTOR, and NOVECTOR pragmas.
>> * match.h: Declare prototypes of matcher functions handling IVDEP,
>> VECTOR, and NOVECTOR pragmas.
>> * parse.c (decode_gcc_attribute, parse_do_block)
>> (parse_executable): Decode IVDEP, VECTOR, and NOVECTOR pragmas;
>> emit warning for unrecognized pragmas instead of error.
>> * trans-stmt.c (gfc_trans_simple_do, gfc_trans_do): Add code to
>> emit annotations for IVDEP, VECTOR, and NOVECTOR pragmas.
>> * gfortran.texi: Document IVDEP, VECTOR, and NOVECTOR pragmas.
>>
>> 2019-08-26  Harald Anlauf  
>>
>> PR fortran/91496
>> * gfortran.dg/pr91496.f90: New testcase.
>>
>
>

Index: gcc/fortran/array.c
===
--- gcc/fortran/array.c (Revision 274964)
+++ gcc/fortran/array.c (Arbeitskopie)
@@ -2185,6 +2185,9 @@
   dest->end = gfc_copy_expr (src->end);
   dest->step = gfc_copy_expr (src->step);
   dest->unroll = src->unroll;
+  dest->ivdep = src->ivdep;
+  dest->vector = src->vector;
+  dest->novector = src->novector;
 
   return dest;
 }
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c  (Revision 274964)
+++ gcc/fortran/decl.c  (Arbeitskopie)
@@ -99,6 +99,11 @@
 /* Set upon parsing a !GCC$ unroll n directive for use in the next loop.  */
 int directive_unroll = -1;
 
+/* Set upon parsing supported !GCC$ pragmas for use in the next loop.  */
+bool directive_ivdep = false;
+bool directive_vector = false;
+bool directive_novector = false;
+
 /* Map of middle-end built-ins that should be vectorized.  */
 hash_map *gfc_vectorized_builtins;
 
@@ -11528,3 +11533,53 @@
 
   return MATCH_YES;
 }
+
+/* Match an !GCC$ IVDEP statement.
+   When we come here, we have already matched the !GCC$ IVDEP string.  */
+
+match
+gfc_match_gcc_ivdep (void)
+{
+  if (gfc_match_eos () == MATCH_YES)
+{
+  directive_ivdep = true;
+  return MATCH_YES;
+}
+
+  gfc_error ("Syntax error in !GCC$ IVDEP directive at %C");
+  return MATCH_ERROR;
+}
+
+/* Match an !GCC$ VECTOR statement.
+   When we come here, we have already matched the !GCC$ VECTOR string.  */
+
+match
+gfc_match_gcc_vector (void)
+{
+  if (gfc_match_eos () == MATCH_YES)
+{
+  directive_vector = true;
+  directive_novector = false;
+  return MATCH_YES;
+}
+
+  gfc_error ("Syntax error in !GCC$ VECTOR directive at %C");
+  return MATCH_ERROR;
+}
+
+/* Match an !GCC$ NOVECTOR statement.
+   When we come here, we have already matched the !GCC$ NOVECTOR string.  */
+
+match
+gfc_match_gcc_novector (void)
+{
+  if (gfc_match_eos () == MATCH_YES)
+{
+  directive_novector = true;
+  directive_vector = false;
+  return MATCH_YES;
+}
+
+  gfc_error ("Syntax error in !GCC$ NOVECTOR directive at %C");
+  return MATCH_ERROR;
+}
Index: gcc/fortran/gfortran.h
===
--- gcc/fortran/gfortran.h  (Revision 274964)
+++ gcc/fortran/gfortran.h  (Arbeitskopie)
@@ -2418,6 +2418,9 @@
 {
   gfc_expr *var, *start, *end, *step;
   unsigned short unroll;
+  bool ivdep;
+  bool vector;
+  bool novector;
 }
 gfc_iterator;
 
@@ -2794,6 +2797,9 @@
 bool gfc_in_match_data (void);
 match gfc_match_char_spec (gfc

Re: C++ PATCH to implement C++20 P1143R2, constinit (PR c++/91360)

2019-08-27 Thread Paolo Carlini

Hi,

On 14/08/19 23:22, Marek Polacek wrote:

+  /* [dcl.spec]/2 "At most one of the constexpr, consteval, and constinit
+ keywords shall appear in a decl-specifier-seq."  */
+  if (constinit_p && constexpr_p)
+{
+  error_at (min_location (declspecs->locations[ds_constinit],
+ declspecs->locations[ds_constexpr]),
+   "can use at most one of the % and % "
+   "specifiers");


For this error we also have the option of using a gcc_rich_location, and 
add_range, etc, like for signed_p && unsigned_p, for example. Just 
saying, since we have the infrastructure ready...


Paolo.



Re: C++ PATCH to implement C++20 P1143R2, constinit (PR c++/91360)

2019-08-27 Thread Marek Polacek
On Tue, Aug 27, 2019 at 09:54:50PM +0200, Paolo Carlini wrote:
> Hi,
> 
> On 14/08/19 23:22, Marek Polacek wrote:
> > +  /* [dcl.spec]/2 "At most one of the constexpr, consteval, and constinit
> > + keywords shall appear in a decl-specifier-seq."  */
> > +  if (constinit_p && constexpr_p)
> > +{
> > +  error_at (min_location (declspecs->locations[ds_constinit],
> > + declspecs->locations[ds_constexpr]),
> > +   "can use at most one of the % and % "
> > +   "specifiers");
> 
> For this error we also have the option of using a gcc_rich_location, and
> add_range, etc, like for signed_p && unsigned_p, for example. Just saying,
> since we have the infrastructure ready...

Happy to polish the diagnostic after the core bits are in ;-).

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA


Re: C++ PATCH for c++/81676 - bogus -Wunused warnings in constexpr if

2019-08-27 Thread Marek Polacek
On Fri, Aug 23, 2019 at 02:20:53PM -0700, Jason Merrill wrote:
> On 8/19/19 11:28 AM, Marek Polacek wrote:
> > On Fri, Aug 16, 2019 at 06:20:20PM -0700, Jason Merrill wrote:
> > > On 8/16/19 2:29 PM, Marek Polacek wrote:
> > > > This patch is an attempt to fix the annoying -Wunused-but-set-* 
> > > > warnings that
> > > > tend to occur with constexpr if.  When we have something like
> > > > 
> > > > template < typename T >
> > > > int f(T v){
> > > > if constexpr(sizeof(T) == sizeof(int)){
> > > >   return v;
> > > > }else{
> > > >   return 0;
> > > > }
> > > > }
> > > > 
> > > > and call f('a'), then the condition is false, meaning that we won't 
> > > > instantiate
> > > > the then-branch, as per tsubst_expr/IF_STMT:
> > > > 17284   if (IF_STMT_CONSTEXPR_P (t) && integer_zerop (tmp))
> > > > 17285 /* Don't instantiate the THEN_CLAUSE. */;
> > > > so we'll never get round to mark_exp_read-ing the decls used in the
> > > > then-branch, causing finish_function to emit "parameter set but not 
> > > > used"
> > > > warnings.
> > > > 
> > > > It's unclear how to best deal with this.  Marking the decls DECL_READ_P 
> > > > while
> > > > parsing doesn't seem like a viable approach
> > > 
> > > Why not?
> > 
> > Well, while parsing, we're in a template and so the condition won't be
> > evaluated until tsubst_expr.  So we can't tell which branch is dead.
> 
> But if a decl is used on one branch, we shouldn't warn even if it isn't used
> on the selected branch.

I didn't want to mark the decls as read multiple times but we do it anyway
so that's no longer my concern.  So...

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-08-27  Marek Polacek  

PR c++/81676 - bogus -Wunused warnings in constexpr if.
* semantics.c (maybe_mark_exp_read_r): New function.
(finish_if_stmt): Call it on THEN_CLAUSE and ELSE_CLAUSE.

* g++.dg/cpp1z/constexpr-if31.C: New test.
* g++.dg/cpp1z/constexpr-if32.C: New test.

diff --git gcc/cp/semantics.c gcc/cp/semantics.c
index 1f7745933f9..2ba8e635ab7 100644
--- gcc/cp/semantics.c
+++ gcc/cp/semantics.c
@@ -774,6 +774,18 @@ finish_else_clause (tree if_stmt)
   ELSE_CLAUSE (if_stmt) = pop_stmt_list (ELSE_CLAUSE (if_stmt));
 }
 
+/* Callback for cp_walk_tree to mark all {VAR,PARM}_DECLs in a tree as
+   read.  */
+
+static tree
+maybe_mark_exp_read_r (tree *tp, int *, void *)
+{
+  tree t = *tp;
+  if (VAR_P (t) || TREE_CODE (t) == PARM_DECL)
+mark_exp_read (t);
+  return NULL_TREE;
+}
+
 /* Finish an if-statement.  */
 
 void
@@ -781,6 +793,16 @@ finish_if_stmt (tree if_stmt)
 {
   tree scope = IF_SCOPE (if_stmt);
   IF_SCOPE (if_stmt) = NULL;
+  if (IF_STMT_CONSTEXPR_P (if_stmt))
+{
+  /* Prevent various -Wunused warnings.  We might not instantiate
+either of these branches, so we would not mark the variables
+used in that branch as read.  */
+  cp_walk_tree_without_duplicates (&THEN_CLAUSE (if_stmt),
+  maybe_mark_exp_read_r, NULL);
+  cp_walk_tree_without_duplicates (&ELSE_CLAUSE (if_stmt),
+  maybe_mark_exp_read_r, NULL);
+}
   add_stmt (do_poplevel (scope));
 }
 
diff --git gcc/testsuite/g++.dg/cpp1z/constexpr-if31.C 
gcc/testsuite/g++.dg/cpp1z/constexpr-if31.C
new file mode 100644
index 000..02140cff9fd
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1z/constexpr-if31.C
@@ -0,0 +1,79 @@
+// PR c++/81676 - bogus -Wunused warnings in constexpr if.
+// { dg-do compile { target c++17 } }
+// { dg-options "-Wall -Wextra" }
+
+template  int
+f1 (T v)
+{
+  T x = 0;
+  if constexpr(sizeof(T) == sizeof(int))
+return v + x;
+  else
+return 0;
+}
+
+template  int
+f2 (T v) // { dg-warning "unused parameter .v." }
+{
+  T x = 0;
+  if constexpr(sizeof(T) == sizeof(int))
+return x;
+  else
+return 0;
+}
+
+template  int
+f3 (T v)
+{
+  T x = 0; // { dg-warning "unused variable .x." }
+  if constexpr(sizeof(T) == sizeof(int))
+return v;
+  else
+return 0;
+}
+
+template  int
+f4 (T v)
+{
+  T x = 0;
+  if constexpr(sizeof(T) == sizeof(int))
+return 0;
+  else
+return v + x;
+}
+
+template  int
+f5 (T v) // { dg-warning "unused parameter .v." }
+{
+  T x = 0;
+  if constexpr(sizeof(T) == sizeof(int))
+return 0;
+  else
+return x;
+}
+
+template  int
+f6 (T v)
+{
+  T x = 0; // { dg-warning "unused variable .x." }
+  if constexpr(sizeof(T) == sizeof(int))
+return 0;
+  else
+return v;
+}
+
+int main()
+{
+  f1(0);
+  f1('a');
+  f2(0);
+  f2('a');
+  f3(0);
+  f3('a');
+  f4(0);
+  f4('a');
+  f5(0);
+  f5('a');
+  f6(0);
+  f6('a');
+}
diff --git gcc/testsuite/g++.dg/cpp1z/constexpr-if32.C 
gcc/testsuite/g++.dg/cpp1z/constexpr-if32.C
new file mode 100644
index 000..13a6039fce6
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1z/constexpr-if32.C
@@ -0,0 +1,16 @@
+// PR c++/81676 - bogus -Wunused warnings in constexpr if.
+// { d

C++ PATCH for c++/91428 - warn about std::is_constant_evaluated in if constexpr

2019-08-27 Thread Marek Polacek
As discussed in 91428 and in
,

  if constexpr (std::is_constant_evaluated ())
// ...
  else
// ...

always evaluates the true branch.  Someone in the SO post said "But hopefully
compilers will just diagnose that case" so I'm adding a warning.

I didn't want to invent a completely new warning so I'm tagging along
-Wtautological-compare.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-08-27  Marek Polacek  

PR c++/91428 - warn about std::is_constant_evaluated in if constexpr.
* cp-tree.h (decl_in_std_namespace_p): Declare.
* semantics.c (is_std_constant_evaluated_p): New.
(finish_if_stmt_cond): Warn about "std::is_constant_evaluated ()" in
an if-constexpr.
* typeck.c (decl_in_std_namespace_p): No longer static.

* g++.dg/cpp2a/is-constant-evaluated9.C: New test.

diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
index 42f180d1dd3..225dbb67c63 100644
--- gcc/cp/cp-tree.h
+++ gcc/cp/cp-tree.h
@@ -7496,6 +7496,7 @@ extern tree finish_left_unary_fold_expr  (tree, int);
 extern tree finish_right_unary_fold_expr (tree, int);
 extern tree finish_binary_fold_expr  (tree, tree, int);
 extern bool treat_lvalue_as_rvalue_p(tree, bool);
+extern bool decl_in_std_namespace_p (tree);
 
 /* in typeck2.c */
 extern void require_complete_eh_spec_types (tree, tree);
diff --git gcc/cp/semantics.c gcc/cp/semantics.c
index 1f7745933f9..8603e57e7f7 100644
--- gcc/cp/semantics.c
+++ gcc/cp/semantics.c
@@ -723,6 +723,28 @@ begin_if_stmt (void)
   return r;
 }
 
+/* Returns true if FN, a CALL_EXPR, is a call to
+   std::is_constant_evaluated or __builtin_is_constant_evaluated.  */
+
+static bool
+is_std_constant_evaluated_p (tree fn)
+{
+  /* std::is_constant_evaluated takes no arguments.  */
+  if (call_expr_nargs (fn) != 0)
+return false;
+
+  tree fndecl = cp_get_callee_fndecl_nofold (fn);
+  if (fndecl_built_in_p (fndecl, CP_BUILT_IN_IS_CONSTANT_EVALUATED,
+BUILT_IN_FRONTEND))
+return true;
+
+  if (!decl_in_std_namespace_p (fndecl))
+return false;
+
+  tree name = DECL_NAME (fndecl);
+  return name && id_equal (name, "is_constant_evaluated");
+}
+
 /* Process the COND of an if-statement, which may be given by
IF_STMT.  */
 
@@ -738,6 +760,20 @@ finish_if_stmt_cond (tree cond, tree if_stmt)
 converted to bool.  */
   && TYPE_MAIN_VARIANT (TREE_TYPE (cond)) == boolean_type_node)
 {
+  /* if constexpr (std::is_constant_evaluated()) is always true,
+so give the user a clue.  */
+  if (warn_tautological_compare)
+   {
+ tree t = cond;
+ if (TREE_CODE (t) == CLEANUP_POINT_EXPR)
+   t = TREE_OPERAND (t, 0);
+ if (TREE_CODE (t) == CALL_EXPR
+ && is_std_constant_evaluated_p (t))
+   warning_at (EXPR_LOCATION (cond), OPT_Wtautological_compare,
+   "%qs always evaluates to true in %",
+   "std::is_constant_evaluated");
+   }
+
   cond = instantiate_non_dependent_expr (cond);
   cond = cxx_constant_value (cond, NULL_TREE);
 }
diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index e2a4f285a72..c09bb309142 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -9328,7 +9328,7 @@ maybe_warn_about_returning_address_of_local (tree retval)
 
 /* Returns true if DECL is in the std namespace.  */
 
-static bool
+bool
 decl_in_std_namespace_p (tree decl)
 {
   return (decl != NULL_TREE
diff --git gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated9.C 
gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated9.C
new file mode 100644
index 000..37833698992
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated9.C
@@ -0,0 +1,49 @@
+// PR c++/91428 - warn about std::is_constant_evaluated in if constexpr.
+// { dg-do compile { target c++2a } }
+// { dg-options "-Wtautological-compare" }
+
+namespace std {
+  constexpr inline bool
+  is_constant_evaluated () noexcept
+  {
+return __builtin_is_constant_evaluated (); 
+  }
+}
+
+constexpr int
+foo(int i)
+{
+  if constexpr (std::is_constant_evaluated ()) // { dg-warning 
".std::is_constant_evaluated. always evaluates to true in .if constexpr." }
+return 42;
+  else
+return i;
+}
+
+constexpr int
+foo2(int i)
+{
+  if constexpr (__builtin_is_constant_evaluated ()) // { dg-warning 
".std::is_constant_evaluated. always evaluates to true in .if constexpr." }
+return 42;
+  else
+return i;
+}
+
+constexpr int
+foo3(int i)
+{
+  // I is not a constant expression but we short-circuit it.
+  if constexpr (__builtin_is_constant_evaluated () || i)
+return 42;
+  else
+return i;
+}
+
+constexpr int
+foo4(int i)
+{
+  const int j = 0;
+  if constexpr (j && __builtin_is_constant_evaluated ())
+return 42;
+  else
+return i;
+}


[PATCH] PR fortran/91565 -- Extra checks on ORDER

2019-08-27 Thread Steve Kargl
The attached ptch implements additional checks on the
ORDER dummy argument for the RESHAPE intrinsic function.
Built and regression tested on x86_64-*-freebsd.  OK to
commit?

2019-08-27  Steven G. Kargl  

PR fortran/91565
* simplify.c (gfc_simplify_reshape): Add additional checks of the
ORDER dummy argument.

2019-08-27  Steven G. Kargl  

PR fortran/91565
* gfortran.dg/pr91565.f90: New test.
-- 
Steve
Index: gcc/fortran/simplify.c
===
--- gcc/fortran/simplify.c	(revision 274961)
+++ gcc/fortran/simplify.c	(working copy)
@@ -6495,7 +6503,14 @@ gfc_simplify_real (gfc_expr *e, gfc_expr *k)
   if (e->expr_type != EXPR_CONSTANT)
 return NULL;
 
+  /* For explicit conversion, turn off -Wconversion and -Wconversion-extra
+ warning.  */
+  tmp1 = warn_conversion;
+  tmp2 = warn_conversion_extra;
+  warn_conversion = warn_conversion_extra = 0;
   result = gfc_convert_constant (e, BT_REAL, kind);
+  warn_conversion = tmp1;
+  warn_conversion_extra = tmp2;
   if (result == &gfc_bad_expr)
 return &gfc_bad_expr;
 
@@ -6668,6 +6683,9 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shap
   mpz_init (index);
   rank = 0;
 
+  for (i = 0; i < GFC_MAX_DIMENSIONS; i++)
+x[i] = 0;
+
   for (;;)
 {
   e = gfc_constructor_lookup_expr (shape_exp->value.constructor, rank);
@@ -6692,9 +6710,29 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shap
 }
   else
 {
-  for (i = 0; i < rank; i++)
-	x[i] = 0;
+  mpz_t size;
+  int order_size, shape_size;
 
+  if (order_exp->rank != shape_exp->rank)
+	{
+	  gfc_error ("Shapes of ORDER at %L and SHAPE at %L are different",
+		 &order_exp->where, &shape_exp->where);
+	  return &gfc_bad_expr;
+	}
+
+  gfc_array_size (shape_exp, &size);
+  shape_size = mpz_get_ui (size);
+  mpz_clear (size);
+  gfc_array_size (order_exp, &size);
+  order_size = mpz_get_ui (size);
+  mpz_clear (size);
+  if (order_size != shape_size)
+	{
+	  gfc_error ("Sizes of ORDER at %L and SHAPE at %L are different",
+		 &order_exp->where, &shape_exp->where);
+	  return &gfc_bad_expr;
+	}
+
   for (i = 0; i < rank; i++)
 	{
 	  e = gfc_constructor_lookup_expr (order_exp->value.constructor, i);
@@ -6704,7 +6742,12 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shap
 
 	  gcc_assert (order[i] >= 1 && order[i] <= rank);
 	  order[i]--;
-	  gcc_assert (x[order[i]] == 0);
+	  if (x[order[i]] != 0)
+	{
+	  gfc_error ("ORDER at %L is not a permutation of the size of "
+			 "SHAPE at %L", &order_exp->where, &shape_exp->where);
+	  return &gfc_bad_expr;
+	}
 	  x[order[i]] = 1;
 	}
 }
Index: gcc/testsuite/gfortran.dg/pr91565.f90
===
--- gcc/testsuite/gfortran.dg/pr91565.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91565.f90	(working copy)
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! PR fortran/91565
+! Contributed by Gerhard Steinmetz
+program p
+   integer, parameter :: a(2) = [2,2]  ! { dg-error "\(1\)" }
+   print *, reshape([1,2,3,4,5,6], [2,3], order=a) ! { dg-error "not a permutation" }
+end
+
+subroutine foo
+   integer, parameter :: a(1) = 1  ! { dg-error "\(1\)" }
+   print *, reshape([1,2,3,4,5,6], [2,3], order=a) ! { dg-error "are different" }
+end
+
+subroutine bar
+   integer, parameter :: a(1,2) = 1! { dg-error "\(1\)" }
+   print *, reshape([1,2,3,4,5,6], [2,3], order=a) ! { dg-error "are different" }
+end


[PATCH] correct an ILP32/LP64 bug in sprintf warning (PR 91567)

2019-08-27 Thread Martin Sebor

The recent sprintf+strlen integration doesn't handle unbounded
string lengths entirely correctly for ILP32 targets and causes
-Wformat-overflow false positives in some common cases, including
during GCC bootstrap targeting such systems  The attached patch
fixes that mistake.  (I think this code could be cleaned up and
simplified some more but in the interest of unblocking the ILP32
bootstrap and Glibc builds I haven't taken the time to do that.)
The patch also adjusts down the maximum strlen result set by EVRP
to PTRDIFF_MAX - 2, to match what the strlen pass does.

The strlen maximum would ideally be computed in terms of
max_object_size() (for which there would ideally be a --param
setting), and checked the same way to avoid off-by-one mistakes
between subsystems and their clients.  I have not made this change
here but added a FIXME comment mentioning it.  I plan to add such
a parameter and use it in max_object_size() in a future change.

Testing with an ILP32 compiler also ran into the known limitation
of the strlen pass being unable to determine the length of array
members of local aggregates (PR 83543) initialized using
the braced-list syntax.  gcc.dg/tree-ssa/builtin-snprintf-6.c
fails a few cases as a result.I've xfailed the assertions
for targets other than x86_64 where it isn't an issue.

Martin
PR tree-optimization/91567 - Spurious -Wformat-overflow warnings building glibc (32-bit only)

gcc/ChangeLog:

	PR tree-optimization/91567
	* gimple-ssa-sprintf.c (get_string_length): Handle more forms of lengths
	of unknown strings.
	* vr-values.c (vr_values::extract_range_basic): Set strlen upper bound
	to PTRDIFF_MAX - 2.

gcc/testsuite/ChangeLog:

	PR tree-optimization/91567
	* gcc.dg/tree-ssa/builtin-snprintf-6.c: Xfail a subset of assertions
	on targets other than x86_64 to work around PR 83543.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-22.c: New test.

Index: gcc/gimple-ssa-sprintf.c
===
--- gcc/gimple-ssa-sprintf.c	(revision 274960)
+++ gcc/gimple-ssa-sprintf.c	(working copy)
@@ -1994,11 +1994,22 @@ get_string_length (tree str, unsigned eltsize, con
  or it's SIZE_MAX otherwise.  */
 
   /* Return the default result when nothing is known about the string.  */
-  if (lendata.maxbound
-  && integer_all_onesp (lendata.maxbound)
-  && integer_all_onesp (lendata.maxlen))
-return fmtresult ();
+  if (lendata.maxbound)
+{
+  if (integer_all_onesp (lendata.maxbound)
+  	  && integer_all_onesp (lendata.maxlen))
+  	return fmtresult ();
 
+  if (!tree_fits_uhwi_p (lendata.maxbound)
+	  || !tree_fits_uhwi_p (lendata.maxlen))
+  	return fmtresult ();
+
+  unsigned HOST_WIDE_INT lenmax = tree_to_uhwi (max_object_size ()) - 2;
+  if (lenmax <= tree_to_uhwi (lendata.maxbound)
+	  && lenmax <= tree_to_uhwi (lendata.maxlen))
+	return fmtresult ();
+}
+
   HOST_WIDE_INT min
 = (tree_fits_uhwi_p (lendata.minlen)
? tree_to_uhwi (lendata.minlen)
Index: gcc/vr-values.c
===
--- gcc/vr-values.c	(revision 274960)
+++ gcc/vr-values.c	(working copy)
@@ -1319,7 +1319,12 @@ vr_values::extract_range_basic (value_range *vr, g
 		tree max = vrp_val_max (ptrdiff_type_node);
 		wide_int wmax = wi::to_wide (max, TYPE_PRECISION (TREE_TYPE (max)));
 		tree range_min = build_zero_cst (type);
-		tree range_max = wide_int_to_tree (type, wmax - 1);
+		/* To account for the terminating NUL, the maximum length
+		   is one less than the maximum array size, which in turn
+		   is one  less than PTRDIFF_MAX (or SIZE_MAX where it's
+		   smaller than the former type).
+		   FIXME: Use max_object_size() - 1 here.  */
+		tree range_max = wide_int_to_tree (type, wmax - 2);
 		vr->set (VR_RANGE, range_min, range_max);
 		return;
 	  }
Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-6.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-6.c	(revision 274960)
+++ gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-6.c	(working copy)
@@ -65,6 +65,10 @@ void test_assign_init_list (void)
   T (5, ARGS ({ 1, 2, 3, 4, 5, 6, 0 }), "s=%.*s", 3, &a[2]);
 }
 
+#if __x86_64__
+
+/* Enabled only on x86_64 to work around PR 83543.  */
+
 #undef T
 #define T(expect, init, fmt, ...)			\
   do {			\
@@ -87,7 +91,10 @@ void test_assign_aggregate (void)
   T (5, "123456", "s=%.*s", 3, &s.a[2]);
 }
 
+/* { dg-final { scan-tree-dump-times "Function test_assign_aggregate" 1 "optimized" { xfail { ! x86_64-*-* } } } } */
 
+#endif   /* x86_64 */
+
 #undef T
 #define T(expect, init, fmt, ...)			\
   do {			\
Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-22.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-22.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-22.c	(working copy)
@@ -0,0 +1,58 @@
+/

Re: [PATCH] builtin fadd variants implementation

2019-08-27 Thread Joseph Myers
On Mon, 26 Aug 2019, Tejas Joshi wrote:

> Hello.
> I have made changes in the patch according to the above corrections.
> However, I didn't understand how these following testcases are
> supposed to handle. Will you please elaborate some more?
> 
> > (E.g. fadd (0x1.01p0, FLT_MIN), as an example from the glibc
> > tests: cases where an intermediate rounding produces a result half way
> > between two values of the narrower type, but the exact value is such that
> > the result of fadd should end up with last bit odd whereas double rounding
> > would result in last bit even in such half-way cases.)

The point of this is to demonstrate that fadd (x, y) is different from 
(float) (x + y), by testing with inputs for which the two evaluate to 
different values.

> > Then you should have some tests of what does *not* get optimized with
> > given compiler options if possible.  (Such a test might e.g. define a
> > static fadd function locally that checks it gets called as expected, or
> > else check the exceptions / errno if you rely on a suitable libm being
> > available.)

There would include:

* A test where the result is within range but inexact; say fadd (1, 
DBL_MIN).  With -ftrapping-math -fno-rounding-math, or -frounding-math 
-fno-trapping-math, or -frounding-math -ftrapping-math, this should not be 
folded; that is, it should be compiled to call a fadd function (which you 
might define in the test as a staic function that sets a variable to 
indicate that it was called, so the test can verify at runtime that the 
call did not get folded).

* But the same inputs, with -fno-trapping-math -fno-rounding-math 
-fmath-errno, *should* get folded (so test the same inputs with those 
options with a link_error test like those for roundeven).

* Then similarly test overflow / underflow cases (e.g. fadd (DBL_MAX, 
DBL_MAX) or fadd (DBL_MIN, DBL_MIN)) with -fno-trapping-math 
-fno-rounding-math -fmath-errno (make sure they don't get folded), and 
with -fno-trapping-math-fno-rounding-math -fno-math-errno (make sure that 
in that case they do get folded, so link_error tests).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] correct an ILP32/LP64 bug in sprintf warning (PR 91567)

2019-08-27 Thread Jeff Law
On 8/27/19 4:34 PM, Martin Sebor wrote:
> The recent sprintf+strlen integration doesn't handle unbounded
> string lengths entirely correctly for ILP32 targets and causes
> -Wformat-overflow false positives in some common cases, including
> during GCC bootstrap targeting such systems  The attached patch
> fixes that mistake.  (I think this code could be cleaned up and
> simplified some more but in the interest of unblocking the ILP32
> bootstrap and Glibc builds I haven't taken the time to do that.)
> The patch also adjusts down the maximum strlen result set by EVRP
> to PTRDIFF_MAX - 2, to match what the strlen pass does.
> 
> The strlen maximum would ideally be computed in terms of
> max_object_size() (for which there would ideally be a --param
> setting), and checked the same way to avoid off-by-one mistakes
> between subsystems and their clients.  I have not made this change
> here but added a FIXME comment mentioning it.  I plan to add such
> a parameter and use it in max_object_size() in a future change.
> 
> Testing with an ILP32 compiler also ran into the known limitation
> of the strlen pass being unable to determine the length of array
> members of local aggregates (PR 83543) initialized using
> the braced-list syntax.  gcc.dg/tree-ssa/builtin-snprintf-6.c
> fails a few cases as a result.I've xfailed the assertions
> for targets other than x86_64 where it isn't an issue.
> 
> Martin
> 
> gcc-91567.diff
> 
> PR tree-optimization/91567 - Spurious -Wformat-overflow warnings building 
> glibc (32-bit only)
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/91567
>   * gimple-ssa-sprintf.c (get_string_length): Handle more forms of lengths
>   of unknown strings.
>   * vr-values.c (vr_values::extract_range_basic): Set strlen upper bound
>   to PTRDIFF_MAX - 2.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/91567
>   * gcc.dg/tree-ssa/builtin-snprintf-6.c: Xfail a subset of assertions
>   on targets other than x86_64 to work around PR 83543.
>   * gcc.dg/tree-ssa/builtin-sprintf-warn-22.c: New test.
OK.  I'm not sure this will fix the glibc issue (it looked like it was
unrelated to ILP32 to me).

jeff


Fix ICE from sprintf/strlen integration work

2019-08-27 Thread Jeff Law

This showed up as an ICE when building the kernel on ppc64le after the
sprintf/strlen integration.

In simplest terms we can't size the ssa_ver_to_stridx array until after
we have initialized the loop optimizer and SCEV as they create new
SSA_NAMEs.  Usually this isn't a problem, but it can be if we need more
names than are currently available on the freelist of SSA_NAMEs to recycle.

Bootstrapped and regression tested on ppc64le.  Also verified the kernel
builds again.

Installing on the trunk,

jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 39459d087b2..1d9d6cf0ff2 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2019-08-27  Jeff Law  
+
+   * tree-ssa-strlen.c (printf_strlen_execute): Initialize
+   the loop optimizer and SCEV before sizing ssa_ver_to_stridx.
+
 2019-08-27  Uroš Bizjak  
 
PR target/91528
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 37133de87f5..2560faf8526 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2019-08-27  Jeff Law  
+
+   * gcc.c-torture/compile/20190827-1.c: New test.
+
 2019-08-27  Harald Anlauf  
 
PR fortran/91496
diff --git a/gcc/testsuite/gcc.c-torture/compile/20190827-1.c 
b/gcc/testsuite/gcc.c-torture/compile/20190827-1.c
new file mode 100644
index 000..f0956179b1d
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/20190827-1.c
@@ -0,0 +1,104 @@
+typedef unsigned char __u8;
+typedef __u8 u8;
+typedef u8 u_int8_t;
+typedef unsigned int gfp_t;
+
+struct list_head
+{
+  struct list_head *next, *prev;
+};
+extern int strcmp (const char *, const char *);
+enum
+{
+  NFPROTO_UNSPEC = 0,
+  NFPROTO_INET = 1,
+  NFPROTO_IPV4 = 2,
+  NFPROTO_ARP = 3,
+  NFPROTO_NETDEV = 5,
+  NFPROTO_BRIDGE = 7,
+  NFPROTO_IPV6 = 10,
+  NFPROTO_DECNET = 12,
+  NFPROTO_NUMPROTO,
+};
+
+struct xt_target
+{
+  struct list_head list;
+  const char name[29];
+  u_int8_t revision;
+};
+
+struct xt_af
+{
+  struct list_head target;
+};
+
+static struct xt_af *xt;
+
+struct xt_af * kcalloc (int, int, int);
+
+static int
+target_revfn (u8 af, const char *name, u8 revision, int *bestp)
+{
+  const struct xt_target *t;
+  int have_rev = 0;
+
+  for (t = (
+{
+void *__mptr = (void *)((&xt[af].target)->next);
+((typeof (*t) *) (__mptr -
+  __builtin_offsetof (typeof (*t), list)));}
+   ); &t->list != (&xt[af].target); t = (
+ {
+ void *__mptr =
+ (void *)((t)->list.next);
+ ((typeof (*(t)) *) (__mptr -
+ 
__builtin_offsetof
+ (typeof
+  (*(t)),
+  list)));}
+   ))
+{
+  if (strcmp (t->name, name) == 0)
+   {
+ if (t->revision > *bestp)
+   *bestp = t->revision;
+ if (t->revision == revision)
+   have_rev = 1;
+   }
+}
+
+  if (af != NFPROTO_UNSPEC && !have_rev)
+return target_revfn (NFPROTO_UNSPEC, name, revision, bestp);
+
+  return have_rev;
+}
+
+int
+xt_find_revision (u8 af, const char *name, u8 revision, int target, int *err)
+{
+  int have_rev, best = -1;
+
+  have_rev = target_revfn (af, name, revision, &best);
+
+
+  if (best == -1)
+{
+  *err = -2;
+  return 0;
+}
+
+}
+
+
+static int __attribute__ ((__section__ (".init.text")))
+  __attribute__ ((__cold__)) xt_init (void)
+{
+  xt =
+kcalloc (NFPROTO_NUMPROTO, sizeof (struct xt_af),
+(((gfp_t) (0x400u | 0x800u)) | ((gfp_t) 0x40u) |
+ ((gfp_t) 0x80u)));
+}
+
+int init_module (void) __attribute__ ((__copy__ (xt_init)))
+  __attribute__ ((alias ("xt_init")));;
diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 5c5b83833c8..d38352a0c4c 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -4850,13 +4850,6 @@ printf_strlen_execute (function *fun, bool warn_only)
 {
   strlen_optimize = !warn_only;
 
-  gcc_assert (!strlen_to_stridx);
-  if (warn_stringop_overflow || warn_stringop_truncation)
-strlen_to_stridx = new hash_map ();
-
-  ssa_ver_to_stridx.safe_grow_cleared (num_ssa_names);
-  max_stridx = 1;
-
   calculate_dominance_info (CDI_DOMINATORS);
 
   bool use_scev = optimize > 0 && flag_printf_return_value;
@@ -4866,6 +4859,15 @@ printf_strlen_execute (function *fun, bool warn_only)
   scev_initialize ();
 }
 
+  gcc_assert (!strlen_to_stridx);
+  if (warn_stringop_overflow || warn_stringop_truncation)
+strlen_to_stridx = new hash_map ();
+
+  /* This has to happen after

[PATCH V4 03/11] testsuite: annotate c-torture/compile tests with dg-require-stack-size

2019-08-27 Thread Jose E. Marchesi
This patch annotates tests that make use of a significant a mount of
stack space.  Embedded and other restricted targets may have problems
compiling and running these tests.  Note that the annotations are in
many cases not exact.

testsuite/ChangeLog:

* gcc.c-torture/compile/2609-1.c: Annotate with
dg-require-stack-size.
* gcc/testsuite/gcc.c-torture/compile/2804-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20020304-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20020604-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20021015-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20050303-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20060421-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20071207-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20080903-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20121027-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20151204.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/920501-12.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/920501-4.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/920723-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/921202-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/931003-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/931004-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/950719-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/951222-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/990517-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/bcopy.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr23929.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr25310.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr34458.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr39937.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr41181.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr41634.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr43415.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr43417.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr44788.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/sound.c: Likewise.
---
 gcc/testsuite/ChangeLog  | 35 
 gcc/testsuite/gcc.c-torture/compile/2609-1.c |  2 ++
 gcc/testsuite/gcc.c-torture/compile/2804-1.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/20020304-1.c |  2 ++
 gcc/testsuite/gcc.c-torture/compile/20020604-1.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/20021015-1.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/20050303-1.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/20060421-1.c |  2 ++
 gcc/testsuite/gcc.c-torture/compile/20071207-1.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/20080903-1.c |  2 ++
 gcc/testsuite/gcc.c-torture/compile/20121027-1.c |  2 ++
 gcc/testsuite/gcc.c-torture/compile/20151204.c   |  1 +
 gcc/testsuite/gcc.c-torture/compile/920501-12.c  |  1 +
 gcc/testsuite/gcc.c-torture/compile/920501-4.c   |  1 +
 gcc/testsuite/gcc.c-torture/compile/920723-1.c   |  1 +
 gcc/testsuite/gcc.c-torture/compile/921202-1.c   |  2 ++
 gcc/testsuite/gcc.c-torture/compile/931003-1.c   |  2 ++
 gcc/testsuite/gcc.c-torture/compile/931004-1.c   |  2 ++
 gcc/testsuite/gcc.c-torture/compile/950719-1.c   |  2 ++
 gcc/testsuite/gcc.c-torture/compile/951222-1.c   |  2 ++
 gcc/testsuite/gcc.c-torture/compile/990517-1.c   |  3 ++
 gcc/testsuite/gcc.c-torture/compile/bcopy.c  |  1 +
 gcc/testsuite/gcc.c-torture/compile/pr23929.c|  1 +
 gcc/testsuite/gcc.c-torture/compile/pr25310.c|  1 +
 gcc/testsuite/gcc.c-torture/compile/pr34458.c|  1 +
 gcc/testsuite/gcc.c-torture/compile/pr39937.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/pr41181.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/pr41634.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/pr43415.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/pr43417.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/pr44788.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/sound.c  |  1 +
 32 files changed, 84 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/compile/2609-1.c 
b/gcc/testsuite/gcc.c-torture/compile/2609-1.c
index f03aa35a7ac..e41701cc6d9 100644
--- a/gcc/testsuite/gcc.c-torture/compile/2609-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/2609-1.c
@@ -1,3 +1,5 @@
+/* { dg-require-stack-size "1024" } */
+
 int main ()
 {
   char temp[1024] = "tempfile";
diff --git a/gcc/testsuite/gcc.c-torture/compile/2804-1.c 
b/gcc/testsuite/gcc.c-torture/compile/2804-1.c
index 35464c212d2..550669b53a3 100644
--- a/gcc/testsuite/gcc.c-torture/compile/2804-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/2804-1.c
@@ -6,6 +6,7 @@
 /* { dg-skip-if "Not enough 64-bit registers" { pdp11-*-* 

[PATCH V4 01/11] Update config.sub and config.guess.

2019-08-27 Thread Jose E. Marchesi
* config.sub: Import upstream version 2019-06-30.
* config.guess: Import upstream version 2019-07-24.
---
 ChangeLog|   5 ++
 config.guess | 264 +++
 config.sub   |  50 +--
 3 files changed, 240 insertions(+), 79 deletions(-)

diff --git a/config.guess b/config.guess
index 8e2a58b864f..97ad0733304 100755
--- a/config.guess
+++ b/config.guess
@@ -2,7 +2,7 @@
 # Attempt to guess a canonical system name.
 #   Copyright 1992-2019 Free Software Foundation, Inc.
 
-timestamp='2019-01-03'
+timestamp='2019-07-24'
 
 # This file is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
@@ -262,6 +262,9 @@ case 
"$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in
 *:SolidBSD:*:*)
echo "$UNAME_MACHINE"-unknown-solidbsd"$UNAME_RELEASE"
exit ;;
+*:OS108:*:*)
+   echo "$UNAME_MACHINE"-unknown-os108_"$UNAME_RELEASE"
+   exit ;;
 macppc:MirBSD:*:*)
echo powerpc-unknown-mirbsd"$UNAME_RELEASE"
exit ;;
@@ -275,8 +278,8 @@ case 
"$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in
echo "$UNAME_MACHINE"-unknown-redox
exit ;;
 mips:OSF1:*.*)
-echo mips-dec-osf1
-exit ;;
+   echo mips-dec-osf1
+   exit ;;
 alpha:OSF1:*:*)
case $UNAME_RELEASE in
*4.0)
@@ -385,20 +388,7 @@ case 
"$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in
echo sparc-hal-solaris2"`echo "$UNAME_RELEASE"|sed -e 's/[^.]*//'`"
exit ;;
 sun4*:SunOS:5.*:* | tadpole*:SunOS:5.*:*)
-   set_cc_for_build
-   SUN_ARCH=sparc
-   # If there is a compiler, see if it is configured for 64-bit objects.
-   # Note that the Sun cc does not turn __LP64__ into 1 like gcc does.
-   # This test works for both compilers.
-   if [ "$CC_FOR_BUILD" != no_compiler_found ]; then
-   if (echo '#ifdef __sparcv9'; echo IS_64BIT_ARCH; echo '#endif') | \
-   (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \
-   grep IS_64BIT_ARCH >/dev/null
-   then
-   SUN_ARCH=sparcv9
-   fi
-   fi
-   echo "$SUN_ARCH"-sun-solaris2"`echo "$UNAME_RELEASE"|sed -e 
's/[^.]*//'`"
+   echo sparc-sun-solaris2"`echo "$UNAME_RELEASE" | sed -e 's/[^.]*//'`"
exit ;;
 i86pc:AuroraUX:5.*:* | i86xen:AuroraUX:5.*:*)
echo i386-pc-auroraux"$UNAME_RELEASE"
@@ -998,22 +988,50 @@ EOF
exit ;;
 mips:Linux:*:* | mips64:Linux:*:*)
set_cc_for_build
+   IS_GLIBC=0
+   test x"${LIBC}" = xgnu && IS_GLIBC=1
sed 's/^//' << EOF > "$dummy.c"
#undef CPU
-   #undef ${UNAME_MACHINE}
-   #undef ${UNAME_MACHINE}el
+   #undef mips
+   #undef mipsel
+   #undef mips64
+   #undef mips64el
+   #if ${IS_GLIBC} && defined(_ABI64)
+   LIBCABI=gnuabi64
+   #else
+   #if ${IS_GLIBC} && defined(_ABIN32)
+   LIBCABI=gnuabin32
+   #else
+   LIBCABI=${LIBC}
+   #endif
+   #endif
+
+   #if ${IS_GLIBC} && defined(__mips64) && defined(__mips_isa_rev) && 
__mips_isa_rev>=6
+   CPU=mipsisa64r6
+   #else
+   #if ${IS_GLIBC} && !defined(__mips64) && defined(__mips_isa_rev) && 
__mips_isa_rev>=6
+   CPU=mipsisa32r6
+   #else
+   #if defined(__mips64)
+   CPU=mips64
+   #else
+   CPU=mips
+   #endif
+   #endif
+   #endif
+
#if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL) || 
defined(MIPSEL)
-   CPU=${UNAME_MACHINE}el
+   MIPS_ENDIAN=el
#else
#if defined(__MIPSEB__) || defined(__MIPSEB) || defined(_MIPSEB) || 
defined(MIPSEB)
-   CPU=${UNAME_MACHINE}
+   MIPS_ENDIAN=
#else
-   CPU=
+   MIPS_ENDIAN=
#endif
#endif
 EOF
-   eval "`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^CPU'`"
-   test "x$CPU" != x && { echo "$CPU-unknown-linux-$LIBC"; exit; }
+   eval "`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep 
'^CPU\|^MIPS_ENDIAN\|^LIBCABI'`"
+   test "x$CPU" != x && { echo 
"$CPU${MIPS_ENDIAN}-unknown-linux-$LIBCABI"; exit; }
;;
 mips64el:Linux:*:*)
echo "$UNAME_MACHINE"-unknown-linux-"$LIBC"
@@ -1126,7 +1144,7 @@ EOF
*Pentium)UNAME_MACHINE=i586 ;;
*Pent*|*Celeron) UNAME_MACHINE=i686 ;;
esac
-   echo 
"$UNAME_MACHINE-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}{$UNAME_VERSION}"
+   echo 
"$UNAME_MACHINE-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}${UNAME_VERSION}"
exit ;;
 i*86:*:3.2:*)
if test -f /usr/options/cb.name; then
@@ -1310,38 +1328,39 @@ EOF
echo "$UNAME_MACHINE"-apple-rhapsody"$UNAME_RELEASE"
exit ;;
 *:Darwin:*:*)
-   UNAME_PROCESSOR=`uname -p` || UNAME_PROCESSOR=unknown
-   set_cc_for_build
-   if test "$UNAME_PROCESSOR" = unknown ; then
-   

[PATCH V4 06/11] bpf: new libgcc port

2019-08-27 Thread Jose E. Marchesi
This patch adds an eBPF port to libgcc.

As of today, compiled eBPF programs do not support a single-entry
point schema.  Instead, a BPF "executable" is a relocatable ELF object
file containing multiple entry points, in certain named sections.

Also, the BPF loaders in the kernel do not execute .ini/.fini
constructors/destructors.  Therefore, this patch provides empty crtn.S
and cri.S files.

libgcc/ChangeLog:

* config.host: Set cpu_type for bpf-*-* targets.
* config/bpf/t-bpf: Likewise.
* config/bpf/crtn.S: Likewise.
* config/bpf/crti.S: New file.
---
 libgcc/ChangeLog |  7 +++
 libgcc/config.host   |  7 +++
 libgcc/config/bpf/crti.S |  0
 libgcc/config/bpf/crtn.S |  0
 libgcc/config/bpf/t-bpf  | 23 +++
 5 files changed, 37 insertions(+)
 create mode 100644 libgcc/config/bpf/crti.S
 create mode 100644 libgcc/config/bpf/crtn.S
 create mode 100644 libgcc/config/bpf/t-bpf

diff --git a/libgcc/config.host b/libgcc/config.host
index 503ebb6be20..2e9fbc35482 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -107,6 +107,9 @@ avr-*-*)
 bfin*-*)
cpu_type=bfin
;;
+bpf-*-*)
+cpu_type=bpf
+;;
 cr16-*-*)
;;
 crisv32-*-*)
@@ -526,6 +529,10 @@ bfin*-*)
tmake_file="$tmake_file bfin/t-bfin t-fdpbit"
extra_parts="crtbegin.o crtend.o crti.o crtn.o"
 ;;
+bpf-*-*)
+tmake_file="$tmake_file ${cpu_type}/t-${cpu_type}"
+extra_parts="crti.o crtn.o"
+   ;;
 cr16-*-elf)
tmake_file="${tmake_file} cr16/t-cr16 cr16/t-crtlibid t-fdpbit"
extra_parts="$extra_parts crti.o crtn.o crtlibid.o"
diff --git a/libgcc/config/bpf/crti.S b/libgcc/config/bpf/crti.S
new file mode 100644
index 000..e69de29bb2d
diff --git a/libgcc/config/bpf/crtn.S b/libgcc/config/bpf/crtn.S
new file mode 100644
index 000..e69de29bb2d
diff --git a/libgcc/config/bpf/t-bpf b/libgcc/config/bpf/t-bpf
new file mode 100644
index 000..88129a78f61
--- /dev/null
+++ b/libgcc/config/bpf/t-bpf
@@ -0,0 +1,23 @@
+LIB2ADDEH = 
+
+crti.o: $(srcdir)/config/bpf/crti.S
+   $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $<
+
+crtn.o: $(srcdir)/config/bpf/crtn.S
+   $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $<
+
+# Some of the functions defined in libgcc2 exceed the eBPF stack
+# limit, or other restrictions imposed by this peculiar target.
+# Therefore we have to exclude them here.
+#
+# Patterns in bpf.md must guarantee that no calls to the excluded
+# functions are ever generated, and compiler tests should make sure
+# this holds.
+#
+# Note that the modes in the function names below are misleading: di
+# means TImode.
+LIB2FUNCS_EXCLUDE = _mulvdi3 _divdi3 _moddi3 _divmoddi4 _udivdi3 _umoddi3 \
+_udivmoddi4
+
+# Prevent building "advanced" stuff (for example, gcov support).
+INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
-- 
2.11.0



[PATCH V4 00/11] eBPF support for GCC

2019-08-27 Thread Jose E. Marchesi
[Differences from V3:
. Formatting/style fixes:
  + Remove redundant braces.
  + Remove unneeded ATTRIBUTE_UNUSED.
  + Truncate too long lines.
  + Remove an odd line split.
  + Do not break after returns.
  + Use function_arg_info methods instead of auxiliary functions.
  + Fix indentation in cbranchdi4.
. Use `sorry' and sorry_at to report lack of support for valid
  constructs.
. Rename GR_REGS to GENERAL_REGS and exclude the pseudo arg_reg
  from the class.
. Fix opcode in "sub3" insns.
. The mul32 instruction zero-extends arguments and then performs
  a multiplication.  Use the right pattern for this in bpf.md and
  name it umulsidi3.
. Use a mov32 instructions for zero-extending 32-bit values to 64-bit.
. Remove the *extendsidi2 insn, as it is not needed.
. (now for real, ahem) Remove dubious code from the move expanders, by
  not accepting constant addresses as legit.  Rework handling of call
  instructions accordingly.
. Improve "too many arguments" error reporting.
  New test gcc.target/bpf/diag-funargs-3.c.
. xfail gcc.target/bpf/constant-calls.c.
. Rebased to today's master.]

Hi people!

This patch series introduces a port of GCC to eBPF, which is a virtual
machine that resides in the Linux kernel.  Initially intended for
user-level packet capture and filtering, eBPF is nowadays generalized
to serve as a general-purpose infrastructure also for non-networking
purposes.

The binutils support is already upstream.  See
https://sourceware.org/ml/binutils/2019-05/msg00306.html.

eBPF architecture and ABI
=
   
Documentation for eBPF can be found in the linux kernel source tree,
file Documentation/networking/filter.txt.  It covers the instructions
set, the way the interpreter works and the many restrictions imposed
by the kernel verifier.
   
As for the ABI, att this moment compiled eBPF doesn't have very well
established conventions.  The details on what is expected to be in an
ELF file containing eBPF is determined, in practice, by what the llvm
BPF backend generates and what is expected by the the two existing
kernel loaders: bpf_load.c and libbpf.

We hope that the addition of this port to the GNU toolchain will help
to mature this domain.

Overview of the patch series

   
The first few patches are preparatory:

. The first patch updates config.guess and config.sub from the
  'config' upstream project, in order to recognize bpf-*-* triplets.

. The second patch fixes an integrity check in opt-functions.awk.

. The third patch annotates many tests in the gcc.c-torture/compile
  testsuite with their requirements in terms of stack frame size,
  using the existing dg-require-stack-size machinery.

. The fourth patch introduces a new effective target flag called
  indirect_call, and annotates the tests in gcc.c-torture/compile
  accordingly.

The rest of the patches are BPF specific:

The fifth patch adds the new GCC port proper.  Machine description,
implementation of target hooks and macros, command-line options and
the like.

The sixth patch adds a libgcc port for eBPF.  At the moment, it is
minimal and it basically addresses the limitations imposed by the
target, by excluding a few functions in libgcc2 (all of them related
to TImodes) whose default implementations exceed the eBPF stack limit.

The seventh, eight and ninth patches deal with testing the new
port. The gcc.target testsuite is extended with eBPF-specific tests,
covering the backend-specific built-in functions and diagnostics.  The
check-effective-target functions are made aware of eBPF targets. Many
tests in the gcc.c-torture/compile testsuite are annotated to be
skipped in bpf-*-* targets, since they violate some restriction
imposed by the hardware (such as surpassing the stack limit.)  The
resulting testsuite doesn't have unexpected failures, and is currently
the principal way to check for regressions in the port.  Likewise,
many tests in the gcc.dg testsuite are annotated to be skipped in
bpf-*-* targets.

The tenth patch adds documentation updates to the GCC manual,
including information on the new command line options and compiler
built-ins.

Finally, the last patch adds myself as the maintainer of the BPF port.
I personally commit to evolve and maintain the port for as long as
necessary, and to find a suitable replacement in case I have to step
down for whatever reason.

Some notes on the port
==

As a compilation target, eBPF is rather peculiar.  This is mainly due
to the quite hard restrictions imposed by the kernel verifier, and
also due to the security-driven design of the architecture itself.

To list a few examples:

. The stack is disjoint, and each stack frame corresponding to a
  function activation is isolated: it is not possible for a callee to
  access the stack frame of the caller, nor for a caller to access the
  stack frame of it's callees.  The frame pointer register is
  read-only.

. Therefore it is not possible to pass arguments in the stac

[PATCH V4 08/11] bpf: make target-supports.exp aware of eBPF

2019-08-27 Thread Jose E. Marchesi
This patch makes the several effective target checks in
target-supports.exp to be aware of eBPF targets.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_trampolines): Adapt to eBPF.
(check_effective_target_stack_size): Likewise.
(dg-effective-target-value): Likewise.
(check_effective_target_indirect_jumps): Likewise.
(check_effective_target_nonlocal_goto): Likewise.
(check_effective_target_global_constructor): Likewise.
(check_effective_target_return_address): Likewise.
---
 gcc/testsuite/ChangeLog   |  9 +
 gcc/testsuite/lib/target-supports.exp | 18 +++---
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f457a46a02b..ce08a2f8421 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -526,7 +526,8 @@ proc check_effective_target_trampolines { } {
 || [istarget nvptx-*-*]
 || [istarget hppa2.0w-hp-hpux11.23]
 || [istarget hppa64-hp-hpux11.23]
-|| [istarget pru-*-*] } {
+|| [istarget pru-*-*]
+|| [istarget bpf-*-*] } {
return 0;
 }
 return 1
@@ -781,7 +782,7 @@ proc add_options_for_tls { flags } {
 # Return 1 if indirect jumps are supported, 0 otherwise.
 
 proc check_effective_target_indirect_jumps {} {
-if { [istarget nvptx-*-*] } {
+if { [istarget nvptx-*-*] || [istarget bpf-*-*] } {
return 0
 }
 return 1
@@ -790,7 +791,7 @@ proc check_effective_target_indirect_jumps {} {
 # Return 1 if nonlocal goto is supported, 0 otherwise.
 
 proc check_effective_target_nonlocal_goto {} {
-if { [istarget nvptx-*-*] } {
+if { [istarget nvptx-*-*] || [istarget bpf-*-*] } {
return 0
 }
 return 1
@@ -799,10 +800,9 @@ proc check_effective_target_nonlocal_goto {} {
 # Return 1 if global constructors are supported, 0 otherwise.
 
 proc check_effective_target_global_constructor {} {
-if { [istarget nvptx-*-*] } {
-   return 0
-}
-if { [istarget amdgcn-*-*] } {
+if { [istarget nvptx-*-*]
+|| [istarget amdgcn-*-*]
+|| [istarget bpf-*-*] } {
return 0
 }
 return 1
@@ -825,6 +825,10 @@ proc check_effective_target_return_address {} {
 if { [istarget nvptx-*-*] } {
return 0
 }
+# No notion of return address in eBPF.
+if { [istarget bpf-*-*] } {
+   return 0
+}
 # It could be supported on amdgcn, but isn't yet.
 if { [istarget amdgcn*-*-*] } {
return 0
-- 
2.11.0



[PATCH V4 04/11] testsuite: new require effective target indirect_calls

2019-08-27 Thread Jose E. Marchesi
This patch adds a new dg_require_effective_target procedure to the
testsuite infrastructure: indirect_calls.  This new function tells
whether a target supports calls to non-constant call targets.

This patch also annotates the tests in the gcc.c-torture testuite that
require support for indirect calls.

gcc/ChangeLog:

* doc/sourcebuild.texi (Effective-Target Keywords): Document
indirect_calls.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_indirect_calls):
New proc.
* gcc.c-torture/compile/20010102-1.c: Annotate with
dg-require-effective-target indirect_calls.
* gcc.c-torture/compile/20010107-1.c: Likewise.
* gcc.c-torture/compile/20011109-1.c: Likewise.
* gcc.c-torture/compile/20011218-1.c: Likewise.
* gcc.c-torture/compile/20011229-1.c: Likewise.
* gcc.c-torture/compile/20020129-1.c: Likewise.
* gcc.c-torture/compile/20020320-1.c: Likewise.
* gcc.c-torture/compile/20020706-1.c: Likewise.
* gcc.c-torture/compile/20020706-2.c: Likewise.
* gcc.c-torture/compile/20021205-1.c: Likewise.
* gcc.c-torture/compile/20030921-1.c: Likewise.
* gcc.c-torture/compile/20031023-1.c: Likewise.
* gcc.c-torture/compile/20031023-2.c: Likewise.
* gcc.c-torture/compile/20031023-3.c: Likewise.
* gcc.c-torture/compile/20031023-4.c: Likewise.
* gcc.c-torture/compile/20040614-1.c: Likewise.
* gcc.c-torture/compile/20040909-1.c: Likewise.
* gcc.c-torture/compile/20050122-1.c: Likewise.
* gcc.c-torture/compile/20050202-1.c: Likewise.
* gcc.c-torture/compile/20060208-1.c: Likewise.
* gcc.c-torture/compile/20081108-1.c: Likewise.
* gcc.c-torture/compile/20150327.c: Likewise.
* gcc.c-torture/compile/920428-2.c: Likewise.
* gcc.c-torture/compile/920928-5.c: Likewise.
* gcc.c-torture/compile/930117-1.c: Likewise.
* gcc.c-torture/compile/930607-1.c: Likewise.
* gcc.c-torture/compile/991213-2.c: Likewise.
* gcc.c-torture/compile/callind.c: Likewise.
* gcc.c-torture/compile/calls-void.c: Likewise.
* gcc.c-torture/compile/calls.c: Likewise.
* gcc.c-torture/compile/pr21840.c: Likewise.
* gcc.c-torture/compile/pr32139.c: Likewise.
* gcc.c-torture/compile/pr35607.c: Likewise.
* gcc.c-torture/compile/pr37433-1.c: Likewise.
* gcc.c-torture/compile/pr37433.c: Likewise.
* gcc.c-torture/compile/pr39941.c: Likewise.
* gcc.c-torture/compile/pr40080.c: Likewise.
* gcc.c-torture/compile/pr43635.c: Likewise.
* gcc.c-torture/compile/pr43791.c: Likewise.
* gcc.c-torture/compile/pr43845.c: Likewise.
* gcc.c-torture/compile/pr44043.c: Likewise.
* gcc.c-torture/compile/pr51694.c: Likewise.
* gcc.c-torture/compile/pr77754-2.c: Likewise.
* gcc.c-torture/compile/pr77754-3.c: Likewise.
* gcc.c-torture/compile/pr77754-4.c: Likewise.
* gcc.c-torture/compile/pr89663-2.c: Likewise.
* gcc.c-torture/compile/pta-1.c: Likewise.
* gcc.c-torture/compile/stack-check-1.c: Likewise.
* gcc.dg/Walloc-size-larger-than-18.c: Likewise.
---
 gcc/ChangeLog  |  5 ++
 gcc/doc/sourcebuild.texi   |  4 ++
 gcc/testsuite/ChangeLog| 55 ++
 gcc/testsuite/gcc.c-torture/compile/20010102-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20010107-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20011109-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20011218-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20011229-1.c   |  3 ++
 gcc/testsuite/gcc.c-torture/compile/20020129-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20020320-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20020706-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20020706-2.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20021205-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20030921-1.c   |  1 +
 gcc/testsuite/gcc.c-torture/compile/20031023-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20031023-2.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20031023-3.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20031023-4.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20040614-1.c   |  1 +
 gcc/testsuite/gcc.c-torture/compile/20040909-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20050122-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20050202-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20060208-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20081108-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20150327.c |  2 +
 gcc/testsuite/gcc.c-torture/compile/920428-2.c |  2 +
 gcc/testsuite/gcc.c-torture/compile/920928-5.c |  3 ++
 gcc/testsuite/gcc.c-torture/compile/930117-1.c |  2 +
 gcc/testsuite/gcc.c-torture/compile/930607

[PATCH V4 10/11] bpf: manual updates for eBPF

2019-08-27 Thread Jose E. Marchesi
gcc/ChangeLog:

* doc/invoke.texi (Option Summary): Cover eBPF.
(eBPF Options): New section.
* doc/extend.texi (BPF Built-in Functions): Likewise.
(BPF Kernel Helpers): Likewise.
---
 gcc/ChangeLog   |   7 +++
 gcc/doc/extend.texi | 171 
 gcc/doc/invoke.texi |  37 
 3 files changed, 215 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 4aea4d31761..e821cafff1e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13604,6 +13604,8 @@ instructions, but allow the compiler to schedule those 
calls.
 * ARM ARMv8-M Security Extensions::
 * AVR Built-in Functions::
 * Blackfin Built-in Functions::
+* BPF Built-in Functions::
+* BPF Kernel Helpers::
 * FR-V Built-in Functions::
 * MIPS DSP Built-in Functions::
 * MIPS Paired-Single Support::
@@ -14601,6 +14603,175 @@ void __builtin_bfin_csync (void)
 void __builtin_bfin_ssync (void)
 @end smallexample
 
+@node BPF Built-in Functions
+@subsection BPF Built-in Functions
+
+The following built-in functions are available for eBPF targets.
+
+@deftypefn {Built-in Function} unsigned long long __builtin_bpf_load_byte 
(unsigned long long @var{offset})
+Load a byte from the @code{struct sk_buff} packet data pointed by the register 
@code{%r6} and return it.
+@end deftypefn
+
+@deftypefn {Built-in Function} unsigned long long __builtin_bpf_load_half 
(unsigned long long @var{offset})
+Load 16-bits from the @code{struct sk_buff} packet data pointed by the 
register @code{%r6} and return it.
+@end deftypefn
+
+@deftypefn {Built-in Function} unsigned long long __builtin_bpf_load_word 
(unsigned long long @var{offset})
+Load 32-bits from the @code{struct sk_buff} packet data pointed by the 
register @code{%r6} and return it.
+@end deftypefn
+
+@node BPF Kernel Helpers
+@subsection BPF Kernel Helpers
+
+These built-in functions are available for calling kernel helpers, and
+they are available depending on the kernel version selected as the
+CPU.
+
+Rather than using the built-ins directly, it is preferred for programs
+to include @file{bpf-helpers.h} and use the wrappers defined there.
+
+For a full description of what the helpers do, the arguments they
+take, and the returned value, see the
+@file{linux/include/uapi/linux/bpf.h} in a Linux source tree.
+
+@smallexample
+void *__builtin_bpf_helper_map_lookup_elem (void *map, void *key)
+int   __builtin_bpf_helper_map_update_elem (void *map, void *key,
+void *value,
+unsigned long long flags)
+int   __builtin_bpf_helper_map_delete_elem (void *map, const void *key)
+int   __builtin_bpf_helper_map_push_elem (void *map, const void *value,
+  unsigned long long flags)
+int   __builtin_bpf_helper_map_pop_elem (void *map, void *value)
+int   __builtin_bpf_helper_map_peek_elem (void *map, void *value)
+int __builtin_bpf_helper_clone_redirect (void *skb,
+ unsigned int ifindex,
+ unsigned long long flags)
+int __builtin_bpf_helper_skb_get_tunnel_key (void *ctx, void *key, int size, 
int flags)
+int __builtin_bpf_helper_skb_set_tunnel_key (void *ctx, void *key, int size, 
int flags)
+int __builtin_bpf_helper_skb_get_tunnel_opt (void *ctx, void *md, int size)
+int __builtin_bpf_helper_skb_set_tunnel_opt (void *ctx, void *md, int size)
+int __builtin_bpf_helper_skb_get_xfrm_state (void *ctx, int index, void *state,
+int size, int flags)
+static unsigned long long __builtin_bpf_helper_skb_cgroup_id (void *ctx)
+static unsigned long long __builtin_bpf_helper_skb_ancestor_cgroup_id
+ (void *ctx, int level)
+int __builtin_bpf_helper_skb_vlan_push (void *ctx, __be16 vlan_proto, __u16 
vlan_tci)
+int __builtin_bpf_helper_skb_vlan_pop (void *ctx)
+int __builtin_bpf_helper_skb_ecn_set_ce (void *ctx)
+
+int __builtin_bpf_helper_skb_load_bytes (void *ctx, int off, void *to, int len)
+int __builtin_bpf_helper_skb_load_bytes_relative (void *ctx, int off, void 
*to, int len, __u32 start_header)
+int __builtin_bpf_helper_skb_store_bytes (void *ctx, int off, void *from, int 
len, int flags)
+int __builtin_bpf_helper_skb_under_cgroup (void *ctx, void *map, int index)
+int __builtin_bpf_helper_skb_change_head (void *, int len, int flags)
+int __builtin_bpf_helper_skb_pull_data (void *, int len)
+int __builtin_bpf_helper_skb_change_proto (void *ctx, __be16 proto, __u64 
flags)
+int __builtin_bpf_helper_skb_change_type (void *ctx, __u32 type)
+int __builtin_bpf_helper_skb_change_tail (void *ctx, __u32 len, __u64 flags)
+int __builtin_bpf_helper_skb_adjust_room (void *ctx, __s32 len_diff, __u32 
mode,
+ unsigned long long flags)
+@end smallexample
+
+Other helpers:
+
+@smallexample
+int __builtin_bpf_helper_probe_rea

[PATCH V4 09/11] bpf: adjust GCC testsuite to eBPF limitations

2019-08-27 Thread Jose E. Marchesi
This patch makes many tests in gcc.dg and gcc.c-torture to be skipped
in bpf-*-* targets.  This is due to the many limitations imposed by
eBPF to what would be perfectly valid C code: no support for more than
5 arguments to function calls, no support for indirect jumps, a very
limited range for direct jumps, etc.

Hopefully some of these restrictions will be relaxed in the future.
Also, as semantics associated with object linking get developed in
eBPF, it may be possible at some point to provide a set of standard
run-time libraries for eBPF programs.

gcc/testsuite/ChangeLog:

* gcc.dg/builtins-config.h: eBPF doesn't support C99 standard
functions.
* gcc.c-torture/compile/20101217-1.c: Add a function prototype for
printf.
* gcc.c-torture/compile/2211-1.c: Skip if target bpf-*-*.
* gcc.c-torture/compile/poor.c: Likewise.
* gcc.c-torture/compile/pr25311.c: Likewise.
* gcc.c-torture/compile/920501-7.c: Likewise.
* gcc.c-torture/compile/2403-1.c: Likewise.
* gcc.c-torture/compile/20001226-1.c: Likewise.
* gcc.c-torture/compile/20030903-1.c: Likewise.
* gcc.c-torture/compile/20031125-1.c: Likewise.
* gcc.c-torture/compile/20040101-1.c: Likewise.
* gcc.c-torture/compile/20040317-2.c: Likewise.
* gcc.c-torture/compile/20040726-1.c: Likewise.
* gcc.c-torture/compile/20051216-1.c: Likewise.
* gcc.c-torture/compile/900313-1.c: Likewise.
* gcc.c-torture/compile/920625-1.c: Likewise.
* gcc.c-torture/compile/930421-1.c: Likewise.
* gcc.c-torture/compile/930623-1.c: Likewise.
* gcc.c-torture/compile/961004-1.c: Likewise.
* gcc.c-torture/compile/980504-1.c: Likewise.
* gcc.c-torture/compile/980816-1.c: Likewise.
* gcc.c-torture/compile/990625-1.c: Likewise.
* gcc.c-torture/compile/DFcmp.c: Likewise.
* gcc.c-torture/compile/HIcmp.c: Likewise.
* gcc.c-torture/compile/HIset.c: Likewise.
* gcc.c-torture/compile/QIcmp.c: Likewise.
* gcc.c-torture/compile/QIset.c: Likewise.
* gcc.c-torture/compile/SFset.c: Likewise.
* gcc.c-torture/compile/SIcmp.c: Likewise.
* gcc.c-torture/compile/SIset.c: Likewise.
* gcc.c-torture/compile/UHIcmp.c: Likewise.
* gcc.c-torture/compile/UQIcmp.c: Likewise.
* gcc.c-torture/compile/USIcmp.c: Likewise.
* gcc.c-torture/compile/consec.c: Likewise.
* gcc.c-torture/compile/limits-fndefn.c: Likewise.
* gcc.c-torture/compile/lll.c: Likewise.
* gcc.c-torture/compile/parms.c: Likewise.
* gcc.c-torture/compile/pass.c: Likewise.
* gcc.c-torture/compile/pp.c: Likewise.
* gcc.c-torture/compile/pr32399.c: Likewise.
* gcc.c-torture/compile/pr34091.c: Likewise.
* gcc.c-torture/compile/pr34688.c: Likewise.
* gcc.c-torture/compile/pr37258.c: Likewise.
* gcc.c-torture/compile/pr37327.c: Likewise.
* gcc.c-torture/compile/pr37381.c: Likewise.
* gcc.c-torture/compile/pr37669-2.c: Likewise.
* gcc.c-torture/compile/pr37669.c: Likewise.
* gcc.c-torture/compile/pr37742-3.c: Likewise.
* gcc.c-torture/compile/pr44063.c: Likewise.
* gcc.c-torture/compile/pr48596.c: Likewise.
* gcc.c-torture/compile/pr51856.c: Likewise.
* gcc.c-torture/compile/pr54428.c: Likewise.
* gcc.c-torture/compile/pr54713-1.c: Likewise.
* gcc.c-torture/compile/pr54713-2.c: Likewise.
* gcc.c-torture/compile/pr54713-3.c: Likewise.
* gcc.c-torture/compile/pr55921.c: Likewise.
* gcc.c-torture/compile/pr70240.c: Likewise.
* gcc.c-torture/compile/pr70355.c: Likewise.
* gcc.c-torture/compile/pr82052.c: Likewise.
* gcc.c-torture/compile/pr83487.c: Likewise.
* gcc.c-torture/compile/pr86122.c: Likewise.
* gcc.c-torture/compile/pret-arg.c: Likewise.
* gcc.c-torture/compile/regs-arg-size.c: Likewise.
* gcc.c-torture/compile/structret.c: Likewise.
* gcc.c-torture/compile/uuarg.c: Likewise.
* gcc.dg/20001009-1.c: Likewise.
* gcc.dg/20020418-1.c: Likewise.
* gcc.dg/20020426-2.c: Likewise.
* gcc.dg/20020430-1.c: Likewise.
* gcc.dg/20040306-1.c: Likewise.
* gcc.dg/20040622-2.c: Likewise.
* gcc.dg/20050603-2.c: Likewise.
* gcc.dg/20050629-1.c: Likewise.
* gcc.dg/20061026.c: Likewise.
* gcc.dg/Warray-bounds-3.c: Likewise.
* gcc.dg/Warray-bounds-30.c: Likewise.
* gcc.dg/Wframe-larger-than-2.c: Likewise.
* gcc.dg/Wframe-larger-than.c: Likewise.
* gcc.dg/Wrestrict-11.c: Likewise.
* gcc.c-torture/compile/2804-1.c: Likewise.
---
 gcc/testsuite/ChangeLog| 85 ++
 gcc/testsuite/gcc.c-torture/compile/2211-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/2403-1.c   |  2 +

[PATCH V4 11/11] bpf: add myself as the maintainer for the eBPF port

2019-08-27 Thread Jose E. Marchesi
ChangeLog:

* MAINTAINERS: Add myself as the maintainer of the eBPF port.
Remove myself from Write After Approval section.
---
 ChangeLog   | 5 +
 MAINTAINERS | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5d8402949bc..5d69d696c2c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -57,6 +57,7 @@ arm port  Ramana Radhakrishnan

 arm port   Kyrylo Tkachov  
 avr port   Denis Chertykov 
 bfin port  Jie Zhang   
+bpf port   Jose E. Marchesi
 c6x port   Bernd Schmidt   
 cris port  Hans-Peter Nilsson  
 c-sky port Xianmiao Qu 
@@ -497,7 +498,6 @@ Luis Machado

 Ziga Mahkovec  
 Matthew Malcomson  
 Mikhail Maltsev
-Jose E. Marchesi   
 Patrick Marlier

 Simon Martin   
 Alejandro Martinez 

-- 
2.11.0



[PATCH V4 07/11] bpf: gcc.target eBPF testsuite

2019-08-27 Thread Jose E. Marchesi
This patch adds a new testsuite to gcc.target, with eBPF specific
tests.

Tests are included for:
- Target specific diagnostics.
- All built-in functions.

testsuite/ChangeLog:

* gcc.target/bpf/bpf.exp: New file.
* gcc.target/bpf/builtin-load.c: Likewise.
* cc.target/bpf/constant-calls.c: Likewise.
* gcc.target/bpf/diag-funargs.c: Likewise.
* cc.target/bpf/diag-indcalls.c: Likewise.
* gcc.target/bpf/helper-bind.c: Likewise.
* cc.target/bpf/helper-bpf-redirect.c: Likewise.
* gcc.target/bpf/helper-clone-redirect.c: Likewise.
* gcc.target/bpf/helper-csum-diff.c: Likewise.
* gcc.target/bpf/helper-csum-update.c: Likewise.
* gcc.target/bpf/helper-current-task-under-cgroup.c: Likewise.
* gcc.target/bpf/helper-fib-lookup.c: Likewise.
* gcc.target/bpf/helper-get-cgroup-classid.c: Likewise.
* gcc.target/bpf/helper-get-current-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-get-current-comm.c: Likewise.
* gcc.target/bpf/helper-get-current-pid-tgid.c: Likewise.
* gcc.target/bpf/helper-get-current-task.c: Likewise.
* gcc.target/bpf/helper-get-current-uid-gid.c: Likewise.
* gcc.target/bpf/helper-get-hash-recalc.c: Likewise.
* gcc.target/bpf/helper-get-listener-sock.c: Likewise.
* gcc.target/bpf/helper-get-local-storage.c: Likewise.
* gcc.target/bpf/helper-get-numa-node-id.c: Likewise.
* gcc.target/bpf/helper-get-prandom-u32.c: Likewise.
* gcc.target/bpf/helper-get-route-realm.c: Likewise.
* gcc.target/bpf/helper-get-smp-processor-id.c: Likewise.
* gcc.target/bpf/helper-get-socket-cookie.c: Likewise.
* gcc.target/bpf/helper-get-socket-uid.c: Likewise.
* gcc.target/bpf/helper-getsockopt.c: Likewise.
* gcc.target/bpf/helper-get-stack.c: Likewise.
* gcc.target/bpf/helper-get-stackid.c: Likewise.
* gcc.target/bpf/helper-ktime-get-ns.c: Likewise.
* gcc.target/bpf/helper-l3-csum-replace.c: Likewise.
* gcc.target/bpf/helper-l4-csum-replace.c: Likewise.
* gcc.target/bpf/helper-lwt-push-encap.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-action.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-adjust-srh.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-store-bytes.c: Likewise.
* gcc.target/bpf/helper-map-delete-elem.c: Likewise.
* gcc.target/bpf/helper-map-lookup-elem.c: Likewise.
* gcc.target/bpf/helper-map-peek-elem.c: Likewise.
* gcc.target/bpf/helper-map-pop-elem.c: Likewise.
* gcc.target/bpf/helper-map-push-elem.c: Likewise.
* gcc.target/bpf/helper-map-update-elem.c: Likewise.
* gcc.target/bpf/helper-msg-apply-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-cork-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-pop-data.c: Likewise.
* gcc.target/bpf/helper-msg-pull-data.c: Likewise.
* gcc.target/bpf/helper-msg-push-data.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-hash.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-map.c: Likewise.
* gcc.target/bpf/helper-override-return.c: Likewise.
* gcc.target/bpf/helper-perf-event-output.c: Likewise.
* gcc.target/bpf/helper-perf-event-read.c: Likewise.
* gcc.target/bpf/helper-perf-event-read-value.c: Likewise.
* gcc.target/bpf/helper-perf-prog-read-value.c: Likewise.
* gcc.target/bpf/helper-probe-read.c: Likewise.
* gcc.target/bpf/helper-probe-read-str.c: Likewise.
* gcc.target/bpf/helper-probe-write-user.c: Likewise.
* gcc.target/bpf/helper-rc-keydown.c: Likewise.
* gcc.target/bpf/helper-rc-pointer-rel.c: Likewise.
* gcc.target/bpf/helper-rc-repeat.c: Likewise.
* gcc.target/bpf/helper-redirect-map.c: Likewise.
* gcc.target/bpf/helper-set-hash.c: Likewise.
* gcc.target/bpf/helper-set-hash-invalid.c: Likewise.
* gcc.target/bpf/helper-setsockopt.c: Likewise.
* gcc.target/bpf/helper-skb-adjust-room.c: Likewise.
* gcc.target/bpf/helper-skb-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-skb-change-head.c: Likewise.
* gcc.target/bpf/helper-skb-change-proto.c: Likewise.
* gcc.target/bpf/helper-skb-change-tail.c: Likewise.
* gcc.target/bpf/helper-skb-change-type.c: Likewise.
* gcc.target/bpf/helper-skb-ecn-set-ce.c: Likewise.
* gcc.target/bpf/helper-skb-get-tunnel-key.c: Likewise.
* gcc.target/bpf/helper-skb-get-tunnel-opt.c: Likewise.
* gcc.target/bpf/helper-skb-get-xfrm-state.c: Likewise.
* gcc.target/bpf/helper-skb-load-bytes.c: Likewise.
* gcc.target/bpf/helper-skb-load-bytes-relative.c: Likewise.
* gcc.target/bpf/helper-skb-pull-data.c: Likewise.
* gcc.target/bpf/helper-skb-set-tunnel-key.c: Likewise.
* gcc.target/bpf/helper-skb-set-tunnel-opt.c: Likewise.
*

[PATCH V4 02/11] opt-functions.awk: fix comparison of limit, begin and end

2019-08-27 Thread Jose E. Marchesi
The function integer_range_info makes sure that, if provided, the
initial value fills in the especified range.  However, it is necessary
to convert the values to a numerical context before comparing, to make
sure awk is using arithmetical order and not lexicographical order.

gcc/ChangeLog:

* opt-functions.awk (integer_range_info): Make sure values are in
numeric context before operating with them.
---
 gcc/ChangeLog | 5 +
 gcc/opt-functions.awk | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/opt-functions.awk b/gcc/opt-functions.awk
index 1190e6d6b66..0b28a6c3bff 100644
--- a/gcc/opt-functions.awk
+++ b/gcc/opt-functions.awk
@@ -346,8 +346,9 @@ function search_var_name(name, opt_numbers, opts, flags, 
n_opts)
 function integer_range_info(range_option, init, option)
 {
 if (range_option != "") {
-   start = nth_arg(0, range_option);
-   end = nth_arg(1, range_option);
+   init = init + 0;
+   start = nth_arg(0, range_option) + 0;
+   end = nth_arg(1, range_option) + 0;
if (init != "" && init != "-1" && (init < start || init > end))
  print "#error initial value " init " of '" option "' must be in range 
[" start "," end "]"
return start ", " end
-- 
2.11.0



[PATCH V4 05/11] bpf: new GCC port

2019-08-27 Thread Jose E. Marchesi


This patch adds a port for the Linux kernel eBPF architecture to GCC.

ChangeLog:

  * configure.ac: Support for bpf-*-* targets.
  * configure: Regenerate.

contrib/ChangeLog:

  * config-list.mk (LIST): Disable go in bpf-*-* targets.

gcc/ChangeLog:

  * config.gcc: Support for bpf-*-* targets.
  * common/config/bpf/bpf-common.c: New file.
  * config/bpf/t-bpf: Likewise.
  * config/bpf/predicates.md: Likewise.
  * config/bpf/constraints.md: Likewise.
  * config/bpf/bpf.opt: Likewise.
  * config/bpf/bpf.md: Likewise.
  * config/bpf/bpf.h: Likewise.
  * config/bpf/bpf.c: Likewise.
  * config/bpf/bpf-protos.h: Likewise.
  * config/bpf/bpf-opts.h: Likewise.
  * config/bpf/bpf-helpers.h: Likewise.
  * config/bpf/bpf-helpers.def: Likewise.
---
 ChangeLog  |   5 +
 configure  |  54 ++-
 configure.ac   |  54 ++-
 contrib/ChangeLog  |   4 +
 contrib/config-list.mk |   3 +-
 gcc/ChangeLog  |  16 +
 gcc/common/config/bpf/bpf-common.c |  55 +++
 gcc/config.gcc |   9 +
 gcc/config/bpf/bpf-helpers.def | 194 
 gcc/config/bpf/bpf-helpers.h   | 327 +
 gcc/config/bpf/bpf-opts.h  |  56 +++
 gcc/config/bpf/bpf-protos.h|  33 ++
 gcc/config/bpf/bpf.c   | 958 +
 gcc/config/bpf/bpf.h   | 532 
 gcc/config/bpf/bpf.md  | 497 +++
 gcc/config/bpf/bpf.opt | 123 +
 gcc/config/bpf/constraints.md  |  32 ++
 gcc/config/bpf/predicates.md   |  72 +++
 gcc/config/bpf/t-bpf   |   0
 19 files changed, 3021 insertions(+), 3 deletions(-)
 create mode 100644 gcc/common/config/bpf/bpf-common.c
 create mode 100644 gcc/config/bpf/bpf-helpers.def
 create mode 100644 gcc/config/bpf/bpf-helpers.h
 create mode 100644 gcc/config/bpf/bpf-opts.h
 create mode 100644 gcc/config/bpf/bpf-protos.h
 create mode 100644 gcc/config/bpf/bpf.c
 create mode 100644 gcc/config/bpf/bpf.h
 create mode 100644 gcc/config/bpf/bpf.md
 create mode 100644 gcc/config/bpf/bpf.opt
 create mode 100644 gcc/config/bpf/constraints.md
 create mode 100644 gcc/config/bpf/predicates.md
 create mode 100644 gcc/config/bpf/t-bpf

diff --git a/configure.ac b/configure.ac
index 1fe97c001cc..b8ce2ad20b9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -638,6 +638,9 @@ case "${target}" in
 # No hosted I/O support.
 noconfigdirs="$noconfigdirs target-libssp"
 ;;
+  bpf-*-*)
+noconfigdirs="$noconfigdirs target-libssp"
+;;
   powerpc-*-aix* | rs6000-*-aix*)
 noconfigdirs="$noconfigdirs target-libssp"
 ;;
@@ -672,12 +675,43 @@ if test "${ENABLE_LIBSTDCXX}" = "default" ; then
 avr-*-*)
   noconfigdirs="$noconfigdirs target-libstdc++-v3"
   ;;
+bpf-*-*)
+  noconfigdirs="$noconfigdirs target-libstdc++-v3"
+  ;;
 ft32-*-*)
   noconfigdirs="$noconfigdirs target-libstdc++-v3"
   ;;
   esac
 fi
 
+# Disable C++ on systems where it is known to not work.
+# For testing, you can override this with --enable-languages=c++.
+case ,${enable_languages}, in
+  *,c++,*)
+;;
+  *)
+  case "${target}" in
+bpf-*-*)
+  unsupported_languages="$unsupported_languages c++"
+  ;;
+  esac
+  ;;
+esac
+
+# Disable Objc on systems where it is known to not work.
+# For testing, you can override this with --enable-languages=objc.
+case ,${enable_languages}, in
+  *,objc,*)
+;;
+  *)
+  case "${target}" in
+bpf-*-*)
+  unsupported_languages="$unsupported_languages objc"
+  ;;
+  esac
+  ;;
+esac
+
 # Disable D on systems where it is known to not work.
 # For testing, you can override this with --enable-languages=d.
 case ,${enable_languages}, in
@@ -687,6 +721,9 @@ case ,${enable_languages}, in
 case "${target}" in
   *-*-darwin*)
unsupported_languages="$unsupported_languages d"
+;;
+  bpf-*-*)
+   unsupported_languages="$unsupported_languages d"
;;
 esac
 ;;
@@ -715,6 +752,9 @@ case "${target}" in
 # See .
 unsupported_languages="$unsupported_languages fortran"
 ;;
+  bpf-*-*)
+unsupported_languages="$unsupported_languages fortran"
+;;
 esac
 
 # Disable libffi for some systems.
@@ -761,6 +801,9 @@ case "${target}" in
   arm*-*-symbianelf*)
 noconfigdirs="$noconfigdirs target-libffi"
 ;;
+  bpf-*-*)
+noconfigdirs="$noconfigdirs target-libffi"
+;;
   cris-*-* | crisv32-*-*)
 case "${target}" in
   *-*-linux*)
@@ -807,7 +850,7 @@ esac
 # Disable the go frontend on systems where it is known to not work. Please keep
 # this in sync with contrib/config-list.mk.
 case "${target}" in
-*-*-darwin* | *-*-cygwin* | *-*-mingw*)
+*-*-darwin* | *-*-cygwin* | *-*-mingw* | bpf-* )
 unsupported_languages="$unsupported_languages go"
  

Re: [PATCH V4 00/11] eBPF support for GCC

2019-08-27 Thread Jose E. Marchesi


. The mul32 instruction zero-extends arguments and then performs
  a multiplication.  Use the right pattern for this in bpf.md and
  name it umulsidi3.

Sorry, here I meant to say that the mul32 instruction multiplies and
then zero-extends its result.


[PATCH] PR fortran/91564 -- Additonal checks on STATUS

2019-08-27 Thread Steve Kargl
The attached patch has been built and tested on x86_64-*-freebsd.
It adds additional checks for the status dummy argument, and
therby prevents an ICE.  OK to commit?

2019-08-27  Steven G. Kargl  

PR fortran/91564
* check.c (gfc_check_kill_sub): Additional checks on status dummy
argument.

2019-08-27  Steven G. Kargl  

PR fortran/91564
* gfortran.dg/pr91564.f90: New test.
-- 
Steve
Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c	(revision 274961)
+++ gcc/fortran/check.c	(working copy)
@@ -3301,6 +3301,22 @@ gfc_check_kill_sub (gfc_expr *pid, gfc_expr *sig, gfc_
 
   if (!scalar_check (status, 2))
 	return false;
+
+  if (status->expr_type != EXPR_VARIABLE)
+	{
+	  gfc_error ("STATUS at %L shall be an INTENT(OUT) variable",
+		 &status->where);
+	  return false;
+	}
+
+  if (status->expr_type == EXPR_VARIABLE
+	  && status->symtree && status->symtree->n.sym
+	  && status->symtree->n.sym->attr.intent == INTENT_IN)
+	{
+	  gfc_error ("%qs at %L shall be an INTENT(OUT) variable",
+		 status->symtree->name, &status->where);
+	  return false;
+	}
 }
 
   return true;
Index: gcc/testsuite/gfortran.dg/pr91564.f90
===
--- gcc/testsuite/gfortran.dg/pr91564.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91564.f90	(working copy)
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! PR fortran/91564
+! Contributed by Gerhard Steinmetz.
+program p
+   integer i, j
+   call kill (1, 2, 3)! { dg-error "shall be an INTENT" }
+   i = 42
+   call bar(i, j)
+end
+
+subroutine bar(n, m)
+   integer, intent(in) :: n
+   integer, intent(inout) :: m
+   call kill (1, 3, n)! { dg-error "shall be an INTENT" }
+   call kill (1, 3, m)
+end subroutine bar


[PATCH] PR fortran/91551 -- ALLOCATED has one argument

2019-08-27 Thread Steve Kargl
The attach patch was built and tested on i586-*-freebsd.
It includes a check for ALLOCATED with no arguments.
OK to commit?

2019-08-28  Steven G. Kargl  

PR fortran/91551
* intrinsic.c (sort_actual): ALLOCATED has one argument. Check for
no argument case.

2019-08-28  Steven G. Kargl  

PR fortran/91551
* gfortran.dg/allocated_3.f90

-- 
Steve
Index: gcc/fortran/intrinsic.c
===
--- gcc/fortran/intrinsic.c	(revision 274900)
+++ gcc/fortran/intrinsic.c	(working copy)
@@ -4190,35 +4190,45 @@ sort_actual (const char *name, gfc_actual_arglist **ap
 
   /* ALLOCATED has two mutually exclusive keywords, but only one
  can be present at time and neither is optional. */
-  if (strcmp (name, "allocated") == 0 && a->name)
+  if (strcmp (name, "allocated") == 0)
 {
-  if (strcmp (a->name, "scalar") == 0)
+  if (!a)
 	{
-  if (a->next)
-	goto whoops;
-	  if (a->expr->rank != 0)
-	{
-	  gfc_error ("Scalar entity required at %L", &a->expr->where);
-	  return false;
-	}
-  return true;
+	  gfc_error ("ALLOCATED intrinsic at %L requires an array or scalar "
+		 "allocatable entity", where);
+	  return false;
 	}
-  else if (strcmp (a->name, "array") == 0)
+
+  if (a->name)
 	{
-  if (a->next)
-	goto whoops;
-	  if (a->expr->rank == 0)
+	  if (strcmp (a->name, "scalar") == 0)
 	{
-	  gfc_error ("Array entity required at %L", &a->expr->where);
+	  if (a->next)
+		goto whoops;
+	  if (a->expr->rank != 0)
+		{
+		  gfc_error ("Scalar entity required at %L", &a->expr->where);
+		  return false;
+		}
+	  return true;
+	}
+	  else if (strcmp (a->name, "array") == 0)
+	{
+	  if (a->next)
+		goto whoops;
+	  if (a->expr->rank == 0)
+		{
+		  gfc_error ("Array entity required at %L", &a->expr->where);
+		  return false;
+		}
+	  return true;
+	}
+	  else
+	{
+	  gfc_error ("Invalid keyword %qs in %qs intrinsic function at %L",
+			 a->name, name, &a->expr->where);
 	  return false;
 	}
-  return true;
-	}
-  else
-	{
-	  gfc_error ("Invalid keyword %qs in %qs intrinsic function at %L",
-		 a->name, name, &a->expr->where);
-	  return false;
 	}
 }
 
Index: gcc/testsuite/gfortran.dg/allocated_3.f90
===
--- gcc/testsuite/gfortran.dg/allocated_3.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/allocated_3.f90	(working copy)
@@ -0,0 +1,6 @@
+! { dg-do compile }
+! PR fortran/91551
+! Contributed by Gerhard Steinmetz
+program p
+   if (allocated()) stop 1 ! { dg-error "requires an array or scalar allocatable" }
+end


Re: C++ PATCH for c++/81676 - bogus -Wunused warnings in constexpr if

2019-08-27 Thread Jason Merrill
On Tue, Aug 27, 2019 at 4:10 PM Marek Polacek  wrote:
> On Fri, Aug 23, 2019 at 02:20:53PM -0700, Jason Merrill wrote:
> > On 8/19/19 11:28 AM, Marek Polacek wrote:
> > > On Fri, Aug 16, 2019 at 06:20:20PM -0700, Jason Merrill wrote:
> > > > On 8/16/19 2:29 PM, Marek Polacek wrote:
> > > > > This patch is an attempt to fix the annoying -Wunused-but-set-* 
> > > > > warnings that
> > > > > tend to occur with constexpr if.  When we have something like
> > > > >
> > > > > template < typename T >
> > > > > int f(T v){
> > > > > if constexpr(sizeof(T) == sizeof(int)){
> > > > >   return v;
> > > > > }else{
> > > > >   return 0;
> > > > > }
> > > > > }
> > > > >
> > > > > and call f('a'), then the condition is false, meaning that we won't 
> > > > > instantiate
> > > > > the then-branch, as per tsubst_expr/IF_STMT:
> > > > > 17284   if (IF_STMT_CONSTEXPR_P (t) && integer_zerop (tmp))
> > > > > 17285 /* Don't instantiate the THEN_CLAUSE. */;
> > > > > so we'll never get round to mark_exp_read-ing the decls used in the
> > > > > then-branch, causing finish_function to emit "parameter set but not 
> > > > > used"
> > > > > warnings.
> > > > >
> > > > > It's unclear how to best deal with this.  Marking the decls 
> > > > > DECL_READ_P while
> > > > > parsing doesn't seem like a viable approach
> > > >
> > > > Why not?
> > >
> > > Well, while parsing, we're in a template and so the condition won't be
> > > evaluated until tsubst_expr.  So we can't tell which branch is dead.
> >
> > But if a decl is used on one branch, we shouldn't warn even if it isn't used
> > on the selected branch.
>
> I didn't want to mark the decls as read multiple times but we do it anyway
> so that's no longer my concern.  So...
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2019-08-27  Marek Polacek  
>
> PR c++/81676 - bogus -Wunused warnings in constexpr if.
> * semantics.c (maybe_mark_exp_read_r): New function.
> (finish_if_stmt): Call it on THEN_CLAUSE and ELSE_CLAUSE.

I was thinking of adding mark_exp_read to places where we currently
take a shortcut in templates, like check_return_expr or
build_x_binary_op, but this is probably simpler.  The patch is OK.

Jason


Re: C++ PATCH for c++/91428 - warn about std::is_constant_evaluated in if constexpr

2019-08-27 Thread Jason Merrill
On Tue, Aug 27, 2019 at 5:50 PM Marek Polacek  wrote:
>
> As discussed in 91428 and in
> ,
>
>   if constexpr (std::is_constant_evaluated ())
> // ...
>   else
> // ...
>
> always evaluates the true branch.  Someone in the SO post said "But hopefully
> compilers will just diagnose that case" so I'm adding a warning.
>
> I didn't want to invent a completely new warning so I'm tagging along
> -Wtautological-compare.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

Jason


Re: [PATCH V4 02/11] opt-functions.awk: fix comparison of limit, begin and end

2019-08-27 Thread Ulrich Drepper
On Wed, Aug 28, 2019 at 1:47 AM Jose E. Marchesi 
wrote:

>  function integer_range_info(range_option, init, option)
>  {
>  if (range_option != "") {
> -   start = nth_arg(0, range_option);
> -   end = nth_arg(1, range_option);
> +   init = init + 0;
> +   start = nth_arg(0, range_option) + 0;
> +   end = nth_arg(1, range_option) + 0;
> if (init != "" && init != "-1" && (init < start || init > end))


In this case the test for init != "" is at least unnecessary.

Maybe something else has to be used.  I didn't trace the uses but if init
is deliberately set to "" then the test would have to be replaced with init
!= 0.


[PATCH] Add vec_sh{l,r}_v4sf (PR libgomp/91530)

2019-08-27 Thread Jakub Jelinek
Hi!

The following two testcases FAIL to be vectorized, because SSE2 doesn't have
many permutation instructions and the one that actually works (whole vector
shifts) aren't enabled for the V4SFmode.

The following patch fixes it by enabling those optabs also for V4SFmode (and
V2DFmode).  Strictly speaking, we need it only for the VI_128 modes plus
V4SFmode, but I'm not sure it is worth adding yet another iterator for
VI_128 + V4SF and the instructions actually do work for V2DFmode too, just
there are also other permutation instructions that handle V2DFmode.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-08-28  Jakub Jelinek  

PR libgomp/91530
* config/i386/sse.md (vec_shl_, vec_shr_): Use
V_128 iterator instead of VI_128.

* testsuite/libgomp.c/scan-21.c: New test.
* testsuite/libgomp.c/scan-22.c: New test.

--- gcc/config/i386/sse.md.jj   2019-08-27 12:26:25.385089103 +0200
+++ gcc/config/i386/sse.md  2019-08-27 13:50:42.594849445 +0200
@@ -12047,9 +12047,9 @@ (define_insn "3"
   [(set (match_dup 3)
(ashift:V1TI
-(match_operand:VI_128 1 "register_operand")
+(match_operand:V_128 1 "register_operand")
 (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
-   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
+   (set (match_operand:V_128 0 "register_operand") (match_dup 4))]
   "TARGET_SSE2"
 {
   operands[1] = gen_lowpart (V1TImode, operands[1]);
@@ -12060,9 +12060,9 @@ (define_expand "vec_shl_"
 (define_expand "vec_shr_"
   [(set (match_dup 3)
(lshiftrt:V1TI
-(match_operand:VI_128 1 "register_operand")
+(match_operand:V_128 1 "register_operand")
 (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
-   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
+   (set (match_operand:V_128 0 "register_operand") (match_dup 4))]
   "TARGET_SSE2"
 {
   operands[1] = gen_lowpart (V1TImode, operands[1]);
--- libgomp/testsuite/libgomp.c/scan-21.c.jj2019-08-27 22:56:03.805127837 
+0200
+++ libgomp/testsuite/libgomp.c/scan-21.c   2019-08-27 22:58:26.347043679 
+0200
@@ -0,0 +1,6 @@
+/* { dg-require-effective-target size32plus } */
+/* { dg-require-effective-target avx_runtime } */
+/* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details -msse2 
-mno-sse3" } */
+/* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" } } */
+
+#include "scan-13.c"
--- libgomp/testsuite/libgomp.c/scan-22.c.jj2019-08-27 22:56:51.034437425 
+0200
+++ libgomp/testsuite/libgomp.c/scan-22.c   2019-08-27 22:59:01.978522645 
+0200
@@ -0,0 +1,6 @@
+/* { dg-require-effective-target size32plus } */
+/* { dg-require-effective-target avx_runtime } */
+/* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details -msse2 
-mno-sse3" } */
+/* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" } } */
+
+#include "scan-17.c"

Jakub


Re: [PATCH] PR fortran/91564 -- Additonal checks on STATUS

2019-08-27 Thread Janne Blomqvist
On Wed, Aug 28, 2019 at 3:37 AM Steve Kargl
 wrote:
>
> The attached patch has been built and tested on x86_64-*-freebsd.
> It adds additional checks for the status dummy argument, and
> therby prevents an ICE.  OK to commit?

Ok.

-- 
Janne Blomqvist