Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-12 Thread Uros Bizjak via Gcc-patches
On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu  wrote:

> > > gcc/ChangeLog:
> > >
> > > PR target/97194
> > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > * config/i386/sse.md (vec_set): Support both constant
> > > and variable index vec_set.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> >
> > +;; True for registers, or const_int_operand, used to vec_setm expander.
> > +(define_predicate "vec_setm_operand"
> > +  (ior (and (match_operand 0 "register_operand")
> > +(match_test "TARGET_AVX2"))
> > +   (match_code "const_int")))
> > +
> >  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
> >  (define_predicate "reg_or_pm1_operand"
> >(ior (match_operand 0 "register_operand")
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index b153a87fb98..1798e5dea75 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> >  (define_expand "vec_set"
> >[(match_operand:V 0 "register_operand")
> > (match_operand: 1 "register_operand")
> > -   (match_operand 2 "const_int_operand")]
> > +   (match_operand 2 "vec_setm_operand")]
> >
> > You need to specify a mode, otherwise a register of any mode can pass here.
> >
> Yes, theoretically, we only accept integer types. But in can_vec_set_var_idx_p
> cut
> ---
> bool
> can_vec_set_var_idx_p (machine_mode vec_mode)
> {
>   if (!VECTOR_MODE_P (vec_mode))
> return false;
>
>   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
>   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
>   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
>   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
>
>   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
>
>   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
>  && insn_operand_matches (icode, 1, reg2)
>  && insn_operand_matches (icode, 2, reg3);
> }
> ---
>
> reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> fail insn_operand_matches (icode, 2, reg3)
> ---
> (gdb) p insn_operand_matches(icode,2,reg3)
> $5 = false
> (gdb)
> ---
>
> Maybe we need to change
>
> rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
>
> to
>
> rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
>
> cc Richard Biener, any thoughts?

There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
specify SImode for operand 2 in vec_setM pattern and allow register
operands. I wonder if and how they manage to generate the pattern.

Uros.


Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-12 Thread Hongtao Liu via Gcc-patches
On Thu, Nov 12, 2020 at 4:21 PM Uros Bizjak  wrote:
>
> On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu  wrote:
>
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/97194
> > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > > * config/i386/sse.md (vec_set): Support both constant
> > > > and variable index vec_set.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > >
> > > +;; True for registers, or const_int_operand, used to vec_setm expander.
> > > +(define_predicate "vec_setm_operand"
> > > +  (ior (and (match_operand 0 "register_operand")
> > > +(match_test "TARGET_AVX2"))
> > > +   (match_code "const_int")))
> > > +
> > >  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
> > >  (define_predicate "reg_or_pm1_operand"
> > >(ior (match_operand 0 "register_operand")
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index b153a87fb98..1798e5dea75 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > >  (define_expand "vec_set"
> > >[(match_operand:V 0 "register_operand")
> > > (match_operand: 1 "register_operand")
> > > -   (match_operand 2 "const_int_operand")]
> > > +   (match_operand 2 "vec_setm_operand")]
> > >
> > > You need to specify a mode, otherwise a register of any mode can pass 
> > > here.
> > >
> > Yes, theoretically, we only accept integer types. But in 
> > can_vec_set_var_idx_p
> > cut
> > ---
> > bool
> > can_vec_set_var_idx_p (machine_mode vec_mode)
> > {
> >   if (!VECTOR_MODE_P (vec_mode))
> > return false;
> >
> >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> >
> >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> >
> >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
> >  && insn_operand_matches (icode, 1, reg2)
> >  && insn_operand_matches (icode, 2, reg3);
> > }
> > ---
> >
> > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > fail insn_operand_matches (icode, 2, reg3)
> > ---
> > (gdb) p insn_operand_matches(icode,2,reg3)
> > $5 = false
> > (gdb)
> > ---
> >
> > Maybe we need to change
> >
> > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> >
> > to
> >
> > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> >
> > cc Richard Biener, any thoughts?
>
> There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> specify SImode for operand 2 in vec_setM pattern and allow register
> operands. I wonder if and how they manage to generate the pattern.
>
> Uros.

Variable index vec_set is enabled by r11-3486, about two months ago in
[1]. But for the upper two targets, the codes are already there since
GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
those codes are for [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html


-- 
BR,
Hongtao


Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-12 Thread Hongtao Liu via Gcc-patches
On Thu, Nov 12, 2020 at 5:12 PM Hongtao Liu  wrote:
>
> On Thu, Nov 12, 2020 at 4:21 PM Uros Bizjak  wrote:
> >
> > On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu  wrote:
> >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR target/97194
> > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New 
> > > > > function.
> > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > > > * config/i386/sse.md (vec_set): Support both constant
> > > > > and variable index vec_set.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > >
> > > > +;; True for registers, or const_int_operand, used to vec_setm expander.
> > > > +(define_predicate "vec_setm_operand"
> > > > +  (ior (and (match_operand 0 "register_operand")
> > > > +(match_test "TARGET_AVX2"))
> > > > +   (match_code "const_int")))
> > > > +
> > > >  ;; True for registers, or 1 or -1.  Used to optimize double-word 
> > > > shifts.
> > > >  (define_predicate "reg_or_pm1_operand"
> > > >(ior (match_operand 0 "register_operand")
> > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > index b153a87fb98..1798e5dea75 100644
> > > > --- a/gcc/config/i386/sse.md
> > > > +++ b/gcc/config/i386/sse.md
> > > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > > >  (define_expand "vec_set"
> > > >[(match_operand:V 0 "register_operand")
> > > > (match_operand: 1 "register_operand")
> > > > -   (match_operand 2 "const_int_operand")]
> > > > +   (match_operand 2 "vec_setm_operand")]
> > > >
> > > > You need to specify a mode, otherwise a register of any mode can pass 
> > > > here.
> > > >
> > > Yes, theoretically, we only accept integer types. But in 
> > > can_vec_set_var_idx_p
> > > cut
> > > ---
> > > bool
> > > can_vec_set_var_idx_p (machine_mode vec_mode)
> > > {
> > >   if (!VECTOR_MODE_P (vec_mode))
> > > return false;
> > >
> > >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> > >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> > >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> > >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > >
> > >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> > >
> > >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, 
> > > reg1)
> > >  && insn_operand_matches (icode, 1, reg2)
> > >  && insn_operand_matches (icode, 2, reg3);
> > > }
> > > ---
> > >
> > > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > > fail insn_operand_matches (icode, 2, reg3)
> > > ---
> > > (gdb) p insn_operand_matches(icode,2,reg3)
> > > $5 = false
> > > (gdb)
> > > ---
> > >
> > > Maybe we need to change
> > >
> > > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > >
> > > to
> > >
> > > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> > >
> > > cc Richard Biener, any thoughts?
> >
> > There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> > specify SImode for operand 2 in vec_setM pattern and allow register
> > operands. I wonder if and how they manage to generate the pattern.
> >
> > Uros.
>
> Variable index vec_set is enabled by r11-3486, about two months ago in
> [1]. But for the upper two targets, the codes are already there since
> GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
> those codes are for [1].
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html
>
>
> --
> BR,
> Hongtao

Correct [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html

-- 
BR,
Hongtao


Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-12 Thread Hongtao Liu via Gcc-patches
On Thu, Nov 12, 2020 at 5:15 PM Hongtao Liu  wrote:
>
> On Thu, Nov 12, 2020 at 5:12 PM Hongtao Liu  wrote:
> >
> > On Thu, Nov 12, 2020 at 4:21 PM Uros Bizjak  wrote:
> > >
> > > On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu  wrote:
> > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > PR target/97194
> > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New 
> > > > > > function.
> > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > > > > * config/i386/sse.md (vec_set): Support both constant
> > > > > > and variable index vec_set.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > > >
> > > > > +;; True for registers, or const_int_operand, used to vec_setm 
> > > > > expander.
> > > > > +(define_predicate "vec_setm_operand"
> > > > > +  (ior (and (match_operand 0 "register_operand")
> > > > > +(match_test "TARGET_AVX2"))
> > > > > +   (match_code "const_int")))
> > > > > +
> > > > >  ;; True for registers, or 1 or -1.  Used to optimize double-word 
> > > > > shifts.
> > > > >  (define_predicate "reg_or_pm1_operand"
> > > > >(ior (match_operand 0 "register_operand")
> > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > index b153a87fb98..1798e5dea75 100644
> > > > > --- a/gcc/config/i386/sse.md
> > > > > +++ b/gcc/config/i386/sse.md
> > > > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > > > >  (define_expand "vec_set"
> > > > >[(match_operand:V 0 "register_operand")
> > > > > (match_operand: 1 "register_operand")
> > > > > -   (match_operand 2 "const_int_operand")]
> > > > > +   (match_operand 2 "vec_setm_operand")]
> > > > >
> > > > > You need to specify a mode, otherwise a register of any mode can pass 
> > > > > here.
> > > > >
> > > > Yes, theoretically, we only accept integer types. But in 
> > > > can_vec_set_var_idx_p
> > > > cut
> > > > ---
> > > > bool
> > > > can_vec_set_var_idx_p (machine_mode vec_mode)
> > > > {
> > > >   if (!VECTOR_MODE_P (vec_mode))
> > > > return false;
> > > >
> > > >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> > > >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> > > >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> > > >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > >
> > > >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> > > >
> > > >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, 
> > > > reg1)
> > > >  && insn_operand_matches (icode, 1, reg2)
> > > >  && insn_operand_matches (icode, 2, reg3);
> > > > }
> > > > ---
> > > >
> > > > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > > > fail insn_operand_matches (icode, 2, reg3)
> > > > ---
> > > > (gdb) p insn_operand_matches(icode,2,reg3)
> > > > $5 = false
> > > > (gdb)
> > > > ---
> > > >
> > > > Maybe we need to change
> > > >
> > > > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > >
> > > > to
> > > >
> > > > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> > > >
> > > > cc Richard Biener, any thoughts?
> > >
> > > There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> > > specify SImode for operand 2 in vec_setM pattern and allow register
> > > operands. I wonder if and how they manage to generate the pattern.
> > >
> > > Uros.
> >
> > Variable index vec_set is enabled by r11-3486, about two months ago in
> > [1]. But for the upper two targets, the codes are already there since
> > GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
> > those codes are for [1].
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html
> >
> >
> > --
> > BR,
> > Hongtao
>
> Correct [1] 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
>
> --
> BR,
> Hongtao

in https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554592.html

It says

> >> +can_vec_set_var_idx_p (enum tree_code code, machine_mode vec_mode,
> >> +  machine_mode value_mode, machine_mode idx_mode)
> >
> > toplevel comment missing
> >
> >> +{
> >> +  gcc_assert (code == VECTOR_TYPE);
> >
> > what's the point of pasing 'code' here then?  Since the optab only has a 
> > single
> > mode, the vector mode, the value_mode is redundant as well.  And I guess
> > we might want to handle "arbitrary" index modes?  That is, the .md expanders
> > shoul

Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jan Hubicka
Hi,
with ipa-icf we often run into problem that operand_equal_p does not
match ADDR_EXPR that take address of fields from two different instances
of same class (at ideantical offsets).  Similar problem can also happen
for record types with LTO if they did not get tree merged.
This patch makes fold-const to compare offsets rather then pinter
equality of FIELD_DECLs. This is done in OEP_ADDRESS_OF mode only sinc
it is not TBAA safe and the TBAA side should be correctly solved by my
ICF patch.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* fold-const.c (operand_compare::operand_equal_p): When comparing 
addresses
look info field offsets for COMPONENT_REFs.
(operand_compare::hash_operand): Likewise.
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index c47557daeba..a4e8cccb1b7 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3312,9 +3312,41 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
case COMPONENT_REF:
  /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
 may be NULL when we're called to compare MEM_EXPRs.  */
- if (!OP_SAME_WITH_NULL (0)
- || !OP_SAME (1))
+ if (!OP_SAME_WITH_NULL (0))
return false;
+ /* Most of time we only need to compare FIELD_DECLs for equality.
+However when determining address look into actual offsets.
+These may match for unions and unshared record types.  */
+ if (!OP_SAME (1))
+   {
+ if (flags & OEP_ADDRESS_OF)
+   {
+ tree field0 = TREE_OPERAND (arg0, 1);
+ tree field1 = TREE_OPERAND (arg1, 1);
+ tree type0 = DECL_CONTEXT (field0);
+ tree type1 = DECL_CONTEXT (field1);
+
+ if (TREE_CODE (type0) == RECORD_TYPE
+ && DECL_BIT_FIELD_REPRESENTATIVE (field0))
+   field0 = DECL_BIT_FIELD_REPRESENTATIVE (field0);
+ if (TREE_CODE (type1) == RECORD_TYPE
+ && DECL_BIT_FIELD_REPRESENTATIVE (field1))
+   field1 = DECL_BIT_FIELD_REPRESENTATIVE (field1);
+ /* Assume that different FIELD_DECLs never overlap within a
+RECORD_TYPE.  */
+ if (type0 == type1 && TREE_CODE (type0) == RECORD_TYPE)
+   return false;
+ if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
+   DECL_FIELD_OFFSET (field1),
+   flags & ~OEP_ADDRESS_OF)
+ || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
+  DECL_FIELD_BIT_OFFSET (field1),
+  flags & ~OEP_ADDRESS_OF))
+   return false;
+   }
+ else
+   return false;
+   }
  flags &= ~OEP_ADDRESS_OF;
  return OP_SAME_WITH_NULL (2);
 
@@ -3787,9 +3819,28 @@ operand_compare::hash_operand (const_tree t, 
inchash::hash &hstate,
  sflags = flags;
  break;
 
+   case COMPONENT_REF:
+ if (flags & OEP_ADDRESS_OF)
+   {
+ tree field = TREE_OPERAND (t, 1);
+ tree type = DECL_CONTEXT (field);
+
+ if (TREE_CODE (type) == RECORD_TYPE
+ && DECL_BIT_FIELD_REPRESENTATIVE (field))
+   field = DECL_BIT_FIELD_REPRESENTATIVE (field);
+
+ hash_operand (TREE_OPERAND (t, 0), hstate, flags);
+ hash_operand (DECL_FIELD_OFFSET (field),
+   hstate, flags & ~OEP_ADDRESS_OF);
+ hash_operand (DECL_FIELD_BIT_OFFSET (field),
+   hstate, flags & ~OEP_ADDRESS_OF);
+ hash_operand (TREE_OPERAND (t, 2), hstate,
+   flags & ~OEP_ADDRESS_OF);
+ return;
+   }
+ break;
case ARRAY_REF:
case ARRAY_RANGE_REF:
-   case COMPONENT_REF:
case BIT_FIELD_REF:
  sflags &= ~OEP_ADDRESS_OF;
  break;


Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 12, 2020 at 10:36:28AM +0100, Jan Hubicka wrote:
>   * fold-const.c (operand_compare::operand_equal_p): When comparing 
> addresses
>   look info field offsets for COMPONENT_REFs.
>   (operand_compare::hash_operand): Likewise.
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index c47557daeba..a4e8cccb1b7 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -3312,9 +3312,41 @@ operand_compare::operand_equal_p (const_tree arg0, 
> const_tree arg1,
>   case COMPONENT_REF:
> /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
>may be NULL when we're called to compare MEM_EXPRs.  */
> -   if (!OP_SAME_WITH_NULL (0)
> -   || !OP_SAME (1))
> +   if (!OP_SAME_WITH_NULL (0))
>   return false;
> +   /* Most of time we only need to compare FIELD_DECLs for equality.
> +  However when determining address look into actual offsets.
> +  These may match for unions and unshared record types.  */
> +   if (!OP_SAME (1))
> + {
> +   if (flags & OEP_ADDRESS_OF)
> + {
> +   tree field0 = TREE_OPERAND (arg0, 1);
> +   tree field1 = TREE_OPERAND (arg1, 1);
> +   tree type0 = DECL_CONTEXT (field0);
> +   tree type1 = DECL_CONTEXT (field1);
> +
> +   if (TREE_CODE (type0) == RECORD_TYPE
> +   && DECL_BIT_FIELD_REPRESENTATIVE (field0))
> + field0 = DECL_BIT_FIELD_REPRESENTATIVE (field0);
> +   if (TREE_CODE (type1) == RECORD_TYPE
> +   && DECL_BIT_FIELD_REPRESENTATIVE (field1))
> + field1 = DECL_BIT_FIELD_REPRESENTATIVE (field1);
> +   /* Assume that different FIELD_DECLs never overlap within a
> +  RECORD_TYPE.  */
> +   if (type0 == type1 && TREE_CODE (type0) == RECORD_TYPE)
> + return false;
> +   if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
> + DECL_FIELD_OFFSET (field1),
> + flags & ~OEP_ADDRESS_OF)
> +   || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
> +DECL_FIELD_BIT_OFFSET (field1),
> +flags & ~OEP_ADDRESS_OF))

If it is an address, why do you need to handle
DECL_BIT_FIELD_REPRESENTATIVE?  Taking address of a bit-field is not allowed.
Perhaps just return false if the fields are bit-fields (or assert they
aren't), and just compare DECL_FIELD*OFFSET of the fields themselves?

Jakub



Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jan Hubicka
> > + if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
> > +   DECL_FIELD_OFFSET (field1),
> > +   flags & ~OEP_ADDRESS_OF)
> > + || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
> > +  DECL_FIELD_BIT_OFFSET (field1),
> > +  flags & ~OEP_ADDRESS_OF))
> 
> If it is an address, why do you need to handle
> DECL_BIT_FIELD_REPRESENTATIVE?  Taking address of a bit-field is not allowed.
> Perhaps just return false if the fields are bit-fields (or assert they
> aren't), and just compare DECL_FIELD*OFFSET of the fields themselves?

I took the code from nonoverlapping_component_refs_p_1, however in
compare_ao_refs i call compare_operands on OEP_ADDRESS for memory
operands, so it would be useful there.  I think it makes sense in that
context - in order to match memory acesses for equivalence we want to
first compare that they access same memory location...

Honza
> 
>   Jakub
> 


Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 12, 2020 at 10:49:40AM +0100, Jan Hubicka wrote:
> > > +   if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
> > > + DECL_FIELD_OFFSET (field1),
> > > + flags & ~OEP_ADDRESS_OF)
> > > +   || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
> > > +DECL_FIELD_BIT_OFFSET (field1),
> > > +flags & ~OEP_ADDRESS_OF))
> > 
> > If it is an address, why do you need to handle
> > DECL_BIT_FIELD_REPRESENTATIVE?  Taking address of a bit-field is not 
> > allowed.
> > Perhaps just return false if the fields are bit-fields (or assert they
> > aren't), and just compare DECL_FIELD*OFFSET of the fields themselves?
> 
> I took the code from nonoverlapping_component_refs_p_1, however in
> compare_ao_refs i call compare_operands on OEP_ADDRESS for memory
> operands, so it would be useful there.  I think it makes sense in that
> context - in order to match memory acesses for equivalence we want to
> first compare that they access same memory location...

If OEP_ADDRESS is used also on non-addressable stuff, just to compare
that two COMPONENT_REFs access the same memory, then just comparing
DECL_BIT_FIELD_REPRESENTATIVE is not sufficient, you could have:
struct S { int c; int a : 7, b : 1; };
struct T { int c; int a : 7, b : 1; };
and compare s->a vs. t->b with OEP_ADDRESS and the offsets of their
DECL_BIT_FIELD_REPRESENATIVE is the same, yet we don't want to say
the two bit-fields are the same.

Jakub



[PATCH] tree-optimization/97806 - fix PRE expression post order

2020-11-12 Thread Richard Biener
This fixes the postorder compute for the case of multiple
expression leaders for a value.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-12  Richard Biener  

PR tree-optimization/97806
* tree-ssa-pre.c (pre_expr_DFS): New overload for visiting
values, visiting all leaders for a value.  Use a bitmap
for visited values.
(sorted_array_from_bitmap_set): Walk over values and adjust.

* gcc.dg/pr97806.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr97806.c | 16 
 gcc/tree-ssa-pre.c | 70 +++---
 2 files changed, 56 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr97806.c

diff --git a/gcc/testsuite/gcc.dg/pr97806.c b/gcc/testsuite/gcc.dg/pr97806.c
new file mode 100644
index 000..9ec3299c0b1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr97806.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int b;
+long c;
+int g();
+void h(long *);
+void i(long *);
+void d() {
+  int e, f = b - e;
+  if (g())
+h(&c + f);
+  else
+i(&c + f);
+  __builtin_memset(0, 0, f * 8);
+}
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 10d223bd365..eb181735e7f 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -805,16 +805,36 @@ bitmap_set_free (bitmap_set_t set)
   bitmap_clear (&set->values);
 }
 
+static void
+pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap expr_visited,
+ bitmap val_visited, vec &post);
+
+/* DFS walk leaders of VAL to their operands with leaders in SET, collecting
+   expressions in SET in postorder into POST.  */
+
+static void
+pre_expr_DFS (unsigned val, bitmap_set_t set, bitmap expr_visited,
+ bitmap val_visited, vec &post)
+{
+  unsigned int i;
+  bitmap_iterator bi;
+
+  /* Iterate over all leaders and DFS recurse.  Borrowed from
+ bitmap_find_leader.  */
+  bitmap exprset = value_expressions[val];
+  EXECUTE_IF_AND_IN_BITMAP (exprset, &set->expressions, 0, i, bi)
+pre_expr_DFS (expression_for_id (i),
+ set, expr_visited, val_visited, post);
+}
 
 /* DFS walk EXPR to its operands with leaders in SET, collecting
expressions in SET in postorder into POST.  */
 
 static void
-pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap visited,
- hash_set > &leader_set,
- vec &post)
+pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap expr_visited,
+ bitmap val_visited, vec &post)
 {
-  if (!bitmap_set_bit (visited, get_expression_id (expr)))
+  if (!bitmap_set_bit (expr_visited, get_expression_id (expr)))
 return;
 
   switch (expr->kind)
@@ -829,12 +849,9 @@ pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap 
visited,
unsigned int op_val_id = VN_INFO (nary->op[i])->value_id;
/* If we already found a leader for the value we've
   recursed already.  Avoid the costly bitmap_find_leader.  */
-   if (!leader_set.add (op_val_id))
- {
-   pre_expr leader = bitmap_find_leader (set, op_val_id);
-   if (leader)
- pre_expr_DFS (leader, set, visited, leader_set, post);
- }
+   if (bitmap_bit_p (&set->values, op_val_id)
+   && bitmap_set_bit (val_visited, op_val_id))
+ pre_expr_DFS (op_val_id, set, expr_visited, val_visited, post);
  }
break;
   }
@@ -854,12 +871,10 @@ pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap 
visited,
if (!op[n] || TREE_CODE (op[n]) != SSA_NAME)
  continue;
unsigned op_val_id = VN_INFO (op[n])->value_id;
-   if (!leader_set.add (op_val_id))
- {
-   pre_expr leader = bitmap_find_leader (set, op_val_id);
-   if (leader)
- pre_expr_DFS (leader, set, visited, leader_set, post);
- }
+   if (bitmap_bit_p (&set->values, op_val_id)
+   && bitmap_set_bit (val_visited, op_val_id))
+ pre_expr_DFS (op_val_id,
+   set, expr_visited, val_visited, post);
  }
  }
break;
@@ -879,20 +894,15 @@ sorted_array_from_bitmap_set (bitmap_set_t set)
   vec result;
 
   /* Pre-allocate enough space for the array.  */
-  size_t len = bitmap_count_bits (&set->expressions);
-  result.create (len);
-  hash_set > leader_set (2*len);
-
-  auto_bitmap visited (&grand_bitmap_obstack);
-  bitmap_tree_view (visited);
-  FOR_EACH_EXPR_ID_IN_SET (set, i, bi)
-{
-  pre_expr expr = expression_for_id (i);
-  /* Hoist insertion calls us with a value-set we have to and with,
-do so.  */
-  if (bitmap_set_contains_value (set, get_expr_value_id (expr)))
-   pre_expr_DFS (expr, set, visited, leader_set, result);
-}
+  result.create (bitmap_count_bits (&set->expressions));
+
+  auto_bitmap expr_visited (&grand_bitmap_obstack)

[PATCH 0/2] Use Graphite for OpenACC "kernels" regions

2020-11-12 Thread Frederik Harwath


Hi,
the two following patches implement a new handling of the loops in
OpenACC "kernels" regions which is based on Graphite and which is meant
to replace the current handling based on the "parloops" pass.  This
extends the class of OpenACC codes using "kernels" regions that can be
analysed by GCC's OpenACC implementation considerably.

We would like to incorporate this work into master soon, but further
work will be necessary in the next weeks to resolve some open questions,
clean up the code etc. In particular, the patches cannot be applied on
master currently because they rely on other patches which have not been
committed to master yet, e.g. the re-ordering of the OpenACC passes to
run device lowering after Graphite which has recently been submitted
(subject "Move pass_oacc_device_lower after pass_graphite"), the
transformation pass which converts OpenACC kernels regions to parallel
regions from OG10 (commit 809ea59722263eb6c2d48402e1eed80727134038).

Best regards,
Frederik


Frederik Harwath (2):
  [WIP] OpenACC: Add Graphite-based handling of "auto" loops
  OpenACC: Add Graphite-based "kernels" handling to pass_convert_oacc_kernels

 gcc/c-family/c.opt|   5 +-
 gcc/common.opt|   8 +
 gcc/doc/invoke.texi   |  10 +-
 gcc/doc/passes.texi   |   6 +-
 gcc/flag-types.h  |   1 +
 gcc/gimple-pretty-print.c |   3 +
 gcc/gimple.h  |   9 +-
 gcc/gimplify.c|   1 +
 gcc/graphite-dependences.c|  12 +-
 gcc/graphite-isl-ast-to-gimple.c  |  77 +-
 gcc/graphite-oacc.h   |  90 ++
 gcc/graphite-scop-detection.c | 828 ++
 gcc/graphite-sese-to-poly.c   |  26 +-
 gcc/graphite.c| 403 -
 gcc/graphite.h|  11 +-
 gcc/internal-fn.h |   7 +-
 gcc/omp-expand.c  |  89 +-
 gcc/omp-general.c |  19 +-
 gcc/omp-general.h |   1 +
 gcc/omp-low.c |  76 +-
 gcc/omp-oacc-kernels.c|  59 +-
 gcc/omp-offload.c | 223 -
 gcc/predict.c |   2 +-
 .../goacc/kernels-conversion-parloops.c   |  61 ++
 .../c-c++-common/goacc/kernels-conversion.c   |  12 +-
 .../graphite/alias-0-no-runtime-check.c   |  20 +
 .../gcc.dg/graphite/alias-0-runtime-check.c   |  21 +
 gcc/testsuite/gcc.dg/graphite/alias-1.c   |  22 +
 .../gfortran.dg/goacc/kernels-reductions.f90  |  37 +
 gcc/tree-chrec-oacc.h |  45 +
 gcc/tree-chrec.c  |  16 +-
 gcc/tree-data-ref.c   | 112 ++-
 gcc/tree-data-ref.h   |   8 +-
 gcc/tree-loop-distribution.c  |  17 +-
 gcc/tree-parloops.c   |  16 +-
 gcc/tree-scalar-evolution.c   | 257 +-
 gcc/tree-ssa-loop-ivcanon.c   |   9 +-
 gcc/tree-ssa-loop-niter.c |  13 +
 gcc/tree-ssa-loop.c   |  10 +
 39 files changed, 2265 insertions(+), 377 deletions(-)
 create mode 100644 gcc/graphite-oacc.h
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-conversion-parloops.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
 create mode 100644 gcc/tree-chrec-oacc.h

--
2.17.1
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH 1/2] [WIP] OpenACC: Add Graphite-base handling of "auto" loops

2020-11-12 Thread Frederik Harwath


This patch enables the use of Graphite for the analysis of OpenACC
"auto" loops. The goal is to decide if a loop may be parallelized
(i.e. converted to an "independent" loop) or not.  Graphite and the
functionality on which it relies (scalar evolution, data references) are
extended to interpret the internal representation of OpenACC loop
constructs that is encoded (e.g. through calls to OpenACC-specific
internal functions) in the OpenACC outlined functions (".omp_fn") and to
ignore some artifacts of the outlining process that are not relevant for
the analysis the original loops (e.g. pointers introduced for the
purpose of offloading are irrelevant to the question whether the
original loops can be parallelized or not). This is done in a way that
does not impact code which does not use OpenACC.  Furthermore, Graphite
is extended by functionality that extends its applicability to
real-world code (e.g. runtime alias checking).  The OpenACC lowering is
extended to use the result of Graphite's analysis to assign
"independent" clauses to loops.
---
 gcc/common.opt|   8 +
 gcc/graphite-dependences.c|  12 +-
 gcc/graphite-isl-ast-to-gimple.c  |  77 +-
 gcc/graphite-oacc.h   |  90 ++
 gcc/graphite-scop-detection.c | 828 ++
 gcc/graphite-sese-to-poly.c   |  26 +-
 gcc/graphite.c| 403 -
 gcc/graphite.h|  11 +-
 gcc/internal-fn.h |   7 +-
 gcc/omp-expand.c  |  26 +-
 gcc/omp-offload.c | 173 +++-
 gcc/predict.c |   2 +-
 .../graphite/alias-0-no-runtime-check.c   |  20 +
 .../gcc.dg/graphite/alias-0-runtime-check.c   |  21 +
 gcc/testsuite/gcc.dg/graphite/alias-1.c   |  22 +
 gcc/tree-chrec-oacc.h |  45 +
 gcc/tree-chrec.c  |  16 +-
 gcc/tree-data-ref.c   | 112 ++-
 gcc/tree-data-ref.h   |   8 +-
 gcc/tree-loop-distribution.c  |  17 +-
 gcc/tree-scalar-evolution.c   | 257 +-
 gcc/tree-ssa-loop-ivcanon.c   |   9 +-
 gcc/tree-ssa-loop-niter.c |  13 +
 23 files changed, 1870 insertions(+), 333 deletions(-)
 create mode 100644 gcc/graphite-oacc.h
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c
 create mode 100644 gcc/tree-chrec-oacc.h

diff --git a/gcc/common.opt b/gcc/common.opt
index dfed6ec76ba..caaeaa1aa6f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1600,6 +1600,14 @@ fgraphite-identity
 Common Report Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation.

+fgraphite-non-affine-accesses
+Common Report Var(flag_graphite_non_affine_accesses) Init(0)
+Allow Graphite to handle non-affine data accesses.
+
+fgraphite-runtime-alias-checks
+Common Report Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
+Allow Graphite to add runtime alias checks to loops if aliasing cannot be 
resolved statically.
+
 fhoist-adjacent-loads
 Common Report Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index 7078c949800..76ba027cdf3 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -82,7 +82,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads,
  {
if (dump_file)
  {
-   fprintf (dump_file, "Adding read to depedence graph: ");
+   fprintf (dump_file, "Adding read to dependence graph: ");
print_pdr (dump_file, pdr);
  }
isl_union_map *um
@@ -90,7 +90,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads,
reads = isl_union_map_union (reads, um);
if (dump_file)
  {
-   fprintf (dump_file, "Reads depedence graph: ");
+   fprintf (dump_file, "Reads dependence graph: ");
print_isl_union_map (dump_file, reads);
  }
  }
@@ -98,7 +98,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads,
  {
if (dump_file)
  {
-   fprintf (dump_file, "Adding must write to depedence graph: ");
+   fprintf (dump_file, "Adding must write to dependence graph: ");
print_pdr (dump_file, pdr);
  }
isl_union_map *um
@@ -106,7 +106,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map 
*&reads,
must_writes = isl_union_map_union (must_writes, um);
if (dump_file)
 

[PATCH 2/2] OpenACC: Add Graphite-based "kernels" handling to pass_convert_oacc_kernels

2020-11-12 Thread Frederik Harwath


This patch changes the "kernels" conversion to route loops in OpenACC
"kernels" regions through Graphite. This is done by converting the loops
in "kernels" regions which are not yet known to be "independent" to
"auto" loops as in the current (OG10) "parloops" based "kernels"
handling. Afterwards, the "kernels" regions will now be treated
essentially like "parallel" regions. A new internal target kind however
still enables to distinguish between the types of regions which is
useful for diagnostic messages.

The old "parloops" based "kernels" handling will be deprecated, but is
still available through the command line options
"-fopenacc-kernels=split-parloops" and "-fopenacc-kernels=parloops".
---
 gcc/c-family/c.opt|  5 +-
 gcc/doc/invoke.texi   | 10 ++-
 gcc/doc/passes.texi   |  6 +-
 gcc/flag-types.h  |  1 +
 gcc/gimple-pretty-print.c |  3 +
 gcc/gimple.h  |  9 ++-
 gcc/gimplify.c|  1 +
 gcc/omp-expand.c  | 63 +--
 gcc/omp-general.c | 19 -
 gcc/omp-general.h |  1 +
 gcc/omp-low.c | 76 +++
 gcc/omp-oacc-kernels.c| 59 --
 gcc/omp-offload.c | 50 +++-
 .../goacc/kernels-conversion-parloops.c   | 61 +++
 .../c-c++-common/goacc/kernels-conversion.c   | 12 +--
 .../gfortran.dg/goacc/kernels-reductions.f90  | 37 +
 gcc/tree-parloops.c   | 16 +++-
 gcc/tree-ssa-loop.c   | 10 +++
 18 files changed, 395 insertions(+), 44 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-conversion-parloops.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 4ef7ea76aa1..255ff84ca4b 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1747,7 +1747,7 @@ Specify default OpenACC compute dimensions.

 fopenacc-kernels=
 C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) 
Var(flag_openacc_kernels) Init(OPENACC_KERNELS_SPLIT)
--fopenacc-kernels=[split|parloops] Configure OpenACC 'kernels' constructs 
handling.
+-fopenacc-kernels=[split|split-parloops|parloops]  Configure OpenACC 
'kernels' constructs handling.

 Enum
 Name(openacc_kernels) Type(enum openacc_kernels)
@@ -1755,6 +1755,9 @@ Name(openacc_kernels) Type(enum openacc_kernels)
 EnumValue
 Enum(openacc_kernels) String(split) Value(OPENACC_KERNELS_SPLIT)

+EnumValue
+Enum(openacc_kernels) String(split-parloops) 
Value(OPENACC_KERNELS_SPLIT_PARLOOPS)
+
 EnumValue
 Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fe04b4d8e6a..d713d6ae8ab 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2266,12 +2266,20 @@ permitted.
 @opindex fopenacc-kernels
 @cindex OpenACC accelerator programming
 Configure OpenACC 'kernels' constructs handling.
+
 With @option{-fopenacc-kernels=split}, OpenACC 'kernels' constructs
 are split into a sequence of compute constructs, each then handled
-individually.
+individually. The data dependence analysis that is necessary to
+determine if loops can be parallelized is performed by the Graphite
+pass.
 This is the default.
+With @option{-fopenacc-kernels=split-parloops}, OpenACC 'kernels' constructs
+are split into a sequence of compute constructs, each then handled
+individually.
+This is deprecated.
 With @option{-fopenacc-kernels=parloops}, the whole OpenACC
 'kernels' constructs is handled by the @samp{parloops} pass.
+This is deprecated.

 @item -fopenmp
 @opindex fopenmp
diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index 7424690dac3..5dda056a2bb 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -248,9 +248,9 @@ constraints in order to generate the points-to sets.  It is 
located in

 This is a pass group for processing OpenACC kernels regions.  It is a
 subpass of the IPA OpenACC pass group that runs on offloaded functions
-containing OpenACC kernels loops.  It is located in
-@file{tree-ssa-loop.c} and is described by
-@code{pass_ipa_oacc_kernels}.
+containing OpenACC kernels loops if @samp{parloops} based handling of
+kernels regions is used. It is located in @file{tree-ssa-loop.c} and
+is described by @code{pass_ipa_oacc_kernels}.

 @item Target clone

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index e2255a56745..058c4e214af 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -376,6 +376,7 @@ enum cf_protection_level
 enum openacc_kernels
 {
   OPENACC_KERNELS_SPLIT,
+  OPENACC_KERNELS_SPLIT_PARLOOPS,
   OPENACC_KERNELS_PARLOOPS
 };

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 54a6d318dc5..b4a2

[PATCH][RFC] Make mingw-w64 printf/scanf attribute alias to ms_printf/ms_scanf only for C89

2020-11-12 Thread Jonathan Yong via Gcc-patches

libgomp build fails because of the false -Wformat error, even though:
1. Correct C99 inttypes.h macros are used.
2. __mingw_* C99 wrappers are used.
3. The printf attribute is used, but it was aliased to ms_printf

The attached patch makes mingw-w64 printf attribute equivalent to other 
platforms on C99 or later. This allows libgomp to build again with 
-Werror on. This patch should not affect the original mingw.org 
distribution in any way.


For C99 or later, the mingw-w64 headers already wrap printf/scanf 
properly, and inttypes.h also gives the correct C99 specifiers, so it 
makes sense to treat the printf attribute as C99 compliant. Under C89 
mode, the headers would produce MS specific specifiers, so the printf 
attribute under C89 reverts to the old behavior of being aliased to 
ms_printf.


This might break other code that assumes differently however. I don't 
think there is a solution to satisfy everyone, but at least this allows 
C99/C++11 compliant code to build again with -Werror. Comments?
From 397097aba5a098f18da92704e9ca7560adb4f29c Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Thu, 12 Nov 2020 06:45:00 +
Subject: [PATCH] gcc: make mingw-w64 printf/scanf attribute gnu equivalents in
 C99

Makes printf/scanf attribute equivalent to gnu_printf and gnu_scanf
in C99 mode. Fixes false positive warnings for functions with printf
attribute, even when correct C99 style printf specifiers are used.

12-11-2020  Jonathan Yong  <10wa...@gmail.com>

gcc/ChangeLog:

	* c-family/c-format.c
	(convert_format_name_to_system_name):
	Turn TARGET_OVERRIDES_FORMAT_ATTRIBUTES into a callback
	that returns a list of attribute aliases.
	* config/i386/mingw-w64.h (TARGET_OVERRIDES_FORMAT_C89):
	Define. Tell GCC to use ms_printf and ms_scanf in C89.
	* config/i386/mingw32.h:
	(TARGET_OVERRIDES_FORMAT_ATTRIBUTES):
	Point to wrapper function and update description.
	(TARGET_OVERRIDES_FORMAT_ATTRIBUTES_COUNT): remove.
	* config/i386/msformat-c.c:
	(mingw_format_attribute_overrides):
	New callback.
	(mingw_format_attribute_overrides_table): make null
	terminated.
---
 gcc/c-family/c-format.c  | 28 
 gcc/config/i386/mingw-w64.h  |  4 
 gcc/config/i386/mingw32.h|  8 ++--
 gcc/config/i386/msformat-c.c | 23 ---
 4 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 77d24ad94e4..4edf4c64f79 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -5077,7 +5077,7 @@ extern const format_kind_info TARGET_FORMAT_TYPES[];
 #endif
 
 #ifdef TARGET_OVERRIDES_FORMAT_ATTRIBUTES
-extern const target_ovr_attr TARGET_OVERRIDES_FORMAT_ATTRIBUTES[];
+extern const target_ovr_attr* TARGET_OVERRIDES_FORMAT_ATTRIBUTES (int format_std_version);
 #endif
 #ifdef TARGET_OVERRIDES_FORMAT_INIT
   extern void TARGET_OVERRIDES_FORMAT_INIT (void);
@@ -5102,6 +5102,9 @@ static const char *
 convert_format_name_to_system_name (const char *attr_name)
 {
   int i;
+#ifdef TARGET_OVERRIDES_FORMAT_ATTRIBUTES
+  const target_ovr_attr* override_attributes;
+#endif
 
   if (attr_name == NULL || *attr_name == 0
   || strncmp (attr_name, "gcc_", 4) == 0)
@@ -5112,18 +5115,19 @@ convert_format_name_to_system_name (const char *attr_name)
 
 #ifdef TARGET_OVERRIDES_FORMAT_ATTRIBUTES
   /* Check if format attribute is overridden by target.  */
-  if (TARGET_OVERRIDES_FORMAT_ATTRIBUTES != NULL
-  && TARGET_OVERRIDES_FORMAT_ATTRIBUTES_COUNT > 0)
+  override_attributes = TARGET_OVERRIDES_FORMAT_ATTRIBUTES(C_STD_VER);
+  if (override_attributes != NULL)
 {
-  for (i = 0; i < TARGET_OVERRIDES_FORMAT_ATTRIBUTES_COUNT; ++i)
-{
-  if (cmp_attribs (TARGET_OVERRIDES_FORMAT_ATTRIBUTES[i].named_attr_src,
-			   attr_name))
-return attr_name;
-  if (cmp_attribs (TARGET_OVERRIDES_FORMAT_ATTRIBUTES[i].named_attr_dst,
-			   attr_name))
-return TARGET_OVERRIDES_FORMAT_ATTRIBUTES[i].named_attr_src;
-}
+  for (i = 0;
+	   override_attributes[i].named_attr_src != NULL
+	   && override_attributes[i].named_attr_dst != NULL;
+	   ++i)
+	{
+	if (cmp_attribs (override_attributes[i].named_attr_src, attr_name))
+	  return attr_name;
+	if (cmp_attribs (override_attributes[i].named_attr_dst, attr_name))
+	  return override_attributes[i].named_attr_src;
+  }
 }
 #endif
   /* Otherwise default to gnu format.  */
diff --git a/gcc/config/i386/mingw-w64.h b/gcc/config/i386/mingw-w64.h
index 0d0aa939996..586d151a082 100644
--- a/gcc/config/i386/mingw-w64.h
+++ b/gcc/config/i386/mingw-w64.h
@@ -104,3 +104,7 @@ along with GCC; see the file COPYING3.  If not see
original mingw32.  */
 #undef TARGET_LIBC_HAS_FUNCTION
 #define TARGET_LIBC_HAS_FUNCTION gnu_libc_has_function
+
+/* alias printf/scanf attributes to MS variants when in C89 */
+#undef TARGET_OVERRIDES_FORMAT_C89
+#define TARGET_OVERRIDES_FORMAT_C89
diff --git a/gcc/config/i386/mingw32.h b

Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jan Hubicka
> On Thu, Nov 12, 2020 at 10:49:40AM +0100, Jan Hubicka wrote:
> > > > + if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
> > > > +   DECL_FIELD_OFFSET (field1),
> > > > +   flags & ~OEP_ADDRESS_OF)
> > > > + || !operand_equal_p (DECL_FIELD_BIT_OFFSET 
> > > > (field0),
> > > > +  DECL_FIELD_BIT_OFFSET 
> > > > (field1),
> > > > +  flags & ~OEP_ADDRESS_OF))
> > > 
> > > If it is an address, why do you need to handle
> > > DECL_BIT_FIELD_REPRESENTATIVE?  Taking address of a bit-field is not 
> > > allowed.
> > > Perhaps just return false if the fields are bit-fields (or assert they
> > > aren't), and just compare DECL_FIELD*OFFSET of the fields themselves?
> > 
> > I took the code from nonoverlapping_component_refs_p_1, however in
> > compare_ao_refs i call compare_operands on OEP_ADDRESS for memory
> > operands, so it would be useful there.  I think it makes sense in that
> > context - in order to match memory acesses for equivalence we want to
> > first compare that they access same memory location...
> 
> If OEP_ADDRESS is used also on non-addressable stuff, just to compare
> that two COMPONENT_REFs access the same memory, then just comparing
> DECL_BIT_FIELD_REPRESENTATIVE is not sufficient, you could have:
> struct S { int c; int a : 7, b : 1; };
> struct T { int c; int a : 7, b : 1; };
> and compare s->a vs. t->b with OEP_ADDRESS and the offsets of their
> DECL_BIT_FIELD_REPRESENATIVE is the same, yet we don't want to say
> the two bit-fields are the same.

You are right, I was just thinking of that.  I suppose it indeed makes
more sense to assert that there are no bitfields here and in the AO
comparsion take care of stripping the last bitfield reference and
handling it specially?

Honza
> 
>   Jakub
> 


Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 12, 2020 at 11:29:21AM +0100, Jan Hubicka wrote:
> > If OEP_ADDRESS is used also on non-addressable stuff, just to compare
> > that two COMPONENT_REFs access the same memory, then just comparing
> > DECL_BIT_FIELD_REPRESENTATIVE is not sufficient, you could have:
> > struct S { int c; int a : 7, b : 1; };
> > struct T { int c; int a : 7, b : 1; };
> > and compare s->a vs. t->b with OEP_ADDRESS and the offsets of their
> > DECL_BIT_FIELD_REPRESENATIVE is the same, yet we don't want to say
> > the two bit-fields are the same.
> 
> You are right, I was just thinking of that.  I suppose it indeed makes
> more sense to assert that there are no bitfields here and in the AO
> comparsion take care of stripping the last bitfield reference and
> handling it specially?

Or just compare DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET of the fields
rather than their DECL_BIT_FIELD_REPRESENTATIVE?

Jakub



Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jan Hubicka
> On Thu, Nov 12, 2020 at 11:29:21AM +0100, Jan Hubicka wrote:
> > > If OEP_ADDRESS is used also on non-addressable stuff, just to compare
> > > that two COMPONENT_REFs access the same memory, then just comparing
> > > DECL_BIT_FIELD_REPRESENTATIVE is not sufficient, you could have:
> > > struct S { int c; int a : 7, b : 1; };
> > > struct T { int c; int a : 7, b : 1; };
> > > and compare s->a vs. t->b with OEP_ADDRESS and the offsets of their
> > > DECL_BIT_FIELD_REPRESENATIVE is the same, yet we don't want to say
> > > the two bit-fields are the same.
> > 
> > You are right, I was just thinking of that.  I suppose it indeed makes
> > more sense to assert that there are no bitfields here and in the AO
> > comparsion take care of stripping the last bitfield reference and
> > handling it specially?
> 
> Or just compare DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET of the fields
> rather than their DECL_BIT_FIELD_REPRESENTATIVE?

I think I will need to compare bitfields specially at the ao_ref_compare
side anyway to distinguish

 struct S { int c; int a : 5, b : 1; };
 struct T { int c; int a : 5, b : 3; };

s->b and t->b. Those have same base address (bitwise) but still we do
not want to consider them equal.

So handling this on operand_equal_p is probably not that useful and
perhaps extra sanity check would be.  I will fire boostrap to see if
there is any other place that trips bitfields with OEP_ADDRESS_OF.

Thanks,
Honza


Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 12, 2020 at 11:39:07AM +0100, Jan Hubicka wrote:
> > On Thu, Nov 12, 2020 at 11:29:21AM +0100, Jan Hubicka wrote:
> > > > If OEP_ADDRESS is used also on non-addressable stuff, just to compare
> > > > that two COMPONENT_REFs access the same memory, then just comparing
> > > > DECL_BIT_FIELD_REPRESENTATIVE is not sufficient, you could have:
> > > > struct S { int c; int a : 7, b : 1; };
> > > > struct T { int c; int a : 7, b : 1; };
> > > > and compare s->a vs. t->b with OEP_ADDRESS and the offsets of their
> > > > DECL_BIT_FIELD_REPRESENATIVE is the same, yet we don't want to say
> > > > the two bit-fields are the same.
> > > 
> > > You are right, I was just thinking of that.  I suppose it indeed makes
> > > more sense to assert that there are no bitfields here and in the AO
> > > comparsion take care of stripping the last bitfield reference and
> > > handling it specially?
> > 
> > Or just compare DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET of the fields
> > rather than their DECL_BIT_FIELD_REPRESENTATIVE?
> 
> I think I will need to compare bitfields specially at the ao_ref_compare
> side anyway to distinguish
> 
>  struct S { int c; int a : 5, b : 1; };
>  struct T { int c; int a : 5, b : 3; };
> 
> s->b and t->b. Those have same base address (bitwise) but still we do
> not want to consider them equal.

How is that different from:
struct S { long long d; int e; };
struct T { long long d; long long e; };
s->e vs. t->e ?
One thing is comparison of the address (as it is comparing
DECL_FIELD_BIT_OFFSET too, it is essentially bit-address), and another thing
(unlrelated to OEP_ADDRESS comparisons) is if you need to ensure the access
has the same size, in that case you just compare the bit size of the access...

Jakub



Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jan Hubicka
> On Thu, Nov 12, 2020 at 11:39:07AM +0100, Jan Hubicka wrote:
> > > On Thu, Nov 12, 2020 at 11:29:21AM +0100, Jan Hubicka wrote:
> > > > > If OEP_ADDRESS is used also on non-addressable stuff, just to compare
> > > > > that two COMPONENT_REFs access the same memory, then just comparing
> > > > > DECL_BIT_FIELD_REPRESENTATIVE is not sufficient, you could have:
> > > > > struct S { int c; int a : 7, b : 1; };
> > > > > struct T { int c; int a : 7, b : 1; };
> > > > > and compare s->a vs. t->b with OEP_ADDRESS and the offsets of their
> > > > > DECL_BIT_FIELD_REPRESENATIVE is the same, yet we don't want to say
> > > > > the two bit-fields are the same.
> > > > 
> > > > You are right, I was just thinking of that.  I suppose it indeed makes
> > > > more sense to assert that there are no bitfields here and in the AO
> > > > comparsion take care of stripping the last bitfield reference and
> > > > handling it specially?
> > > 
> > > Or just compare DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET of the fields
> > > rather than their DECL_BIT_FIELD_REPRESENTATIVE?
> > 
> > I think I will need to compare bitfields specially at the ao_ref_compare
> > side anyway to distinguish
> > 
> >  struct S { int c; int a : 5, b : 1; };
> >  struct T { int c; int a : 5, b : 3; };
> > 
> > s->b and t->b. Those have same base address (bitwise) but still we do
> > not want to consider them equal.
> 
> How is that different from:
> struct S { long long d; int e; };
> struct T { long long d; long long e; };
> s->e vs. t->e ?
> One thing is comparison of the address (as it is comparing
> DECL_FIELD_BIT_OFFSET too, it is essentially bit-address), and another thing
> (unlrelated to OEP_ADDRESS comparisons) is if you need to ensure the access
> has the same size, in that case you just compare the bit size of the access...

I believe for that I only need to compare TYPE_SIZE of the access.
Bitfields are special since their types are wider and they do extension.

Honza
> 
>   Jakub
> 


Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jan Hubicka
> > How is that different from:
> > struct S { long long d; int e; };
> > struct T { long long d; long long e; };
> > s->e vs. t->e ?
> > One thing is comparison of the address (as it is comparing
> > DECL_FIELD_BIT_OFFSET too, it is essentially bit-address), and another thing
> > (unlrelated to OEP_ADDRESS comparisons) is if you need to ensure the access
> > has the same size, in that case you just compare the bit size of the 
> > access...
> 
> I believe for that I only need to compare TYPE_SIZE of the access.
> Bitfields are special since their types are wider and they do extension.

So the extra sanity check fires in

0xe3d093 operand_compare::operand_equal_p(tree_node const*, tree_node const*, 
unsigned int)
../../gcc/fold-const.c:3329
0xe4066d operand_equal_p(tree_node const*, tree_node const*, unsigned int)
../../gcc/fold-const.c:3949
0x167fad5 refs_hasher::equal(ref_to_bb const*, ref_to_bb const*)
../../gcc/tree-ssa-phiopt.c:2152
0x1680555 hash_table::find_slot_with_hash(ref_to_bb* const&, unsigned int, 
insert_option)
../../gcc/hash-table.h:981
0x16802d6 hash_table::find_slot(ref_to_bb* 
const&, insert_option)
../../gcc/hash-table.h:435
0x167b4f2 nontrapping_dom_walker::add_or_mark_expr(basic_block_def*, 
tree_node*, bool)
../../gcc/tree-ssa-phiopt.c:2261
0x167b380 nontrapping_dom_walker::before_dom_children(basic_block_def*)
../../gcc/tree-ssa-phiopt.c:2211
0x23705e1 dom_walker::walk(basic_block_def*)
../../gcc/domwalk.c:309
0x167b637 get_non_trapping
../../gcc/tree-ssa-phiopt.c:2308
0x167410e tree_ssa_phiopt_worker
../../gcc/tree-ssa-phiopt.c:177
0x1673f65 tree_ssa_cs_elim
../../gcc/tree-ssa-phiopt.c:127
0x167d84c execute
../../gcc/tree-ssa-phiopt.c:3192

This makes sort of sense: refs hasher is trying to prove that two
accesses go to exactly same location to eliminate NULL pointer checks.
If my ao_ref_compare gets approved (that is I guess not at all clear at
this point) I suppose we could use it here.  We want only accesses to
have same address&size, we do not care about TBAA properties here.

It is not obvious to me how this optimization is safe without
-fnon-call-exceptions though.

I will test the patch without assert for now.

Honza


[Patch] Fortran: improve location data for OpenACC/OpenMP directives [PR97782]

2020-11-12 Thread Tobias Burnus

For code like
 !$acc kernels
... a lot of loops and other code
 !$acc end kernels

gfortran generates
  #pragma ..._kernels
{
  ... lot of code
}

As the PR shows, the location associated with the #pragma
is not the 'acc kernels' line but the one near the 'acc end kernel'
line.

The reason is that first the {...} code is generated and only then
the outer #pragma. And using input_location as location then points to
the wrong line:
 ...
 oacc_clauses = gfc_trans_omp_clauses (...) // fine so far
 stmt = gfc_trans_omp_code (code->block->next, true); // translates {...}
 stmt = build2_loc (input_location, construct_code, ... // wrong location

This patch tries to fix this in most cases; I am sure I missed some and
others could be handled better.

In the testsuite, it affects two tests by moving a 'dg-message' line with
  optimized: assigned OpenACC gang loop parallelism
from the loop-line one up to the 'kernels'-line; I think either location
is fine.

The PR has a testcase (not included) which works with -fopt-info-omp-all.
In principle, it should also have an effect on warnings (if there are
any) and it unsurprisingly affects --fdump-tree-*-lineno.

Comments, remarks, does it look good to you?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran: improve location data for OpenACC/OpenMP directives [PR97782]

gcc/fortran/ChangeLog:

	PR fortran/97782
	* trans-openmp.c (gfc_trans_oacc_construct, gfc_trans_omp_parallel_do,
	gfc_trans_omp_parallel_do_simd, gfc_trans_omp_parallel_sections,
	gfc_trans_omp_parallel_workshare, gfc_trans_omp_sections
	gfc_trans_omp_single, gfc_trans_omp_task, gfc_trans_omp_teams
	gfc_trans_omp_target, gfc_trans_omp_target_data,
	gfc_trans_omp_workshare): Use code->loc instead of input_location
	when building the OMP_/OACC_ construct.

gcc/testsuite/ChangeLog:

	PR fortran/97782
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Move dg-message
	one line up.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.

 gcc/fortran/trans-openmp.c | 50 +++---
 .../goacc/classify-kernels-unparallelized.f95  |  4 +-
 .../gfortran.dg/goacc/classify-kernels.f95 |  4 +-
 3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index d2559bd0c0a..6b4ad6a7050 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3922,8 +3922,8 @@ gfc_trans_oacc_construct (gfc_code *code)
   oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses,
 	code->loc, false, true);
   stmt = gfc_trans_omp_code (code->block->next, true);
-  stmt = build2_loc (input_location, construct_code, void_type_node, stmt,
-		 oacc_clauses);
+  stmt = build2_loc (gfc_get_location (&code->loc), construct_code,
+		 void_type_node, stmt, oacc_clauses);
   gfc_add_expr_to_block (&block, stmt);
   return gfc_finish_block (&block);
 }
@@ -5351,8 +5351,8 @@ gfc_trans_omp_parallel_do (gfc_code *code, stmtblock_t *pblock,
 }
   else if (TREE_CODE (stmt) != BIND_EXPR)
 stmt = build3_v (BIND_EXPR, NULL, stmt, NULL_TREE);
-  stmt = build2_loc (input_location, OMP_PARALLEL, void_type_node, stmt,
-		 omp_clauses);
+  stmt = build2_loc (gfc_get_location (&code->loc), OMP_PARALLEL,
+		 void_type_node, stmt, omp_clauses);
   OMP_PARALLEL_COMBINED (stmt) = 1;
   gfc_add_expr_to_block (&block, stmt);
   return gfc_finish_block (&block);
@@ -5394,8 +5394,8 @@ gfc_trans_omp_parallel_do_simd (gfc_code *code, stmtblock_t *pblock,
 stmt = build3_v (BIND_EXPR, NULL, stmt, NULL_TREE);
   if (flag_openmp)
 {
-  stmt = build2_loc (input_location, OMP_PARALLEL, void_type_node, stmt,
-			 omp_clauses);
+  stmt = build2_loc (gfc_get_location (&code->loc), OMP_PARALLEL,
+			 void_type_node, stmt, omp_clauses);
   OMP_PARALLEL_COMBINED (stmt) = 1;
 }
   gfc_add_expr_to_block (&block, stmt);
@@ -5421,8 +5421,8 @@ gfc_trans_omp_parallel_sections (gfc_code *code)
 stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
   else
 poplevel (0, 0);
-  stmt = build2_loc (input_location, OMP_PARALLEL, void_type_node, stmt,
-		 omp_clauses);
+  stmt = build2_loc (gfc_get_location (&code->loc), OMP_PARALLEL,
+		 void_type_node, stmt, omp_clauses);
   OMP_PARALLEL_COMBINED (stmt) = 1;
   gfc_add_expr_to_block (&block, stmt);
   return gfc_finish_block (&block);
@@ -5444,8 +5444,8 @@ gfc_trans_omp_parallel_workshare (gfc_code *code)
   pushlevel ();
   stmt = gfc_trans_omp_workshare (code, &workshare_clauses);
   stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
-  stmt = build2_loc (input_location, OMP_PARALLEL, void_type_node, stmt,
-		 omp_clauses);
+  stmt = build2_loc (gfc_get_location (&code->loc), OMP_PARALLEL,
+		 void_type_node, stmt, omp_clauses);
   OMP_PARALLEL_COMBINED (s

[PATCH] Avoid PRE insert iteration when possible

2020-11-12 Thread Richard Biener
The following make sure to only iterate PRE insertion when
necessary - which is when AVAIL_OUT of a predecessor of a
block we already visited changed (that's backedge destinations).

To not regress this also makes sure to locally iterate insertion
since even topological sort of expressions isn't enough to
guarantee we get all opportunities of a block in one iteration.
This avoids costly re-compute of the topologically sorted expression
array (more micro-optimization is possible here).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-12  Richard Biener  

* tree-ssa-pre.c (bitmap_value_replace_in_set): Return
whether we have changed anything.
(do_pre_regular_insertion): Get topologically sorted array
of expressions from caller.
(do_pre_partial_partial_insertion): Likewise.
(insert): Compute topologically sorted arrays of expressions
here and locally iterate actual insertion.  Iterate only
when AVAIL_OUT of an already visited block source changed.
---
 gcc/tree-ssa-pre.c | 86 ++
 1 file changed, 57 insertions(+), 29 deletions(-)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index eb181735e7f..9db1b0258f7 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -524,7 +524,7 @@ static struct
 static bool do_partial_partial;
 static pre_expr bitmap_find_leader (bitmap_set_t, unsigned int);
 static void bitmap_value_insert_into_set (bitmap_set_t, pre_expr);
-static void bitmap_value_replace_in_set (bitmap_set_t, pre_expr);
+static bool bitmap_value_replace_in_set (bitmap_set_t, pre_expr);
 static void bitmap_set_copy (bitmap_set_t, bitmap_set_t);
 static bool bitmap_set_contains_value (bitmap_set_t, unsigned int);
 static void bitmap_insert_into_set (bitmap_set_t, pre_expr);
@@ -974,14 +974,14 @@ bitmap_set_equal (bitmap_set_t a, bitmap_set_t b)
 }
 
 /* Replace an instance of EXPR's VALUE with EXPR in SET if it exists,
-   and add it otherwise.  */
+   and add it otherwise.  Return true if any changes were made.  */
 
-static void
+static bool
 bitmap_value_replace_in_set (bitmap_set_t set, pre_expr expr)
 {
   unsigned int val = get_expr_value_id (expr);
   if (value_id_constant_p (val))
-return;
+return false;
 
   if (bitmap_set_contains_value (set, val))
 {
@@ -1002,13 +1002,14 @@ bitmap_value_replace_in_set (bitmap_set_t set, pre_expr 
expr)
  if (bitmap_clear_bit (&set->expressions, i))
{
  bitmap_set_bit (&set->expressions, get_expression_id (expr));
- return;
+ return i != get_expression_id (expr);
}
}
   gcc_unreachable ();
 }
-  else
-bitmap_insert_into_set (set, expr);
+
+  bitmap_insert_into_set (set, expr);
+  return true;
 }
 
 /* Insert EXPR into SET if EXPR's value is not already present in
@@ -3158,7 +3159,8 @@ insert_into_preds_of_block (basic_block block, unsigned 
int exprnum,
  && expr->kind != REFERENCE)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "Skipping insertion of phi for partial 
redundancy: Looks like an induction variable\n");
+   fprintf (dump_file, "Skipping insertion of phi for partial "
+"redundancy: Looks like an induction variable\n");
  nophi = true;
}
 }
@@ -3166,6 +3168,10 @@ insert_into_preds_of_block (basic_block block, unsigned 
int exprnum,
   /* Make the necessary insertions.  */
   FOR_EACH_EDGE (pred, ei, block->preds)
 {
+  /* When we are not inserting a PHI node do not bother inserting
+into places that do not dominate the anticipated computations.  */
+  if (nophi && !dominated_by_p (CDI_DOMINATORS, block, pred->src))
+   continue;
   gimple_seq stmts = NULL;
   tree builtexpr;
   bprime = pred->src;
@@ -3308,15 +3314,14 @@ insert_into_preds_of_block (basic_block block, unsigned 
int exprnum,
 */
 
 static bool
-do_pre_regular_insertion (basic_block block, basic_block dom)
+do_pre_regular_insertion (basic_block block, basic_block dom,
+ vec exprs)
 {
   bool new_stuff = false;
-  vec exprs;
   pre_expr expr;
   auto_vec avail;
   int i;
 
-  exprs = sorted_array_from_bitmap_set (ANTIC_IN (block));
   avail.safe_grow (EDGE_COUNT (block->preds), true);
 
   FOR_EACH_VEC_ELT (exprs, i, expr)
@@ -3464,7 +3469,6 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom)
}
 }
 
-  exprs.release ();
   return new_stuff;
 }
 
@@ -3476,15 +3480,14 @@ do_pre_regular_insertion (basic_block block, 
basic_block dom)
remove the later computation.  */
 
 static bool
-do_pre_partial_partial_insertion (basic_block block, basic_block dom)
+do_pre_partial_partial_insertion (basic_block block, basic_block dom,
+ vec exprs)
 {
   bool new_stuff = false;
-  vec exprs;
   pre_expr expr;
   auto_vec avail;
   int i;
 
-  exprs =

[PATCH] ipa-cp: Work with time benefits and frequencies in sreals

2020-11-12 Thread Martin Jambor
Hi,.

this patch converts the variables that hold time benefits and
frequencies in IPA-CP from plain integers to sreals, avoiding the need
to cap them to avoid overflows and also fixing a potential underflows.

Size costs corresponding to individual constants are left as ints so
that they do not take up too much space.  Care must be taken that
adding it up does not overflow, especially in the case of
prop_size_cost, because in cases of extremely long chains of lattice
dependencies it can overflow (e.g. in testsuite/gcc.dg/ipa/pr50744.c).
The overall size is already tracked in long ints.

Bootstrapped, LTO-bootstrapped and tested on x86_64-linux, OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2020-11-11  Martin Jambor  

* ipa-cp.c (class ipcp_value_base): Change the type of
local_time_benefit and prop_time_benefit to sreal.  Adjust the
constructor initializer.
(ipcp_lattice::print): Dump sreals.
(struct caller_statistics): Change the type of freq_sum to sreal.
(gather_caller_stats): Work with sreal freq_sum.
(incorporate_penalties): Work with sreal evaluation.
(good_cloning_opportunity_p): Adjusted for sreal sreal time_benefit
and freq_sum.  Bail out if size_cost is INT_MAX.
(perform_estimation_of_a_value): Work with sreal time_benefit.  Avoid
unnecessary capping.
(estimate_local_effects): Pass sreal time benefit to
good_cloning_opportunity_p without capping it.  Adjust dumping.
(safe_add): If there can be overflow, return INT_MAX.
(propagate_effects): Work with sreal times.
(get_info_about_necessary_edges): Work with sreal frequencies.
(decide_about_value): Likewise and with sreal time benefits.
---
 gcc/ipa-cp.c | 151 ---
 1 file changed, 82 insertions(+), 69 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 140515668a6..961dd05f03c 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -156,16 +156,22 @@ public:
 class ipcp_value_base
 {
 public:
-  /* Time benefit and size cost that specializing the function for this value
- would bring about in this function alone.  */
-  int local_time_benefit, local_size_cost;
-  /* Time benefit and size cost that specializing the function for this value
- can bring about in it's callees (transitively).  */
-  int prop_time_benefit, prop_size_cost;
+  /* Time benefit and that specializing the function for this value would bring
+ about in this function alone.  */
+  sreal local_time_benefit;
+  /* Time benefit that specializing the function for this value can bring about
+ in it's callees.  */
+  sreal prop_time_benefit;
+  /* Size cost that specializing the function for this value would bring about
+ in this function alone.  */
+  int local_size_cost;
+  /* Size cost that specializing the function for this value can bring about in
+ it's callees.  */
+  int prop_size_cost;
 
   ipcp_value_base ()
-: local_time_benefit (0), local_size_cost (0),
-  prop_time_benefit (0), prop_size_cost (0) {}
+: local_time_benefit (0), prop_time_benefit (0),
+  local_size_cost (0), prop_size_cost (0) {}
 };
 
 /* Describes one particular value stored in struct ipcp_lattice.  */
@@ -499,10 +505,10 @@ ipcp_lattice::print (FILE * f, bool 
dump_sources, bool dump_benefits)
}
 
   if (dump_benefits)
-   fprintf (f, " [loc_time: %i, loc_size: %i, "
-"prop_time: %i, prop_size: %i]\n",
-val->local_time_benefit, val->local_size_cost,
-val->prop_time_benefit, val->prop_size_cost);
+   fprintf (f, " [loc_time: %g, loc_size: %i, "
+"prop_time: %g, prop_size: %i]\n",
+val->local_time_benefit.to_double (), val->local_size_cost,
+val->prop_time_benefit.to_double (), val->prop_size_cost);
 }
   if (!dump_benefits)
 fprintf (f, "\n");
@@ -668,7 +674,8 @@ ipcp_versionable_function_p (struct cgraph_node *node)
 struct caller_statistics
 {
   profile_count count_sum;
-  int n_calls, n_hot_calls, freq_sum;
+  sreal freq_sum;
+  int n_calls, n_hot_calls;
 };
 
 /* Initialize fields of STAT to zeroes.  */
@@ -696,7 +703,7 @@ gather_caller_stats (struct cgraph_node *node, void *data)
   {
 if (cs->count.ipa ().initialized_p ())
  stats->count_sum += cs->count.ipa ();
-   stats->freq_sum += cs->frequency ();
+   stats->freq_sum += cs->sreal_frequency ();
stats->n_calls++;
if (cs->maybe_hot_p ())
  stats->n_hot_calls ++;
@@ -3224,9 +3231,9 @@ hint_time_bonus (cgraph_node *node, const 
ipa_call_estimates &estimates)
 /* If there is a reason to penalize the function described by INFO in the
cloning goodness evaluation, do so.  */
 
-static inline int64_t
+static inline sreal
 incorporate_penalties (cgraph_node *node, ipa_node_params *info,
-  int64_t evaluation)
+  srea

Re: [Patch] Fortran: improve location data for OpenACC/OpenMP directives [PR97782]

2020-11-12 Thread Thomas Schwinge
Hi Tobias!

On 2020-11-12T12:45:24+0100, Tobias Burnus  wrote:
> For code like
>   !$acc kernels
>  ... a lot of loops and other code
>   !$acc end kernels
>
> gfortran generates
>#pragma ..._kernels
>  {
>... lot of code
>  }
>
> As the PR shows, the location associated with the #pragma
> is not the 'acc kernels' line but the one near the 'acc end kernel'
> line.
>
> The reason is that first the {...} code is generated and only then
> the outer #pragma. And using input_location as location then points to
> the wrong line:
>   ...
>   oacc_clauses = gfc_trans_omp_clauses (...) // fine so far
>   stmt = gfc_trans_omp_code (code->block->next, true); // translates {...}
>   stmt = build2_loc (input_location, construct_code, ... // wrong location
>
> This patch tries to fix this in most cases; I am sure I missed some and
> others could be handled better.

I guess we eventually should get rid of *all* such uses of
'input_location' (and then 'gfc_set_backend_locus', etc., too)...  ;-)
(One step at a time.)  :-)

> In the testsuite, it affects two tests by moving a 'dg-message' line with
>optimized: assigned OpenACC gang loop parallelism
> from the loop-line one up to the 'kernels'-line; I think either location
> is fine.

That change is good: for current 'parloops' OpenACC 'kernels', these
diagnostics are actually expected on the 'kernels' directive line, not on
'loop's.

> The PR has a testcase (not included) which works with -fopt-info-omp-all.

Your call whether you'd like to include that one (why not?) -- but then,
I'll establish further/similar testsuite coverage with other pending
patches of mine.

> In principle, it should also have an effect on warnings (if there are
> any) and it unsurprisingly affects --fdump-tree-*-lineno.
>
> Comments, remarks, does it look good to you?

I have not verified all the details, but it conceptually looks good to
me, thanks.

Reviewed-by: Thomas Schwinge 

Will you then please backport that to releases/gcc-10 branch, too?


Grüße
 Thomas


> Fortran: improve location data for OpenACC/OpenMP directives [PR97782]
>
> gcc/fortran/ChangeLog:
>
>   PR fortran/97782
>   * trans-openmp.c (gfc_trans_oacc_construct, gfc_trans_omp_parallel_do,
>   gfc_trans_omp_parallel_do_simd, gfc_trans_omp_parallel_sections,
>   gfc_trans_omp_parallel_workshare, gfc_trans_omp_sections
>   gfc_trans_omp_single, gfc_trans_omp_task, gfc_trans_omp_teams
>   gfc_trans_omp_target, gfc_trans_omp_target_data,
>   gfc_trans_omp_workshare): Use code->loc instead of input_location
>   when building the OMP_/OACC_ construct.
>
> gcc/testsuite/ChangeLog:
>
>   PR fortran/97782
>   * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Move dg-message
>   one line up.
>   * gfortran.dg/goacc/classify-kernels.f95: Likewise.
>
>  gcc/fortran/trans-openmp.c | 50 
> +++---
>  .../goacc/classify-kernels-unparallelized.f95  |  4 +-
>  .../gfortran.dg/goacc/classify-kernels.f95 |  4 +-
>  3 files changed, 30 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
> index d2559bd0c0a..6b4ad6a7050 100644
> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -3922,8 +3922,8 @@ gfc_trans_oacc_construct (gfc_code *code)
>oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses,
>   code->loc, false, true);
>stmt = gfc_trans_omp_code (code->block->next, true);
> -  stmt = build2_loc (input_location, construct_code, void_type_node, stmt,
> -  oacc_clauses);
> +  stmt = build2_loc (gfc_get_location (&code->loc), construct_code,
> +  void_type_node, stmt, oacc_clauses);
>gfc_add_expr_to_block (&block, stmt);
>return gfc_finish_block (&block);
>  }
> @@ -5351,8 +5351,8 @@ gfc_trans_omp_parallel_do (gfc_code *code, stmtblock_t 
> *pblock,
>  }
>else if (TREE_CODE (stmt) != BIND_EXPR)
>  stmt = build3_v (BIND_EXPR, NULL, stmt, NULL_TREE);
> -  stmt = build2_loc (input_location, OMP_PARALLEL, void_type_node, stmt,
> -  omp_clauses);
> +  stmt = build2_loc (gfc_get_location (&code->loc), OMP_PARALLEL,
> +  void_type_node, stmt, omp_clauses);
>OMP_PARALLEL_COMBINED (stmt) = 1;
>gfc_add_expr_to_block (&block, stmt);
>return gfc_finish_block (&block);
> @@ -5394,8 +5394,8 @@ gfc_trans_omp_parallel_do_simd (gfc_code *code, 
> stmtblock_t *pblock,
>  stmt = build3_v (BIND_EXPR, NULL, stmt, NULL_TREE);
>if (flag_openmp)
>  {
> -  stmt = build2_loc (input_location, OMP_PARALLEL, void_type_node, stmt,
> -  omp_clauses);
> +  stmt = build2_loc (gfc_get_location (&code->loc), OMP_PARALLEL,
> +  void_type_node, stmt, omp_clauses);
>OMP_PARALLEL_COMBINED (stmt) = 1;
>  }
>gfc_add_expr_to_block (&block, stmt);
> @@ -5421,

[committed] libstdc++: Fix __numeric_traits_integer<__int20> [PR 97798]

2020-11-12 Thread Jonathan Wakely via Gcc-patches
The expression used to calculate the maximum value for an integer type
assumes that the number of bits in the value representation is always
sizeof(T) * CHAR_BIT. This is not true for the __int20 type on msp430,
which has only 20 bits in the value representation but 32 bits in the
object representation. This causes an integer overflow in a constant
expression, which is ill-formed.

This problem was already solved by DJ for std::numeric_limits<__int20>
by generalizing the helper macros to use a specified number of bits
instead of assuming sizeof(T) * CHAR_BIT. Then the INT_N_n types can
specify the number of bits using the __GLIBCXX_BITSIZE_INT_N_n macros
that the compiler defines.

I'm using a slightly different approach here. I've replaced the helper
macros entirely, and just expanded the calculations in the initializers
for the static data members. By reordering the data members we can reuse
__is_signed and __digits in the other initializers. This removes the
repetition of expanding __glibcxx_signed(T) and __glibcxx_digits(T)
multiple times in each initializer.

The __is_integer_nonstrict trait now defines a new constant, __width,
which is sizeof(T) * CHAR_BIT by default (defined as an enumerator so
that no storage is needed for a static data member). By specializing
__is_integer_nonstrict for the INT_N types that have padding bits, we
can provide the correct width via the __GLIBCXX_BITSIZE_INT_N_n macros.

libstdc++-v3/ChangeLog:

PR libstdc++/97798
* include/ext/numeric_traits.h (__glibcxx_signed)
(__glibcxx_digits, __glibcxx_min, __glibcxx_max): Remove
macros.
(__is_integer_nonstrict::__width): Define new constant.
(__numeric_traits_integer): Define constants in terms of each
other and __is_integer_nonstrict::__width, rather than the
removed macros.
(_GLIBCXX_INT_N_TRAITS): Macro to define explicit
specializations for non-standard integer types.

Bootstrapped a msp430-elf cross-compiler. Tested on powerpc64le-linux.

Committed to trunk.

commit 7f851c33411fc39982c62a91fa93ec02981fd956
Author: Jonathan Wakely 
Date:   Thu Nov 12 10:29:21 2020

libstdc++: Fix __numeric_traits_integer<__int20> [PR 97798]

The expression used to calculate the maximum value for an integer type
assumes that the number of bits in the value representation is always
sizeof(T) * CHAR_BIT. This is not true for the __int20 type on msp430,
which has only 20 bits in the value representation but 32 bits in the
object representation. This causes an integer overflow in a constant
expression, which is ill-formed.

This problem was already solved by DJ for std::numeric_limits<__int20>
by generalizing the helper macros to use a specified number of bits
instead of assuming sizeof(T) * CHAR_BIT. Then the INT_N_n types can
specify the number of bits using the __GLIBCXX_BITSIZE_INT_N_n macros
that the compiler defines.

I'm using a slightly different approach here. I've replaced the helper
macros entirely, and just expanded the calculations in the initializers
for the static data members. By reordering the data members we can reuse
__is_signed and __digits in the other initializers. This removes the
repetition of expanding __glibcxx_signed(T) and __glibcxx_digits(T)
multiple times in each initializer.

The __is_integer_nonstrict trait now defines a new constant, __width,
which is sizeof(T) * CHAR_BIT by default (defined as an enumerator so
that no storage is needed for a static data member). By specializing
__is_integer_nonstrict for the INT_N types that have padding bits, we
can provide the correct width via the __GLIBCXX_BITSIZE_INT_N_n macros.

libstdc++-v3/ChangeLog:

PR libstdc++/97798
* include/ext/numeric_traits.h (__glibcxx_signed)
(__glibcxx_digits, __glibcxx_min, __glibcxx_max): Remove
macros.
(__is_integer_nonstrict::__width): Define new constant.
(__numeric_traits_integer): Define constants in terms of each
other and __is_integer_nonstrict::__width, rather than the
removed macros.
(_GLIBCXX_INT_N_TRAITS): Macro to define explicit
specializations for non-standard integer types.

diff --git a/libstdc++-v3/include/ext/numeric_traits.h 
b/libstdc++-v3/include/ext/numeric_traits.h
index 585ecc0ba9f5..c29f9f21d1aa 100644
--- a/libstdc++-v3/include/ext/numeric_traits.h
+++ b/libstdc++-v3/include/ext/numeric_traits.h
@@ -39,31 +39,23 @@ namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // Compile time constants for builtin types.
-  // In C++98 std::numeric_limits member functions cannot be used for this.
-#define __glibcxx_signed(_Tp) ((_Tp)(-1) < 0)
-#define __glibcxx_digits(_Tp) \
-  (sizeof(_Tp) * __CHAR_BIT__ - __glibcxx_signed(_Tp))
-
-#define __glibcxx_min(_Tp) \
-  (__glibcxx_si

[PATCH] IBM Z: Define vec_vfees instruction pattern

2020-11-12 Thread Stefan Schulze Frielinghaus via Gcc-patches
Bootstrapped and regtested on IBM Z.  Ok for master?

gcc/ChangeLog:

* config/s390/vector.md ("vec_vfees"): New insn pattern.
---
 gcc/config/s390/vector.md | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 31d323930b2..4333a2191ae 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1798,6 +1798,32 @@
   "vll\t%v0,%1,%2"
   [(set_attr "op_type" "VRS")])
 
+; vfeebs, vfeehs, vfeefs
+; vfeezbs, vfeezhs, vfeezfs
+(define_insn "vec_vfees"
+  [(set (match_operand:VI_HW_QHS 0 "register_operand" "=v")
+   (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v")
+  (match_operand:VI_HW_QHS 2 "register_operand" "v")
+  (match_operand:QI 3 "const_mask_operand" "C")]
+ UNSPEC_VEC_VFEE))
+   (set (reg:CCRAW CC_REGNUM)
+   (unspec:CCRAW [(match_dup 1)
+  (match_dup 2)
+  (match_dup 3)]
+ UNSPEC_VEC_VFEECC))]
+  "TARGET_VX"
+{
+  unsigned HOST_WIDE_INT flags = UINTVAL (operands[3]);
+
+  gcc_assert (!(flags & ~(VSTRING_FLAG_ZS | VSTRING_FLAG_CS)));
+  flags &= ~VSTRING_FLAG_CS;
+
+  if (flags == VSTRING_FLAG_ZS)
+return "vfeezs\t%v0,%v1,%v2";
+  return "vfees\t%v0,%v1,%v2";
+}
+  [(set_attr "op_type" "VRR")])
+
 ; vfenebs, vfenehs, vfenefs
 ; vfenezbs, vfenezhs, vfenezfs
 (define_insn "vec_vfenes"
-- 
2.28.0



[PATCH] IBM Z: Fix output template for "*vfees"

2020-11-12 Thread Stefan Schulze Frielinghaus via Gcc-patches
Bootstrapped and regtested on IBM Z.  Ok for master?

gcc/ChangeLog:

* config/s390/vx-builtins.md ("*vfees"): Fix output
  template.
---
 gcc/config/s390/vx-builtins.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 010db4d1115..0c2e7170223 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -1395,7 +1395,7 @@
 
   if (flags == VSTRING_FLAG_ZS)
 return "vfeezs\t%v0,%v1,%v2";
-  return "vfees\t%v0,%v1,%v2,%b3";
+  return "vfees\t%v0,%v1,%v2";
 }
   [(set_attr "op_type" "VRR")])
 
-- 
2.28.0



Re: Add support for copy specifier to fnspec

2020-11-12 Thread Jan Hubicka
Hi,
here is updated patch that replaces 'C' by '1'...'9' so we still have
place to specify size.
As discussed on IRC, this seems better alternative.

Bootstrapped/regtested x86_64-linux, OK?

Honza

gcc/ChangeLog:

2020-11-12  Jan Hubicka  

* attr-fnspec.h: Update topleve comment.
(attr_fnspec::arg_direct_p): Accept 1...9.
(attr_fnspec::arg_maybe_written_p): Reject 1...9.
(attr_fnspec::arg_copied_to_arg_p): New member function.
* builtins.c (builtin_fnspec): Update fnspec of block copy.
* tree-ssa-alias.c (attr_fnspec::verify): Update.

diff --git a/gcc/attr-fnspec.h b/gcc/attr-fnspec.h
index 28135328437..766414a2520 100644
--- a/gcc/attr-fnspec.h
+++ b/gcc/attr-fnspec.h
@@ -41,6 +41,9 @@
written and does not escape
  'w' or 'W' specifies that the memory pointed to by the parameter does not
escape
+ '1''9' specifies that the memory pointed to by the parameter is
+   copied to memory pointed to by different parameter
+   (as in memcpy).
  '.'   specifies that nothing is known.
The uppercase letter in addition specifies that the memory pointed to
by the parameter is not dereferenced.  For 'r' only read applies
@@ -51,8 +54,8 @@
  ' 'nothing is known
  't'   the size of value written/read corresponds to the size of
of the pointed-to type of the argument type
- '1'...'9'  the size of value written/read is given by the specified
-   argument
+ '1'...'9'  specifies the size of value written/read is given by the
+   specified argument
  */
 
 #ifndef ATTR_FNSPEC_H
@@ -122,7 +125,8 @@ public:
   {
 unsigned int idx = arg_idx (i);
 gcc_checking_assert (arg_specified_p (i));
-return str[idx] == 'R' || str[idx] == 'O' || str[idx] == 'W';
+return str[idx] == 'R' || str[idx] == 'O'
+  || str[idx] == 'W' || (str[idx] >= '1' && str[idx] <= '9');
   }
 
   /* True if argument is used.  */
@@ -161,6 +165,7 @@ public:
 unsigned int idx = arg_idx (i);
 gcc_checking_assert (arg_specified_p (i));
 return str[idx] != 'r' && str[idx] != 'R'
+  && (str[idx] < '1' || str[idx] > '9')
   && str[idx] != 'x' && str[idx] != 'X';
   }
 
@@ -190,6 +195,21 @@ public:
 return str[idx + 1] == 't';
   }
 
+  /* Return true if memory pointer to by argument is copied to a memory
+ pointed to by a different argument (as in memcpy).
+ In this case set ARG.  */
+  bool
+  arg_copied_to_arg_p (unsigned int i, unsigned int *arg)
+  {
+unsigned int idx = arg_idx (i);
+gcc_checking_assert (arg_specified_p (i));
+if (str[idx] < '1' || str[idx] > '9')
+  return false;
+*arg = str[idx] - '1';
+return true;
+  }
+
+
   /* True if the argument does not escape.  */
   bool
   arg_noescape_p (unsigned int i)
@@ -230,7 +250,7 @@ public:
 return str[1] != 'c' && str[1] != 'C';
   }
 
-  /* Return true if all memory written by the function 
+  /* Return true if all memory written by the function
  is specified by fnspec.  */
   bool
   global_memory_written_p ()
diff --git a/gcc/builtins.c b/gcc/builtins.c
index da25343beb1..4ec1766cffd 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -12939,16 +12939,16 @@ builtin_fnspec (tree callee)
 argument.  */
   case BUILT_IN_STRCAT:
   case BUILT_IN_STRCAT_CHK:
-   return "1cW R ";
+   return "1cW 1 ";
   case BUILT_IN_STRNCAT:
   case BUILT_IN_STRNCAT_CHK:
-   return "1cW R3";
+   return "1cW 13";
   case BUILT_IN_STRCPY:
   case BUILT_IN_STRCPY_CHK:
-   return "1cO R ";
+   return "1cO 1 ";
   case BUILT_IN_STPCPY:
   case BUILT_IN_STPCPY_CHK:
-   return ".cO R ";
+   return ".cO 1 ";
   case BUILT_IN_STRNCPY:
   case BUILT_IN_MEMCPY:
   case BUILT_IN_MEMMOVE:
@@ -12957,15 +12957,15 @@ builtin_fnspec (tree callee)
   case BUILT_IN_STRNCPY_CHK:
   case BUILT_IN_MEMCPY_CHK:
   case BUILT_IN_MEMMOVE_CHK:
-   return "1cO3R3";
+   return "1cO313";
   case BUILT_IN_MEMPCPY:
   case BUILT_IN_MEMPCPY_CHK:
-   return ".cO3R3";
+   return ".cO313";
   case BUILT_IN_STPNCPY:
   case BUILT_IN_STPNCPY_CHK:
-   return ".cO3R3";
+   return ".cO313";
   case BUILT_IN_BCOPY:
-   return ".cR3O3";
+   return ".c23O3";
   case BUILT_IN_BZERO:
return ".cO2";
   case BUILT_IN_MEMCMP:
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index e64011d04df..b1e8e5b5352 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -3797,6 +3797,8 @@ attr_fnspec::verify ()
   default:
err = true;
 }
+  if (err)
+internal_error ("invalid fn spec attribute \"%s\"", str);
 
   /* Now check all parameters.  */
   for (unsigned int i = 0; arg_specified_p (i); i++)
@@ -3813,21 +3815,28 @@ attr_fnspec::verify ()
  case 'w':
  case 'W':
  case '.

Re: [PATCH] ipa-cp: Work with time benefits and frequencies in sreals

2020-11-12 Thread Jan Hubicka
> Hi,.
> 
> this patch converts the variables that hold time benefits and
> frequencies in IPA-CP from plain integers to sreals, avoiding the need
> to cap them to avoid overflows and also fixing a potential underflows.
> 
> Size costs corresponding to individual constants are left as ints so
> that they do not take up too much space.  Care must be taken that
> adding it up does not overflow, especially in the case of
> prop_size_cost, because in cases of extremely long chains of lattice
> dependencies it can overflow (e.g. in testsuite/gcc.dg/ipa/pr50744.c).
> The overall size is already tracked in long ints.
> 
> Bootstrapped, LTO-bootstrapped and tested on x86_64-linux, OK for trunk?
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2020-11-11  Martin Jambor  
> 
>   * ipa-cp.c (class ipcp_value_base): Change the type of
>   local_time_benefit and prop_time_benefit to sreal.  Adjust the
>   constructor initializer.
>   (ipcp_lattice::print): Dump sreals.
>   (struct caller_statistics): Change the type of freq_sum to sreal.
>   (gather_caller_stats): Work with sreal freq_sum.
>   (incorporate_penalties): Work with sreal evaluation.
>   (good_cloning_opportunity_p): Adjusted for sreal sreal time_benefit
>   and freq_sum.  Bail out if size_cost is INT_MAX.
>   (perform_estimation_of_a_value): Work with sreal time_benefit.  Avoid
>   unnecessary capping.
>   (estimate_local_effects): Pass sreal time benefit to
>   good_cloning_opportunity_p without capping it.  Adjust dumping.
>   (safe_add): If there can be overflow, return INT_MAX.
>   (propagate_effects): Work with sreal times.
>   (get_info_about_necessary_edges): Work with sreal frequencies.
>   (decide_about_value): Likewise and with sreal time benefits.

OK, thanks!
It is bit odd that we work hard enough to overflow sizes, but I guess it
is just an extreme case.  Definitly capping there makes sense since we
do not want such large duplication.

Honza


Re: [PATCH] IBM Z: Fix output template for "*vfees"

2020-11-12 Thread Andreas Krebbel via Gcc-patches
On 12.11.20 13:25, Stefan Schulze Frielinghaus wrote:
> Bootstrapped and regtested on IBM Z.  Ok for master?
> 
> gcc/ChangeLog:
> 
>   * config/s390/vx-builtins.md ("*vfees"): Fix output
> template.
> ---
>  gcc/config/s390/vx-builtins.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
> index 010db4d1115..0c2e7170223 100644
> --- a/gcc/config/s390/vx-builtins.md
> +++ b/gcc/config/s390/vx-builtins.md
> @@ -1395,7 +1395,7 @@
>  
>if (flags == VSTRING_FLAG_ZS)
>  return "vfeezs\t%v0,%v1,%v2";
> -  return "vfees\t%v0,%v1,%v2,%b3";
> +  return "vfees\t%v0,%v1,%v2";
>  }
>[(set_attr "op_type" "VRR")])
>  
> 

Ok. Thanks!

Andreas


Re: [PATCH] IBM Z: Define vec_vfees instruction pattern

2020-11-12 Thread Andreas Krebbel via Gcc-patches
On 12.11.20 13:21, Stefan Schulze Frielinghaus wrote:
> Bootstrapped and regtested on IBM Z.  Ok for master?
> 
> gcc/ChangeLog:
> 
>   * config/s390/vector.md ("vec_vfees"): New insn pattern.
> ---
>  gcc/config/s390/vector.md | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index 31d323930b2..4333a2191ae 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -1798,6 +1798,32 @@
>"vll\t%v0,%1,%2"
>[(set_attr "op_type" "VRS")])
>  
> +; vfeebs, vfeehs, vfeefs
> +; vfeezbs, vfeezhs, vfeezfs
> +(define_insn "vec_vfees"
> +  [(set (match_operand:VI_HW_QHS 0 "register_operand" "=v")
> + (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v")
> +(match_operand:VI_HW_QHS 2 "register_operand" "v")
> +(match_operand:QI 3 "const_mask_operand" "C")]
> +   UNSPEC_VEC_VFEE))
> +   (set (reg:CCRAW CC_REGNUM)
> + (unspec:CCRAW [(match_dup 1)
> +(match_dup 2)
> +(match_dup 3)]
> +   UNSPEC_VEC_VFEECC))]
> +  "TARGET_VX"
> +{
> +  unsigned HOST_WIDE_INT flags = UINTVAL (operands[3]);
> +
> +  gcc_assert (!(flags & ~(VSTRING_FLAG_ZS | VSTRING_FLAG_CS)));
> +  flags &= ~VSTRING_FLAG_CS;
> +
> +  if (flags == VSTRING_FLAG_ZS)
> +return "vfeezs\t%v0,%v1,%v2";
> +  return "vfees\t%v0,%v1,%v2";
> +}
> +  [(set_attr "op_type" "VRR")])
> +
>  ; vfenebs, vfenehs, vfenefs
>  ; vfenezbs, vfenezhs, vfenezfs
>  (define_insn "vec_vfenes"
> 

Since this is mostly a copy of the pattern in vx-builtins.md I think we should 
remove the other
version then.

I also would prefer this to be committed together with the code making use of 
the expander. So far
this would be dead code - right?

Andreas


Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Richard Biener
On Thu, 12 Nov 2020, Jan Hubicka wrote:

> Hi,
> with ipa-icf we often run into problem that operand_equal_p does not
> match ADDR_EXPR that take address of fields from two different instances
> of same class (at ideantical offsets).  Similar problem can also happen
> for record types with LTO if they did not get tree merged.
> This patch makes fold-const to compare offsets rather then pinter
> equality of FIELD_DECLs. This is done in OEP_ADDRESS_OF mode only sinc
> it is not TBAA safe and the TBAA side should be correctly solved by my
> ICF patch.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> Honza
> 
>   * fold-const.c (operand_compare::operand_equal_p): When comparing 
> addresses
>   look info field offsets for COMPONENT_REFs.
>   (operand_compare::hash_operand): Likewise.
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index c47557daeba..a4e8cccb1b7 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -3312,9 +3312,41 @@ operand_compare::operand_equal_p (const_tree arg0, 
> const_tree arg1,
>   case COMPONENT_REF:
> /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
>may be NULL when we're called to compare MEM_EXPRs.  */
> -   if (!OP_SAME_WITH_NULL (0)
> -   || !OP_SAME (1))
> +   if (!OP_SAME_WITH_NULL (0))
>   return false;
> +   /* Most of time we only need to compare FIELD_DECLs for equality.
> +  However when determining address look into actual offsets.
> +  These may match for unions and unshared record types.  */
> +   if (!OP_SAME (1))
> + {
> +   if (flags & OEP_ADDRESS_OF)
> + {

actually if OP2 is not NULL for both you can just compare that (and that's
more correct then).

> +   tree field0 = TREE_OPERAND (arg0, 1);
> +   tree field1 = TREE_OPERAND (arg1, 1);
> +   tree type0 = DECL_CONTEXT (field0);
> +   tree type1 = DECL_CONTEXT (field1);
> +
> +   if (TREE_CODE (type0) == RECORD_TYPE
> +   && DECL_BIT_FIELD_REPRESENTATIVE (field0))
> + field0 = DECL_BIT_FIELD_REPRESENTATIVE (field0);
> +   if (TREE_CODE (type1) == RECORD_TYPE
> +   && DECL_BIT_FIELD_REPRESENTATIVE (field1))
> + field1 = DECL_BIT_FIELD_REPRESENTATIVE (field1);

Why does the representative matter?  For a 32bit bitfield if you'd
have two addresses at 8bit boundary but different you'd make them
equal this way.  Soo ...

> +   /* Assume that different FIELD_DECLs never overlap within a
> +  RECORD_TYPE.  */
> +   if (type0 == type1 && TREE_CODE (type0) == RECORD_TYPE)
> + return false;

this isn't really about "overlap", OEP_ADDRESS_OF is just about
the address (not it's extent).

> +   if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
> + DECL_FIELD_OFFSET (field1),
> + flags & ~OEP_ADDRESS_OF)
> +   || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
> +DECL_FIELD_BIT_OFFSET (field1),
> +flags & ~OEP_ADDRESS_OF))
> + return false;

So this should suffice (on the original fields).

> + }
> +   else
> + return false;
> + }
> flags &= ~OEP_ADDRESS_OF;
> return OP_SAME_WITH_NULL (2);
>  
> @@ -3787,9 +3819,28 @@ operand_compare::hash_operand (const_tree t, 
> inchash::hash &hstate,
> sflags = flags;
> break;
>  
> + case COMPONENT_REF:
> +   if (flags & OEP_ADDRESS_OF)
> + {
> +   tree field = TREE_OPERAND (t, 1);
> +   tree type = DECL_CONTEXT (field);
> +
> +   if (TREE_CODE (type) == RECORD_TYPE
> +   && DECL_BIT_FIELD_REPRESENTATIVE (field))
> + field = DECL_BIT_FIELD_REPRESENTATIVE (field);

see above.

> +   hash_operand (TREE_OPERAND (t, 0), hstate, flags);
> +   hash_operand (DECL_FIELD_OFFSET (field),
> + hstate, flags & ~OEP_ADDRESS_OF);
> +   hash_operand (DECL_FIELD_BIT_OFFSET (field),
> + hstate, flags & ~OEP_ADDRESS_OF);
> +   hash_operand (TREE_OPERAND (t, 2), hstate,
> + flags & ~OEP_ADDRESS_OF);

otherwise this looks ok.

> +   return;
> + }
> +   break;
>   case ARRAY_REF:
>   case ARRAY_RANGE_REF:
> - case COMPONENT_REF:
>   case BIT_FIELD_REF:
> sflags &= ~OEP_ADDRESS_OF;
> break;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jan Hubicka
> > * fold-const.c (operand_compare::operand_equal_p): When comparing 
> > addresses
> > look info field offsets for COMPONENT_REFs.
> > (operand_compare::hash_operand): Likewise.
> > diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> > index c47557daeba..a4e8cccb1b7 100644
> > --- a/gcc/fold-const.c
> > +++ b/gcc/fold-const.c
> > @@ -3312,9 +3312,41 @@ operand_compare::operand_equal_p (const_tree arg0, 
> > const_tree arg1,
> > case COMPONENT_REF:
> >   /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
> >  may be NULL when we're called to compare MEM_EXPRs.  */
> > - if (!OP_SAME_WITH_NULL (0)
> > - || !OP_SAME (1))
> > + if (!OP_SAME_WITH_NULL (0))
> > return false;
> > + /* Most of time we only need to compare FIELD_DECLs for equality.
> > +However when determining address look into actual offsets.
> > +These may match for unions and unshared record types.  */
> > + if (!OP_SAME (1))
> > +   {
> > + if (flags & OEP_ADDRESS_OF)
> > +   {
> 
> actually if OP2 is not NULL for both you can just compare that (and that's
> more correct then).
> 
> > + tree field0 = TREE_OPERAND (arg0, 1);
> > + tree field1 = TREE_OPERAND (arg1, 1);
> > + tree type0 = DECL_CONTEXT (field0);
> > + tree type1 = DECL_CONTEXT (field1);
> > +
> > + if (TREE_CODE (type0) == RECORD_TYPE
> > + && DECL_BIT_FIELD_REPRESENTATIVE (field0))
> > +   field0 = DECL_BIT_FIELD_REPRESENTATIVE (field0);
> > + if (TREE_CODE (type1) == RECORD_TYPE
> > + && DECL_BIT_FIELD_REPRESENTATIVE (field1))
> > +   field1 = DECL_BIT_FIELD_REPRESENTATIVE (field1);
> 
> Why does the representative matter?  For a 32bit bitfield if you'd
> have two addresses at 8bit boundary but different you'd make them
> equal this way.  Soo ...
> 
> > + /* Assume that different FIELD_DECLs never overlap within a
> > +RECORD_TYPE.  */
> > + if (type0 == type1 && TREE_CODE (type0) == RECORD_TYPE)
> > +   return false;
> 
> this isn't really about "overlap", OEP_ADDRESS_OF is just about
> the address (not it's extent).

We discussed this with Jakub, so I have already tested version dropping
both of these, but indeed I should check for OPERAND2.  Will do that
now.

Thanks!
Honza
> 
> > + if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
> > +   DECL_FIELD_OFFSET (field1),
> > +   flags & ~OEP_ADDRESS_OF)
> > + || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
> > +  DECL_FIELD_BIT_OFFSET (field1),
> > +  flags & ~OEP_ADDRESS_OF))
> > +   return false;
> 
> So this should suffice (on the original fields).
> 
> > +   }
> > + else
> > +   return false;
> > +   }
> >   flags &= ~OEP_ADDRESS_OF;
> >   return OP_SAME_WITH_NULL (2);
> >  
> > @@ -3787,9 +3819,28 @@ operand_compare::hash_operand (const_tree t, 
> > inchash::hash &hstate,
> >   sflags = flags;
> >   break;
> >  
> > +   case COMPONENT_REF:
> > + if (flags & OEP_ADDRESS_OF)
> > +   {
> > + tree field = TREE_OPERAND (t, 1);
> > + tree type = DECL_CONTEXT (field);
> > +
> > + if (TREE_CODE (type) == RECORD_TYPE
> > + && DECL_BIT_FIELD_REPRESENTATIVE (field))
> > +   field = DECL_BIT_FIELD_REPRESENTATIVE (field);
> 
> see above.
> 
> > + hash_operand (TREE_OPERAND (t, 0), hstate, flags);
> > + hash_operand (DECL_FIELD_OFFSET (field),
> > +   hstate, flags & ~OEP_ADDRESS_OF);
> > + hash_operand (DECL_FIELD_BIT_OFFSET (field),
> > +   hstate, flags & ~OEP_ADDRESS_OF);
> > + hash_operand (TREE_OPERAND (t, 2), hstate,
> > +   flags & ~OEP_ADDRESS_OF);
> 
> otherwise this looks ok.
> 
> > + return;
> > +   }
> > + break;
> > case ARRAY_REF:
> > case ARRAY_RANGE_REF:
> > -   case COMPONENT_REF:
> > case BIT_FIELD_REF:
> >   sflags &= ~OEP_ADDRESS_OF;
> >   break;
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imend


Re: Add support for copy specifier to fnspec

2020-11-12 Thread Richard Biener
On Thu, 12 Nov 2020, Jan Hubicka wrote:

> Hi,
> here is updated patch that replaces 'C' by '1'...'9' so we still have
> place to specify size.
> As discussed on IRC, this seems better alternative.
> 
> Bootstrapped/regtested x86_64-linux, OK?

OK.

Richard.

> Honza
> 
> gcc/ChangeLog:
> 
> 2020-11-12  Jan Hubicka  
> 
>   * attr-fnspec.h: Update topleve comment.
>   (attr_fnspec::arg_direct_p): Accept 1...9.
>   (attr_fnspec::arg_maybe_written_p): Reject 1...9.
>   (attr_fnspec::arg_copied_to_arg_p): New member function.
>   * builtins.c (builtin_fnspec): Update fnspec of block copy.
>   * tree-ssa-alias.c (attr_fnspec::verify): Update.
> 
> diff --git a/gcc/attr-fnspec.h b/gcc/attr-fnspec.h
> index 28135328437..766414a2520 100644
> --- a/gcc/attr-fnspec.h
> +++ b/gcc/attr-fnspec.h
> @@ -41,6 +41,9 @@
>   written and does not escape
>   'w' or 'W' specifies that the memory pointed to by the parameter does 
> not
>   escape
> + '1''9' specifies that the memory pointed to by the parameter is
> + copied to memory pointed to by different parameter
> + (as in memcpy).
>   '.' specifies that nothing is known.
> The uppercase letter in addition specifies that the memory pointed to
> by the parameter is not dereferenced.  For 'r' only read applies
> @@ -51,8 +54,8 @@
>   ' 'nothing is known
>   't' the size of value written/read corresponds to the size of
>   of the pointed-to type of the argument type
> - '1'...'9'  the size of value written/read is given by the specified
> - argument
> + '1'...'9'  specifies the size of value written/read is given by the
> + specified argument
>   */
>  
>  #ifndef ATTR_FNSPEC_H
> @@ -122,7 +125,8 @@ public:
>{
>  unsigned int idx = arg_idx (i);
>  gcc_checking_assert (arg_specified_p (i));
> -return str[idx] == 'R' || str[idx] == 'O' || str[idx] == 'W';
> +return str[idx] == 'R' || str[idx] == 'O'
> +|| str[idx] == 'W' || (str[idx] >= '1' && str[idx] <= '9');
>}
>  
>/* True if argument is used.  */
> @@ -161,6 +165,7 @@ public:
>  unsigned int idx = arg_idx (i);
>  gcc_checking_assert (arg_specified_p (i));
>  return str[idx] != 'r' && str[idx] != 'R'
> +&& (str[idx] < '1' || str[idx] > '9')
>  && str[idx] != 'x' && str[idx] != 'X';
>}
>  
> @@ -190,6 +195,21 @@ public:
>  return str[idx + 1] == 't';
>}
>  
> +  /* Return true if memory pointer to by argument is copied to a memory
> + pointed to by a different argument (as in memcpy).
> + In this case set ARG.  */
> +  bool
> +  arg_copied_to_arg_p (unsigned int i, unsigned int *arg)
> +  {
> +unsigned int idx = arg_idx (i);
> +gcc_checking_assert (arg_specified_p (i));
> +if (str[idx] < '1' || str[idx] > '9')
> +  return false;
> +*arg = str[idx] - '1';
> +return true;
> +  }
> +
> +
>/* True if the argument does not escape.  */
>bool
>arg_noescape_p (unsigned int i)
> @@ -230,7 +250,7 @@ public:
>  return str[1] != 'c' && str[1] != 'C';
>}
>  
> -  /* Return true if all memory written by the function 
> +  /* Return true if all memory written by the function
>   is specified by fnspec.  */
>bool
>global_memory_written_p ()
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index da25343beb1..4ec1766cffd 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -12939,16 +12939,16 @@ builtin_fnspec (tree callee)
>argument.  */
>case BUILT_IN_STRCAT:
>case BUILT_IN_STRCAT_CHK:
> - return "1cW R ";
> + return "1cW 1 ";
>case BUILT_IN_STRNCAT:
>case BUILT_IN_STRNCAT_CHK:
> - return "1cW R3";
> + return "1cW 13";
>case BUILT_IN_STRCPY:
>case BUILT_IN_STRCPY_CHK:
> - return "1cO R ";
> + return "1cO 1 ";
>case BUILT_IN_STPCPY:
>case BUILT_IN_STPCPY_CHK:
> - return ".cO R ";
> + return ".cO 1 ";
>case BUILT_IN_STRNCPY:
>case BUILT_IN_MEMCPY:
>case BUILT_IN_MEMMOVE:
> @@ -12957,15 +12957,15 @@ builtin_fnspec (tree callee)
>case BUILT_IN_STRNCPY_CHK:
>case BUILT_IN_MEMCPY_CHK:
>case BUILT_IN_MEMMOVE_CHK:
> - return "1cO3R3";
> + return "1cO313";
>case BUILT_IN_MEMPCPY:
>case BUILT_IN_MEMPCPY_CHK:
> - return ".cO3R3";
> + return ".cO313";
>case BUILT_IN_STPNCPY:
>case BUILT_IN_STPNCPY_CHK:
> - return ".cO3R3";
> + return ".cO313";
>case BUILT_IN_BCOPY:
> - return ".cR3O3";
> + return ".c23O3";
>case BUILT_IN_BZERO:
>   return ".cO2";
>case BUILT_IN_MEMCMP:
> diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> index e64011d04df..b1e8e5b5352 100644
> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -3797,6 +3797,8 @@ attr_fnspec::verify ()
>default:
>   err = true;
>  

Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jan Hubicka
Hi,
this is updated patch I am re-testing and plan to commit if it suceeds.

* fold-const.c (operand_compare::operand_equal_p): Compare
offsets of fields in component_refs when comparing addresses.
(operand_compare::hash_operand): Likewise.
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index c47557daeba..273ee25ceda 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3312,11 +3312,36 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
case COMPONENT_REF:
  /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
 may be NULL when we're called to compare MEM_EXPRs.  */
- if (!OP_SAME_WITH_NULL (0)
- || !OP_SAME (1))
+ if (!OP_SAME_WITH_NULL (0))
return false;
- flags &= ~OEP_ADDRESS_OF;
- return OP_SAME_WITH_NULL (2);
+ /* Most of time we only need to compare FIELD_DECLs for equality.
+However when determining address look into actual offsets.
+These may match for unions and unshared record types.  */
+ if (!OP_SAME (1))
+   {
+ if (flags & OEP_ADDRESS_OF)
+   {
+ if (TREE_OPERAND (arg0, 2)
+ || TREE_OPERAND (arg1, 2))
+   {
+ flags &= ~OEP_ADDRESS_OF;
+ return OP_SAME_WITH_NULL (2);
+   }
+ tree field0 = TREE_OPERAND (arg0, 1);
+ tree field1 = TREE_OPERAND (arg1, 1);
+
+ if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
+   DECL_FIELD_OFFSET (field1),
+   flags & ~OEP_ADDRESS_OF)
+ || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
+  DECL_FIELD_BIT_OFFSET (field1),
+  flags & ~OEP_ADDRESS_OF))
+   return false;
+   }
+ else
+   return false;
+   }
+ return true;
 
case BIT_FIELD_REF:
  if (!OP_SAME (0))
@@ -3787,9 +3812,26 @@ operand_compare::hash_operand (const_tree t, 
inchash::hash &hstate,
  sflags = flags;
  break;
 
+   case COMPONENT_REF:
+ if (sflags & OEP_ADDRESS_OF)
+   {
+ hash_operand (TREE_OPERAND (t, 0), hstate, flags);
+ if (TREE_OPERAND (t, 2))
+   hash_operand (TREE_OPERAND (t, 2), hstate,
+ flags & ~OEP_ADDRESS_OF);
+ else
+   {
+ tree field = TREE_OPERAND (t, 1);
+ hash_operand (DECL_FIELD_OFFSET (field),
+   hstate, flags & ~OEP_ADDRESS_OF);
+ hash_operand (DECL_FIELD_BIT_OFFSET (field),
+   hstate, flags & ~OEP_ADDRESS_OF);
+   }
+ return;
+   }
+ break;
case ARRAY_REF:
case ARRAY_RANGE_REF:
-   case COMPONENT_REF:
case BIT_FIELD_REF:
  sflags &= ~OEP_ADDRESS_OF;
  break;


Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-12 Thread Richard Biener via Gcc-patches
On Thu, Nov 12, 2020 at 10:23 AM Hongtao Liu  wrote:
>
> On Thu, Nov 12, 2020 at 5:15 PM Hongtao Liu  wrote:
> >
> > On Thu, Nov 12, 2020 at 5:12 PM Hongtao Liu  wrote:
> > >
> > > On Thu, Nov 12, 2020 at 4:21 PM Uros Bizjak  wrote:
> > > >
> > > > On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu  wrote:
> > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > PR target/97194
> > > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New 
> > > > > > > function.
> > > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New 
> > > > > > > Decl.
> > > > > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > > > > > * config/i386/sse.md (vec_set): Support both constant
> > > > > > > and variable index vec_set.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > > > >
> > > > > > +;; True for registers, or const_int_operand, used to vec_setm 
> > > > > > expander.
> > > > > > +(define_predicate "vec_setm_operand"
> > > > > > +  (ior (and (match_operand 0 "register_operand")
> > > > > > +(match_test "TARGET_AVX2"))
> > > > > > +   (match_code "const_int")))
> > > > > > +
> > > > > >  ;; True for registers, or 1 or -1.  Used to optimize double-word 
> > > > > > shifts.
> > > > > >  (define_predicate "reg_or_pm1_operand"
> > > > > >(ior (match_operand 0 "register_operand")
> > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > > index b153a87fb98..1798e5dea75 100644
> > > > > > --- a/gcc/config/i386/sse.md
> > > > > > +++ b/gcc/config/i386/sse.md
> > > > > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > > > > >  (define_expand "vec_set"
> > > > > >[(match_operand:V 0 "register_operand")
> > > > > > (match_operand: 1 "register_operand")
> > > > > > -   (match_operand 2 "const_int_operand")]
> > > > > > +   (match_operand 2 "vec_setm_operand")]
> > > > > >
> > > > > > You need to specify a mode, otherwise a register of any mode can 
> > > > > > pass here.
> > > > > >
> > > > > Yes, theoretically, we only accept integer types. But in 
> > > > > can_vec_set_var_idx_p
> > > > > cut
> > > > > ---
> > > > > bool
> > > > > can_vec_set_var_idx_p (machine_mode vec_mode)
> > > > > {
> > > > >   if (!VECTOR_MODE_P (vec_mode))
> > > > > return false;
> > > > >
> > > > >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> > > > >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> > > > >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> > > > >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > > >
> > > > >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> > > > >
> > > > >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, 
> > > > > reg1)
> > > > >  && insn_operand_matches (icode, 1, reg2)
> > > > >  && insn_operand_matches (icode, 2, reg3);
> > > > > }
> > > > > ---
> > > > >
> > > > > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > > > > fail insn_operand_matches (icode, 2, reg3)
> > > > > ---
> > > > > (gdb) p insn_operand_matches(icode,2,reg3)
> > > > > $5 = false
> > > > > (gdb)
> > > > > ---
> > > > >
> > > > > Maybe we need to change
> > > > >
> > > > > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > > >
> > > > > to
> > > > >
> > > > > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> > > > >
> > > > > cc Richard Biener, any thoughts?
> > > >
> > > > There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> > > > specify SImode for operand 2 in vec_setM pattern and allow register
> > > > operands. I wonder if and how they manage to generate the pattern.
> > > >
> > > > Uros.
> > >
> > > Variable index vec_set is enabled by r11-3486, about two months ago in
> > > [1]. But for the upper two targets, the codes are already there since
> > > GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
> > > those codes are for [1].
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> > Correct [1] 
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
> >
> > --
> > BR,
> > Hongtao
>
> in https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554592.html
>
> It says
>
> > >> +can_vec_set_var_idx_p (enum tree_code code, machine_mode vec_mode,
> > >> +  machine_mode value_mode, machine_mode idx_mode)
> > >
> > > toplevel comment

Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Richard Biener
On Thu, 12 Nov 2020, Jan Hubicka wrote:

> Hi,
> this is updated patch I am re-testing and plan to commit if it suceeds.
> 
>   * fold-const.c (operand_compare::operand_equal_p): Compare
>   offsets of fields in component_refs when comparing addresses.
>   (operand_compare::hash_operand): Likewise.
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index c47557daeba..273ee25ceda 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -3312,11 +3312,36 @@ operand_compare::operand_equal_p (const_tree arg0, 
> const_tree arg1,
>   case COMPONENT_REF:
> /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
>may be NULL when we're called to compare MEM_EXPRs.  */
> -   if (!OP_SAME_WITH_NULL (0)
> -   || !OP_SAME (1))
> +   if (!OP_SAME_WITH_NULL (0))
>   return false;
> -   flags &= ~OEP_ADDRESS_OF;
> -   return OP_SAME_WITH_NULL (2);
> +   /* Most of time we only need to compare FIELD_DECLs for equality.
> +  However when determining address look into actual offsets.
> +  These may match for unions and unshared record types.  */

looks like you can simplify by doing

  flags &= ~OEP_ADDRESS_OF;

here.  Neither the FIELD_DECL compare nor the offsets need it

> +   if (!OP_SAME (1))
> + {
> +   if (flags & OEP_ADDRESS_OF)
> + {
> +   if (TREE_OPERAND (arg0, 2)
> +   || TREE_OPERAND (arg1, 2))
> + {
> +   flags &= ~OEP_ADDRESS_OF;
> +   return OP_SAME_WITH_NULL (2);
> + }
> +   tree field0 = TREE_OPERAND (arg0, 1);
> +   tree field1 = TREE_OPERAND (arg1, 1);
> +
> +   if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
> + DECL_FIELD_OFFSET (field1),
> + flags & ~OEP_ADDRESS_OF)
> +   || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
> +DECL_FIELD_BIT_OFFSET (field1),
> +flags & ~OEP_ADDRESS_OF))
> + return false;
> + }
> +   else
> + return false;
> + }

You elided

  flags &= ~OEP_ADDRESS_OF;
- return OP_SAME_WITH_NULL (2);

that was here when OP_SAME (1), please re-instantiate.

> +   return true;
>  
>   case BIT_FIELD_REF:
> if (!OP_SAME (0))
> @@ -3787,9 +3812,26 @@ operand_compare::hash_operand (const_tree t, 
> inchash::hash &hstate,
> sflags = flags;
> break;
>  
> + case COMPONENT_REF:
> +   if (sflags & OEP_ADDRESS_OF)
> + {
> +   hash_operand (TREE_OPERAND (t, 0), hstate, flags);
> +   if (TREE_OPERAND (t, 2))
> + hash_operand (TREE_OPERAND (t, 2), hstate,
> +   flags & ~OEP_ADDRESS_OF);
> +   else
> + {
> +   tree field = TREE_OPERAND (t, 1);
> +   hash_operand (DECL_FIELD_OFFSET (field),
> + hstate, flags & ~OEP_ADDRESS_OF);
> +   hash_operand (DECL_FIELD_BIT_OFFSET (field),
> + hstate, flags & ~OEP_ADDRESS_OF);
> + }
> +   return;
> + }
> +   break;
>   case ARRAY_REF:
>   case ARRAY_RANGE_REF:
> - case COMPONENT_REF:
>   case BIT_FIELD_REF:
> sflags &= ~OEP_ADDRESS_OF;
> break;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH] pch: Specify reason of -Winvalid-pch warning [PR86674]

2020-11-12 Thread Jeff Law via Gcc-patches


On 3/9/20 2:55 AM, Nicholas Guriev wrote:
> gcc/c-family/ChangeLog:
>
>   PR pch/86674
>   * c-pch.c (c_common_valid_pch): Use cpp_warning with CPP_W_INVALID_PCH
>   reason to fix -Werror=invalid-pch and -Wno-error=invalid-pch switches.
> ---
>  gcc/c-family/ChangeLog |  6 ++
>  gcc/c-family/c-pch.c   | 40 +++-
>  libcpp/files.c |  2 +-
>  3 files changed, 22 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
> index 2e11b..1c83eeb0f 100644
> --- a/gcc/c-family/ChangeLog
> +++ b/gcc/c-family/ChangeLog
> @@ -1,3 +1,9 @@
> +2020-03-09  Nicholas Guriev 
> +
> + PR pch/86674
> + * c-pch.c (c_common_valid_pch): Use cpp_warning with CPP_W_INVALID_PCH
> + reason to fix -Werror=invalid-pch and -Wno-error=invalid-pch switches.

THanks.  I've added a ChangeLog entry for the libcpp/files.c change,
re-tested and pushed this to the trunk.


Jeff




[PATCH] More PRE compile-time optimizations

2020-11-12 Thread Richard Biener
This fixes a bug in bitmap_list_view which could end up with
a NULL head->current which makes followup searches fail.  Oops.

It also further optimizes the PRE DFS walk by removing useless
stuff and special-casing bitmaps with just one element for
EXECUTE_IF_AND_IN_BITMAP which makes a quite big difference.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-12  Richard Biener  

* bitmap.c (bitmap_list_view): Restore head->current.
* tree-ssa-pre.c (pre_expr_DFS): Elide expr_visited bitmap.
Special-case value expression bitmaps with one element.
(bitmap_find_leader): Likewise.
(sorted_array_from_bitmap_set): Elide expr_visited bitmap.
---
 gcc/bitmap.c   |  5 +
 gcc/tree-ssa-pre.c | 40 +++-
 2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/gcc/bitmap.c b/gcc/bitmap.c
index 810b80be1ba..c849b0d22f5 100644
--- a/gcc/bitmap.c
+++ b/gcc/bitmap.c
@@ -678,6 +678,11 @@ bitmap_list_view (bitmap head)
 }
 
   head->tree_form = false;
+  if (!head->current)
+{
+  head->current = head->first;
+  head->indx = head->current ? head->current->indx : 0;
+}
 }
 
 /* Convert bitmap HEAD from linked-list view to splay-tree view.
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 9db1b0258f7..e25cec7ffa1 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -806,15 +806,15 @@ bitmap_set_free (bitmap_set_t set)
 }
 
 static void
-pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap expr_visited,
- bitmap val_visited, vec &post);
+pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap val_visited,
+ vec &post);
 
 /* DFS walk leaders of VAL to their operands with leaders in SET, collecting
expressions in SET in postorder into POST.  */
 
 static void
-pre_expr_DFS (unsigned val, bitmap_set_t set, bitmap expr_visited,
- bitmap val_visited, vec &post)
+pre_expr_DFS (unsigned val, bitmap_set_t set, bitmap val_visited,
+ vec &post)
 {
   unsigned int i;
   bitmap_iterator bi;
@@ -822,21 +822,25 @@ pre_expr_DFS (unsigned val, bitmap_set_t set, bitmap 
expr_visited,
   /* Iterate over all leaders and DFS recurse.  Borrowed from
  bitmap_find_leader.  */
   bitmap exprset = value_expressions[val];
+  if (!exprset->first->next)
+{
+  EXECUTE_IF_SET_IN_BITMAP (exprset, 0, i, bi)
+   if (bitmap_bit_p (&set->expressions, i))
+ pre_expr_DFS (expression_for_id (i), set, val_visited, post);
+  return;
+}
+
   EXECUTE_IF_AND_IN_BITMAP (exprset, &set->expressions, 0, i, bi)
-pre_expr_DFS (expression_for_id (i),
- set, expr_visited, val_visited, post);
+pre_expr_DFS (expression_for_id (i), set, val_visited, post);
 }
 
 /* DFS walk EXPR to its operands with leaders in SET, collecting
expressions in SET in postorder into POST.  */
 
 static void
-pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap expr_visited,
- bitmap val_visited, vec &post)
+pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap val_visited,
+ vec &post)
 {
-  if (!bitmap_set_bit (expr_visited, get_expression_id (expr)))
-return;
-
   switch (expr->kind)
 {
 case NARY:
@@ -851,7 +855,7 @@ pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap 
expr_visited,
   recursed already.  Avoid the costly bitmap_find_leader.  */
if (bitmap_bit_p (&set->values, op_val_id)
&& bitmap_set_bit (val_visited, op_val_id))
- pre_expr_DFS (op_val_id, set, expr_visited, val_visited, post);
+ pre_expr_DFS (op_val_id, set, val_visited, post);
  }
break;
   }
@@ -873,8 +877,7 @@ pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap 
expr_visited,
unsigned op_val_id = VN_INFO (op[n])->value_id;
if (bitmap_bit_p (&set->values, op_val_id)
&& bitmap_set_bit (val_visited, op_val_id))
- pre_expr_DFS (op_val_id,
-   set, expr_visited, val_visited, post);
+ pre_expr_DFS (op_val_id, set, val_visited, post);
  }
  }
break;
@@ -896,13 +899,11 @@ sorted_array_from_bitmap_set (bitmap_set_t set)
   /* Pre-allocate enough space for the array.  */
   result.create (bitmap_count_bits (&set->expressions));
 
-  auto_bitmap expr_visited (&grand_bitmap_obstack);
   auto_bitmap val_visited (&grand_bitmap_obstack);
-  bitmap_tree_view (expr_visited);
   bitmap_tree_view (val_visited);
   FOR_EACH_VALUE_ID_IN_SET (set, i, bi)
 if (bitmap_set_bit (val_visited, i))
-  pre_expr_DFS (i, set, expr_visited, val_visited, result);
+  pre_expr_DFS (i, set, val_visited, result);
 
   return result;
 }
@@ -1883,6 +1884,11 @@ bitmap_find_leader (bitmap_set_t set, unsigned int val)
   bitmap_iterator bi;
   bitmap exprset = value_expressions[val];
 
+  if (!exprset->first->next)
+   EXECUTE_

Re: [committed] libstdc++: Fix __numeric_traits_integer<__int20> [PR 97798]

2020-11-12 Thread Jonathan Wakely via Gcc-patches

Here's a small tweak to __numeric_traits that I decided to do after
the previous patch.

Tested on powerpc64le-linux. Committed to trunk.

commit d21776ef90361e66401cd99c8ff0d98b46d3b0d6
Author: Jonathan Wakely 
Date:   Thu Nov 12 13:31:02 2020

libstdc++: Simplify __numeric_traits definition

This changes the __numeric_traits primary template to assume its
argument is an integer type. For the three floating point types that are
supported by __numeric_traits_floating an explicit specialization of
__numeric_traits chooses the right base class.

This improves the failure mode for using __numeric_traits with an
unsupported type. Previously it would use __numeric_traits_floating as
the base class, and give somewhat obscure errors for trying to access
the static data members. Now it will use __numeric_traits_integer which
has a static_assert to check for supported types.

As a side effect of this change there is no need to instantiate
__conditional_type to decide which base class to use.

libstdc++-v3/ChangeLog:

* include/ext/numeric_traits.h (__numeric_traits): Change
primary template to always derive from __numeric_traits_integer.
(__numeric_traits, __numeric_traits)
(__numeric_traits): Add explicit specializations.

diff --git a/libstdc++-v3/include/ext/numeric_traits.h b/libstdc++-v3/include/ext/numeric_traits.h
index c29f9f21d1aa..2cac7f1d1edc 100644
--- a/libstdc++-v3/include/ext/numeric_traits.h
+++ b/libstdc++-v3/include/ext/numeric_traits.h
@@ -176,19 +176,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 const int __numeric_traits_floating<_Value>::__max_exponent10;
 
-  template
-struct __numeric_traits
-: public __conditional_type<__is_integer_nonstrict<_Value>::__value,
-__numeric_traits_integer<_Value>,
-__numeric_traits_floating<_Value> >::__type
-{ };
-
-_GLIBCXX_END_NAMESPACE_VERSION
-} // namespace
-
 #undef __glibcxx_floating
 #undef __glibcxx_max_digits10
 #undef __glibcxx_digits10
 #undef __glibcxx_max_exponent10
 
+  template
+struct __numeric_traits
+: public __numeric_traits_integer<_Value>
+{ };
+
+  template<>
+struct __numeric_traits
+: public __numeric_traits_floating
+{ };
+
+  template<>
+struct __numeric_traits
+: public __numeric_traits_floating
+{ };
+
+  template<>
+struct __numeric_traits
+: public __numeric_traits_floating
+{ };
+
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace
+
 #endif


Re: [PATCH] Add a new pattern in 4-insn combine

2020-11-12 Thread Segher Boessenkool
On Wed, Nov 11, 2020 at 06:22:53PM -0600, Segher Boessenkool wrote:
> I'm running an all-arch comparison with this patch, just to see what it
> does, but [...]

Results: C0 is trunk, C1 with patch:

C0C1
   alpha   6422312   99.971%
 arc   3783838  100.000%
 arm  10168277  100.000%
   arm64  20077721 0
   armhf  14886534  100.000%
 c6x   2509915  100.000%
csky 0 0
   h8300   1229802  100.000%
i386  12040952 0
ia64  18555229  100.000%
m68k   3868729  100.000%
  microblaze   5885763  100.000%
mips   9158101  100.000%
  mips64   7402870  100.001%
   nds32   4833031  100.000%
   nios2   3917080  100.000%
openrisc   4571561  100.000%
  parisc   7725308  100.000%
parisc64 0 0
 powerpc  11004119  100.000%
   powerpc64  22618492  100.000%
 powerpc64le  19609678  100.000%
 riscv32   1639840  100.000%
 riscv64   7658668 0
s390  15345481 0
  sh 0 0
 shnommu   1694176  100.000%
   sparc   4744809  100.000%
 sparc64   7205254  100.000%
  x86_64  19870124 0
  xtensa   2658455  100.002%

0 means it did not build...  So some targets newly ICE (x86, riscv, z).

It surprisingly only helps alpha a bit, and all other changes are in the
wrong direction (but very slightly).


Segher


Re: [PATCH][RFC] Make mingw-w64 printf/scanf attribute alias to ms_printf/ms_scanf only for C89

2020-11-12 Thread Liu Hao via Gcc-patches
在 2020/11/12 18:18, Jonathan Yong 写道:
> libgomp build fails because of the false -Wformat error, even though:
> 1. Correct C99 inttypes.h macros are used.
> 2. __mingw_* C99 wrappers are used.
> 3. The printf attribute is used, but it was aliased to ms_printf
> 
> The attached patch makes mingw-w64 printf attribute equivalent to other 
> platforms on C99 or later.
> This allows libgomp to build again with -Werror on. This patch should not 
> affect the original
> mingw.org distribution in any way.
> 

According to the conversation on IRC, I personally consider this inappropriate. 
Although the `ll`
modifier is specified by C99 for `long long`, there are many more that don't 
conform to C99 without
UCRT:

1. The `z` modifier for `size_t` is unrecognized.
2. The `t` modifier for `ptrdiff_t` is unrecognized.
3. The `L` modifier for `long double` is accepted but ignored due to
   the fact that MSABI uses an 8-byte type.


> For C99 or later, the mingw-w64 headers already wrap printf/scanf properly, 
> and inttypes.h also
> gives the correct C99 specifiers, so it makes sense to treat the printf 
> attribute as C99 compliant.
> Under C89 mode, the headers would produce MS specific specifiers, so the 
> printf attribute under C89
> reverts to the old behavior of being aliased to ms_printf.
> 
> This might break other code that assumes differently however. I don't think 
> there is a solution to
> satisfy everyone, but at least this allows C99/C++11 compliant code to build 
> again with -Werror.
> Comments?

My humble opinion is that people should have gotten used to the `ll` specifier 
so I propose a
different patch that enables it unconditionally. As Jonathan Yong pointed out, 
GCC is impossible to
predict where the target executable will run. It may be reasonable to expect 
Vista+ instead of an
ancient one. Users who still code for XP- should probably handle such 
differentiation themselves. In
comparison, MSVC does not have such format checks at all.

I started bootstrapping GCC a few minutes ago. It's not gonna finish very soon 
so I send this patch
for your comments.



-- 
Best regards,
LH_Mouse
From 1d61adae0695e7067e35f36e607a754a7cf12796 Mon Sep 17 00:00:00 2001
From: Liu Hao 
Date: Thu, 12 Nov 2020 22:20:29 +0800
Subject: [PATCH] gcc: Add `ll` and `L` length modifiers for `ms_printf`

Previous code abuse `FMT_LEN_L` for the `I` modifier. As `L` is a valid
modifier for `f`, `e`, `g`, etc. and `I` has the same semantics as the
C99 `z` modifier, `FMT_LEN_z` is now used.

First, in the Microsoft ABI, type `long double` has the same layout as
type `double`, so `%Lg` behaves identically to `%g`. Users should pass
in `double`s instead as `long double`s, as GCC uses the 10-byte format.

Second, with a CRT that is recent enough (MSVCRT since Vista, MSVCR80,
UCRT, or mingw-w64 8.0), `printf`-family functions can handle the `ll`
length modifier correctly. This ability is assumed to be available
universally. A lot of libraries (such as libgomp) that use the
`format(printf, ...)` attribute used to suffer from warnings about
unknown format specifiers.

Reference: 
https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2008/tcxf1dw6(v=vs.90)
Reference: 
https://docs.microsoft.com/en-us/cpp/porting/visual-cpp-what-s-new-2003-through-2015#new-crt-features
Signed-off-by: Liu Hao 

gcc/:
* config/i386/msformat-c.c: Add more length modifiers
---
 gcc/config/i386/msformat-c.c | 45 ++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/gcc/config/i386/msformat-c.c b/gcc/config/i386/msformat-c.c
index 4ceec633a6e..1629b866976 100644
--- a/gcc/config/i386/msformat-c.c
+++ b/gcc/config/i386/msformat-c.c
@@ -32,10 +32,11 @@ along with GCC; see the file COPYING3.  If not see
 static format_length_info ms_printf_length_specs[] =
 {
   { "h", FMT_LEN_h, STD_C89, NULL, FMT_LEN_none, STD_C89, 0 },
-  { "l", FMT_LEN_l, STD_C89, NULL, FMT_LEN_none, STD_C89, 0 },
+  { "l", FMT_LEN_l, STD_C89, NULL, FMT_LEN_ll, STD_C9L, 0 },
+  { "L", FMT_LEN_L, STD_C89, NULL, FMT_LEN_none, STD_C89, 1 },
   { "I32", FMT_LEN_l, STD_EXT, NULL, FMT_LEN_none, STD_C89, 1 },
   { "I64", FMT_LEN_ll, STD_EXT, NULL, FMT_LEN_none, STD_C89, 1 },
-  { "I", FMT_LEN_L, STD_EXT, NULL, FMT_LEN_none, STD_C89, 1 },
+  { "I", FMT_LEN_z, STD_EXT, NULL, FMT_LEN_none, STD_C89, 1 },
   { NULL, FMT_LEN_none, STD_C89, NULL, FMT_LEN_none, STD_C89, 0 }
 };
 
@@ -90,33 +91,33 @@ static const format_flag_pair ms_strftime_flag_pairs[] =
 static const format_char_info ms_print_char_table[] =
 {
   /* C89 conversion specifiers.  */
-  { "di",  0, STD_C89, { T89_I,   BADLEN,  T89_S,   T89_L,   T9L_LL,  T99_SST, 
 BADLEN, BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "-wp0 +'",  "i",  NULL 
},
-  { "oxX", 0, STD_C89, { T89_UI,  BADLEN,  T89_US,  T89_UL,  T9L_ULL, T99_ST, 
BADLEN, BADLEN, BADLEN, BADLEN,  BADLEN,  BADLEN }, "-wp0#", "i",  NULL },
-  { "u",   0, STD_C89, { T89_UI,  BADLEN,  T89_US,  T89_UL, 

[PATCH 2/5] Refactor VRP threading code into vrp_jump_threader class.

2020-11-12 Thread Aldy Hernandez via Gcc-patches
Will push pending aarch64 tests.

gcc/ChangeLog:

* tree-vrp.c (identify_jump_threads): Refactor to..
(vrp_jump_threader::vrp_jump_threader): ...here
(vrp_jump_threader::~vrp_jump_threader): ...and here.
(vrp_jump_threader::after_dom_children): Rename vr_values to
m_vr_values.
(execute_vrp): Use vrp_jump_threader.
---
 gcc/tree-vrp.c | 144 -
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index d3816ab569e..6b77c357a8f 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -4152,32 +4152,87 @@ vrp_prop::finalize ()
 }
 }
 
+/* Blocks which have more than one predecessor and more than
+   one successor present jump threading opportunities, i.e.,
+   when the block is reached from a specific predecessor, we
+   may be able to determine which of the outgoing edges will
+   be traversed.  When this optimization applies, we are able
+   to avoid conditionals at runtime and we may expose secondary
+   optimization opportunities.
+
+   This class is effectively a driver for the generic jump
+   threading code.  It basically just presents the generic code
+   with edges that may be suitable for jump threading.
+
+   Unlike DOM, we do not iterate VRP if jump threading was successful.
+   While iterating may expose new opportunities for VRP, it is expected
+   those opportunities would be very limited and the compile time cost
+   to expose those opportunities would be significant.
+
+   As jump threading opportunities are discovered, they are registered
+   for later realization.  */
+
 class vrp_jump_threader : public dom_walker
 {
 public:
-  vrp_jump_threader (cdi_direction direction,
-class const_and_copies *const_and_copies,
-class avail_exprs_stack *avail_exprs_stack)
-: dom_walker (direction, REACHABLE_BLOCKS),
-  m_const_and_copies (const_and_copies),
-  m_avail_exprs_stack (avail_exprs_stack),
-  m_dummy_cond (NULL) {}
-
-  virtual edge before_dom_children (basic_block);
-  virtual void after_dom_children (basic_block);
+  vrp_jump_threader (struct function *, vr_values *);
+  ~vrp_jump_threader ();
 
-  class vr_values *vr_values;
+  void thread_jumps ()
+  {
+walk (m_fun->cfg->x_entry_block_ptr);
+  }
 
 private:
   static tree simplify_stmt (gimple *stmt, gimple *within_stmt,
 avail_exprs_stack *, basic_block);
+  virtual edge before_dom_children (basic_block);
+  virtual void after_dom_children (basic_block);
 
-  class const_and_copies *m_const_and_copies;
-  class avail_exprs_stack *m_avail_exprs_stack;
-
+  function *m_fun;
+  vr_values *m_vr_values;
+  const_and_copies *m_const_and_copies;
+  avail_exprs_stack *m_avail_exprs_stack;
+  hash_table *m_avail_exprs;
   gcond *m_dummy_cond;
 };
 
+vrp_jump_threader::vrp_jump_threader (struct function *fun, vr_values *v)
+  : dom_walker (CDI_DOMINATORS, REACHABLE_BLOCKS)
+{
+  /* Ugh.  When substituting values earlier in this pass we can wipe
+ the dominance information.  So rebuild the dominator information
+ as we need it within the jump threading code.  */
+  calculate_dominance_info (CDI_DOMINATORS);
+
+  /* We do not allow VRP information to be used for jump threading
+ across a back edge in the CFG.  Otherwise it becomes too
+ difficult to avoid eliminating loop exit tests.  Of course
+ EDGE_DFS_BACK is not accurate at this time so we have to
+ recompute it.  */
+  mark_dfs_back_edges ();
+
+  /* Allocate our unwinder stack to unwind any temporary equivalences
+ that might be recorded.  */
+  m_const_and_copies = new const_and_copies ();
+
+  m_dummy_cond = NULL;
+  m_fun = fun;
+  m_vr_values = v;
+  m_avail_exprs = new hash_table (1024);
+  m_avail_exprs_stack = new avail_exprs_stack (m_avail_exprs);
+}
+
+vrp_jump_threader::~vrp_jump_threader ()
+{
+  /* We do not actually update the CFG or SSA graphs at this point as
+ ASSERT_EXPRs are still in the IL and cfg cleanup code does not
+ yet handle ASSERT_EXPRs gracefully.  */
+  delete m_const_and_copies;
+  delete m_avail_exprs;
+  delete m_avail_exprs_stack;
+}
+
 /* Called before processing dominator children of BB.  We want to look
at ASSERT_EXPRs and record information from them in the appropriate
tables.
@@ -4295,7 +4350,7 @@ vrp_jump_threader::after_dom_children (basic_block bb)
  integer_zero_node, integer_zero_node,
  NULL, NULL);
 
-  x_vr_values = vr_values;
+  x_vr_values = m_vr_values;
   thread_outgoing_edges (bb, m_dummy_cond, m_const_and_copies,
 m_avail_exprs_stack, NULL,
 simplify_stmt);
@@ -4305,62 +4360,6 @@ vrp_jump_threader::after_dom_children (basic_block bb)
   m_const_and_copies->pop_to_marker ();
 }
 
-/* Blocks which have more than one predecessor and more than
-   one successor pre

[PATCH 4/5] Move vr_values out of vrp_prop into execute_vrp so it can be shared.

2020-11-12 Thread Aldy Hernandez via Gcc-patches
vr_values is being shared among the propagator and the folder and
passed around.  I've pulled it out from the propagator so it can be
passed around to each, instead of being publicly accessible from the
propagator.

Will push pending aarch64 tests.

gcc/ChangeLog:

* tree-vrp.c (class vrp_prop): Rename vr_values to m_vr_values.
(vrp_prop::vrp_prop): New.
(vrp_prop::initialize): Rename vr_values to m_vr_values.
(vrp_prop::visit_stmt): Same.
(vrp_prop::visit_phi): Same.
(vrp_prop::finalize): Same.
(execute_vrp): Instantiate vrp_vr_values and pass it to folder
and propagator.
---
 gcc/tree-vrp.c | 53 +++---
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 15267e3d878..81bbaefd642 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3817,15 +3817,19 @@ vrp_asserts::remove_range_assertions ()
 class vrp_prop : public ssa_propagation_engine
 {
 public:
-  enum ssa_prop_result visit_stmt (gimple *, edge *, tree *) FINAL OVERRIDE;
-  enum ssa_prop_result visit_phi (gphi *) FINAL OVERRIDE;
-
-  struct function *fun;
+  vrp_prop (vr_values *v)
+: ssa_propagation_engine (),
+  m_vr_values (v) { }
 
   void initialize (struct function *);
   void finalize ();
 
-  class vr_values vr_values;
+  enum ssa_prop_result visit_stmt (gimple *, edge *, tree *) FINAL OVERRIDE;
+  enum ssa_prop_result visit_phi (gphi *) FINAL OVERRIDE;
+
+private:
+  struct function *fun;
+  vr_values *m_vr_values;
 };
 
 /* Initialization required by ssa_propagate engine.  */
@@ -3845,7 +3849,7 @@ vrp_prop::initialize (struct function *fn)
  if (!stmt_interesting_for_vrp (phi))
{
  tree lhs = PHI_RESULT (phi);
- vr_values.set_def_to_varying (lhs);
+ m_vr_values->set_def_to_varying (lhs);
  prop_set_simulate_again (phi, false);
}
  else
@@ -3864,7 +3868,7 @@ vrp_prop::initialize (struct function *fn)
prop_set_simulate_again (stmt, true);
  else if (!stmt_interesting_for_vrp (stmt))
{
- vr_values.set_defs_to_varying (stmt);
+ m_vr_values->set_defs_to_varying (stmt);
  prop_set_simulate_again (stmt, false);
}
  else
@@ -3887,11 +3891,11 @@ vrp_prop::visit_stmt (gimple *stmt, edge *taken_edge_p, 
tree *output_p)
 {
   tree lhs = gimple_get_lhs (stmt);
   value_range_equiv vr;
-  vr_values.extract_range_from_stmt (stmt, taken_edge_p, output_p, &vr);
+  m_vr_values->extract_range_from_stmt (stmt, taken_edge_p, output_p, &vr);
 
   if (*output_p)
 {
-  if (vr_values.update_value_range (*output_p, &vr))
+  if (m_vr_values->update_value_range (*output_p, &vr))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -3926,7 +3930,7 @@ vrp_prop::visit_stmt (gimple *stmt, edge *taken_edge_p, 
tree *output_p)
use_operand_p use_p;
enum ssa_prop_result res = SSA_PROP_VARYING;
 
-   vr_values.set_def_to_varying (lhs);
+   m_vr_values->set_def_to_varying (lhs);
 
FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
  {
@@ -3956,9 +3960,9 @@ vrp_prop::visit_stmt (gimple *stmt, edge *taken_edge_p, 
tree *output_p)
   {REAL,IMAG}PART_EXPR uses at all,
   return SSA_PROP_VARYING.  */
value_range_equiv new_vr;
-   vr_values.extract_range_basic (&new_vr, use_stmt);
+   m_vr_values->extract_range_basic (&new_vr, use_stmt);
const value_range_equiv *old_vr
- = vr_values.get_value_range (use_lhs);
+ = m_vr_values->get_value_range (use_lhs);
if (!old_vr->equal_p (new_vr, /*ignore_equivs=*/false))
  res = SSA_PROP_INTERESTING;
else
@@ -3980,7 +3984,7 @@ vrp_prop::visit_stmt (gimple *stmt, edge *taken_edge_p, 
tree *output_p)
 
   /* All other statements produce nothing of interest for VRP, so mark
  their outputs varying and prevent further simulation.  */
-  vr_values.set_defs_to_varying (stmt);
+  m_vr_values->set_defs_to_varying (stmt);
 
   return (*taken_edge_p) ? SSA_PROP_INTERESTING : SSA_PROP_VARYING;
 }
@@ -3994,8 +3998,8 @@ vrp_prop::visit_phi (gphi *phi)
 {
   tree lhs = PHI_RESULT (phi);
   value_range_equiv vr_result;
-  vr_values.extract_range_from_phi_node (phi, &vr_result);
-  if (vr_values.update_value_range (lhs, &vr_result))
+  m_vr_values->extract_range_from_phi_node (phi, &vr_result);
+  if (m_vr_values->update_value_range (lhs, &vr_result))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -4024,12 +4028,12 @@ vrp_prop::finalize ()
   size_t i;
 
   /* We have completed propagating through the lattice.  */
-  vr_values.set_lattice_propagation_complete ();
+  m_vr_values->set_lattice_propagation_complete ();
 
   if (dump_file)
 {

[PATCH 3/5] Move vrp_prop before vrp_folder.

2020-11-12 Thread Aldy Hernandez via Gcc-patches
Will push pending aarch64 tests.

gcc/ChangeLog:

* tree-vrp.c (class vrp_prop): Move entire class...
(class vrp_folder): ...before here.
---
 gcc/tree-vrp.c | 200 -
 1 file changed, 100 insertions(+), 100 deletions(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 6b77c357a8f..15267e3d878 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3814,106 +3814,6 @@ vrp_asserts::remove_range_assertions ()
   }
 }
 
-class vrp_folder : public substitute_and_fold_engine
-{
- public:
-  vrp_folder (vr_values *v)
-: substitute_and_fold_engine (/* Fold all stmts.  */ true),
-  m_vr_values (v), simplifier (v)
-{  }
-  bool fold_stmt (gimple_stmt_iterator *) FINAL OVERRIDE;
-
-  tree value_of_expr (tree name, gimple *stmt) OVERRIDE
-{
-  return m_vr_values->value_of_expr (name, stmt);
-}
-  class vr_values *m_vr_values;
-
-private:
-  bool fold_predicate_in (gimple_stmt_iterator *);
-  /* Delegators.  */
-  tree vrp_evaluate_conditional (tree_code code, tree op0,
-tree op1, gimple *stmt)
-{ return simplifier.vrp_evaluate_conditional (code, op0, op1, stmt); }
-  bool simplify_stmt_using_ranges (gimple_stmt_iterator *gsi)
-{ return simplifier.simplify (gsi); }
-
-  simplify_using_ranges simplifier;
-};
-
-/* If the statement pointed by SI has a predicate whose value can be
-   computed using the value range information computed by VRP, compute
-   its value and return true.  Otherwise, return false.  */
-
-bool
-vrp_folder::fold_predicate_in (gimple_stmt_iterator *si)
-{
-  bool assignment_p = false;
-  tree val;
-  gimple *stmt = gsi_stmt (*si);
-
-  if (is_gimple_assign (stmt)
-  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison)
-{
-  assignment_p = true;
-  val = vrp_evaluate_conditional (gimple_assign_rhs_code (stmt),
- gimple_assign_rhs1 (stmt),
- gimple_assign_rhs2 (stmt),
- stmt);
-}
-  else if (gcond *cond_stmt = dyn_cast  (stmt))
-val = vrp_evaluate_conditional (gimple_cond_code (cond_stmt),
-   gimple_cond_lhs (cond_stmt),
-   gimple_cond_rhs (cond_stmt),
-   stmt);
-  else
-return false;
-
-  if (val)
-{
-  if (assignment_p)
-val = fold_convert (gimple_expr_type (stmt), val);
-
-  if (dump_file)
-   {
- fprintf (dump_file, "Folding predicate ");
- print_gimple_expr (dump_file, stmt, 0);
- fprintf (dump_file, " to ");
- print_generic_expr (dump_file, val);
- fprintf (dump_file, "\n");
-   }
-
-  if (is_gimple_assign (stmt))
-   gimple_assign_set_rhs_from_tree (si, val);
-  else
-   {
- gcc_assert (gimple_code (stmt) == GIMPLE_COND);
- gcond *cond_stmt = as_a  (stmt);
- if (integer_zerop (val))
-   gimple_cond_make_false (cond_stmt);
- else if (integer_onep (val))
-   gimple_cond_make_true (cond_stmt);
- else
-   gcc_unreachable ();
-   }
-
-  return true;
-}
-
-  return false;
-}
-
-/* Callback for substitute_and_fold folding the stmt at *SI.  */
-
-bool
-vrp_folder::fold_stmt (gimple_stmt_iterator *si)
-{
-  if (fold_predicate_in (si))
-return true;
-
-  return simplify_stmt_using_ranges (si);
-}
-
 class vrp_prop : public ssa_propagation_engine
 {
 public:
@@ -4152,6 +4052,106 @@ vrp_prop::finalize ()
 }
 }
 
+class vrp_folder : public substitute_and_fold_engine
+{
+ public:
+  vrp_folder (vr_values *v)
+: substitute_and_fold_engine (/* Fold all stmts.  */ true),
+  m_vr_values (v), simplifier (v)
+{  }
+  bool fold_stmt (gimple_stmt_iterator *) FINAL OVERRIDE;
+
+  tree value_of_expr (tree name, gimple *stmt) OVERRIDE
+{
+  return m_vr_values->value_of_expr (name, stmt);
+}
+  class vr_values *m_vr_values;
+
+private:
+  bool fold_predicate_in (gimple_stmt_iterator *);
+  /* Delegators.  */
+  tree vrp_evaluate_conditional (tree_code code, tree op0,
+tree op1, gimple *stmt)
+{ return simplifier.vrp_evaluate_conditional (code, op0, op1, stmt); }
+  bool simplify_stmt_using_ranges (gimple_stmt_iterator *gsi)
+{ return simplifier.simplify (gsi); }
+
+  simplify_using_ranges simplifier;
+};
+
+/* If the statement pointed by SI has a predicate whose value can be
+   computed using the value range information computed by VRP, compute
+   its value and return true.  Otherwise, return false.  */
+
+bool
+vrp_folder::fold_predicate_in (gimple_stmt_iterator *si)
+{
+  bool assignment_p = false;
+  tree val;
+  gimple *stmt = gsi_stmt (*si);
+
+  if (is_gimple_assign (stmt)
+  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison)
+{
+  assignment_p = true;
+  val = vrp_evaluate

[PATCH 5/5] Inline delegators in vrp_folder.

2020-11-12 Thread Aldy Hernandez via Gcc-patches
Will push pending aarch64 tests.

gcc/ChangeLog:

* tree-vrp.c (class vrp_folder): Make visit_stmt, visit_phi,
and m_vr_values private.
(vrp_folder::vrp_evaluate_conditional): Remove.
(vrp_folder::vrp_simplify_stmt_using_ranges): Remove.
(vrp_folder::fold_predicate_in): Inline
vrp_evaluate_conditional and vrp_simplify_stmt_using_ranges.
(vrp_folder::fold_stmt): Same.
---
 gcc/tree-vrp.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 81bbaefd642..54ce017e8b2 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3824,10 +3824,10 @@ public:
   void initialize (struct function *);
   void finalize ();
 
+private:
   enum ssa_prop_result visit_stmt (gimple *, edge *, tree *) FINAL OVERRIDE;
   enum ssa_prop_result visit_phi (gphi *) FINAL OVERRIDE;
 
-private:
   struct function *fun;
   vr_values *m_vr_values;
 };
@@ -4063,23 +4063,16 @@ class vrp_folder : public substitute_and_fold_engine
 : substitute_and_fold_engine (/* Fold all stmts.  */ true),
   m_vr_values (v), simplifier (v)
 {  }
-  bool fold_stmt (gimple_stmt_iterator *) FINAL OVERRIDE;
 
+private:
   tree value_of_expr (tree name, gimple *stmt) OVERRIDE
 {
   return m_vr_values->value_of_expr (name, stmt);
 }
-  class vr_values *m_vr_values;
-
-private:
+  bool fold_stmt (gimple_stmt_iterator *) FINAL OVERRIDE;
   bool fold_predicate_in (gimple_stmt_iterator *);
-  /* Delegators.  */
-  tree vrp_evaluate_conditional (tree_code code, tree op0,
-tree op1, gimple *stmt)
-{ return simplifier.vrp_evaluate_conditional (code, op0, op1, stmt); }
-  bool simplify_stmt_using_ranges (gimple_stmt_iterator *gsi)
-{ return simplifier.simplify (gsi); }
 
+  vr_values *m_vr_values;
   simplify_using_ranges simplifier;
 };
 
@@ -4098,16 +4091,16 @@ vrp_folder::fold_predicate_in (gimple_stmt_iterator *si)
   && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison)
 {
   assignment_p = true;
-  val = vrp_evaluate_conditional (gimple_assign_rhs_code (stmt),
- gimple_assign_rhs1 (stmt),
- gimple_assign_rhs2 (stmt),
- stmt);
+  val = simplifier.vrp_evaluate_conditional (gimple_assign_rhs_code (stmt),
+gimple_assign_rhs1 (stmt),
+gimple_assign_rhs2 (stmt),
+stmt);
 }
   else if (gcond *cond_stmt = dyn_cast  (stmt))
-val = vrp_evaluate_conditional (gimple_cond_code (cond_stmt),
-   gimple_cond_lhs (cond_stmt),
-   gimple_cond_rhs (cond_stmt),
-   stmt);
+val = simplifier.vrp_evaluate_conditional (gimple_cond_code (cond_stmt),
+  gimple_cond_lhs (cond_stmt),
+  gimple_cond_rhs (cond_stmt),
+  stmt);
   else
 return false;
 
@@ -4153,7 +4146,7 @@ vrp_folder::fold_stmt (gimple_stmt_iterator *si)
   if (fold_predicate_in (si))
 return true;
 
-  return simplify_stmt_using_ranges (si);
+  return simplifier.simplify (si);
 }
 
 /* Blocks which have more than one predecessor and more than
-- 
2.26.2



[PATCH 1/5] Group tree-vrp.c by functionality.

2020-11-12 Thread Aldy Hernandez via Gcc-patches
Earlier in this cycle there was some work by Giuliano Belinassi and
myself to refactor tree-vrp.c.  A lot of functions and globals were
moved into independent classes, but the haphazard layout remained.
Assertion methods were indispersed with the propagation code, and with
the jump threading code, etc etc.

This series of patches moves things around so that common
functionality is geographically close.  There is no change in
behavior.

I know this is all slated to go in the next release, but finding
things in the current code base, even if just to compare with the
ranger, is difficult.

Tested on x86-64 Linux.  Aarch64 tests are still going.

Since I keep getting bit by aarch64 regressions, I'll push when the
entire patchset finishes tests on aarch64.

gcc/ChangeLog:

* tree-vrp.c (struct assert_locus): Move.
(class vrp_insert): Rename to vrp_asserts.
(vrp_insert::build_assert_expr_for): Move to vrp_asserts.
(fp_predicate): Same.
(vrp_insert::dump): Same.
(vrp_insert::register_new_assert_for): Same.
(extract_code_and_val_from_cond_with_ops): Move.
(vrp_insert::finish_register_edge_assert_for): Move to vrp_asserts.
(maybe_set_nonzero_bits): Move.
(vrp_insert::find_conditional_asserts): Move to vrp_asserts.
(stmt_interesting_for_vrp): Move.
(struct case_info): Move.
(compare_case_labels): Move.
(lhs_of_dominating_assert): Move.
(find_case_label_index): Move.
(find_case_label_range): Move.
(class vrp_asserts): New.
(vrp_asserts::build_assert_expr_for): Rename from vrp_insert.
(vrp_asserts::dump): Same.
(vrp_asserts::register_new_assert_for): Same.
(vrp_asserts::finish_register_edge_assert_for): Same.
(vrp_asserts::find_conditional_asserts): Same.
(vrp_asserts::compare_case_labels): Same.
(vrp_asserts::find_switch_asserts): Same.
(vrp_asserts::find_assert_locations_in_bb): Same.
(vrp_asserts::find_assert_locations): Same.
(vrp_asserts::process_assert_insertions_for): Same.
(vrp_asserts::compare_assert_loc): Same.
(vrp_asserts::process_assert_insertions): Same.
(vrp_asserts::insert_range_assertions): Same.
(vrp_asserts::all_imm_uses_in_stmt_or_feed_cond): Same.
(vrp_asserts::remove_range_assertions): Same.
(class vrp_prop): Move.
(all_imm_uses_in_stmt_or_feed_cond): Move.
(vrp_prop::vrp_initialize): Move.
(class vrp_folder): Move.
(vrp_folder::fold_predicate_in): Move.
(vrp_folder::fold_stmt): Move.
(vrp_prop::initialize): Move.
(vrp_prop::visit_stmt): Move.
(enum ssa_prop_result): Move.
(vrp_prop::visit_phi): Move.
(vrp_prop::finalize): Move.
(class vrp_dom_walker): Rename to...
(class vrp_jump_threader): ...this.
(vrp_jump_threader::before_dom_children): Rename from
vrp_dom_walker.
(simplify_stmt_for_jump_threading): Rename to...
(vrp_jump_threader::simplify_stmt): ...here.
(vrp_jump_threader::after_dom_children): Same.
(identify_jump_threads): Move.
(vrp_prop::vrp_finalize): Move array bounds setup code to...
(execute_vrp): ...here.
---
 gcc/tree-vrp.c | 2127 
 1 file changed, 1057 insertions(+), 1070 deletions(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e00c034fee3..d3816ab569e 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -161,153 +161,6 @@ live_names::live_on_block_p (tree name, basic_block bb)
  && bitmap_bit_p (live[bb->index], SSA_NAME_VERSION (name)));
 }
 
-
-/* Location information for ASSERT_EXPRs.  Each instance of this
-   structure describes an ASSERT_EXPR for an SSA name.  Since a single
-   SSA name may have more than one assertion associated with it, these
-   locations are kept in a linked list attached to the corresponding
-   SSA name.  */
-struct assert_locus
-{
-  /* Basic block where the assertion would be inserted.  */
-  basic_block bb;
-
-  /* Some assertions need to be inserted on an edge (e.g., assertions
- generated by COND_EXPRs).  In those cases, BB will be NULL.  */
-  edge e;
-
-  /* Pointer to the statement that generated this assertion.  */
-  gimple_stmt_iterator si;
-
-  /* Predicate code for the ASSERT_EXPR.  Must be COMPARISON_CLASS_P.  */
-  enum tree_code comp_code;
-
-  /* Value being compared against.  */
-  tree val;
-
-  /* Expression to compare.  */
-  tree expr;
-
-  /* Next node in the linked list.  */
-  assert_locus *next;
-};
-
-class vrp_insert
-{
-public:
-  vrp_insert (struct function *fn) : fun (fn) { }
-
-  /* Traverse the flowgraph looking for conditional jumps to insert range
- expressions.  These range expressions are meant to provide information
- to optimizations that need to reason in terms of value ranges.  They
- will not be expanded into RTL.

Re: Compare field offsets in fold_const when checking addresses

2020-11-12 Thread Jan Hubicka
> On Thu, 12 Nov 2020, Jan Hubicka wrote:
> 
> > Hi,
> > this is updated patch I am re-testing and plan to commit if it suceeds.
> > 
> > * fold-const.c (operand_compare::operand_equal_p): Compare
> > offsets of fields in component_refs when comparing addresses.
> > (operand_compare::hash_operand): Likewise.
> > diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> > index c47557daeba..273ee25ceda 100644
> > --- a/gcc/fold-const.c
> > +++ b/gcc/fold-const.c
> > @@ -3312,11 +3312,36 @@ operand_compare::operand_equal_p (const_tree arg0, 
> > const_tree arg1,
> > case COMPONENT_REF:
> >   /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
> >  may be NULL when we're called to compare MEM_EXPRs.  */
> > - if (!OP_SAME_WITH_NULL (0)
> > - || !OP_SAME (1))
> > + if (!OP_SAME_WITH_NULL (0))
> > return false;
> > - flags &= ~OEP_ADDRESS_OF;
> > - return OP_SAME_WITH_NULL (2);
> > + /* Most of time we only need to compare FIELD_DECLs for equality.
> > +However when determining address look into actual offsets.
> > +These may match for unions and unshared record types.  */
> 
> looks like you can simplify by doing
> 
>   flags &= ~OEP_ADDRESS_OF;
> 
> here.  Neither the FIELD_DECL compare nor the offsets need it

Yep
> 
> You elided
> 
>   flags &= ~OEP_ADDRESS_OF;
> - return OP_SAME_WITH_NULL (2);
> 
> that was here when OP_SAME (1), please re-instantiate.
Sorry for that, that was not very careful.
Here is updated patch I re-tested x86_64-linux.

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index c47557daeba..ddf18f27cb7 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3312,10 +3312,32 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
case COMPONENT_REF:
  /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
 may be NULL when we're called to compare MEM_EXPRs.  */
- if (!OP_SAME_WITH_NULL (0)
- || !OP_SAME (1))
+ if (!OP_SAME_WITH_NULL (0))
return false;
+ /* Most of time we only need to compare FIELD_DECLs for equality.
+However when determining address look into actual offsets.
+These may match for unions and unshared record types.  */
  flags &= ~OEP_ADDRESS_OF;
+ if (!OP_SAME (1))
+   {
+ if (flags & OEP_ADDRESS_OF)
+   {
+ if (TREE_OPERAND (arg0, 2)
+ || TREE_OPERAND (arg1, 2))
+   return OP_SAME_WITH_NULL (2);
+ tree field0 = TREE_OPERAND (arg0, 1);
+ tree field1 = TREE_OPERAND (arg1, 1);
+
+ if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
+   DECL_FIELD_OFFSET (field1), flags)
+ || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
+  DECL_FIELD_BIT_OFFSET (field1),
+  flags))
+   return false;
+   }
+ else
+   return false;
+   }
  return OP_SAME_WITH_NULL (2);
 
case BIT_FIELD_REF:
@@ -3787,9 +3809,26 @@ operand_compare::hash_operand (const_tree t, 
inchash::hash &hstate,
  sflags = flags;
  break;
 
+   case COMPONENT_REF:
+ if (sflags & OEP_ADDRESS_OF)
+   {
+ hash_operand (TREE_OPERAND (t, 0), hstate, flags);
+ if (TREE_OPERAND (t, 2))
+   hash_operand (TREE_OPERAND (t, 2), hstate,
+ flags & ~OEP_ADDRESS_OF);
+ else
+   {
+ tree field = TREE_OPERAND (t, 1);
+ hash_operand (DECL_FIELD_OFFSET (field),
+   hstate, flags & ~OEP_ADDRESS_OF);
+ hash_operand (DECL_FIELD_BIT_OFFSET (field),
+   hstate, flags & ~OEP_ADDRESS_OF);
+   }
+ return;
+   }
+ break;
case ARRAY_REF:
case ARRAY_RANGE_REF:
-   case COMPONENT_REF:
case BIT_FIELD_REF:
  sflags &= ~OEP_ADDRESS_OF;
  break;


Re: [PATCH v3 1/2] generate EH info for volatile asm statements (PR93981)

2020-11-12 Thread Jeff Law via Gcc-patches


On 3/11/20 6:38 PM, J.W. Jagersma via Gcc-patches wrote:
> The following patch extends the generation of exception handling
> information, so that it is possible to catch exceptions thrown from
> volatile asm statements, when -fnon-call-exceptions is enabled.  Parts
> of the gcc code already suggested this should be possible, but it was
> never fully implemented.
>
> Two new test cases are added.  The target-dependent test should pass on
> platforms where throwing from a signal handler is allowed.  The only
> platform I am aware of where that is the case is *-linux-gnu, so it is
> set to XFAIL on all others.
>
> gcc/
> 2020-03-11  Jan W. Jagersma  
>
>   PR inline-asm/93981
>   * tree-cfg.c (make_edges_bb): Make EH edges for GIMPLE_ASM.
>   * tree-eh.c (lower_eh_constructs_2): Add case for GIMPLE_ASM.
>   Assign register output operands to temporaries.
>   * doc/extend.texi: Document that volatile asms can now throw.
>
> gcc/testsuite/
> 2020-03-11  Jan W. Jagersma  
>
>   PR inline-asm/93981
>   * g++.target/i386/pr93981.C: New test.
>   * g++.dg/eh/pr93981.C: New test.

Is this the final version of the patch?  Do we have agreement on the
sematics for output operands, particularly memory operands?  The last
few messages in the March thread lead me to believe that's still not
settled.


Jeff




Re: [PATCH] IBM Z: Define vec_vfees instruction pattern

2020-11-12 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Thu, Nov 12, 2020 at 02:18:13PM +0100, Andreas Krebbel wrote:
> On 12.11.20 13:21, Stefan Schulze Frielinghaus wrote:
> > Bootstrapped and regtested on IBM Z.  Ok for master?
> > 
> > gcc/ChangeLog:
> > 
> > * config/s390/vector.md ("vec_vfees"): New insn pattern.
> > ---
> >  gcc/config/s390/vector.md | 26 ++
> >  1 file changed, 26 insertions(+)
> > 
> > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> > index 31d323930b2..4333a2191ae 100644
> > --- a/gcc/config/s390/vector.md
> > +++ b/gcc/config/s390/vector.md
> > @@ -1798,6 +1798,32 @@
> >"vll\t%v0,%1,%2"
> >[(set_attr "op_type" "VRS")])
> >  
> > +; vfeebs, vfeehs, vfeefs
> > +; vfeezbs, vfeezhs, vfeezfs
> > +(define_insn "vec_vfees"
> > +  [(set (match_operand:VI_HW_QHS 0 "register_operand" "=v")
> > +   (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v")
> > +  (match_operand:VI_HW_QHS 2 "register_operand" "v")
> > +  (match_operand:QI 3 "const_mask_operand" "C")]
> > + UNSPEC_VEC_VFEE))
> > +   (set (reg:CCRAW CC_REGNUM)
> > +   (unspec:CCRAW [(match_dup 1)
> > +  (match_dup 2)
> > +  (match_dup 3)]
> > + UNSPEC_VEC_VFEECC))]
> > +  "TARGET_VX"
> > +{
> > +  unsigned HOST_WIDE_INT flags = UINTVAL (operands[3]);
> > +
> > +  gcc_assert (!(flags & ~(VSTRING_FLAG_ZS | VSTRING_FLAG_CS)));
> > +  flags &= ~VSTRING_FLAG_CS;
> > +
> > +  if (flags == VSTRING_FLAG_ZS)
> > +return "vfeezs\t%v0,%v1,%v2";
> > +  return "vfees\t%v0,%v1,%v2";
> > +}
> > +  [(set_attr "op_type" "VRR")])
> > +
> >  ; vfenebs, vfenehs, vfenefs
> >  ; vfenezbs, vfenezhs, vfenezfs
> >  (define_insn "vec_vfenes"
> > 
> 
> Since this is mostly a copy of the pattern in vx-builtins.md I think we 
> should remove the other
> version then.
> 
> I also would prefer this to be committed together with the code making use of 
> the expander. So far
> this would be dead code - right?

Ok, I will remove the dead code and commit this change in conjunction
with the user in a different patch.

Thanks,
Stefan


Re: [PATCH] system: Add WARN_UNUSED_RESULT

2020-11-12 Thread Jason Merrill via Gcc-patches

On 11/11/20 10:03 PM, Marek Polacek wrote:

I'd like to have the option of marking functions with
__attribute__ ((__warn_unused_result__)), so this patch adds a macro.
And use it for maybe_wrap_with_location, it's always a bug if the
return value is not used, which happened to me and got me confused.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/ChangeLog:

* system.h (WARN_UNUSED_RESULT): Define for GCC >= 3.4.
* tree.h (maybe_wrap_with_location): Add WARN_UNUSED_RESULT.
---
  gcc/system.h | 6 ++
  gcc/tree.h   | 2 +-
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/system.h b/gcc/system.h
index b0f3f1dd019..6f6ab616a61 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -789,6 +789,12 @@ extern void fancy_abort (const char *, int, const char *)
  #define ALWAYS_INLINE inline
  #endif
  
+#if GCC_VERSION >= 3004

+#define WARN_UNUSED_RESULT __attribute__ ((__warn_unused_result__))
+#else
+#define WARN_UNUSED_RESULT
+#endif
+
  /* Use gcc_unreachable() to mark unreachable locations (like an
 unreachable default case of a switch.  Do not use gcc_assert(0).  */
  #if (GCC_VERSION >= 4005) && !ENABLE_ASSERT_CHECKING
diff --git a/gcc/tree.h b/gcc/tree.h
index 684be10b440..9a713cdb0c7 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1214,7 +1214,7 @@ get_expr_source_range (tree expr)
  extern void protected_set_expr_location (tree, location_t);
  extern void protected_set_expr_location_if_unset (tree, location_t);
  
-extern tree maybe_wrap_with_location (tree, location_t);

+WARN_UNUSED_RESULT extern tree maybe_wrap_with_location (tree, location_t);
  
  extern int suppress_location_wrappers;
  


base-commit: 0f5f9ed5e5a041b636cc002451b1e8b2295f8e4f





Re: [RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions

2020-11-12 Thread Jeff Law via Gcc-patches


On 4/7/20 2:45 PM, Pat Haugen via Gcc-patches wrote:
> The Power processor has the ability to fuse certain pairs of dependent
> instructions to improve their performance if they appear back-to-back in
> the instruction stream. In looking at the current support for
> instruction fusion in GCC I saw the following 2 options.
>
> 1) TARGET_SCHED_MACRO_FUSION target hooks: Only looks at existing
> back-to-back instructions and will ensure the scheduler keeps them together.
>
> 2) -fsched-fusion/TARGET_SCHED_FUSION_PRIORITY: Runs as a separate
> scheduling pass before peephole2. Operates independently on a single
> insn. Used by ARM backend to assign higher priorities to base/disp loads
> and stores so that the scheduling pass will schedule loads/stores to
> adjacent memory back-to-back. Later these insns will be transformed into
> load/store pair insns.
>
> Neither of these work for Power's purpose because they don't deal with
> fusion of dependent insns that may not already be back-to-back. The
> TARGET_SCHED_REORDER[2] hooks also don't work since the dependent insn
> more than likely gets queued for N cycles so wouldn't be on the ready
> list for the reorder hooks to process. We want the ability for the
> scheduler to schedule dependent insn pairs back-to-back when possible
> (i.e. other dependencies of both insns have been satisfied).
>
> I have coded up a proof of concept that implements our needs via a new
> target hook. The hook is passed a pair of dependent insns and returns if
> they are a fusion candidate. It is called while removing the forward
> dependencies of the just scheduled insn. If a dependent insn becomes
> available to schedule and it's a fusion candidate with the just
> scheduled insn, then the new code moves it to the ready list (if
> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
> candidate will be scheduled next. Following is the scheduling part of
> the diff. Does this sound like a feasible approach? I welcome any
> comments/discussion.

It looks fairly reasonable to me.   Do you plan on trying to take this
forward at all?


jeff




Re: [PATCH] IBM Z: Fix output template for "*vfees"

2020-11-12 Thread Stefan Schulze Frielinghaus via Gcc-patches
As pointed out in
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558816.html
this instruction pattern will be removed anyway.  Thus we can ignore
this patch.

On Thu, Nov 12, 2020 at 01:25:35PM +0100, Stefan Schulze Frielinghaus wrote:
> Bootstrapped and regtested on IBM Z.  Ok for master?
> 
> gcc/ChangeLog:
> 
>   * config/s390/vx-builtins.md ("*vfees"): Fix output
> template.
> ---
>  gcc/config/s390/vx-builtins.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
> index 010db4d1115..0c2e7170223 100644
> --- a/gcc/config/s390/vx-builtins.md
> +++ b/gcc/config/s390/vx-builtins.md
> @@ -1395,7 +1395,7 @@
>  
>if (flags == VSTRING_FLAG_ZS)
>  return "vfeezs\t%v0,%v1,%v2";
> -  return "vfees\t%v0,%v1,%v2,%b3";
> +  return "vfees\t%v0,%v1,%v2";
>  }
>[(set_attr "op_type" "VRR")])
>  
> -- 
> 2.28.0
> 


Re: [RFC][PR target PR90000] (rs6000) Compile time hog w/impossible asm constraint lra loop

2020-11-12 Thread Jeff Law via Gcc-patches


On 4/23/20 9:48 AM, will schmidt wrote:
> On Wed, 2020-04-22 at 12:26 -0600, Jeff Law wrote:
>> On Fri, 2020-04-10 at 16:40 -0500, will schmidt via Gcc-patches
>> wrote:
>>> [RFC][PR target/9] Compile time hog w/impossible asm constraint
>>> lra loop
>>> 
>>> Hi,
>>>   RFC for a bandaid/patch to partially address target PR/9.
>>>
>>> This adds an escape condition from the forever loop where 
>>> LRA gets stuck while attempting to handle constraints from an 
>>> instruction that has previously suffered an impossible constraint
>>> error.
>>>
>>> This is somewhat inspired by MAX_RELOAD_INSNS_NUMBER as
>>> seen in lra-constraints.c lra_constraints().   This utilizes the
>>> existing counter variable lra_constraint_iter.
>>>
>>> More needs to be done here, as this does replace a spin-forever
>>> situation with an ICE.
>>>
>>> Thanks
>>> -Will
>>>
>>>
>>> gcc/
>>> 2020-04-10  Will Schmidt  
>>>
>>> * lra.c: Add include of rtl-error.h.
>>> (MAX_LRA_CONSTRAINT_PASSES): New define.
>>> (lra): Add check of lra_constraint_iter value.
>> Doesn't this argue that there's some other datastructure that needs
>> to be updated
>> when we removed the impossible asm?
> Yes, i think so.   I'm just not sure exactly what or where.
> The submitted patch is minimally allowing for manageable-in-size reload
> dumps for my continued debug.  :-)
>
> There is an old patch that addressed what looks like a similar issue,
> but i wasn't able to directly apply that to this situation without
> failing in other places. 
>
>> commit e86c0101ae59b32c3f10edcca78398cbf8848eaa
>> Author: Steven Bosscher 
>> Date:   Thu Jan 24 10:30:26 2013 +
>>re PR inline-asm/55934 (LRA inline asm error recovery)
> Which does a bit more, but at it's core is this:
>
> + PATTERN (insn) = gen_rtx_USE (VOIDmode, const0_rtx);
> + lra_set_insn_deleted (insn);
>
>
> I suspect this particular scenario with the testcase is a dependency across
> several 'insns', so marking just one as deleted is not enough.
> (but i'm not sure,..
>
> void foo (void)
> {
>   register float __attribute__ ((mode(SD))) r31 __asm__ ("r31");
>   register float __attribute__ ((mode(SD))) fr1 __asm__ ("fr1");
>
>   __asm__ ("#" : "=d" (fr1));
>   r31 = fr1;
>   __asm__ ("#" : : "r" (r31));
> }

Looking at this again after many months away, I wonder the real problem
is the reloads we have to generate for copies to/from he fr1 local
variable, which is bound to hard reg fr1 rather than the asm statements
themselves.  It's not clear to me from the BZ and I don't have a PPC
cross handy to look directly.


jeff




Re: [24/32] module mapper

2020-11-12 Thread Nathan Sidwell

On 11/3/20 4:17 PM, Nathan Sidwell wrote:
this is the module mapper client and server pieces.  It features a 
default resolver that can read a text file, or generate default mappings 
from module name to cmi name.


Richard rightly suggested on IRC that the sample server for the module 
mapper shouldn't be in the gcc/cp dir.  It happened to be that way 
because it started out much more closely coupled, but then it grew legs.


So this patch creates a new c++tools toplevel directory and places the 
mapper-server and its default resolver there.  That means more changes 
to the toplevel Makefile.def and Makefile.tpl (I've not included the 
regenerated Makefile.in, nor other generated files in gcc/ and c++tools 
in this diff.)


We still need to build the default resolver when building cc1plus, and 
I've placed mapper-resolver.cc there, as a simple #include forwarder to 
the source in c++tools.  I also replace 'gcc/cp/mapper.h' with a 
client-specific 'gcc/cp/mapper-client.h'.  (mapper-client is only linked 
into cc1plus, so gcc/cp seems the right place for it.)


The sample server relies on gcc/version.o to pick up its version number, 
and I place it in the libexecsubdir that we place cc1plus.  I wasn't 
comfortable placing it in the install location of g++ itself.  I call it 
a sample server for a reason :)


I will of course provide changelog when committing.

nathan

--
Nathan Sidwell
diff --git c/Makefile.def w/Makefile.def
index 36fd26b0367..6e98d2d3340 100644
--- c/Makefile.def
+++ w/Makefile.def
@@ -125,12 +134,13 @@ host_modules= { module= libtermcap; no_check=true;
 missing=distclean;
 missing=maintainer-clean; };
 host_modules= { module= utils; no_check=true; };
+host_modules= { module= c++tools; };
 host_modules= { module= gnattools; };
+host_modules= { module= gotools; };
 host_modules= { module= lto-plugin; bootstrap=true;
 		extra_configure_flags='--enable-shared @extra_linker_plugin_flags@ @extra_linker_plugin_configure_flags@';
 		extra_make_flags='@extra_linker_plugin_flags@'; };
 host_modules= { module= libcc1; extra_configure_flags=--enable-shared; };
-host_modules= { module= gotools; };
 host_modules= { module= libctf; no_install=true; no_check=true;
 		bootstrap=true; };
 
@@ -381,6 +392,8 @@ dependencies = { module=all-lto-plugin; on=all-libiberty-linker-plugin; };
 dependencies = { module=configure-libcc1; on=configure-gcc; };
 dependencies = { module=all-libcc1; on=all-gcc; };
 
+// we want version.o from gcc, and implicitly depend on libcody
+dependencies = { module=all-c++tools; on=all-gcc; };
 dependencies = { module=all-gotools; on=all-target-libgo; };
 
 dependencies = { module=all-utils; on=all-libiberty; };
diff --git c/Makefile.tpl w/Makefile.tpl
index efed1511750..3b88f351d5b 100644
--- c/Makefile.tpl
+++ w/Makefile.tpl
@@ -864,8 +864,8 @@ local-distclean:
 	-rm -f texinfo/doc/Makefile texinfo/po/POTFILES
 	-rmdir texinfo/doc texinfo/info texinfo/intl texinfo/lib 2>/dev/null
 	-rmdir texinfo/makeinfo texinfo/po texinfo/util 2>/dev/null
-	-rmdir fastjar gcc gnattools gotools libcc1 libiberty 2>/dev/null
-	-rmdir texinfo zlib 2>/dev/null
+	-rmdir c++tools fastjar gcc gnattools gotools 2>/dev/null
+	-rmdir libcc1 libiberty texinfo zlib 2>/dev/null
 	-find . -name config.cache -exec rm -f {} \; \; 2>/dev/null
 
 local-maintainer-clean:
diff --git c/c++tools/configure.ac w/c++tools/configure.ac
new file mode 100644
index 000..8d882e541df
--- /dev/null
+++ w/c++tools/configure.ac
@@ -0,0 +1,210 @@
+# Configure script for c++tools
+#   Copyright (C) 2020 Free Software Foundation, Inc.
+#   Written by Nathan Sidwell  while at FaceBook
+#
+# This file is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; see the file COPYING3.  If not see
+# .
+
+# C++ has grown a C++20 mapper server.  This may be used to provide
+# and/or learn and/or build required modules.  This sample server
+# shows how the protocol introduced by wg21.link/p1184 may be used.
+# By default g++ uses an in-process mapper.
+
+sinclude(../config/acx.m4)
+
+AC_INIT(c++tools)
+
+AC_CONFIG_SRCDIR([server.cc])
+
+# Determine the noncanonical names used for directories.
+ACX_NONCANONICAL_HOST
+
+AC_CANONICAL_SYSTEM
+AC_PROG_INSTALL
+
+AC_PROG_CXX
+MISSING=`cd $ac_aux_dir && ${PWDCMD-pwd}`/missing
+AC_CHECK_PROGS([AUTOCONF], [autoconf], [$MISSING autoconf])
+AC_CHECK_PROGS([AUTOHEADER], [autoheader], [$MISSING autoheader])
+
+dnl Enab

Re: [PATCH][RFC] Make mingw-w64 printf/scanf attribute alias to ms_printf/ms_scanf only for C89

2020-11-12 Thread Liu Hao via Gcc-patches
在 2020/11/12 23:12, Liu Hao 写道:
> 
> My humble opinion is that people should have gotten used to the `ll` 
> specifier so I propose a
> different patch that enables it unconditionally. As Jonathan Yong pointed 
> out, GCC is impossible to

The previous patch missed a `double_name` field. A revised version has been 
attached.



-- 
Best regards,
LH_Mouse
From 1d61adae0695e7067e35f36e607a754a7cf12796 Mon Sep 17 00:00:00 2001
From: Liu Hao 
Date: Thu, 12 Nov 2020 22:20:29 +0800
Subject: [PATCH] gcc: Add `ll` and `L` length modifiers for `ms_printf`

Previous code abuse `FMT_LEN_L` for the `I` modifier. As `L` is a valid
modifier for `f`, `e`, `g`, etc. and `I` has the same semantics as the
C99 `z` modifier, `FMT_LEN_z` is now used.

First, in the Microsoft ABI, type `long double` has the same layout as
type `double`, so `%Lg` behaves identically to `%g`. Users should pass
in `double`s instead as `long double`s, as GCC uses the 10-byte format.

Second, with a CRT that is recent enough (MSVCRT since Vista, MSVCR80,
UCRT, or mingw-w64 8.0), `printf`-family functions can handle the `ll`
length modifier correctly. This ability is assumed to be available
universally. A lot of libraries (such as libgomp) that use the
`format(printf, ...)` attribute used to suffer from warnings about
unknown format specifiers.

Reference: 
https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2008/tcxf1dw6(v=vs.90)
Reference: 
https://docs.microsoft.com/en-us/cpp/porting/visual-cpp-what-s-new-2003-through-2015#new-crt-features
Signed-off-by: Liu Hao 

gcc/:
* config/i386/msformat-c.c: Add more length modifiers
---
 gcc/config/i386/msformat-c.c | 45 ++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/gcc/config/i386/msformat-c.c b/gcc/config/i386/msformat-c.c
index 4ceec633a6e..1629b866976 100644
--- a/gcc/config/i386/msformat-c.c
+++ b/gcc/config/i386/msformat-c.c
@@ -32,10 +32,11 @@ along with GCC; see the file COPYING3.  If not see
 static format_length_info ms_printf_length_specs[] =
 {
   { "h", FMT_LEN_h, STD_C89, NULL, FMT_LEN_none, STD_C89, 0 },
-  { "l", FMT_LEN_l, STD_C89, NULL, FMT_LEN_none, STD_C89, 0 },
+  { "l", FMT_LEN_l, STD_C89, "ll", FMT_LEN_ll, STD_C89, 0 },
+  { "L", FMT_LEN_L, STD_C89, NULL, FMT_LEN_none, STD_C89, 1 },
   { "I32", FMT_LEN_l, STD_EXT, NULL, FMT_LEN_none, STD_C89, 1 },
   { "I64", FMT_LEN_ll, STD_EXT, NULL, FMT_LEN_none, STD_C89, 1 },
-  { "I", FMT_LEN_L, STD_EXT, NULL, FMT_LEN_none, STD_C89, 1 },
+  { "I", FMT_LEN_z, STD_EXT, NULL, FMT_LEN_none, STD_C89, 1 },
   { NULL, FMT_LEN_none, STD_C89, NULL, FMT_LEN_none, STD_C89, 0 }
 };
 
@@ -90,33 +91,33 @@ static const format_flag_pair ms_strftime_flag_pairs[] =
 static const format_char_info ms_print_char_table[] =
 {
   /* C89 conversion specifiers.  */
-  { "di",  0, STD_C89, { T89_I,   BADLEN,  T89_S,   T89_L,   T9L_LL,  T99_SST, 
 BADLEN, BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "-wp0 +'",  "i",  NULL 
},
-  { "oxX", 0, STD_C89, { T89_UI,  BADLEN,  T89_US,  T89_UL,  T9L_ULL, T99_ST, 
BADLEN, BADLEN, BADLEN, BADLEN,  BADLEN,  BADLEN }, "-wp0#", "i",  NULL },
-  { "u",   0, STD_C89, { T89_UI,  BADLEN,  T89_US,  T89_UL,  T9L_ULL, T99_ST, 
BADLEN, BADLEN, BADLEN, BADLEN,  BADLEN,  BADLEN }, "-wp0'","i",  NULL },
-  { "fgG", 0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN, BADLEN, BADLEN }, "-wp0 +#'", "",   NULL },
-  { "eE",  0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN, BADLEN, BADLEN }, "-wp0 +#",  "",   NULL },
-  { "c",   0, STD_C89, { T89_I,   BADLEN,  T89_S,  T94_WI,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-w","",   NULL 
},
-  { "s",   1, STD_C89, { T89_C,   BADLEN,  T89_S,  T94_W,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp",   "cR", NULL 
},
-  { "p",   1, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-w","c",  NULL 
},
-  { "n",   1, STD_C89, { T89_I,   BADLEN,  T89_S,   T89_L,   T9L_LL,  BADLEN,  
BADLEN, BADLEN,  T99_IM,  BADLEN,  BADLEN,  BADLEN }, "",  "W",  NULL },
+  { "di",  0, STD_C89, { T89_I,   BADLEN,  T89_S,   T89_L,   T9L_LL,  BADLEN, 
T99_SST, BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp0 +'",  "i",  NULL },
+  { "oxX", 0, STD_C89, { T89_UI,  BADLEN,  T89_US,  T89_UL,  T9L_ULL, BADLEN, 
T99_ST,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp0#","i",  NULL },
+  { "u",   0, STD_C89, { T89_UI,  BADLEN,  T89_US,  T89_UL,  T9L_ULL, BADLEN, 
T99_ST,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp0'","i",  NULL },
+  { "fgG", 0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T89_D,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp0 +#'", "",   NULL },
+  { "eE",  0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   

Re: Move thunks out of cgraph_node

2020-11-12 Thread David Malcolm via Gcc-patches
On Fri, 2020-10-23 at 21:45 +0200, Jan Hubicka wrote:
> Hi,
> this patch moves thunk_info out of cgraph_node into a symbol summary.
> I also moved it to separate hearder file since cgraph.h became really
> too
> fat.  I plan to contiue with similar breakup in order to cleanup
> interfaces
> and reduce WPA memory footprint (symbol table now consumes more
> memory than
> trees)
> 
> Bootstrapped/regtested x86_64-linux, plan to commit it shortly.

This seems to have broken libgccjit (specifically, code that makes
function calls).  Please can you test with --enable-languages=all,jit
(as "jit" isn't part of "all", since it needs --enable-host-shared).

[...snip...]

> +/* Return thunk_info possibly creating new one.  */
> +thunk_info *
> +thunk_info::get_create (cgraph_node *node)
> +{
> +  if (!symtab->m_thunks)
> +{
> +  symtab->m_thunks
> +  = new (ggc_alloc_no_dtor  ())
> +  thunk_infos_t (symtab, true);
> +  symtab->m_thunks->disable_insertion_hook ();
> +}
> +  return symtab->m_thunks->get_create (node);
> +}

symtab->m_thunks is allocated via ggc_alloc_no_dtor here, thus
allocating it within the GC heap...

[...snip...]

> +/* Free thunk info summaries.  */
> +inline void
> +thunk_info::release ()
> +{
> +  if (symtab->m_thunks)
> +delete (symtab->m_thunks);
> +  symtab->m_thunks = NULL;
> +}

...but deallocated using plain "delete", attempting to release the GC-
allocated memory into the system heap, leading to an ICE.  This seems
to happen for any compilation of function calls in which
toplev::finalize is called, i.e. any compilation of function calls from
libgccjit.

Does it need to be in the GC heap (maybe for PCH?), or can it be simply
allocated in the system heap via regular "new"?

I hope to get some continuous integration of libgccjit going at some
point (but am focusing on finishing my stage 1 stuff right now, and am
hacking round this for now).

I wonder if it would be useful for debug builds to
call toplev::finalize, so that cc1/cc1plus exercise these cleanup code
paths and thus catch this kind of breakage much earlier.

Dave



Re: [PATCH] libstdc++: Enable without gthreads

2020-11-12 Thread Jonathan Wakely via Gcc-patches

On 11/11/20 17:31 +, Jonathan Wakely wrote:

On 11/11/20 16:13 +, Jonathan Wakely wrote:

This makes it possible to use std::thread in single-threaded builds.
All member functions are available, but attempting to create a new
thread will throw an exception.

The main benefit for most targets is that other headers such as 
do not need to include the whole of  just to be able to create a
std::thread. That avoids including  and std::jthread where
not required.


I forgot to mention that this patch also reduces the size of the
 header, by only including  instead of the
whole of . That could be done separately from the rest of the
changes here.

It would be possible to split std::thread and this_thread::get_id()
into a new header without also making them work without gthreads.

That would still reduce the size of the  header, because it
wouldn't need the whole of . But it wouldn't get rid of
preprocessor checks for _GLIBCXX_HAS_GTHREADS in .

Allowing std::this_thread::get_id() and std::this_thread::yield() to
work without threads seems worth doing (we already make
std::this_thread::sleep_until and std::this_thread::sleep_for work
without threads).


Here's a slightly more conservative version of the patch. This moves
std::thread and this_thread::get_id() and this_thread::yield() to a
new header, and makes *most* of std::thread defined without gthreads
(because we need the nested thread::id type to be returned from
this_thread::get_id()). But it doesn't declare the std::thread
constructor that creates new threads.

That means std::thread is present, but you can't even try to create
new threads. This means we don't need to export the std::thread
symbols from libstdc++.so for a target where they are unusable and
just throw an exception.

This still has the main benefits of making  include a lot less
code, and removing some #if conditions in .

One other change from the previous patch worth mentioning is that I've
made  include  so that
std::reference_wrapper (and std::ref and std::cref) are defined by
. That isn't required, but it is a tiny header and being able
to use std::ref to pass lvalues to new threads without including
all of  seems like a kindness to users.

Both this and the previous patch require some GDB changes, because GDB
currently assumes that if std::thread is declared in  that it
is usable and multiple threads are supported. That's no longer true,
because we would declare a useless std::thread after this patch. Tom
Tromey has patches to make GDB handle this though.

Tested powerpc64le-linux, --enable-threads and --disable-threads.

Thoughts?


commit 68a99d44890957d6c5b128116a6af6bb4bcfaad3
Author: Jonathan Wakely 
Date:   Thu Nov 12 15:26:02 2020

libstdc++: Move std::thread to a new header

This makes it possible to use std::thread without including the whole of
. It also makes this_thread::get_id() and this_thread::yield()
available even when there is no gthreads support (e.g. when GCC is built
with --disable-threads or --enable-threads=single).

In order for the std::thread::id return type of this_thread::get_id() to
be defined, std:thread itself is defined unconditionally. However the
constructor that creates new threads is not defined for single-threaded
builds. The thread::join() and thread::detach() member functions are
defined inline for single-threaded builds and just throw an exception
(because we know the thread cannot be joinable if the constructor that
creates joinable threads doesn't exit).

The thread::hardware_concurrency() member function is also defined
inline and returns 0 (as suggested by the standard when the value "is
not computable or well-defined").

The main benefit for most targets is that other headers such as 
do not need to include the whole of  just to be able to create a
std::thread. That avoids including  and std::jthread where
not required.

This also means we can use this_thread::get_id() and this_thread::yield()
in  instead of using the gthread functions directly. This
removes some preprocessor conditionals, simplifying the code.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/std/future: Include new header instead of .
* include/std/stop_token: Include new header instead of
.
(stop_token::_S_yield()): Use this_thread::yield().
(_Stop_state_t::_M_requester): Change type to std::thread::id.
(_Stop_state_t::_M_request_stop()): Use this_thread::get_id().
(_Stop_state_t::_M_remove_callback(_Stop_cb*)): Likewise.
Use __is_single_threaded() to decide whether to synchronize.
* include/std/thread (thread, operator==, this_thread::get_id)
(this_thread::yield): Move to new header.
(operator<=>, operator!=, operator<, operator<=, 

[PATCH] cgraph: Avoid segfault when attempting to dump NULL clone_info

2020-11-12 Thread Martin Jambor
Hi,

cgraph_node::materialize_clone segfaulted when I tried compiling Tramp3D
with -fdump-ipa-all because there was no clone_info - IPA-CP created a
clone only for an aggregate constant, adding a note to its
transformation summary but not creating any tree_map nor
param_adjustements.

Fixed with the following obvious extra checks which I will commit after
an obligatory round of bootstrap and testing.

Thanks,

Martin


gcc/ChangeLog:

2020-11-12  Martin Jambor  

* cgraphclones.c (cgraph_node::materialize_clone): Check that clone
info is not NULL before attempting to dump it.
---
 gcc/cgraphclones.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index bc590819f78..712a54e8d0c 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -1107,7 +1107,7 @@ cgraph_node::materialize_clone ()
   fprintf (symtab->dump_file, "cloning %s to %s\n",
   clone_of->dump_name (),
   dump_name ());
-  if (info->tree_map)
+  if (info && info->tree_map)
 {
  fprintf (symtab->dump_file, "replace map:");
  for (unsigned int i = 0;
@@ -1123,7 +1123,7 @@ cgraph_node::materialize_clone ()
}
  fprintf (symtab->dump_file, "\n");
}
-  if (info->param_adjustments)
+  if (info && info->param_adjustments)
info->param_adjustments->dump (symtab->dump_file);
 }
   clear_stmts_in_references ();
-- 
2.29.2



Re: Move thunks out of cgraph_node

2020-11-12 Thread Jan Hubicka
> On Fri, 2020-10-23 at 21:45 +0200, Jan Hubicka wrote:
> > Hi,
> > this patch moves thunk_info out of cgraph_node into a symbol summary.
> > I also moved it to separate hearder file since cgraph.h became really
> > too
> > fat.  I plan to contiue with similar breakup in order to cleanup
> > interfaces
> > and reduce WPA memory footprint (symbol table now consumes more
> > memory than
> > trees)
> > 
> > Bootstrapped/regtested x86_64-linux, plan to commit it shortly.
> 
> This seems to have broken libgccjit (specifically, code that makes
> function calls).  Please can you test with --enable-languages=all,jit
> (as "jit" isn't part of "all", since it needs --enable-host-shared).

Sorry for that :(
I wl try to keep in mind and test JIT.

> 
> [...snip...]
> 
> > +/* Return thunk_info possibly creating new one.  */
> > +thunk_info *
> > +thunk_info::get_create (cgraph_node *node)
> > +{
> > +  if (!symtab->m_thunks)
> > +{
> > +  symtab->m_thunks
> > += new (ggc_alloc_no_dtor  ())
> > +thunk_infos_t (symtab, true);
> > +  symtab->m_thunks->disable_insertion_hook ();
> > +}
> > +  return symtab->m_thunks->get_create (node);
> > +}
> 
> symtab->m_thunks is allocated via ggc_alloc_no_dtor here, thus
> allocating it within the GC heap...
> 
> [...snip...]
> 
> > +/* Free thunk info summaries.  */
> > +inline void
> > +thunk_info::release ()
> > +{
> > +  if (symtab->m_thunks)
> > +delete (symtab->m_thunks);
> > +  symtab->m_thunks = NULL;
> > +}
> 
> ...but deallocated using plain "delete", attempting to release the GC-
> allocated memory into the system heap, leading to an ICE.  This seems
> to happen for any compilation of function calls in which
> toplev::finalize is called, i.e. any compilation of function calls from
> libgccjit.
> 
> Does it need to be in the GC heap (maybe for PCH?), or can it be simply
> allocated in the system heap via regular "new"?

It points to trees (alias decl) and thus it is in PCH so it walks
through it.  In fact I have a cleanup patch for this (which gets rid of
the alias decl that is, in fact, another PCH workaround hack.

> 
> I hope to get some continuous integration of libgccjit going at some
> point (but am focusing on finishing my stage 1 stuff right now, and am
> hacking round this for now).
> 
> I wonder if it would be useful for debug builds to
> call toplev::finalize, so that cc1/cc1plus exercise these cleanup code
> paths and thus catch this kind of breakage much earlier.

I think we could do that for checking compilers.
It would also make it easier to check for memory leaks...

Honza
> 
> Dave
> 


Re: [PATCH] cgraph: Avoid segfault when attempting to dump NULL clone_info

2020-11-12 Thread Jan Hubicka
> Hi,
> 
> cgraph_node::materialize_clone segfaulted when I tried compiling Tramp3D
> with -fdump-ipa-all because there was no clone_info - IPA-CP created a
> clone only for an aggregate constant, adding a note to its
> transformation summary but not creating any tree_map nor
> param_adjustements.
> 
> Fixed with the following obvious extra checks which I will commit after
> an obligatory round of bootstrap and testing.
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2020-11-12  Martin Jambor  
> 
>   * cgraphclones.c (cgraph_node::materialize_clone): Check that clone
>   info is not NULL before attempting to dump it.
OK, thanks!
Honza
> ---
>  gcc/cgraphclones.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
> index bc590819f78..712a54e8d0c 100644
> --- a/gcc/cgraphclones.c
> +++ b/gcc/cgraphclones.c
> @@ -1107,7 +1107,7 @@ cgraph_node::materialize_clone ()
>fprintf (symtab->dump_file, "cloning %s to %s\n",
>  clone_of->dump_name (),
>  dump_name ());
> -  if (info->tree_map)
> +  if (info && info->tree_map)
>  {
> fprintf (symtab->dump_file, "replace map:");
> for (unsigned int i = 0;
> @@ -1123,7 +1123,7 @@ cgraph_node::materialize_clone ()
>   }
> fprintf (symtab->dump_file, "\n");
>   }
> -  if (info->param_adjustments)
> +  if (info && info->param_adjustments)
>   info->param_adjustments->dump (symtab->dump_file);
>  }
>clear_stmts_in_references ();
> -- 
> 2.29.2
> 


Re: [PATCH] libstdc++: Add support for C++20 barriers

2020-11-12 Thread Jonathan Wakely via Gcc-patches

On 04/11/20 10:55 -0800, Thomas Rodgers wrote:

--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -603,13 +603,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }

#if __cplusplus > 201703L
+  template
+   _GLIBCXX_ALWAYS_INLINE void
+   _M_wait(__int_type __old, const _Func& __fn) const noexcept
+   { std::__atomic_wait(&_M_i, __old, __fn); }
+
 _GLIBCXX_ALWAYS_INLINE void
 wait(__int_type __old,
  memory_order __m = memory_order_seq_cst) const noexcept
 {
-   std::__atomic_wait(&_M_i, __old,
-  [__m, this, __old]
-  { return this->load(__m) != __old; });
+   _M_wait(__old,
+   [__m, this, __old]
+   { return this->load(__m) != __old; });
 }


This looks like it's not meant to be part of this patch.

It also looks wrong for any patch, because it adds _M_wait as a public
member.

Not sure what this piece is for :-)



It is used at include/std/barrier:197 to keep the implementation as close as 
possible to the libc++ version upon which it is based.


So the caller in  can't use __atomic_wait directly because it
can't access the _M_i member of the atomic.

Would it be possible to use atomic_ref instead of atomic, so that the
barrier code has access to the underlying object and can use it
directly with __atomic_wait?




Re: [PATCH] libstdc++: Ensure __gthread_self doesn't call undefined weak symbol [PR 95989]

2020-11-12 Thread Jonathan Wakely via Gcc-patches

On 11/11/20 19:08 +0100, Jakub Jelinek via Libstdc++ wrote:

On Wed, Nov 11, 2020 at 05:24:42PM +, Jonathan Wakely wrote:

--- a/libgcc/gthr-posix.h
+++ b/libgcc/gthr-posix.h
@@ -684,7 +684,14 @@ __gthread_equal (__gthread_t __t1, __gthread_t __t2)
 static inline __gthread_t
 __gthread_self (void)
 {
+#if __GLIBC_PREREQ(2, 27)


What if it is a non-glibc system where __GLIBC_PREREQ macro isn't defined?
I think you'd get then
error: missing binary operator before token "("
So I think you want
#if defined __GLIBC__ && defined __GLIBC_PREREQ
#if __GLIBC_PREREQ(2, 27)
 return pthread_self ();
#else
 return __gthrw_(pthread_self) ();
#else
 return __gthrw_(pthread_self) ();
#endif
or similar.



Here's a fixed version of the patch.

I've moved the glibc-specific code in this_thread::get_id() into a new
macro defined in config/os/gnu-linux/os_defines.h (where we already
know we are dealing with glibc). That means we don't do the
__GLIBC_PREREQ check directly in , it's hidden away in a
target-specific header.

Tested powerpc64le-linux (glibc 2.17 and 2.32), sparc-solaris2.11 and
powerpc-aix.




commit 822914f1f1f4710ff252764ee634aa07ac565d53
Author: Jonathan Wakely 
Date:   Wed Nov 11 19:26:00 2020

libstdc++: Ensure __gthread_self doesn't call undefined weak symbol [PR 95989]

Since glibc 2.27 the pthread_self symbol has been defined in libc rather
than libpthread. Because we only call pthread_self through a weak alias
it's possible for statically linked executables to end up without a
definition of pthread_self. This crashes when trying to call an
undefined weak symbol.

We can use the __GLIBC_PREREQ version check to detect the version of
glibc where pthread_self is no longer in libpthread, and call it
directly rather than through the weak reference.

It would be better to check for pthread_self in libc during configure
instead of hardcoding the __GLIBC_PREREQ check. That would be somewhat
complicated by the fact that prior to glibc 2.27 only libc.so.6
contained the pthread_self symbol. The configure checks would need to
try to link both statically and dynamically, and the result would depend
on whether the static libc.a happens to be installed during configure
(which could vary between different systems using the same version of
glibc). Doing it properly is left for a future date, as it will be
needed anyway after glibc moves all pthread symbols from libpthread to
libc. When that happens we should revisit the whole approach of using
weak symbols for pthread symbols.

For the purposes of std::this_thread::get_id() we create a fake non-zero
thread ID ((__gthread_t)1) when using glibc but not linked to
libpthread. When using glibc 2.27 or later pthread_self() never returns
zero so we don't need to use (__gthread_t)1 for new glibc.

An undesirable consequence of this change is that code compiled prior to
the change might inline the old definition of this_thread::get_id()
which always returns (__gthread_t)1 in a program that isn't linked to
libpthread. Code compiled after the change will use pthread_self() and
so get a real TID. That could result in the main thread having different
thread::id values in different translation units. This seems acceptable,
as there are not expected to be many uses of thread::id in programs
that aren't linked to libpthread.

libgcc/ChangeLog:

PR libstdc++/95989
* gthr-posix.h (__gthread_self) [__GLIBC_PREREQ(2, 27)]: Call
pthread_self directly rather than using weak alias.

libstdc++-v3/ChangeLog:

PR libstdc++/95989
* config/os/gnu-linux/os_defines.h (_GLIBCXX_NATIVE_THREAD_ID):
Define new macro to get reliable thread ID.
* include/std/thread (this_thread::get_id): Use new macro if
it's defined.

diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h
index 965247602acf..dc34645d1c52 100644
--- a/libgcc/gthr-posix.h
+++ b/libgcc/gthr-posix.h
@@ -684,7 +684,18 @@ __gthread_equal (__gthread_t __t1, __gthread_t __t2)
 static inline __gthread_t
 __gthread_self (void)
 {
+#if defined __GLIBC__ && defined __GLIBC_PREREQ
+# if __GLIBC_PREREQ(2, 27)
+  /* Since Glibc 2.27, pthread_self is defined in libc not libpthread.
+   * Call it directly so that we get a non-weak reference and won't call
+   * an undefined weak symbol when linked to the libc.a static lib.  */
+  return pthread_self ();
+# else
   return __gthrw_(pthread_self) ();
+# endif
+#else
+  return __gthrw_(pthread_self) ();
+#endif
 }
 
 static inline int
diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h b/libstdc++-v3/config/os/gnu-linux/os_defines.h
index f821486ec8f5..ca61ecf60f62 100644
--- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
+++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
@@ -49,4 +49,14 @@
 // version dynamically in case it has ch

Re: Improve handling of memory operands in ipa-icf 2/4

2020-11-12 Thread Jan Hubicka
Hi,
this is updated patch.  It fixes the comparsion of bitfield where I now
check that they bitsizes and bitoffsets match (and OEP_ADDRESSOF is not
used for bitfield references).
I also noticed problem with dependence clique in ao_refs_may_alias that
I copied here.  Instead of base rbase should be used.

Finally I ran statistics on when access paths mismatches and noticed
that I do not really need to check that component_refs and array_refs
are semantically equivalent since this is implied from earlier tests.
This is described in inline comment and simplifies the code.

Bootstrapped/regtested x86_64-linux, OK?
Honza


* ipa-icf-gimple.c: Include tree-ssa-alias-compare.h.
(find_checker::func_checker): Initialize m_tbaa.
(func_checker::hash_operand): Use hash_ao_ref for memory accesses.
(func_checker::compare_operand): Use compare_ao_refs for memory
accesses.
(func_checker::cmopare_gimple_assign): Do not check LHS types
of memory stores.
* ipa-icf-gimple.h (func_checker): Derive from ao_compare;
add m_tbaa.
* ipa-icf.c: Include tree-ssa-alias-compare.h.
(sem_function::equals_private): Update call of
func_checker::func_checker.
* ipa-utils.h (lto_streaming_expected_p): New inline
predicate.
* tree-ssa-alias-compare.h: New file.
* tree-ssa-alias.c: Include tree-ssa-alias-compare.h
and bultins.h
(view_converted_memref_p): New function.
(types_equal_for_same_type_for_tbaa_p): New function.
(ao_compare::compare_ao_refs): New member function.
(ao_compare::hash_ao_ref): New function

* c-c++-common/Wstringop-overflow-2.c: Disable ICF.
* g++.dg/warn/Warray-bounds-8.C: Disable ICF.

index f75951f7c49..26337dd7384 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "gimple-walk.h"
 
+#include "tree-ssa-alias-compare.h"
 #include "ipa-icf-gimple.h"
 
 namespace ipa_icf_gimple {
@@ -52,13 +53,13 @@ namespace ipa_icf_gimple {
of declarations that can be skipped.  */
 
 func_checker::func_checker (tree source_func_decl, tree target_func_decl,
-   bool ignore_labels,
+   bool ignore_labels, bool tbaa,
hash_set *ignored_source_nodes,
hash_set *ignored_target_nodes)
   : m_source_func_decl (source_func_decl), m_target_func_decl 
(target_func_decl),
 m_ignored_source_nodes (ignored_source_nodes),
 m_ignored_target_nodes (ignored_target_nodes),
-m_ignore_labels (ignore_labels)
+m_ignore_labels (ignore_labels), m_tbaa (tbaa)
 {
   function *source_func = DECL_STRUCT_FUNCTION (source_func_decl);
   function *target_func = DECL_STRUCT_FUNCTION (target_func_decl);
@@ -252,9 +253,16 @@ func_checker::hash_operand (const_tree arg, inchash::hash 
&hstate,
 
 void
 func_checker::hash_operand (const_tree arg, inchash::hash &hstate,
-   unsigned int flags, operand_access_type)
+   unsigned int flags, operand_access_type access)
 {
-  return hash_operand (arg, hstate, flags);
+  if (access == OP_MEMORY)
+{
+  ao_ref ref;
+  ao_ref_init (&ref, const_cast  (arg));
+  return hash_ao_ref (&ref, lto_streaming_expected_p (), m_tbaa, hstate);
+}
+  else
+return hash_operand (arg, hstate, flags);
 }
 
 bool
@@ -314,18 +322,40 @@ func_checker::compare_operand (tree t1, tree t2, 
operand_access_type access)
 return true;
   else if (!t1 || !t2)
 return false;
-  if (operand_equal_p (t1, t2, OEP_MATCH_SIDE_EFFECTS))
-return true;
-  switch (access)
+  if (access == OP_MEMORY)
 {
-case OP_MEMORY:
-  return return_false_with_msg
-("operand_equal_p failed (access == memory)");
-case OP_NORMAL:
+  ao_ref ref1, ref2;
+  ao_ref_init (&ref1, const_cast  (t1));
+  ao_ref_init (&ref2, const_cast  (t2));
+  int flags = compare_ao_refs (&ref1, &ref2,
+  lto_streaming_expected_p (), m_tbaa);
+
+  if (!flags)
+   return true;
+  if (flags & SEMANTICS)
+   return return_false_with_msg
+   ("compare_ao_refs failed (semantic difference)");
+  if (flags & BASE_ALIAS_SET)
+   return return_false_with_msg
+   ("compare_ao_refs failed (base alias set difference)");
+  if (flags & REF_ALIAS_SET)
+   return return_false_with_msg
+("compare_ao_refs failed (ref alias set difference)");
+  if (flags & ACCESS_PATH)
+   return return_false_with_msg
+("compare_ao_refs failed (access path difference)");
+  if (flags & DEPENDENCE_CLIQUE)
+   return return_false_with_msg
+("compare_ao_refs failed (dependence clique difference)");
+  gcc_unreachable ();
+}
+  else
+{
+  if (op

Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-12 Thread Uros Bizjak via Gcc-patches
On Thu, Nov 12, 2020 at 2:59 PM Richard Biener
 wrote:
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > PR target/97194
> > > > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New 
> > > > > > > > function.
> > > > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New 
> > > > > > > > Decl.
> > > > > > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > > > > true for const_int_operand or register_operand under 
> > > > > > > > TARGET_AVX2.
> > > > > > > > * config/i386/sse.md (vec_set): Support both constant
> > > > > > > > and variable index vec_set.
> > > > > > > >
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > >
> > > > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > > > > >
> > > > > > > +;; True for registers, or const_int_operand, used to vec_setm 
> > > > > > > expander.
> > > > > > > +(define_predicate "vec_setm_operand"
> > > > > > > +  (ior (and (match_operand 0 "register_operand")
> > > > > > > +(match_test "TARGET_AVX2"))
> > > > > > > +   (match_code "const_int")))
> > > > > > > +
> > > > > > >  ;; True for registers, or 1 or -1.  Used to optimize double-word 
> > > > > > > shifts.
> > > > > > >  (define_predicate "reg_or_pm1_operand"
> > > > > > >(ior (match_operand 0 "register_operand")
> > > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > > > index b153a87fb98..1798e5dea75 100644
> > > > > > > --- a/gcc/config/i386/sse.md
> > > > > > > +++ b/gcc/config/i386/sse.md
> > > > > > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > > > > > >  (define_expand "vec_set"
> > > > > > >[(match_operand:V 0 "register_operand")
> > > > > > > (match_operand: 1 "register_operand")
> > > > > > > -   (match_operand 2 "const_int_operand")]
> > > > > > > +   (match_operand 2 "vec_setm_operand")]
> > > > > > >
> > > > > > > You need to specify a mode, otherwise a register of any mode can 
> > > > > > > pass here.
> > > > > > >
> > > > > > Yes, theoretically, we only accept integer types. But in 
> > > > > > can_vec_set_var_idx_p
> > > > > > cut
> > > > > > ---
> > > > > > bool
> > > > > > can_vec_set_var_idx_p (machine_mode vec_mode)
> > > > > > {
> > > > > >   if (!VECTOR_MODE_P (vec_mode))
> > > > > > return false;
> > > > > >
> > > > > >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> > > > > >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> > > > > >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> > > > > >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > > > >
> > > > > >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> > > > > >
> > > > > >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 
> > > > > > 0, reg1)
> > > > > >  && insn_operand_matches (icode, 1, reg2)
> > > > > >  && insn_operand_matches (icode, 2, reg3);
> > > > > > }
> > > > > > ---
> > > > > >
> > > > > > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > > > > > fail insn_operand_matches (icode, 2, reg3)
> > > > > > ---
> > > > > > (gdb) p insn_operand_matches(icode,2,reg3)
> > > > > > $5 = false
> > > > > > (gdb)
> > > > > > ---
> > > > > >
> > > > > > Maybe we need to change
> > > > > >
> > > > > > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > > > >
> > > > > > to
> > > > > >
> > > > > > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> > > > > >
> > > > > > cc Richard Biener, any thoughts?
> > > > >
> > > > > There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> > > > > specify SImode for operand 2 in vec_setM pattern and allow register
> > > > > operands. I wonder if and how they manage to generate the pattern.
> > > > >
> > > > > Uros.
> > > >
> > > > Variable index vec_set is enabled by r11-3486, about two months ago in
> > > > [1]. But for the upper two targets, the codes are already there since
> > > > GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
> > > > those codes are for [1].
> > > >
> > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > > Correct [1] 
> > > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> > in https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554592.html
> >
> > It says
> >
> > > >> +can_vec_set_var_idx_p (enum tree_code code, machine_mode vec_mode,
> > > >> +  machine_mode value_mode, machine_mode idx_mode)
> > > >
> > > > toplevel co

Re: [gcc r9-8794] aarch64: Clear canary value after stack_protect_test [PR96191]

2020-11-12 Thread Sebastian Pop via Gcc-patches
Hi,

On Fri, Aug 7, 2020 at 6:18 AM Richard Sandiford  wrote:
>
> https://gcc.gnu.org/g:5380912a17ea09a8996720fb62b1a70c16c8f9f2
>
> commit r9-8794-g5380912a17ea09a8996720fb62b1a70c16c8f9f2
> Author: Richard Sandiford 
> Date:   Fri Aug 7 12:17:37 2020 +0100

could you please also apply this change to the gcc-8 branch?

Thanks,
Sebastian

>
> aarch64: Clear canary value after stack_protect_test [PR96191]
>
> The stack_protect_test patterns were leaving the canary value in the
> temporary register, meaning that it was often still in registers on
> return from the function.  An attacker might therefore have been
> able to use it to defeat stack-smash protection for a later function.
>
> gcc/
> PR target/96191
> * config/aarch64/aarch64.md (stack_protect_test_): Set the
> CC register directly, instead of a GPR.  Replace the original GPR
> destination with an extra scratch register.  Zero out operand 3
> after use.
> (stack_protect_test): Update accordingly.
>
> gcc/testsuite/
> PR target/96191
> * gcc.target/aarch64/stack-protector-1.c: New test.
> * gcc.target/aarch64/stack-protector-2.c: Likewise.
>
> (cherry picked from commit fe1a26429038d7cd17abc53f96a6f3e2639b605f)
>
> Diff:
> ---
>  gcc/config/aarch64/aarch64.md  | 34 -
>  .../gcc.target/aarch64/stack-protector-1.c | 89 
> ++
>  .../gcc.target/aarch64/stack-protector-2.c |  6 ++
>  3 files changed, 110 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index ed8cf8ecea1..9598bac387f 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6985,10 +6985,8 @@
> (match_operand 2)]
>""
>  {
> -  rtx result;
>machine_mode mode = GET_MODE (operands[0]);
>
> -  result = gen_reg_rtx(mode);
>if (aarch64_stack_protector_guard != SSP_GLOBAL)
>{
>  /* Generate access through the system register. The
> @@ -7013,29 +7011,27 @@
>  operands[1] = gen_rtx_MEM (mode, tmp_reg);
>}
>emit_insn ((mode == DImode
> - ? gen_stack_protect_test_di
> - : gen_stack_protect_test_si) (result,
> -   operands[0],
> -   operands[1]));
> -
> -  if (mode == DImode)
> -emit_jump_insn (gen_cbranchdi4 (gen_rtx_EQ (VOIDmode, result, 
> const0_rtx),
> -   result, const0_rtx, operands[2]));
> -  else
> -emit_jump_insn (gen_cbranchsi4 (gen_rtx_EQ (VOIDmode, result, 
> const0_rtx),
> -   result, const0_rtx, operands[2]));
> +? gen_stack_protect_test_di
> +: gen_stack_protect_test_si) (operands[0], operands[1]));
> +
> +  rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
> +  emit_jump_insn (gen_condjump (gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
> +   cc_reg, operands[2]));
>DONE;
>  })
>
> +;; DO NOT SPLIT THIS PATTERN.  It is important for security reasons that the
> +;; canary value does not live beyond the end of this sequence.
>  (define_insn "stack_protect_test_"
> -  [(set (match_operand:PTR 0 "register_operand" "=r")
> -   (unspec:PTR [(match_operand:PTR 1 "memory_operand" "m")
> -(match_operand:PTR 2 "memory_operand" "m")]
> -UNSPEC_SP_TEST))
> +  [(set (reg:CC CC_REGNUM)
> +   (unspec:CC [(match_operand:PTR 0 "memory_operand" "m")
> +   (match_operand:PTR 1 "memory_operand" "m")]
> +  UNSPEC_SP_TEST))
> +   (clobber (match_scratch:PTR 2 "=&r"))
> (clobber (match_scratch:PTR 3 "=&r"))]
>""
> -  "ldr\t%3, %1\;ldr\t%0, %2\;eor\t%0, %3, %0"
> -  [(set_attr "length" "12")
> +  "ldr\t%2, %0\;ldr\t%3, %1\;subs\t%2, %2, %3\;mov\t%3, 0"
> +  [(set_attr "length" "16")
> (set_attr "type" "multiple")])
>
>  ;; Write Floating-point Control Register.
> diff --git a/gcc/testsuite/gcc.target/aarch64/stack-protector-1.c 
> b/gcc/testsuite/gcc.target/aarch64/stack-protector-1.c
> new file mode 100644
> index 000..73e83bc413f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/stack-protector-1.c
> @@ -0,0 +1,89 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fstack_protector } */
> +/* { dg-options "-fstack-protector-all -O2" } */
> +
> +extern volatile long *stack_chk_guard_ptr;
> +
> +volatile long *
> +get_ptr (void)
> +{
> +  return stack_chk_guard_ptr;
> +}
> +
> +void __attribute__ ((noipa))
> +f (void)
> +{
> +  volatile int x;
> +  x = 1;
> +  x += 1;
> +}
> +
> +#define CHECK(REG) "\tcmp\tx0, " #REG "\n\tbeq\t1f\n"
> +
> +asm (
> +"  .pushsection .data\n"
> +"  .align  3\n"
> +"  .globl  stack_chk_guard_ptr\n"
> +"stack_chk_guard_ptr:\n"
> +#if __ILP32__
> +"  .word   __stack_chk_guard\n"
> +#else
> +"  

Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-12 Thread Uros Bizjak via Gcc-patches
On Thu, Nov 12, 2020 at 6:51 PM Uros Bizjak  wrote:

> > > > Yes, removed 'code' and value_mode by checking VECTOR_MODE_P and use 
> > > > GET_MODE_INNER
> > > > for value_mode.  ".md expanders" shall support for integer constants 
> > > > index mode, but
> > > > I guess they shouldn't be expanded by IFN as this function is for 
> > > > variable index
> > > > insert only?  Anyway, the v3 patch used VOIDmode check...
> >
> > I'm not sure what best to do here, as said accepting "any" (integer) mode as
> > input is desirable (SImode, DImode but eventually also smaller modes).  How
> > that can be best achieved I don't know.
>
> I was expecting something similar to how extvM/extzvM operands are
> handled here. We have:
>
> Operands 0 and 1 both have mode M.  Operands 2 and 3 have a
> target-specific mode.
>
> Please note operands 2 and 3 having a "target-specific" mode, handled
> in optabs-query.c as:
>
>   machine_mode struct_mode = data->operand[struct_op].mode;
>   if (struct_mode == VOIDmode)
> struct_mode = word_mode;
>   if (mode != struct_mode)
> return false;
>
> > Why's not specifying any mode in the patter no good?  Just make sure you
> > appropriately extend/subreg it?  We can make sure it will be an integer
> > mode in the expander itself.
>
> IIRC, having known mode, expanders can use create_convert_operand_to,
> and the middle-end will do the above by itself. Also note that at
> least two targets specify SImode, so register operands are currently
> ineffective there.

On a related note, the pattern is currently expanded as (see
store_bit_field_1 in expmed.c):

  create_fixed_operand (&ops[0], op0);
  create_input_operand (&ops[1], value, innermode);
  create_integer_operand (&ops[2], pos);

I don't think calling create_integer_operand on register operand is
correct. The function comment says:

/* Make OP describe an input operand that has value INTVAL and that has
   no inherent mode.  This function should only be used for operands that
   are always expand-time constants.  The backend may request that INTVAL
   be copied into a different kind of rtx, but it must specify the mode
   of that rtx if so.  */

Uros.


[PATCH] c++: Don't form a templated TARGET_EXPR in finish_compound_literal

2020-11-12 Thread Patrick Palka via Gcc-patches
The atom_cache in normalize_atom relies on the assumption that two
equivalent (templated) trees (in the sense of cp_tree_equal) must use
the same template parameters (according to find_template_parameters).

This assumption unfortunately doesn't always hold for TARGET_EXPRs,
because cp_tree_equal ignores an artificial target of a TARGET_EXPR, but
find_template_parameters walks this target (and its DECL_CONTEXT).

Hence two TARGET_EXPRs built by force_target_expr with the same
initializer but under different settings of current_function_decl may
compare equal according to cp_tree_equal, but find_template_parameters
returns a different set of template parameters for them.  This breaks
the below testcase because during normalization we build two such
TARGET_EXPRs (one under current_function_decl=f and another under =g),
and then use the same ATOMIC_CONSTR for the two corresponding atoms,
leading to a crash during satisfaction of g's associated constraints.

This patch works around this assumption violation by removing the source
of these templated TARGET_EXPRs.  The relevant call to get_target_expr was
added in r9-6043, but it seems it's no longer necessary (according to
https://gcc.gnu.org/pipermail/gcc-patches/2019-February/517323.html, the
call was added in order to avoid regressing on initlist109.C at the time).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* semantics.c (finish_compound_literal): Don't wrap the original
compound literal in a TARGET_EXPR when inside a template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-decltype3.C: New test.
---
 gcc/cp/semantics.c  |  7 +--
 gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C | 15 +++
 2 files changed, 16 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 33d715edaec..172286922e7 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3006,12 +3006,7 @@ finish_compound_literal (tree type, tree 
compound_literal,
 
   /* If we're in a template, return the original compound literal.  */
   if (orig_cl)
-{
-  if (!VECTOR_TYPE_P (type))
-   return get_target_expr_sfinae (orig_cl, complain);
-  else
-   return orig_cl;
-}
+return orig_cl;
 
   if (TREE_CODE (compound_literal) == CONSTRUCTOR)
 {
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C
new file mode 100644
index 000..837855ce8ac
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C
@@ -0,0 +1,15 @@
+// { dg-do compile { target c++20 } }
+
+template  concept C = requires(T t) { t; };
+
+template  using A = decltype((T{}, int{}));
+
+template  concept D = C>;
+
+template  void f() requires D;
+template  void g() requires D;
+
+void h() {
+  f();
+  g();
+}
-- 
2.29.2.260.ge31aba42fb



Re: [RFC][PR target PR90000] (rs6000) Compile time hog w/impossible asm constraint lra loop

2020-11-12 Thread Segher Boessenkool
On Thu, Nov 12, 2020 at 09:15:11AM -0700, Jeff Law wrote:
> > void foo (void)
> > {
> >   register float __attribute__ ((mode(SD))) r31 __asm__ ("r31");
> >   register float __attribute__ ((mode(SD))) fr1 __asm__ ("fr1");
> >
> >   __asm__ ("#" : "=d" (fr1));
> >   r31 = fr1;
> >   __asm__ ("#" : : "r" (r31));
> > }
> 
> Looking at this again after many months away, I wonder the real problem
> is the reloads we have to generate for copies to/from he fr1 local
> variable, which is bound to hard reg fr1 rather than the asm statements
> themselves.  It's not clear to me from the BZ and I don't have a PPC
> cross handy to look directly.

We should never do a reload of a (local) register variable.
Unfortunately we cannot currently tell during reload that something is
one!

See also PR97708, and many more, going many years back.


Segher


Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-12 Thread Uros Bizjak via Gcc-patches
On Thu, Nov 12, 2020 at 7:26 PM Uros Bizjak  wrote:
>
> On Thu, Nov 12, 2020 at 6:51 PM Uros Bizjak  wrote:
>
> > > > > Yes, removed 'code' and value_mode by checking VECTOR_MODE_P and use 
> > > > > GET_MODE_INNER
> > > > > for value_mode.  ".md expanders" shall support for integer constants 
> > > > > index mode, but
> > > > > I guess they shouldn't be expanded by IFN as this function is for 
> > > > > variable index
> > > > > insert only?  Anyway, the v3 patch used VOIDmode check...
> > >
> > > I'm not sure what best to do here, as said accepting "any" (integer) mode 
> > > as
> > > input is desirable (SImode, DImode but eventually also smaller modes).  
> > > How
> > > that can be best achieved I don't know.
> >
> > I was expecting something similar to how extvM/extzvM operands are
> > handled here. We have:
> >
> > Operands 0 and 1 both have mode M.  Operands 2 and 3 have a
> > target-specific mode.
> >
> > Please note operands 2 and 3 having a "target-specific" mode, handled
> > in optabs-query.c as:
> >
> >   machine_mode struct_mode = data->operand[struct_op].mode;
> >   if (struct_mode == VOIDmode)
> > struct_mode = word_mode;
> >   if (mode != struct_mode)
> > return false;
> >
> > > Why's not specifying any mode in the patter no good?  Just make sure you
> > > appropriately extend/subreg it?  We can make sure it will be an integer
> > > mode in the expander itself.
> >
> > IIRC, having known mode, expanders can use create_convert_operand_to,
> > and the middle-end will do the above by itself. Also note that at
> > least two targets specify SImode, so register operands are currently
> > ineffective there.
>
> On a related note, the pattern is currently expanded as (see
> store_bit_field_1 in expmed.c):
>
>   create_fixed_operand (&ops[0], op0);
>   create_input_operand (&ops[1], value, innermode);
>   create_integer_operand (&ops[2], pos);
>
> I don't think calling create_integer_operand on register operand is
> correct. The function comment says:
>
> /* Make OP describe an input operand that has value INTVAL and that has
>no inherent mode.  This function should only be used for operands that
>are always expand-time constants.  The backend may request that INTVAL
>be copied into a different kind of rtx, but it must specify the mode
>of that rtx if so.  */

Ah, sorry - variable vec_set takes a different path, please disregard
my last message.

Uros.


Re: SLS Mitigation patches backported for GCC9

2020-11-12 Thread Sebastian Pop via Gcc-patches
Hi,

could the SLS Mitigation patches be back-ported to the gcc-8 branch?

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=dc586a74922 aarch64:
Introduce SLS mitigation for RET and BR instructions
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=20da13e395b aarch64:
New Straight Line Speculation (SLS) mitigation flags

Thanks,
Sebastian

On Tue, Aug 4, 2020 at 3:34 AM Kyrylo Tkachov  wrote:
>
> Hi Matthew,
>
> > -Original Message-
> > From: Matthew Malcomson 
> > Sent: 24 July 2020 17:03
> > To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> > Cc: Richard Earnshaw ; Ross Burton
> > ; Richard Sandiford 
> > Subject: Re: SLS Mitigation patches backported for GCC9
> >
> > On 24/07/2020 12:01, Kyrylo Tkachov wrote:
> > > Hi Matthew,
> > >
> > >> -Original Message-
> > >> From: Matthew Malcomson 
> > >> Sent: 21 July 2020 16:16
> > >> To: gcc-patches@gcc.gnu.org
> > >> Cc: Richard Earnshaw ; Kyrylo Tkachov
> > >> ; Ross Burton 
> > >> Subject: SLS Mitigation patches backported for GCC9
> > >>
> > >> Hello,
> > >>
> > >> Eventually we will want to backport the SLS patches to older branches.
> > >>
> > >> When the GCC10 release is unfrozen we will work on getting the same
> > >> patches
> > >> already posted backported to that branch.  The patches already posted on
> > >> the
> > >> mailing list apply cleanly to the current releases/gcc-10 branch.
> > >>
> > >> I've heard interest in having the GCC 9 patches, so I'm posting the
> > modified
> > >> versions upstream sooner than otherwise.
> > >
> > > I'd say let's go ahead with the GCC 10 patches (assuming testing works out
> > well on there).
> > > For the GCC 9 patches it would be useful if you included a bit of text of 
> > > how
> > they differ from the GCC 10/11 patches.
> > > This would speed up the technical review.
> > > Thanks,
> > > Kyrill
> > >
> > >>
> > >> Cheers,
> > >> Matthew
> > >>
> > >> Entire patch series attached to cover letter.
> >
> > Below were the only two "interesting" hunks that failed to apply after
> > `patch -p1`.
> >
> > The differences causing these were:
> > - in GCC-9 the `retab` instruction wasn't in the "do_return" pattern.
> > - `simple_return` had "aarch64_use_simple_return_insn_p ()" as a
> > condition.
> >
> >
>
> Thanks, the backports to GCC 10 and GCC 9 are okay, let's go ahead with them.
> Kyrill
>
> >
> >
> > --- gcc/config/aarch64/aarch64.md
> > +++ gcc/config/aarch64/aarch64.md
> > @@ -863,18 +882,23 @@
> > [(return)]
> > ""
> > {
> > +const char *ret = NULL;
> >   if (aarch64_return_address_signing_enabled ()
> >  && TARGET_ARMV8_3
> >  && !crtl->calls_eh_return)
> > {
> >  if (aarch64_ra_sign_key == AARCH64_KEY_B)
> > - return "retab";
> > + ret = "retab";
> >  else
> > - return "retaa";
> > + ret = "retaa";
> > }
> > -return "ret";
> > +else
> > +  ret = "ret";
> > +output_asm_insn (ret, operands);
> > +return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
> > }
> > -  [(set_attr "type" "branch")]
> > +  [(set_attr "type" "branch")
> > +   (set_attr "sls_length" "retbr")]
> >   )
> >
> >   (define_expand "return"
> > @@ -886,8 +910,12 @@
> >   (define_insn "simple_return"
> > [(simple_return)]
> > ""
> > -  "ret"
> > -  [(set_attr "type" "branch")]
> > +  {
> > +output_asm_insn ("ret", operands);
> > +return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
> > +  }
> > +  [(set_attr "type" "branch")
> > +   (set_attr "sls_length" "retbr")]
> >   )
> >
> >   (define_insn "*cb1"


Re: [PATCH] c++: Don't form a templated TARGET_EXPR in finish_compound_literal

2020-11-12 Thread Marek Polacek via Gcc-patches
On Thu, Nov 12, 2020 at 01:27:23PM -0500, Patrick Palka wrote:
> The atom_cache in normalize_atom relies on the assumption that two
> equivalent (templated) trees (in the sense of cp_tree_equal) must use
> the same template parameters (according to find_template_parameters).
> 
> This assumption unfortunately doesn't always hold for TARGET_EXPRs,
> because cp_tree_equal ignores an artificial target of a TARGET_EXPR, but
> find_template_parameters walks this target (and its DECL_CONTEXT).
> 
> Hence two TARGET_EXPRs built by force_target_expr with the same
> initializer but under different settings of current_function_decl may
> compare equal according to cp_tree_equal, but find_template_parameters
> returns a different set of template parameters for them.  This breaks
> the below testcase because during normalization we build two such
> TARGET_EXPRs (one under current_function_decl=f and another under =g),
> and then use the same ATOMIC_CONSTR for the two corresponding atoms,
> leading to a crash during satisfaction of g's associated constraints.
> 
> This patch works around this assumption violation by removing the source
> of these templated TARGET_EXPRs.  The relevant call to get_target_expr was
> added in r9-6043, but it seems it's no longer necessary (according to
> https://gcc.gnu.org/pipermail/gcc-patches/2019-February/517323.html, the
> call was added in order to avoid regressing on initlist109.C at the time).
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?

Looks OK to me, thanks!

> gcc/cp/ChangeLog:
> 
>   * semantics.c (finish_compound_literal): Don't wrap the original
>   compound literal in a TARGET_EXPR when inside a template.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-decltype3.C: New test.
> ---
>  gcc/cp/semantics.c  |  7 +--
>  gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C | 15 +++
>  2 files changed, 16 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C
> 
> diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
> index 33d715edaec..172286922e7 100644
> --- a/gcc/cp/semantics.c
> +++ b/gcc/cp/semantics.c
> @@ -3006,12 +3006,7 @@ finish_compound_literal (tree type, tree 
> compound_literal,
>  
>/* If we're in a template, return the original compound literal.  */
>if (orig_cl)
> -{
> -  if (!VECTOR_TYPE_P (type))
> - return get_target_expr_sfinae (orig_cl, complain);
> -  else
> - return orig_cl;
> -}
> +return orig_cl;
>  
>if (TREE_CODE (compound_literal) == CONSTRUCTOR)
>  {
> diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C 
> b/gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C
> new file mode 100644
> index 000..837855ce8ac
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C
> @@ -0,0 +1,15 @@
> +// { dg-do compile { target c++20 } }
> +
> +template  concept C = requires(T t) { t; };
> +
> +template  using A = decltype((T{}, int{}));
> +
> +template  concept D = C>;
> +
> +template  void f() requires D;
> +template  void g() requires D;
> +
> +void h() {
> +  f();
> +  g();
> +}
> -- 
> 2.29.2.260.ge31aba42fb
> 

Marek



Re: [PATCH][RFC] Make mingw-w64 printf/scanf attribute alias to ms_printf/ms_scanf only for C89

2020-11-12 Thread Joseph Myers
I'd expect these patches to include updates to the gcc.dg/format/ms_*.c 
tests to reflect the changed semantics (or new tests there if some of the 
changes don't result in any failures in the existing tests).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2020-11-12 Thread Antony Polukhin via Gcc-patches
Final bits for libstdc/71579

std::common_type assertions attempt to give a proper 'required from
here' hint for user code, do not bring many changes to the
implementation and check all the template parameters for completeness.
In some cases the type could be checked for completeness more than
once. This seems to be unsolvable due to the fact that
std::common_type could be specialized by the user, so we have to call
std::common_type recursively, potentially repeating the check for the
first type.

std::common_reference assertions make sure that we detect incomplete
types even if the user specialized the std::basic_common_reference.

Changelog:

2020-11-12  Antony Polukhin  
PR libstdc/71579
* include/std/type_traits (is_convertible, is_nothrow_convertible)
(common_type, common_reference): Add static_asserts
to make sure that the arguments of the type traits are not misused
with incomplete types.
* testsuite/20_util/common_reference/incomplete_basic_common_neg.cc:
New test.
* testsuite/20_util/common_reference/incomplete_neg.cc: New test.
* testsuite/20_util/common_type/incomplete_neg.cc: New test.
* testsuite/20_util/common_type/requirements/sfinae_friendly_1.cc: Remove
SFINAE tests on incomplete types.
* testsuite/20_util/is_convertible/incomplete_neg.cc: New test.
* testsuite/20_util/is_nothrow_convertible/incomplete_neg.cc: New test.



--
Best regards,
Antony Polukhin
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 34e068b..00fa7f5 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1406,12 +1406,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct is_convertible
 : public __is_convertible_helper<_From, _To>::type
-{ };
+{
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_From>{}),
+   "first template argument must be a complete class or an unbounded 
array");
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_To>{}),
+   "second template argument must be a complete class or an unbounded 
array");
+};
 
   // helper trait for unique_ptr, shared_ptr, and span
   template
 using __is_array_convertible
-  = is_convertible<_FromElementType(*)[], _ToElementType(*)[]>;
+  = typename __is_convertible_helper<
+   _FromElementType(*)[], _ToElementType(*)[]>::type;
 
   template, is_function<_To>,
@@ -1454,7 +1460,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct is_nothrow_convertible
 : public __is_nt_convertible_helper<_From, _To>::type
-{ };
+{
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_From>{}),
+   "first template argument must be a complete class or an unbounded 
array");
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_To>{}),
+   "second template argument must be a complete class or an unbounded 
array");
+};
 
   /// is_nothrow_convertible_v
   template
@@ -2239,7 +2250,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct common_type<_Tp1, _Tp2>
 : public __common_type_impl<_Tp1, _Tp2>::type
-{ };
+{
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp1>{}),
+   "each argument type must be a complete class or an unbounded array");
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp2>{}),
+   "each argument type must be a complete class or an unbounded array");
+};
 
   template
 struct __common_type_pack
@@ -2253,7 +2269,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct common_type<_Tp1, _Tp2, _Rp...>
 : public __common_type_fold,
__common_type_pack<_Rp...>>
-{ };
+{
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp1>{}),
+   "first argument type must be a complete class or an unbounded array");
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp2>{}),
+   "second argument type must be a complete class or an unbounded array");
+#ifdef __cpp_fold_expressions
+  static_assert((std::__is_complete_or_unbounded(
+   __type_identity<_Rp>{}) && ...),
+   "each argument type must be a complete class or an unbounded array");
+#endif
+};
 
   // Let C denote the same type, if any, as common_type_t.
   // If there is such a type C, type shall denote the same type, if any,
@@ -3315,9 +3341,10 @@ template 
 
   // If A and B are both rvalue reference types, ...
   template
-struct __common_ref_impl<_Xp&&, _Yp&&,
-  _Require>,
-  is_convertible<_Yp&&, __common_ref_C<_Xp, _Yp
+struct __common_ref_impl<_Xp&&, _Yp&&, _Require<
+  typename __is_convertible_helper<_Xp&&, __common_ref_C<_Xp, _Yp>>::type,
+  typename __is_convertible_helper<_Yp&&, __common_ref_C<_Xp, _Yp>>::type
+>>
 { using type = __common_ref_C<_Xp, _Yp>; };
 
   // let D be COMMON-REF(const X&, Y&)
@@ -3326,8 +33

Re: [Patch] Fortran: improve location data for OpenACC/OpenMP directives [PR97782]

2020-11-12 Thread Thomas Schwinge
Hi!

On 2020-11-12T12:45:24+0100, Tobias Burnus  wrote:
> For code like
>   !$acc kernels
>  ... a lot of loops and other code
>   !$acc end kernels
>
> gfortran generates
>#pragma ..._kernels
>  {
>... lot of code
>  }
>
> As the PR shows, the location associated with the #pragma
> is not the 'acc kernels' line but the one near the 'acc end kernel'
> line.
>
> The reason is that [...]

> This patch [...]

> In principle, it should also have an effect on warnings (if there are
> any)

..., and there are -- one, at least (and somewhat bogus, but still).  ;-)
I've thus pushed "Adjust 'libgomp.oacc-fortran/attach-descriptor-1.f90'
for improved location information" to master branch in commit
9106c51e57c06e88a0dddf994fb5432b4bbe68c0, see attached.  (Not (yet)
relevant for releases/gcc-10 branch; the commit introducing that testcase
isn't there yet -- that's to be discussed in a different thread.)


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 9106c51e57c06e88a0dddf994fb5432b4bbe68c0 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 12 Nov 2020 20:07:25 +0100
Subject: [PATCH] Adjust 'libgomp.oacc-fortran/attach-descriptor-1.f90' for
 improved location information

Fix-up for commit b71ff8c15f5a7d6b1cc1524b4d27843f0d88dbda "Fortran: improve
location data for OpenACC/OpenMP directives [PR97782]".

	libgomp/
	PR fortran/97782
	* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Adjust.
---
 libgomp/testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90 | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90
index 960b9f94507..2701192e37d 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90
@@ -42,9 +42,8 @@ subroutine test(variant)
  stop 1
   end if
 
-  ! FIXME: This warning is emitted on the wrong line number.
-  ! { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } 52 }
   !$acc serial present(myvar%arr2)
+  ! { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } .-1 }
   do i=1,10
 myvar%arr1(i) = i + variant
 myvar%arr2(i) = i - variant
-- 
2.17.1



Re: [PATCH,wwwdocs] gcc-11/changes: Mention Intel AVX-VNNI

2020-11-12 Thread Gerald Pfeifer
On Wed, 11 Nov 2020, Hongtao Liu via Gcc-patches wrote:
> +  New ISA extension support for Intel AVX-VNNI was added to GCC.

More for the future (i.e., no need to change that now): I suggest
to skip "to GCC" in cases like this, since this is our context to
begin with. 

Gerald


Re: PowerPC: Use __float128 instead of __ieee128 in tests.

2020-11-12 Thread Segher Boessenkool
Hi,

On Thu, Oct 22, 2020 at 06:12:31PM -0400, Michael Meissner wrote:
> Two of the tests used the __ieee128 keyword instead of __float128.  This
> patch changes those cases to use the official keyword.

What is "official" about that?

Why make this change at all?  __ieee128 should work as well!  Did you
see failures without this patch?  Thos need fixing, then.


Segher


[committed] wwwdocs: Editorial changes around x86-64 ISA extensions

2020-11-12 Thread Gerald Pfeifer
Per our discussion on the list (plus a grammer improvement in a
section above).

One question: why are the ISA extension lists not alphabetically
sorted?  Wouldn't that be beneficial for users?  Easier to find
something and also easier to compare?

Gerald

---
 htdocs/gcc-11/changes.html | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index fc4c74f4..106db8e9 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -265,7 +265,8 @@ a work-in-progress.
   
   New ISA extension support for Intel AMX-TILE, AMX-INT8, AMX-BF16 was
   added to GCC. AMX-TILE, AMX-INT8, AMX-BF16 intrinsics are available
-  via the -mamx-tile, -mamx-int8, -mamx-bf16 compiler switch.
+  via the -mamx-tile, -mamx-int8, -mamx-bf16 compiler
+  switches.
   
   New ISA extension support for Intel AVX-VNNI was added to GCC.
   AVX-VNNI intrinsics are available via the -mavxvnni
@@ -273,14 +274,14 @@ a work-in-progress.
   
   GCC now supports the Intel CPU named Sapphire Rapids through
 -march=sapphirerapids.
-The switch enables the MOVDIRI MOVDIR64B AVX512VP2INTERSECT ENQCMD CLDEMOTE
-SERIALIZE PTWRITE WAITPKG TSXLDTRK AMT-TILE AMX-INT8 AMX-BF16 AVX-VNNI
-ISA extensions.
+The switch enables the MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, ENQCMD,
+CLDEMOTE, SERIALIZE, PTWRITE, WAITPKG, TSXLDTRK, AMT-TILE, AMX-INT8,
+AMX-BF16, and AVX-VNNI ISA extensions.
   
   GCC now supports the Intel CPU named Alderlake through
 -march=alderlake.
-The switch enables the CLDEMOTE PTWRITE WAITPKG SERIALIZE KEYLOCKER 
AVX-VNNI
-HRESET ISA extensions.
+The switch enables the CLDEMOTE, PTWRITE, WAITPKG, SERIALIZE, KEYLOCKER,
+AVX-VNNI, and HRESET ISA extensions.
   
 
 
-- 
2.29.2


RE: gcc-wwwdocs branch master updated. 88e29096c36837553fc841bd1fa5df6caa776b44

2020-11-12 Thread Gerald Pfeifer
On Fri, 6 Nov 2020, Liu, Hongtao wrote:
> I realize you're talking about the patch for gcc-wwwdocs.
> No, I didn't send out a patch, sorry for that, will do it in further commit.

Thanks - saw that. Jeff just beat me to it. :-)

Gerald


[2/3][vect] Add widening add, subtract vect patterns

2020-11-12 Thread Joel Hutton via Gcc-patches
Hi all,

This patch adds widening add and widening subtract patterns to 
tree-vect-patterns.

All 3 patches together bootstrapped and regression tested on aarch64.

gcc/ChangeLog:

2020-11-12  Joel Hutton  

        * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
        * optabs-tree.c (optab_for_tree_code): optabs for widening 
adds,subtracts
        * optabs.def (OPTAB_D): define vectorized widen add, subtracts
        * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, 
subtracts
        * tree-inline.c (estimate_operator_cost): Add case for widening adds, 
subtracts
        * tree-vect-generic.c (expand_vector_operations_1): Add case for 
widening adds, subtracts
        * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
        (vect_recog_widen_sub_pattern): New recog pattern
        (vect_recog_average_pattern): Update widened add code
        (vect_recog_average_pattern): Update widened add code
        * tree-vect-stmts.c (vectorizable_conversion): Add case for widened 
add, subtract
        (supportable_widening_operation): Add case for widened add, subtract
        * tree.def (WIDEN_ADD_EXPR): New tree code
        (WIDEN_SUB_EXPR): New tree code
        (VEC_WIDEN_ADD_HI_EXPR): New tree code
        (VEC_WIDEN_ADD_LO_EXPR): New tree code
        (VEC_WIDEN_SUB_HI_EXPR): New tree code
        (VEC_WIDEN_SUB_LO_EXPR): New tree code

gcc/testsuite/ChangeLog:

2020-11-12  Joel Hutton  

        * gcc.target/aarch64/vect-widen-add.c: New test.
        * gcc.target/aarch64/vect-widen-sub.c: New test.


Ok for trunk?
From e0c10ca554729b9e6d58dbd3f18ba72b2c3ee8bc Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 9 Nov 2020 15:44:18 +
Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns

Add widening add, subtract patterns to tree-vect-patterns.
Add aarch64 tests for patterns.

fix sad
---
 gcc/expr.c|  6 ++
 gcc/optabs-tree.c | 17 
 gcc/optabs.def|  8 ++
 .../gcc.target/aarch64/vect-widen-add.c   | 90 +++
 .../gcc.target/aarch64/vect-widen-sub.c   | 90 +++
 gcc/tree-cfg.c|  8 ++
 gcc/tree-inline.c |  6 ++
 gcc/tree-vect-generic.c   |  4 +
 gcc/tree-vect-patterns.c  | 32 +--
 gcc/tree-vect-stmts.c | 15 +++-
 gcc/tree.def  |  6 ++
 11 files changed, 276 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c

diff --git a/gcc/expr.c b/gcc/expr.c
index ae16f07775870792729e3805436d7f2debafb6ca..ffc8aed5296174066849d9e0d73b1c352c20fd9e 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9034,6 +9034,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	  target, unsignedp);
   return target;
 
+case WIDEN_ADD_EXPR:
+case WIDEN_SUB_EXPR:
 case WIDEN_MULT_EXPR:
   /* If first operand is constant, swap them.
 	 Thus the following special case checks need only
@@ -9754,6 +9756,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
   }
 
+case VEC_WIDEN_ADD_HI_EXPR:
+case VEC_WIDEN_ADD_LO_EXPR:
+case VEC_WIDEN_SUB_HI_EXPR:
+case VEC_WIDEN_SUB_LO_EXPR:
 case VEC_WIDEN_MULT_HI_EXPR:
 case VEC_WIDEN_MULT_LO_EXPR:
 case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 4dfda756932de1693667c39c6fabed043b20b63b..009dccfa3bd298bca7b3b45401a4cc2acc90ff21 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -170,6 +170,23 @@ optab_for_tree_code (enum tree_code code, const_tree type,
   return (TYPE_UNSIGNED (type)
 	  ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
 
+case VEC_WIDEN_ADD_LO_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_uaddl_lo_optab  : vec_widen_saddl_lo_optab);
+
+case VEC_WIDEN_ADD_HI_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_uaddl_hi_optab  : vec_widen_saddl_hi_optab);
+
+case VEC_WIDEN_SUB_LO_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_usubl_lo_optab  : vec_widen_ssubl_lo_optab);
+
+case VEC_WIDEN_SUB_HI_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
+
+
 case VEC_UNPACK_HI_EXPR:
   return (TYPE_UNSIGNED (type)
 	  ? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..a97cdb360781ca9c743e2991422c600626c75aa5 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -383,6 +383,14 @@ OPTAB_D (vec_widen_smult_even_optab, "vec_widen_smult_even_$a")
 OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
 OPTAB_D (vec_widen_smult_lo_optab, "ve

[1/3][aarch64] Add aarch64 support for vec_widen_add, vec_widen_sub patterns

2020-11-12 Thread Joel Hutton via Gcc-patches
Hi all,

This patch adds backend patterns for vec_widen_add, vec_widen_sub on aarch64.

All 3 patches together bootstrapped and regression tested on aarch64.

Ok for stage 1?

gcc/ChangeLog:

2020-11-12  Joel Hutton  

        * config/aarch64/aarch64-simd.md: New patterns 
vec_widen_saddl_lo/hi_
From 3e47bc562b83417a048e780bcde52fb2c9617df3 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 9 Nov 2020 15:35:57 +
Subject: [PATCH 1/3] [aarch64] Add vec_widen patterns to aarch64

Add widening add and subtract pattrerns to the aarch64
backend.
---
 gcc/config/aarch64/aarch64-simd.md | 94 ++
 1 file changed, 94 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2cf6fe9154a2ee1b21ad9e8e2a6109805022be7f..b4f56a2295926f027bd53e7456eec729af0cd6df 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3382,6 +3382,100 @@
   [(set_attr "type" "neon__long")]
 )
 
+(define_expand "vec_widen_saddl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_saddl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_ssubl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_ssubl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+(define_expand "vec_widen_saddl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_saddl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_ssubl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_ssubl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+(define_expand "vec_widen_uaddl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_uaddl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_usubl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_usubl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_uaddl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_uaddl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_usubl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_usubl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
 
 (define_expand "aarch64_saddl2"
   [(match_operand: 0 "register_operand")
-- 
2.17.1



[3/3][aarch64] Add support for vec_widen_shift pattern

2020-11-12 Thread Joel Hutton via Gcc-patches
Hi all,

This patch adds support in the aarch64 backend for the vec_widen_shift 
vect-pattern and makes a minor mid-end fix to support it.

All 3 patches together bootstrapped and regression tested on aarch64.

Ok for stage 1?

gcc/ChangeLog:

2020-11-12  Joel Hutton  

        * config/aarch64/aarch64-simd.md: vec_widen_lshift_hi/lo patterns
        * tree-vect-stmts.c 
        (vectorizable_conversion): Fix for widen_lshift case

gcc/testsuite/ChangeLog:

2020-11-12  Joel Hutton  

        * gcc.target/aarch64/vect-widen-lshift.c: New test.
From 97af35b2d2a505dcefd8474cbd4bc3441b83ab02 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 12 Nov 2020 11:48:25 +
Subject: [PATCH 3/3] [AArch64][vect] vec_widen_lshift pattern

Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
mid-end.
---
 gcc/config/aarch64/aarch64-simd.md| 66 +++
 .../gcc.target/aarch64/vect-widen-lshift.c| 60 +
 gcc/tree-vect-stmts.c |  9 ++-
 3 files changed, 133 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b4f56a2295926f027bd53e7456eec729af0cd6df..2bb39c530a1a861cb9bd3df0c2943f62bd6153d7 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4711,8 +4711,74 @@
   [(set_attr "type" "neon_sat_shift_reg")]
 )
 
+(define_expand "vec_widen_shiftl_lo_"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+emit_insn (gen_aarch64_shll_internal (operands[0], operands[1],
+		 p, operands[2]));
+DONE;
+  }
+)
+
+(define_expand "vec_widen_shiftl_hi_"
+   [(set (match_operand: 0 "register_operand")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "immediate_operand" "i")]
+			  VSHLL))]
+   "TARGET_SIMD"
+   {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+emit_insn (gen_aarch64_shll2_internal (operands[0], operands[1],
+		  p, operands[2]));
+DONE;
+   }
+)
+
 ;; vshll_n
 
+(define_insn "aarch64_shll_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll\\t%0., %1., %3";
+else
+  return "shll\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_shll2_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll2\\t%0., %1., %3";
+else
+  return "shll2\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
 (define_insn "aarch64_shll_n"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec: [(match_operand:VD_BHSI 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
new file mode 100644
index ..23ed93d1dcbc3ca559efa6708b4ed5855fb6a050
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include 
+#include 
+
+#define ARR_SIZE 1024
+
+/* Should produce an shll,shll2 pair*/
+void sshll_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+{
+foo[i]   = a[i]   << 16;
+foo[i+1] = a[i+1] << 16;
+foo[i+2] = a[i+2] << 16;
+foo[i+3] = a[i+3] << 16;
+}
+}
+
+__attribute__((optimize (0)))
+void sshll_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+{
+foo[i]   = a[i]   << 16;
+foo[i+1] = a[i+1] << 16;
+foo[i+2] = a[i+2] << 16;
+foo[i+3] = a[i+3] << 16;
+}
+}
+
+
+void __attribute__((optimize (0)))
+init(uint16_t *a, uint16_t *b)
+{
+for( int i = 0; i < ARR_SIZE;i++)
+{
+  a[i] = i;
+  b[i] = 2*i;
+}
+}
+
+int __attribute__((optimize (0)))
+main()
+{
+uint32_t foo_arr[ARR_SIZE];
+uint32_t bar_arr[ARR_SIZE];
+uint16_t a[ARR_SIZE];
+uint16_t b[ARR_SIZE];
+
+init(a, b);
+sshll_opt(foo_arr, a, b);
+sshll_nonop

Re: [PATCH 1/3] C-family, Objective-C [1/3] : Implement Wobjc-root-class [PR77404].

2020-11-12 Thread Joseph Myers
On Thu, 12 Nov 2020, Iain Sandoe wrote:

> OK for the c-family parts?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Implementation of asm goto outputs

2020-11-12 Thread Vladimir Makarov via Gcc-patches

  The following patch implements asm goto with outputs.  Kernel
developers several times expressed wish to have this feature. Asm
goto with outputs was implemented in LLVM recently.  This new feature
was presented on 2020 linux plumbers conference
(https://linuxplumbersconf.org/event/7/contributions/801/attachments/659/1212/asm_goto_w__Outputs.pdf)
and 2020 LLVM conference
(https://www.youtube.com/watch?v=vcPD490s-hE).

  The patch permits to use outputs in asm gotos only when LRA is used.
It is problematic to implement it in the old reload pass.  To be
honest it was hard to implement it in LRA too until global live info
update was added to LRA few years ago.

  Different from LLVM asm goto output implementation, you can use
outputs on any path from the asm goto (not only on fallthrough path as
in LLVM).

  The patch removes critical edges on which potentially asm output
reloads could occur (it means you can have several asm gotos using the
same labels and the same outputs).  It is done in IRA as it is
difficult to create new BBs in LRA.  The most of the work (placement
of output reloads in BB destinations of asm goto basic block) is done in
LRA.  When it happens, LRA updates global live info to reflect that
new pseudos live on the BB borders and the old ones do not live there
anymore.

  I tried also approach to split live ranges of pseudos involved in
asm goto outputs to guarantee they get hard registers in IRA. But
this approach did not work as it is difficult to keep this assignment
through all LRA. Also probably it would result in worse code as move
insn coalescing is not guaranteed.

  Asm goto with outputs will not work for targets which were not
converted to LRA (probably some outdated targets as the old reload
pass is not supported anymore).  An error will be generated when the
old reload pass meets asm goto with an output.  A precaution is taken
not to crash compiler after this error.

  The patch is pretty small as all necessary infrastructure was
already implemented, practically in all compiler pipeline.  It did not
required adding new RTL insns opposite to what Google engineers did to
LLVM MIR.

  The patch could be also useful for implementing jump insns with
output reloads in the future (e.g. branch and count insns).

  I think asm gotos with outputs should be considered as an experimental
feature as there are no real usage of this yet.  Earlier adoption of
this feature could help with debugging and hardening the
implementation.

  The patch was successfully bootstrapped and tested on x86-64, ppc64, 
and aarch64.


Are non-RA changes ok in the patch?

2020-11-12  Vladimir Makarov 

    * c/c-parser.c (c_parser_asm_statement): Parse outputs for asm
    goto too.
    * c/c-typeck.c (build_asm_expr): Remove an assert checking output
    absence for asm goto.
    * cfgexpand.c (expand_asm_stmt): Output asm goto with outputs too.
    Place insns after asm goto on edges.
    * cp/parser.c (cp_parser_asm_definition): Parse outputs for asm
    goto too.
    * doc/extend.texi: Reflect the changes in asm goto documentation.
    * gcc/gimple.c (gimple_build_asm_1): Remove an assert checking 
output

    absence for asm goto.
    * gimple.h (gimple_asm_label_op, gimple_asm_set_label_op): Take
    possible asm goto outputs into account.
    * ira.c (ira): Remove critical edges for potential asm goto output
    reloads.
    (ira_nullify_asm_goto): New function.
    * ira.h (ira_nullify_asm_goto): New prototype.
    * lra-assigns.c (lra_split_hard_reg_for): Use ira_nullify_asm_goto.
    Check that splitting is done inside a basic block.
    * lra-constraints.c (curr_insn_transform): Permit output reloads
    for any jump insn.
    * lra-spills.c (lra_final_code_change): Remove USEs added in 
ira for asm gotos.

    * lra.c (lra_process_new_insns): Place output reload insns after
    jumps in the beginning of destination BBs.
    * reload.c (find_reloads): Report error for asm gotos with
    outputs.  Modify them to keep CFG consistency to avoid crashes.
    * tree-into-ssa.c (rewrite_stmt): Don't put debug stmt after asm
    goto.


2020-11-12  Vladimir Makarov  

    * c-c++-common/asmgoto-2.c: Permit output in asm goto.
    * gcc.c-torture/compile/asmgoto-[2345].c: New tests.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index ecc3d2119fa..db719fad58c 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7144,10 +7144,7 @@ c_parser_asm_statement (c_parser *parser)
 	switch (section)
 	  {
 	  case 0:
-	/* For asm goto, we don't allow output operands, but reserve
-	   the slot for a future extension that does allow them.  */
-	if (!is_goto)
-	  outputs = c_parser_asm_operands (parser);
+	outputs = c_parser_asm_operands (parser);
 	break;
 	  case 1:
 	inputs = c_parser_asm_operands (parser);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 968403

Re: [PATCH 2/2] loops: Invoke lim after successful loop interchange

2020-11-12 Thread Martin Jambor
Hi,

On Wed, Nov 11 2020, Richard Biener wrote:
> On Mon, 9 Nov 2020, Martin Jambor wrote:
>
>> this patch modifies the loop invariant pass so that is can operate
>> only on a single requested loop and its sub-loops and ignore the rest
>> of the function, much like it currently ignores basic blocks that are
>> not in any real loop.  It then invokes it from within the loop
>> interchange pass when it successfully swaps two loops.  This avoids
>> the non-LTO -Ofast run-time regressions of 410.bwaves and 503.bwaves_r
>> (which are 19% and 15% faster than current master on an AMD zen2
>> machine) while not introducing a full LIM pass into the pass pipeline.
>> 
>> I have not modified the LIM data structures, this means that it still
>> contains vectors indexed by loop->num even though only a single loop
>> nest is actually processed.  I also did not replace the uses of
>> pre_and_rev_post_order_compute_fn with a function that would count a
>> postorder only for a given loop.  I can of course do so if the
>> approach is otherwise deemed viable.
>> 
>> The patch adds one additional global variable requested_loop to the
>> pass and then at various places behaves differently when it is set.  I
>> was considering storing the fake root loop into it for normal
>> operation, but since this loop often requires special handling anyway,
>> I came to the conclusion that the code would actually end up less
>> straightforward.
>> 
>> I have bootstrapped and tested the patch on x86_64-linux and a very
>> similar one on aarch64-linux.  I have also tested it by modifying the
>> tree_ssa_lim function to run loop_invariant_motion_from_loop on each
>> real outermost loop in a function and this variant also passed
>> bootstrap and all tests, including dump scans, of all languages.
>> 
>> I have built the entire SPEC 2006 FPrate monitoring the activity of
>> the LIM pass without and with the patch (on top of commit b642fca1c31
>> with which 526.blender_r and 538.imagick_r seemed to be failing) and
>> it only examined 0.2% more loops, 0.02% more BBs and even fewer
>> percent of statements because it is invoked only in a rather special
>> circumstance.  But the patch allows for more such need-based uses at
>> hopefully reasonable cost.
>> 
>> Since I do not have much experience with loop optimizers, I expect
>> that there will be requests to adjust the patch during the review.
>> Still, it fixes a performance regression against GCC 9 and so I hope
>> to address the concerns in time to get it into GCC 11.
>> 

[...]

>
> That said, in the way it's currently structured I think it's
> "better" to export tree_ssa_lim () and call it from interchange
> if any loop was interchanged (thus run a full pass but conditional
> on interchange done).  You can make it cheaper by adding a flag
> to tree_ssa_lim whether to do store-motion (I guess this might
> be an interesting user-visible flag as well and a possibility
> to make select lim passes cheaper via a pass flag) and not do
> store-motion from the interchange call.  I think that's how we should
> fix the regression, refactoring LIM properly requires more work
> that doesn't seem to fit the stage1 deadline.
>

So just like this?  Bootstrapped and tested on x86_64-linux and I have
verified it fixes the bwaves reduction.

Thanks,

Martin



gcc/ChangeLog:

2020-11-12  Martin Jambor  

PR tree-optimization/94406
* tree-ssa-loop-im.c (tree_ssa_lim): Renamed to
loop_invariant_motion_in_fun, added a parameter to control store
motion.
(pass_lim::execute): Adjust call to tree_ssa_lim, now
loop_invariant_motion_in_fun.
* tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Declare.
* gimple-loop-interchange.cc (pass_linterchange::execute): Call
loop_invariant_motion_in_fun if any interchange has been done.
---
 gcc/gimple-loop-interchange.cc |  9 +++--
 gcc/tree-ssa-loop-im.c | 12 +++-
 gcc/tree-ssa-loop-manip.h  |  2 +-
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index 1656004ecf0..a36dbb49b1f 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -2085,8 +2085,13 @@ pass_linterchange::execute (function *fun)
 }
 
   if (changed_p)
-scev_reset ();
-  return changed_p ? (TODO_update_ssa_only_virtuals) : 0;
+{
+  unsigned todo = TODO_update_ssa_only_virtuals;
+  todo |= loop_invariant_motion_in_fun (cfun, false);
+  scev_reset ();
+  return todo;
+}
+  return 0;
 }
 
 } // anon namespace
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 6bb07e133cd..3c7412737f0 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3089,10 +3089,11 @@ tree_ssa_lim_finalize (void)
 }
 
 /* Moves invariants from loops.  Only "expensive" invariants are moved out --
-   i.e. those that are likely to be win regardless of the register pressure.  
*/
+   i.

[committed] openmp: Implement allocate clause in omp lowering

2020-11-12 Thread Jakub Jelinek via Gcc-patches
Hi!

For now, task/taskloop constructs aren't handled and C/C++ array reductions
and reductions with task or inscan modifiers need further work.
Instead of calling omp_alloc/omp_free (where the former doesn't have
alignment argument and omp_aligned_alloc is 5.1 only feature), this calls
GOMP_alloc/GOMP_free, so that the library can fail if it would fall back
into NULL (exception is zero length allocations).

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-11-12  Jakub Jelinek  

gcc/
* builtin-types.def (BT_FN_PTR_SIZE_SIZE_PTRMODE): New function type.
* omp-builtins.def (BUILT_IN_GOACC_DECLARE): Move earlier.
(BUILT_IN_GOMP_ALLOC, BUILT_IN_GOMP_FREE): New builtins.
* gimplify.c (gimplify_scan_omp_clauses): Force allocator into a
decl if it is not NULL, INTEGER_CST or decl.
(gimplify_adjust_omp_clauses): Clear GOVD_EXPLICIT on explicit clauses
which are being removed.  Remove allocate clauses for variables not seen
if they are private, firstprivate or linear too.  Call
omp_notice_variable on the allocator otherwise.
(gimplify_omp_for): Handle iterator vars mentioned in allocate clauses
similarly to non-is_gimple_reg iterators.
* omp-low.c (struct omp_context): Add allocate_map field.
(delete_omp_context): Delete it.
(scan_sharing_clauses): Fill it from allocate clauses.  Remove it
if mentioned also in shared clause.
(lower_private_allocate): New function.
(lower_rec_input_clauses): Handle allocate clause for privatized
variables, except for task/taskloop, C/C++ array reductions for now
and task/inscan variables.
(lower_send_shared_vars): Don't consider variables in allocate_map
as shared.
* omp-expand.c (expand_omp_for_generic, expand_omp_for_static_nochunk,
expand_omp_for_static_chunk): Use expand_omp_build_assign instead of
gimple_build_assign + gsi_insert_after.
* builtins.c (builtin_fnspec): Handle BUILTIN_GOMP_ALLOC and
BUILTIN_GOMP_FREE.
* tree-ssa-ccp.c (evaluate_stmt): Handle BUILTIN_GOMP_ALLOC.
* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Handle
BUILTIN_GOMP_ALLOC.
(mark_all_reaching_defs_necessary_1): Handle BUILTIN_GOMP_ALLOC
and BUILTIN_GOMP_FREE.
(propagate_necessity): Likewise.
gcc/fortran/
* f95-lang.c (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST):
Define.
(gfc_init_builtin_functions): Add alloc_size and warn_unused_result
attributes to __builtin_GOMP_alloc.
* types.def (BT_PTRMODE): New primitive type.
(BT_FN_VOID_PTR_PTRMODE, BT_FN_PTR_SIZE_SIZE_PTRMODE): New function
types.
libgomp/
* libgomp.map (GOMP_alloc, GOMP_free): Export at GOMP_5.0.1.
* omp.h.in (omp_alloc): Add malloc and alloc_size attributes.
* libgomp_g.h (GOMP_alloc, GOMP_free): Declare.
* allocator.c (omp_aligned_alloc): New for now static function,
add alignment argument and handle it.
(omp_alloc): Reimplement using omp_aligned_alloc.
(GOMP_alloc, GOMP_free): New functions.
(omp_free): Add ialias.
* testsuite/libgomp.c-c++-common/allocate-1.c: New test.
* testsuite/libgomp.c++/allocate-1.C: New test.

--- gcc/builtin-types.def.jj2020-11-12 11:57:58.465562360 +0100
+++ gcc/builtin-types.def   2020-11-12 12:42:06.093029492 +0100
@@ -637,6 +637,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_SIZE_SIZ
 DEF_FUNCTION_TYPE_3 (BT_FN_UINT_UINT_PTR_PTR, BT_UINT, BT_UINT, BT_PTR, BT_PTR)
 DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_CONST_SIZE_BOOL,
 BT_PTR, BT_PTR, BT_CONST_SIZE, BT_BOOL)
+DEF_FUNCTION_TYPE_3 (BT_FN_PTR_SIZE_SIZE_PTRMODE,
+BT_PTR, BT_SIZE, BT_SIZE, BT_PTRMODE)
 
 DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR,
 BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR)
--- gcc/omp-builtins.def.jj 2020-11-12 11:57:58.470562304 +0100
+++ gcc/omp-builtins.def2020-11-12 12:42:06.105029360 +0100
@@ -47,6 +47,8 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, "GOACC_wait",
   BT_FN_VOID_INT_INT_VAR,
   ATTR_NOTHROW_LIST)
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DECLARE, "GOACC_declare",
+  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 
 DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device",
BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
@@ -444,5 +446,8 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASK_RED
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WORKSHARE_TASK_REDUCTION_UNREGISTER,
  "GOMP_workshare_task_reduction_unregister",
  BT_FN_VOID_BOOL, ATTR_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DECLARE, "GOACC_declare",
-  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHR

Fix gimple_expr_code?

2020-11-12 Thread Andrew MacLeod via Gcc-patches
So I spent some time tracking down a ranger issue, and in the end, it 
boiled down to the range-op handler not being picked up properly.


The handler is picked up by:

  if ((gimple_code (s) == GIMPLE_ASSIGN) || (gimple_code (s) == 
GIMPLE_COND))

    return range_op_handler (gimple_expr_code (s), gimple_expr_type (s));

where it is indexing the table with the gimple_expr_code..
the stmt being processed was for a pointer assignment,
  _5 = _33
and it was coming back with a gimple_expr_code of  VAR_DECL instead of 
an SSA_NAME... which confused me greatly.



gimple_expr_code (const gimple *stmt)
{
  enum gimple_code code = gimple_code (stmt);
  if (code == GIMPLE_ASSIGN || code == GIMPLE_COND)
    return (enum tree_code) stmt->subcode;

A little more digging shows this:

static inline enum tree_code
gimple_assign_rhs_code (const gassign *gs)
{
  enum tree_code code = (enum tree_code) gs->subcode;
  /* While we initially set subcode to the TREE_CODE of the rhs for
 GIMPLE_SINGLE_RHS assigns we do not update that subcode to stay
 in sync when we rewrite stmts into SSA form or do SSA 
propagations.  */

  if (get_gimple_rhs_class (code) == GIMPLE_SINGLE_RHS)
    code = TREE_CODE (gs->op[1]);

  return code;
}

Fascinating comment.

But it means that gimple_expr_code() isn't returning the correct result 
for GIMPLE_SINGLE_RHS


Wouldn't it make sense that gimple_expr_code be changed to return 
gimple_assign_rhs_code() for GIMPLE_ASSIGN?


I tested the attached patch, and it bootstraps and passes regression tests.

There aren't a lot of places where its used, but I saw a suspicious bit 
in ipa-icf-gimple.c that looks like it is working around this?



   bool
   func_checker::compare_gimple_assign (gimple *s1, gimple *s2)
   {
  tree arg1, arg2;
  tree_code code1, code2;
  unsigned i;

  code1 = gimple_expr_code (s1);
  code2 = gimple_expr_code (s2);

  if (code1 != code2)
    return false;

  code1 = gimple_assign_rhs_code (s1);
  code2 = gimple_assign_rhs_code (s2);

  if (code1 != code2)
    return false;


and  there were one or two other places where SSA_NAME occurred in the 
cases of a switch after calling gimple_expr_code().


This seems like it should be the right thing?
Andrew
	* gimple.h (gimple_expr_code): Return gimple_assign_rhs_code
	for GIMPLE_ASSIGN.

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 62b5a8a6124..8ef2f83d412 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -2229,26 +2229,6 @@ gimple_set_modified (gimple *s, bool modifiedp)
 }
 
 
-/* Return the tree code for the expression computed by STMT.  This is
-   only valid for GIMPLE_COND, GIMPLE_CALL and GIMPLE_ASSIGN.  For
-   GIMPLE_CALL, return CALL_EXPR as the expression code for
-   consistency.  This is useful when the caller needs to deal with the
-   three kinds of computation that GIMPLE supports.  */
-
-static inline enum tree_code
-gimple_expr_code (const gimple *stmt)
-{
-  enum gimple_code code = gimple_code (stmt);
-  if (code == GIMPLE_ASSIGN || code == GIMPLE_COND)
-return (enum tree_code) stmt->subcode;
-  else
-{
-  gcc_gimple_checking_assert (code == GIMPLE_CALL);
-  return CALL_EXPR;
-}
-}
-
-
 /* Return true if statement STMT contains volatile operands.  */
 
 static inline bool
@@ -2889,6 +2869,29 @@ gimple_assign_cast_p (const gimple *s)
   return false;
 }
 
+
+/* Return the tree code for the expression computed by STMT.  This is
+   only valid for GIMPLE_COND, GIMPLE_CALL and GIMPLE_ASSIGN.  For
+   GIMPLE_CALL, return CALL_EXPR as the expression code for
+   consistency.  This is useful when the caller needs to deal with the
+   three kinds of computation that GIMPLE supports.  */
+
+static inline enum tree_code
+gimple_expr_code (const gimple *stmt)
+{
+  enum gimple_code code = gimple_code (stmt);
+  if (code == GIMPLE_ASSIGN)
+return gimple_assign_rhs_code (stmt);
+  else if (code == GIMPLE_COND)
+return (enum tree_code) stmt->subcode;
+  else
+{
+  gcc_gimple_checking_assert (code == GIMPLE_CALL);
+  return CALL_EXPR;
+}
+}
+
+
 /* Return true if S is a clobber statement.  */
 
 static inline bool


Re: Fix gimple_expr_code?

2020-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2020 9:43:52 PM GMT+01:00, Andrew MacLeod via Gcc-patches 
 wrote:
>So I spent some time tracking down a ranger issue, and in the end, it 
>boiled down to the range-op handler not being picked up properly.
>
>The handler is picked up by:
>
>   if ((gimple_code (s) == GIMPLE_ASSIGN) || (gimple_code (s) == 
>GIMPLE_COND))
>    return range_op_handler (gimple_expr_code (s), gimple_expr_type
>(s));

IMHO this should use more specific functions. Gimple_expr_code should go away 
similar to gimple_expr_type. 

>where it is indexing the table with the gimple_expr_code..
>the stmt being processed was for a pointer assignment,
>   _5 = _33
>and it was coming back with a gimple_expr_code of  VAR_DECL instead of 
>an SSA_NAME... which confused me greatly.
>
>
>gimple_expr_code (const gimple *stmt)
>{
>   enum gimple_code code = gimple_code (stmt);
>   if (code == GIMPLE_ASSIGN || code == GIMPLE_COND)
>     return (enum tree_code) stmt->subcode;
>
>A little more digging shows this:
>
>static inline enum tree_code
>gimple_assign_rhs_code (const gassign *gs)
>{
>   enum tree_code code = (enum tree_code) gs->subcode;
>   /* While we initially set subcode to the TREE_CODE of the rhs for
>  GIMPLE_SINGLE_RHS assigns we do not update that subcode to stay
>  in sync when we rewrite stmts into SSA form or do SSA 
>propagations.  */
>   if (get_gimple_rhs_class (code) == GIMPLE_SINGLE_RHS)
>     code = TREE_CODE (gs->op[1]);
>
>   return code;
>}
>
>Fascinating comment.

... 😬 

>But it means that gimple_expr_code() isn't returning the correct result
>
>for GIMPLE_SINGLE_RHS

It depends. A SSA name isn't an expression code either. As said, the generic 
gimple_expr_code should be used with extreme care. 

>Wouldn't it make sense that gimple_expr_code be changed to return 
>gimple_assign_rhs_code() for GIMPLE_ASSIGN?
>
>I tested the attached patch, and it bootstraps and passes regression
>tests.
>
>There aren't a lot of places where its used, but I saw a suspicious bit
>
>in ipa-icf-gimple.c that looks like it is working around this?
>
>
>bool
>func_checker::compare_gimple_assign (gimple *s1, gimple *s2)
>{
>   tree arg1, arg2;
>   tree_code code1, code2;
>   unsigned i;
>
>   code1 = gimple_expr_code (s1);
>   code2 = gimple_expr_code (s2);
>
>   if (code1 != code2)
>     return false;
>
>   code1 = gimple_assign_rhs_code (s1);
>   code2 = gimple_assign_rhs_code (s2);
>
>   if (code1 != code2)
>     return false;
>
>
>and  there were one or two other places where SSA_NAME occurred in the 
>cases of a switch after calling gimple_expr_code().
>
>This seems like it should be the right thing?
>Andrew



Re: Fix gimple_expr_code?

2020-11-12 Thread Andrew MacLeod via Gcc-patches

On 11/12/20 3:53 PM, Richard Biener wrote:

On November 12, 2020 9:43:52 PM GMT+01:00, Andrew MacLeod via Gcc-patches 
 wrote:

So I spent some time tracking down a ranger issue, and in the end, it
boiled down to the range-op handler not being picked up properly.

The handler is picked up by:

   if ((gimple_code (s) == GIMPLE_ASSIGN) || (gimple_code (s) ==
GIMPLE_COND))
     return range_op_handler (gimple_expr_code (s), gimple_expr_type
(s));

IMHO this should use more specific functions. Gimple_expr_code should go away 
similar to gimple_expr_type.


gimple_expr_type is quite pervasive.. and each consumer is going to have 
to roll their own version of it.  Why do we want to get rid of it?


If we are trying to save a few bytes by storing the information in 
different places, then we're going to need some sort of accessing 
function like that



where it is indexing the table with the gimple_expr_code..
the stmt being processed was for a pointer assignment,
   _5 = _33
and it was coming back with a gimple_expr_code of  VAR_DECL instead of
an SSA_NAME... which confused me greatly.


gimple_expr_code (const gimple *stmt)
{
   enum gimple_code code = gimple_code (stmt);
   if (code == GIMPLE_ASSIGN || code == GIMPLE_COND)
     return (enum tree_code) stmt->subcode;

A little more digging shows this:

static inline enum tree_code
gimple_assign_rhs_code (const gassign *gs)
{
   enum tree_code code = (enum tree_code) gs->subcode;
   /* While we initially set subcode to the TREE_CODE of the rhs for
  GIMPLE_SINGLE_RHS assigns we do not update that subcode to stay
  in sync when we rewrite stmts into SSA form or do SSA
propagations.  */
   if (get_gimple_rhs_class (code) == GIMPLE_SINGLE_RHS)
     code = TREE_CODE (gs->op[1]);

   return code;
}

Fascinating comment.

... 😬


But it means that gimple_expr_code() isn't returning the correct result

for GIMPLE_SINGLE_RHS

It depends. A SSA name isn't an expression code either. As said, the generic 
gimple_expr_code should be used with extreme care.


what is an expression code?  It seems like its just a  tree_code 
representing what is on the RHS?    Im not sure I understand why one 
needs to be careful with it.  It only applies to COND, ASSIGN and CALL. 
and its current right for everything except GIMPLE_SINGLE_RHS?


If we dont fix gimple_expr_code, then Im basically going to be 
reimplementing it myself... which seems kind of pointless.


Andrew





c: C2x __has_c_attribute

2020-11-12 Thread Joseph Myers
C2x adds the __has_c_attribute preprocessor operator, similar to C++
__has_cpp_attribute.

GCC implements __has_cpp_attribute as exactly equivalent to
__has_attribute.  (The documentation says they differ regarding the
values returned for standard attributes, but that's actually only a
matter of the particular nonzero value returned not being specified in
the documentation for __has_attribute; the implementation makes no
distinction between the two.)

I don't think having them exactly equivalent is actually correct,
either for __has_cpp_attribute or for __has_c_attribute.
Specifically, I think it is only correct for __has_cpp_attribute or
__has_c_attribute to return nonzero if the given attribute is
supported, with the particular pp-tokens passed to __has_cpp_attribute
or __has_c_attribute, with [[]] syntax, not if it's only accepted in
__attribute__ or with gnu:: added in [[]].  For example, they should
return nonzero for gnu::packed, but zero for plain packed, because
[[gnu::packed]] is accepted but [[packed]] is ignored as not a
standard attribute.

This patch implements that for __has_c_attribute, leaving any changes
to __has_cpp_attribute for the C++ maintainers.  A new
BT_HAS_STD_ATTRIBUTE is added for __has_c_attribute (which I think,
based on the above, would actually be correct to use for
__has_cpp_attribute as well).  The code in c_common_has_attribute that
deals with scopes has its C++ conditional removed; instead, whether
the language is C or C++ is used only to determine the numeric values
returned for standard attributes (and which standard attributes are
handled there at all).  A new argument is passed to
c_common_has_attribute to distinguish BT_HAS_STD_ATTRIBUTE from
BT_HAS_ATTRIBUTE, and that argument is used to stop attributes with no
scope specified from being accepted with __has_c_attribute unless they
are one of the known standard attributes and so handled specially.

Although the standard specify constants ending with 'L' as the values
for the standard attributes, there is no correctness issue with the
lack of code in GCC to add that 'L' to the expansion:
__has_c_attribute and __has_cpp_attribute are expanded in #if after
other macro expansion has occurred, with no semantics being specified
if they occur outside #if, so there is no way for a conforming program
to inspect the exact text of the expansion of those macros, only to
use the resulting pp-number in a #if expression, where long and int
have the same set of values.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/
2020-11-12  Joseph Myers  

* doc/cpp.texi (__has_attribute): Document when scopes are allowed
for C.
(__has_c_attribute): New.

gcc/c-family/
2020-11-12  Joseph Myers  

* c-lex.c (c_common_has_attribute): Take argument std_syntax.
Allow scope for C.  Handle standard attributes for C.  Do not
accept unscoped attributes if std_syntax and not handled as
standard attributes.
* c-common.h (c_common_has_attribute): Update prototype.

gcc/testsuite/
2020-11-12  Joseph Myers  

* gcc.dg/c2x-has-c-attribute-1.c, gcc.dg/c2x-has-c-attribute-2.c,
gcc.dg/c2x-has-c-attribute-3.c, gcc.dg/c2x-has-c-attribute-4.c:
New tests.

libcpp/
2020-11-12  Joseph Myers  

* include/cpplib.h (struct cpp_callbacks): Add bool argument to
has_attribute.
(enum cpp_builtin_type): Add BT_HAS_STD_ATTRIBUTE.
* init.c (builtin_array): Add __has_c_attribute.
(cpp_init_special_builtins): Handle BT_HAS_STD_ATTRIBUTE.
* macro.c (_cpp_builtin_macro_text): Handle BT_HAS_STD_ATTRIBUTE.
Update call to has_attribute for BT_HAS_ATTRIBUTE.
* traditional.c (fun_like_macro): Handle BT_HAS_STD_ATTRIBUTE.

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 94f4868915a..f47097442eb 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1042,7 +1042,7 @@ extern bool c_cpp_diagnostic (cpp_reader *, enum 
cpp_diagnostic_level,
  enum cpp_warning_reason, rich_location *,
  const char *, va_list *)
  ATTRIBUTE_GCC_DIAG(5,0);
-extern int c_common_has_attribute (cpp_reader *);
+extern int c_common_has_attribute (cpp_reader *, bool);
 extern int c_common_has_builtin (cpp_reader *);
 
 extern bool parse_optimize_options (tree, bool);
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e81e16ddc26..6cd3df7c96f 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -300,7 +300,7 @@ get_token_no_padding (cpp_reader *pfile)
 
 /* Callback for has_attribute.  */
 int
-c_common_has_attribute (cpp_reader *pfile)
+c_common_has_attribute (cpp_reader *pfile, bool std_syntax)
 {
   int result = 0;
   tree attr_name = NULL_TREE;
@@ -319,35 +319,37 @@ c_common_has_attribute (cpp_reader *pfile)
   attr_name = get_identifier ((const char *)
  cpp_token_as_tex

[PATCH] C-Family, Objective-C : Implement Objective-C nullability Part 1 [PR90707].

2020-11-12 Thread Iain Sandoe

Hi,

The PR notes that our inability to parse these keywords in GNU Objective-C
is one of the contributing factors to being unable to use some important  
system

headers (at least, on Darwin platforms).

tested on x86_64-darwin and x86_64-linux-gnu,
OK for the C-family changes?
thanks
Iain

— commit log

Part 1 of the implementation covers property nullability attributes
and includes the changes to common code. Follow-on changes will be needed
to cover Objective-C method definitions, but those are expected to be
local to the Objective-C front end.

The basis of the implementation is to translate the Objective-C-specific
keywords into an attribute (objc_nullability) which has the required
states to carry the attribute markup.

We introduce the keywords, and these are parsed and validated in the same
manner as other property attributes.  The resulting value is attached to
the property as an objc_nullability attribute.

gcc/c-family/ChangeLog:

PR objc/90707
* c-common.c (c_common_reswords): null_unspecified, nullable,
nonnull, null_resettable: New keywords.
* c-common.h (enum rid): RID_NULL_UNSPECIFIED, RID_NULLABLE,
RID_NONNULL, RID_NULL_RESETTABLE: New.
(OBJC_IS_PATTR_KEYWORD): Include nullability keywords in the
ranges accepted for property attributes.
* c-attribs.c (handle_objc_nullability_attribute): New.
* c-objc.h (enum objc_property_attribute_group): Add
OBJC_PROPATTR_GROUP_NULLABLE.
(enum objc_property_attribute_kind):Add
OBJC_PROPERTY_ATTR_NULL_UNSPECIFIED, OBJC_PROPERTY_ATTR_NULLABLE,
OBJC_PROPERTY_ATTR_NONNULL, OBJC_PROPERTY_ATTR_NULL_RESETTABLE.

gcc/objc/ChangeLog:

PR objc/90707
* objc-act.c (objc_prop_attr_kind_for_rid): Handle nullability.
(objc_add_property_declaration): Handle nullability attributes.
Check that these are applicable to the property type.
* objc-act.h (enum objc_property_nullability): New.

gcc/testsuite/ChangeLog:

PR objc/90707
* obj-c++.dg/property/at-property-4.mm: Add basic nullability
tests.
* objc.dg/property/at-property-4.m: Likewise.
* obj-c++.dg/attributes/nullability-00.mm: New test.
* obj-c++.dg/property/nullability-00.mm: New test.
* objc.dg/attributes/nullability-00.m: New test.
* objc.dg/property/nullability-00.m: New test.

gcc/ChangeLog:

PR objc/90707
* doc/extend.texi: Document the objc_nullability attribute.
---
 gcc/c-family/c-attribs.c  | 49 ++
 gcc/c-family/c-common.c   |  6 +++
 gcc/c-family/c-common.h   |  7 ++-
 gcc/c-family/c-objc.h |  5 ++
 gcc/doc/extend.texi   | 27 ++
 gcc/objc/objc-act.c   | 51 ++-
 gcc/objc/objc-act.h   | 10 
 .../obj-c++.dg/attributes/nullability-00.mm   | 20 
 .../obj-c++.dg/property/at-property-4.mm  | 20 +++-
 .../obj-c++.dg/property/nullability-00.mm | 21 
 .../objc.dg/attributes/nullability-00.m   | 20 
 .../objc.dg/property/at-property-4.m  | 18 +++
 .../objc.dg/property/nullability-00.m | 21 
 13 files changed, 272 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/obj-c++.dg/attributes/nullability-00.mm
 create mode 100644 gcc/testsuite/obj-c++.dg/property/nullability-00.mm
 create mode 100644 gcc/testsuite/objc.dg/attributes/nullability-00.m
 create mode 100644 gcc/testsuite/objc.dg/property/nullability-00.m

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 6718fff6efb..9c62508651c 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -161,6 +161,7 @@ static tree handle_patchable_function_entry_attribute  
(tree *, tree, tree,

 static tree handle_copy_attribute (tree *, tree, tree, int, bool *);
 static tree handle_nsobject_attribute (tree *, tree, tree, int, bool *);
 static tree handle_objc_root_class_attribute (tree *, tree, tree, int, bool *);
+static tree handle_objc_nullability_attribute (tree *, tree, tree, int,  
bool *);


 /* Helper to define attribute exclusions.  */
 #define ATTR_EXCL(name, function, type, variable)  \
@@ -520,6 +521,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_nsobject_attribute, NULL },
   { "objc_root_class", 0, 0, true, false, false, false,
  handle_objc_root_class_attribute, NULL },
+  { "objc_nullability",1, 1, true, false, false, false,
+ handle_objc_nullability_attribute, NULL },
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };

@@ -5251,6 +5254,52 @@ handle_objc_root_class_attribute (tree */*node*/,  
tree name, tree /*args*/,

   return NULL_TREE;
 }

+/* Handle an "objc_nu

Re: PowerPC: Add __float128 conversions to/from Decimal

2020-11-12 Thread Michael Meissner via Gcc-patches
On Thu, Oct 29, 2020 at 10:05:38PM +, Joseph Myers wrote:
> On Thu, 29 Oct 2020, Segher Boessenkool wrote:
> 
> > > Doing these conversions accurately is nontrivial.  Converting via strings 
> > > is the simple approach (i.e. the one that moves the complexity somewhere 
> > > else).  There are more complicated but more efficient approaches that can 
> > > achieve correct conversions with smaller bounds on resource usage (and 
> > > there are various papers published in this area), but those involve a lot 
> > > more code (and precomputed data, with a speed/space trade-off in how much 
> > > you precompute; the BID code in libgcc has several MB of precomputed data 
> > > for that purpose).
> > 
> > Does the printf code in libgcc handle things correctly for IEEE QP float
> > as long double, do you know?
> 
> As far as I know, the code in libgcc for conversions *from* decimal *to* 
> binary (so the direction that uses strtof128 as opposed to the one using 
> strfrom128, in the binary128 case) works correctly, if the underlying libc 
> has accurate string/numeric conversion operations.
> 
> Binary to decimal is another matter, even for cases such as float to 
> _Decimal64.  I've just filed bug 97635 for that.
> 
> Also note that if you want to use printf as opposed to strfromf128 for 
> IEEE binary128 you'll need to use __printfieee128 (the version that 
> expects long double to be IEEE binary128) which was introduced in glibc 
> 2.32, so that doesn't help with the glibc version dependencies.

My latest patches now switches to using the GLIBC 2.32 and __sprintfieee128.
If we don't have glibc 2.32, it just calls abort, so we don't get linker
errors.  I hope to submit it tonight or tomorrow night.

> When I investigated and reported several bugs in the conversion operations 
> in libdfp, I noted (e.g. https://github.com/libdfp/libdfp/issues/29 ) that 
> the libgcc versions were working correctly for those tests (and filed and 
> subsequently fixed one glibc strtod bug, missing inexact exceptions, that 
> I'd noticed while looking at such issues in libdfp).  But the specific 
> case I tested for badly rounded conversions was the case of conversions 
> from decimal to binary, not the case of conversions from binary to 
> decimal, which, as noted above, turn out to be buggy in libgcc.
> 
> Lots of bugs have been fixed in the glibc conversion code over the years 
> (more on the strtod side than in the code shared by printf and strfrom 
> functions).  That code uses multiple-precision operations from GMP, which 
> avoids some complications but introduces others (it also needs to e.g. 
> deal with locale issues, which are irrelevant for libgcc conversions).

Using the sprintf method, I see an error in

c-c++-common/dfp/convert-bfp-11.c

that I didn't see with the method used in the patches with strtof128 and
strfromf128 directly.  I need to track down exactly what the error is.

All of the other dfp conversion tests work fine.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: PowerPC: Use __float128 instead of __ieee128 in tests.

2020-11-12 Thread Michael Meissner via Gcc-patches
On Thu, Nov 12, 2020 at 01:26:32PM -0600, Segher Boessenkool wrote:
> Hi,
> 
> On Thu, Oct 22, 2020 at 06:12:31PM -0400, Michael Meissner wrote:
> > Two of the tests used the __ieee128 keyword instead of __float128.  This
> > patch changes those cases to use the official keyword.
> 
> What is "official" about that?
> 
> Why make this change at all?  __ieee128 should work as well!  Did you
> see failures without this patch?  Thos need fixing, then.

We document '__float128'.  We don't document '__ieee128'.  As I said, using
'__ieee128' internally was due some issues in the GCC 7 time frame,
particularly before we had the glibc changes.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] RISC-V: Enable ifunc if it was supported in the binutils for linux toolchain.

2020-11-12 Thread Jim Wilson
On Tue, Nov 10, 2020 at 7:33 PM Nelson Chu  wrote:

> gcc/
> * configure: Regenerated.
> * configure.ac: If ifunc was supported in the binutils for
> linux toolchain, then set enable_gnu_indirect_function to yes.
>

Looks good.  I committed and pushed it.

I see some extra ifunc related testsuite failures, but that is because we
don't have the glibc ifunc patches upstream yet.  It will be important to
get those done next.

Jim


Re: [PATCH] PR target/97682 - Fix to reuse t1 register between call address and epilogue.

2020-11-12 Thread Jim Wilson
On Mon, Nov 9, 2020 at 11:15 PM Monk Chiang  wrote:

> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 172c7ca7c98..3bd1993c4c9 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -342,9 +342,13 @@ extern const char *riscv_default_mtune (int argc,
> const char **argv);
> The epilogue temporary mustn't conflict with the return registers,
> the frame pointer, the EH stack adjustment, or the EH data registers.
> */
>
> -#define RISCV_PROLOGUE_TEMP_REGNUM (GP_TEMP_FIRST + 1)
> +#define RISCV_PROLOGUE_TEMP_REGNUM (GP_TEMP_FIRST)
>  #define RISCV_PROLOGUE_TEMP(MODE) gen_rtx_REG (MODE,
> RISCV_PROLOGUE_TEMP_REGNUM)
>
> +#define RISCV_CALL_ADDRESS_TEMP_REGNUM (GP_TEMP_FIRST + 1)
> +#define RISCV_CALL_ADDRESS_TEMP(MODE) \
> +  gen_rtx_REG (MODE, RISCV_CALL_ADDRESS_TEMP_REGNUM)
>

This looks generally OK, however there is a minor problem that we have code
in riscv_compute_frame_info to save t1 in an interrupt handler register
with a large stack frame, as we know the prologue code will clobber t1 in
this case.  However, with this patch, the prologue now clobbers t0
instead.  So riscv_computer_frame_info needs to be fixed.  I'd suggest
changing the T1_REGNUM to RISCV_PROLOGUE_TEMP_REGNUM to prevent this from
happening again, that is probably my fault.  And the interrupt_save_t1
variable should be renamed, maybe to interupt_save_prologue_temp.

You can see the problem with gcc/testsuite/gcc.target/riscv/interrupt-3.c
if you compile with -O0 and we get
foo:
addi sp,sp,-32
sw t1,28(sp)
sw s0,24(sp)
addi s0,sp,32
li t0,-4096
addi t0,t0,16
add sp,sp,t0
so we are saving t1 and then clobbering t0 with your patch.

Otherwise this looks good.

Jim


Re: [PATCH] c++: Don't form a templated TARGET_EXPR in finish_compound_literal

2020-11-12 Thread Jason Merrill via Gcc-patches

On 11/12/20 1:27 PM, Patrick Palka wrote:

The atom_cache in normalize_atom relies on the assumption that two
equivalent (templated) trees (in the sense of cp_tree_equal) must use
the same template parameters (according to find_template_parameters).

This assumption unfortunately doesn't always hold for TARGET_EXPRs,
because cp_tree_equal ignores an artificial target of a TARGET_EXPR, but
find_template_parameters walks this target (and its DECL_CONTEXT).

Hence two TARGET_EXPRs built by force_target_expr with the same
initializer but under different settings of current_function_decl may
compare equal according to cp_tree_equal, but find_template_parameters
returns a different set of template parameters for them.  This breaks
the below testcase because during normalization we build two such
TARGET_EXPRs (one under current_function_decl=f and another under =g),
and then use the same ATOMIC_CONSTR for the two corresponding atoms,
leading to a crash during satisfaction of g's associated constraints.

This patch works around this assumption violation by removing the source
of these templated TARGET_EXPRs.  The relevant call to get_target_expr was
added in r9-6043, but it seems it's no longer necessary (according to
https://gcc.gnu.org/pipermail/gcc-patches/2019-February/517323.html, the
call was added in order to avoid regressing on initlist109.C at the time).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.  I wonder what else asserting !processing_template_decl in 
build_target_expr would find...



gcc/cp/ChangeLog:

* semantics.c (finish_compound_literal): Don't wrap the original
compound literal in a TARGET_EXPR when inside a template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-decltype3.C: New test.
---
  gcc/cp/semantics.c  |  7 +--
  gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C | 15 +++
  2 files changed, 16 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 33d715edaec..172286922e7 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3006,12 +3006,7 @@ finish_compound_literal (tree type, tree 
compound_literal,
  
/* If we're in a template, return the original compound literal.  */

if (orig_cl)
-{
-  if (!VECTOR_TYPE_P (type))
-   return get_target_expr_sfinae (orig_cl, complain);
-  else
-   return orig_cl;
-}
+return orig_cl;
  
if (TREE_CODE (compound_literal) == CONSTRUCTOR)

  {
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C
new file mode 100644
index 000..837855ce8ac
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-decltype3.C
@@ -0,0 +1,15 @@
+// { dg-do compile { target c++20 } }
+
+template  concept C = requires(T t) { t; };
+
+template  using A = decltype((T{}, int{}));
+
+template  concept D = C>;
+
+template  void f() requires D;
+template  void g() requires D;
+
+void h() {
+  f();
+  g();
+}





[committed] libgccjit.h: fix typo in comment

2020-11-12 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 8948a5715b00fe36d20c03b6c4c4397b74cc6282.

gcc/jit/ChangeLog:
* libgccjit.h: Fix typo in comment.
---
 gcc/jit/libgccjit.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h
index 7134841bb07..7fbaa9f3162 100644
--- a/gcc/jit/libgccjit.h
+++ b/gcc/jit/libgccjit.h
@@ -1504,7 +1504,7 @@ gcc_jit_context_new_rvalue_from_vector (gcc_jit_context 
*ctxt,
 
 #define LIBGCCJIT_HAVE_gcc_jit_version
 
-/* Functions to retrive libgccjit version.
+/* Functions to retrieve libgccjit version.
Analogous to __GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__ in C code.
 
These API entrypoints were added in LIBGCCJIT_ABI_13; you can test for their
-- 
2.26.2



[committed] jit: fix string escaping

2020-11-12 Thread David Malcolm via Gcc-patches
This patch fixes a bug in recording::string::make_debug_string in which
'\t' and '\n' were "escaped" by simply prepending a '\', thus emitting
'\' then '\n', rather than '\' then 'n'.  It also removes a hack that
determined if a string is to be escaped by checking for a leading '"',
by instead adding a flag.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as fec573408310139e1ffc42741fbe46b4f2947592.

gcc/jit/ChangeLog:
* jit-recording.c (recording::context::new_string): Add "escaped"
param and use it when creating the new recording::string instance.
(recording::string::string): Add "escaped" param and use it to
initialize m_escaped.
(recording::string::make_debug_string): Replace check that first
char is double-quote with use of m_escaped.  Fix escaping of
'\t' and '\n'.  Set "escaped" on the result.
* jit-recording.h (recording::context::new_string): Add "escaped"
param.
(recording::string::string): Add "escaped" param.
(recording::string::m_escaped): New field.

gcc/testsuite/ChangeLog:
* jit.dg/test-debug-strings.c (create_code): Add tests of
string literal escaping.
---
 gcc/jit/jit-recording.c   | 39 ---
 gcc/jit/jit-recording.h   |  9 --
 gcc/testsuite/jit.dg/test-debug-strings.c | 20 
 3 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c
index 3cbeba0f371..3a84c1fc5c0 100644
--- a/gcc/jit/jit-recording.c
+++ b/gcc/jit/jit-recording.c
@@ -724,12 +724,12 @@ recording::context::disassociate_from_playback ()
This creates a fresh copy of the given 0-terminated buffer.  */
 
 recording::string *
-recording::context::new_string (const char *text)
+recording::context::new_string (const char *text, bool escaped)
 {
   if (!text)
 return NULL;
 
-  recording::string *result = new string (this, text);
+  recording::string *result = new string (this, text, escaped);
   record (result);
   return result;
 }
@@ -1954,8 +1954,9 @@ recording::memento::write_to_dump (dump &d)
 /* Constructor for gcc::jit::recording::string::string, allocating a
copy of the given text using new char[].  */
 
-recording::string::string (context *ctxt, const char *text)
-  : memento (ctxt)
+recording::string::string (context *ctxt, const char *text, bool escaped)
+: memento (ctxt),
+  m_escaped (escaped)
 {
   m_len = strlen (text);
   m_buffer = new char[m_len + 1];
@@ -2005,9 +2006,9 @@ recording::string::from_printf (context *ctxt, const char 
*fmt, ...)
 recording::string *
 recording::string::make_debug_string ()
 {
-  /* Hack to avoid infinite recursion into strings when logging all
- mementos: don't re-escape strings:  */
-  if (m_buffer[0] == '"')
+  /* Avoid infinite recursion into strings when logging all mementos:
+ don't re-escape strings:  */
+  if (m_escaped)
 return this;
 
   /* Wrap in quotes and do escaping etc */
@@ -2024,15 +2025,31 @@ recording::string::make_debug_string ()
   for (size_t i = 0; i < m_len ; i++)
 {
   char ch = m_buffer[i];
-  if (ch == '\t' || ch == '\n' || ch == '\\' || ch == '"')
-   APPEND('\\');
-  APPEND(ch);
+  switch (ch)
+   {
+   default:
+ APPEND(ch);
+ break;
+   case '\t':
+ APPEND('\\');
+ APPEND('t');
+ break;
+   case '\n':
+ APPEND('\\');
+ APPEND('n');
+ break;
+   case '\\':
+   case '"':
+ APPEND('\\');
+ APPEND(ch);
+ break;
+   }
 }
   APPEND('"'); /* closing quote */
 #undef APPEND
   tmp[len] = '\0'; /* nil termintator */
 
-  string *result = m_ctxt->new_string (tmp);
+  string *result = m_ctxt->new_string (tmp, true);
 
   delete[] tmp;
   return result;
diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h
index 30e37aff387..9a43a7bf33a 100644
--- a/gcc/jit/jit-recording.h
+++ b/gcc/jit/jit-recording.h
@@ -74,7 +74,7 @@ public:
   void disassociate_from_playback ();
 
   string *
-  new_string (const char *text);
+  new_string (const char *text, bool escaped = false);
 
   location *
   new_location (const char *filename,
@@ -414,7 +414,7 @@ private:
 class string : public memento
 {
 public:
-  string (context *ctxt, const char *text);
+  string (context *ctxt, const char *text, bool escaped);
   ~string ();
 
   const char *c_str () { return m_buffer; }
@@ -431,6 +431,11 @@ private:
 private:
   size_t m_len;
   char *m_buffer;
+
+  /* Flag to track if this string is the result of string::make_debug_string,
+ to avoid infinite recursion when logging all mementos: don't re-escape
+ such strings.  */
+  bool m_escaped;
 };
 
 class location : public memento
diff --git a/gcc/testsuite/jit.dg/test-debug-strings.c 
b/gcc/testsuite/jit.dg/test-debug-strings.c
index e515a176257..03ef3370d94 100644
--- a/gcc/testsuite/jit.dg/test-debu

[committed] jit: add support for inline asm [PR87291]

2020-11-12 Thread David Malcolm via Gcc-patches
This patch adds various entrypoints to libgccjit for directly embedding
asm statements into a compile, analogous to inline asm in the C frontend:
  gcc_jit_block_add_extended_asm
  gcc_jit_block_end_with_extended_asm_goto
  gcc_jit_extended_asm_as_object
  gcc_jit_extended_asm_set_volatile_flag
  gcc_jit_extended_asm_set_inline_flag
  gcc_jit_extended_asm_add_output_operand
  gcc_jit_extended_asm_add_input_operand
  gcc_jit_extended_asm_add_clobber
  gcc_jit_context_add_top_level_asm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 421d0d0f54294a7bf2872b3b2ac521ce0fa9869e.

gcc/jit/ChangeLog:
PR jit/87291
* docs/cp/topics/asm.rst: New file.
* docs/cp/topics/index.rst (Topic Reference): Add it.
* docs/topics/asm.rst: New file.
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_15): New.
* docs/topics/functions.rst (Statements): Add link to extended
asm.
* docs/topics/index.rst (Topic Reference): Add asm.rst.
* docs/topics/objects.rst: Add gcc_jit_extended_asm to ASCII art.
* jit-common.h (gcc::jit::recording::extended_asm): New forward
decl.
(gcc::jit::recording::top_level_asm): Likewise.
* jit-playback.c: Include "stmt.h".
(build_string): New.
(gcc::jit::playback::context::new_string_literal): Disambiguate
build_string call.
(gcc::jit::playback::context::add_top_level_asm): New.
(build_operand_chain): New.
(build_clobbers): New.
(build_goto_operands): New.
(gcc::jit::playback::block::add_extended_asm): New.
* jit-playback.h (gcc::jit::playback::context::add_top_level_asm):
New decl.
(struct gcc::jit::playback::asm_operand): New struct.
(gcc::jit::playback::block::add_extended_asm): New decl.
* jit-recording.c (gcc::jit::recording::context::dump_to_file):
Dump top-level asms.
(gcc::jit::recording::context::add_top_level_asm): New.
(gcc::jit::recording::block::add_extended_asm): New.
(gcc::jit::recording::block::end_with_extended_asm_goto): New.
(gcc::jit::recording::asm_operand::asm_operand): New.
(gcc::jit::recording::asm_operand::print): New.
(gcc::jit::recording::asm_operand::make_debug_string): New.
(gcc::jit::recording::output_asm_operand::write_reproducer): New.
(gcc::jit::recording::output_asm_operand::print): New.
(gcc::jit::recording::input_asm_operand::write_reproducer): New.
(gcc::jit::recording::input_asm_operand::print): New.
(gcc::jit::recording::extended_asm::add_output_operand): New.
(gcc::jit::recording::extended_asm::add_input_operand): New.
(gcc::jit::recording::extended_asm::add_clobber): New.
(gcc::jit::recording::extended_asm::replay_into): New.
(gcc::jit::recording::extended_asm::make_debug_string): New.
(gcc::jit::recording::extended_asm::write_flags): New.
(gcc::jit::recording::extended_asm::write_clobbers): New.
(gcc::jit::recording::extended_asm_simple::write_reproducer): New.
(gcc::jit::recording::extended_asm::maybe_populate_playback_blocks):
New.
(gcc::jit::recording::extended_asm_goto::extended_asm_goto): New.
(gcc::jit::recording::extended_asm_goto::replay_into): New.
(gcc::jit::recording::extended_asm_goto::write_reproducer): New.
(gcc::jit::recording::extended_asm_goto::get_successor_blocks):
New.
(gcc::jit::recording::extended_asm_goto::maybe_print_gotos): New.

(gcc::jit::recording::extended_asm_goto::maybe_populate_playback_blocks):
New.
(gcc::jit::recording::top_level_asm::top_level_asm): New.
(gcc::jit::recording::top_level_asm::replay_into): New.
(gcc::jit::recording::top_level_asm::make_debug_string): New.
(gcc::jit::recording::top_level_asm::write_to_dump): New.
(gcc::jit::recording::top_level_asm::write_reproducer): New.
* jit-recording.h
(gcc::jit::recording::context::add_top_level_asm): New decl.
(gcc::jit::recording::context::m_top_level_asms): New field.
(gcc::jit::recording::block::add_extended_asm): New decl.
(gcc::jit::recording::block::end_with_extended_asm_goto): New
decl.
(gcc::jit::recording::asm_operand): New class.
(gcc::jit::recording::output_asm_operand): New class.
(gcc::jit::recording::input_asm_operand): New class.
(gcc::jit::recording::extended_asm): New class.
(gcc::jit::recording::extended_asm_simple): New class.
(gcc::jit::recording::extended_asm_goto): New class.
(gcc::jit::recording::top_level_asm): New class.
* libgccjit++.h (gccjit::extended_asm): New forward decl.
(gccjit::context::add_top_level_asm): New.
(gccjit::block::add_extended_asm): New.
(gccjit::block::end_with_extended_asm_goto): New.
(

[PATCH] Use SHF_GNU_RETAIN to preserve symbol definitions

2020-11-12 Thread H.J. Lu via Gcc-patches
In assemly code, the section flag 'R' sets the SHF_GNU_RETAIN flag to
indicate that the section must be preserved by the linker.

Add SECTION_RETAIN to indicate a section should be retained by the linker
and set SECTION_RETAIN on section for the preserved symbol if assembler
supports SHF_GNU_RETAIN.  All retained symbols are placed in separate
sections with

.section .data.rel.local.preserved_symbol,"awR"
preserved_symbol:
...
.section .data.rel.local,"aw"
not_preserved_symbol:
...

to avoid

.section .data.rel.local,"awR"
preserved_symbol:
...
not_preserved_symbol:
...

which places not_preserved_symbol definition in the SHF_GNU_RETAIN
section.

gcc/

2020-11-XX  H.J. Lu  

* configure.ac (HAVE_GAS_SHF_GNU_RETAIN): New.  Define 1 if
the assembler supports marking sections with SHF_GNU_RETAIN flag.
* output.h (SECTION_RETAIN): New.  Defined as 0x400.
(SECTION_MACH_DEP): Changed from 0x400 to 0x800.
(default_unique_section): Add a bool argument.
* varasm.c (get_section): Set SECTION_RETAIN for the preserved
symbol with HAVE_GAS_SHF_GNU_RETAIN.
(resolve_unique_section): Used named section for the preserved
symbol if assembler supports SHF_GNU_RETAIN.
(get_variable_section): Handle the preserved common symbol with
HAVE_GAS_SHF_GNU_RETAIN.
(default_elf_asm_named_section): Require the full declaration and
use the 'R' flag for SECTION_RETAIN.
* config.in: Regenerated.
* configure: Likewise.

gcc/testsuite/

2020-11-XX  H.J. Lu  
Jozef Lawrynowicz  

* c-c++-common/attr-used.c: Check the 'R' flag.
* c-c++-common/attr-used-2.c: Likewise.
* c-c++-common/attr-used-3.c: New test.
* c-c++-common/attr-used-4.c: Likewise.
* gcc.c-torture/compile/attr-used-retain-1.c: Likewise.
* gcc.c-torture/compile/attr-used-retain-2.c: Likewise.
* lib/target-supports.exp
(check_effective_target_R_flag_in_section): New proc.
---
 gcc/config.in |  7 +++
 gcc/configure | 51 +++
 gcc/configure.ac  | 20 
 gcc/output.h  |  6 ++-
 gcc/testsuite/c-c++-common/attr-used-2.c  |  1 +
 gcc/testsuite/c-c++-common/attr-used-3.c  |  7 +++
 gcc/testsuite/c-c++-common/attr-used-4.c  |  7 +++
 gcc/testsuite/c-c++-common/attr-used.c|  1 +
 .../compile/attr-used-retain-1.c  | 32 
 .../compile/attr-used-retain-2.c  | 15 ++
 gcc/testsuite/lib/target-supports.exp | 40 +++
 gcc/varasm.c  | 17 +--
 12 files changed, 200 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-3.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-4.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/attr-used-retain-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/attr-used-retain-2.c

diff --git a/gcc/config.in b/gcc/config.in
index b7c3107bfe3..23ae2f9bc1b 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1352,6 +1352,13 @@
 #endif
 
 
+/* Define 0/1 if your assembler supports marking sections with SHF_GNU_RETAIN
+   flag. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GAS_SHF_GNU_RETAIN
+#endif
+
+
 /* Define 0/1 if your assembler supports marking sections with SHF_MERGE flag.
*/
 #ifndef USED_FOR_TARGET
diff --git a/gcc/configure b/gcc/configure
index dbda4415a17..a925a6e5efb 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -24272,6 +24272,57 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
+# Test if the assembler supports the section flag 'R' for specifying
+# section with SHF_GNU_RETAIN.
+case "${target}" in
+  # Solaris may use GNU assembler with Solairs ld.  Even if GNU
+  # assembler supports the section flag 'R', it doesn't mean that
+  # Solairs ld supports it.
+  *-*-solaris2*)
+gcc_cv_as_shf_gnu_retain=no
+;;
+  *)
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for section 
'R' flag" >&5
+$as_echo_n "checking assembler for section 'R' flag... " >&6; }
+if ${gcc_cv_as_shf_gnu_retain+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_shf_gnu_retain=no
+if test $in_tree_gas = yes; then
+if test $in_tree_gas_is_elf = yes \
+  && test $gcc_cv_gas_vers -ge `expr \( \( 2 \* 1000 \) + 36 \) \* 1000 + 0`
+  then gcc_cv_as_shf_gnu_retain=yes
+fi
+  elif test x$gcc_cv_as != x; then
+$as_echo '.section .foo,"awR",%progbits
+.byte 0' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags --fatal-warnings -o conftest.o 
conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_shf_gnu_retain=yes
+el

Re: [PATCH] openmp: Retire nest-var ICV

2020-11-12 Thread Kwok Cheung Yeung

On 10/11/2020 6:01 pm, Jakub Jelinek wrote:

One thing is that max-active-levels-var in 5.0 is per-device,
but in 5.1 per-data environment.  The question is if we should implement
the problematic 5.0 way or the 5.1 one.  E.g.:
#include 
#include 

int
main ()
{
   #pragma omp parallel
   {
 omp_set_nested (1);
 #pragma omp parallel num_threads(2)
 printf ("Hello, world!\n");
   }
}
which used to be valid in 4.5 (where nest-var used to be per-data
environment) is in 5.0 racy (and in 5.1 will not be racy again).
Though, as these are deprecated APIs, perhaps we can just do the 5.0 way for
now.


Since max-active-levels-var is still current in 5.1, I guess we might as well do 
it properly :-). I have now placed max-active-levels-var into gomp_task_icv. The 
definition of omp_get_nested in 5.1 refers to the active-level-var ICV which is 
currently not implemented, so the comparison is against omp_get_active_level() 
instead.



--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -489,8 +489,11 @@ represent their language-specific counterparts.
  
  Nested parallel regions may be initialized at startup by the

  @env{OMP_NESTED} environment variable or at runtime using
-@code{omp_set_nested}.  If undefined, nested parallel regions are
-disabled by default.
+@code{omp_set_nested}.  Setting the maximum number of nested
+regions to above one using the @env{OMP_MAX_ACTIVE_LEVELS}
+environment variable or @code{omp_set_max_active_levels} will
+also enable nesting.  If undefined, nested parallel regions
+are disabled by default.


This doesn't really describe what env.c does.  If undefined, then
if OMP_NESTED is defined, it will be folloed, and if neither is
defined, the code sets the default based on
"OMP_NUM_THREADS or OMP_PROC_BIND is set to a
comma-separated list of more than one value"
as the spec says and only is disabled otherwise.



Similarly.



Again.


I have changed these to more accurately describe what is happening. The 
descriptions are starting to get rather verbose though...



--- a/libgomp/testsuite/libgomp.c/target-5.c
+++ b/libgomp/testsuite/libgomp.c/target-5.c


Why does this testcase need updates?
It doesn't seem to use omp_[sg]et_max_active_levels and so I don't see
why it couldn't use omp_[sg]et_nested.



The problem is with max-active-levels-var (which nesting is now in terms of) 
being per device rather than per data environment. The test expects the nested 
setting to go back to its previous value after leaving a DE that sets it to 
something else.


Anyway, with max-active-levels-var now being per data environment, that is all 
moot now, and the test can remain unchanged.


Is this version okay for trunk? Bootstrapped on x86_64 and libgomp tested with 
no regressions with nvptx offloading.


Thanks

Kwok
commit bcaa3dbf1f130e3a2c7e6033a10be3f61221a951
Author: Kwok Cheung Yeung 
Date:   Thu Nov 12 13:42:28 2020 -0800

openmp: Retire nest-var ICV for OpenMP 5.1

This removes the nest-var ICV, expressing nesting in terms of the
max-active-levels-var ICV instead.  The max-active-levels-var ICV
is now per data environment rather than per device.

2020-11-12  Kwok Cheung Yeung  

libgomp/
* env.c (gomp_global_icv): Remove nest_var field.  Add
max_active_levels_var field.
(gomp_max_active_levels_var): Remove.
(parse_boolean): Return true on success.
(handle_omp_display_env): Express OMP_NESTED in terms of
max_active_levels_var.
(initialize_env): Set max_active_levels_var from
OMP_MAX_ACTIVE_LEVELS, OMP_NESTED, OMP_NUM_THREADS and
OMP_PROC_BIND.
* icv.c (omp_set_nested): Express in terms of
max_active_levels_var.
(omp_get_nested): Likewise.
(omp_set_max_active_levels): Use max_active_levels_var field instead
of gomp_max_active_levels_var.
(omp_get_max_active_levels): Likewise.
* libgomp.h (struct gomp_task_icv): Remove nest_var field.  Add
max_active_levels_var field.
(gomp_max_active_levels_var): Delete.
* libgomp.texi (omp_get_nested): Update documentation.
(omp_set_nested): Likewise.
(OMP_MAX_ACTIVE_LEVELS): Likewise.
(OMP_NESTED): Likewise.
(OMP_NUM_THREADS): Likewise.
(OMP_PROC_BIND): Likewise.
* parallel.c (gomp_resolve_num_threads): Replace reference
to nest_var with max_active_levels_var.  Use max_active_levels_var
field instead of gomp_max_active_levels_var.

diff --git a/libgomp/env.c b/libgomp/env.c
index ab22525..b8ed1bd 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -68,12 +68,11 @@ struct gomp_task_icv gomp_global_icv = {
   .run_sched_chunk_size = 1,
   .default_device_var = 0,
   .dyn_var = false,
-  .nest_var = false,
+  .max_active_levels_var = 1,
   .bind_var = omp_proc_bind_false,
   .target_data = NULL
 };
 
-unsigned long gomp_max_active_levels_var = gomp_supported_active_levels;
 bool gomp_cancel_var

  1   2   >