Re: [Patch] x86: Enable GCC support for Intel AVX-VNNI extension

2020-11-10 Thread Hongtao Liu via Gcc-patches
On Mon, Nov 9, 2020 at 8:26 PM Uros Bizjak  wrote:
>
> On Mon, Nov 9, 2020 at 11:31 AM Hongtao Liu  wrote:
> >
> > >
> > > +  /* Support unified builtin.  */
> > > +  || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
> > >
> > > I don't think we gain anything with unified builtins. Better, just
> > > introduce separate builtins, e.g for
> > >
> >
> > Unified builtins are used for unified intrinsics, intrinsics users may 
> > prefer
> > same interface and let compiler decide encoding version. Separate
> > buitins may cause
> > some defination ambiguous when target attribute is used, see avx-vnni-2.c.
> > We also provide separate intrinsics interface for compatibility with
> > different compilers(llvm/msvc/icc).
>
> Hm, the new intrinsics file introduces:
>
> +#ifdef __AVXVNNI__
> +#define _mm256_dpbusd_avx_epi32(A, B, C) \
> +  _mm256_dpbusd_epi32((A), (B), (C))
> ...
> +#endif /* __AVXVNNI__ */
> +
> +#define _mm256_dpbusd_epi32(A, B, C)\
> +  ((__m256i) __builtin_ia32_vpdpbusd_v8si ((__v8si) (A),\
> +   (__v8si) (B),\
> +   (__v8si) (C)))
> +
>
> And there are two versions of intrinsics:
>
> _mm256_dpbusd_avx_epi32
> _mm256_dpbusd_epi32
>
> So, is _mm256_dpusb_epi32 active for either
>
> OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL
>
> or
>
> OPTION_MASK_ISA2_AVXVNNI ?
>
Yes.
> Is _mm265_dpbusb_avx_epi32 the "compatibility intrinsics"?
>
> In case the above is correct, please expand the comment
>
> +  /* Support unified builtin.  */
> +  || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
>
> with the above information, what kind of unified builtin is this.
>

To explain this, I want to introduce some background first. the first
version of avx-vnni intrinsics are like below(same as you mentioned in
the former email)
extern __inline __m256i
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm256_dpwssds_avx_epi32(__m256i __A,__m256i __B,__m256i __C)
{
  return (__m256i) __builtin_ia32_vpdpwssds_avx_v8si ((__v8si) __A,
  (__v8si) __B,
  (__v8si) __C);
}

The implementation is normal as before. Then some intrinsics users ask
us to provide an unified intrinsics interface for the same
instruction. The request is reasonable, but the problem is avx-vnni is
independent of avx512-vnni, which means avx512-vnni doesn't imply
avx-vnni. It causes some difficulties in implementation.
If we only change patterns in sse.md(to generate vex encoding when
avx-vnni exists), it would fail gcc.target/i386/avx-vnni-2.c since
_mm_dpwssds_epi32 is define with
target_attribute("avx512vl,avx512vnni").
if we remove target_attribute and still define _mm_dpwssds_epi32 as
"extern __inline" function, gcc would fail to build since
__builtin_ia32_vpdpwssds_v4si request target avx512vl and avx512vnni.
Finally we decided to use macros. To do this, we need to provide
unified builtins, or else there would be some issue in mapping
intrinsics to builtins. i.e: we can't do something like below, it
would still fail gcc.target/i386/avx-vnni-2.c.

#if defined(__AVX512VNNI__) && defined(__AVX512VL__)
define _mm_dpwssds_epi32 (A, B, C)\
 ((__m128i) __builtin_ia32_vpdpwssds_v4si (A,B,C))
#endif

#ifdef __AVXVNNI__
define _mm_dpwssds_epi32 (A, B, C)\
 ((__m128i) __builtin_ia32_vpdpwssds_avx_v4si (A,B,C))
#endif

Also we hack some code to support unified builtin.

+  if bisa & (OPTION_MASK_ISA_AVX512VNNI |
OPTION_MASK_ISA_AVX512VL))
+== (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL))
+   || (bisa2 & OPTION_MASK_ISA2_AVXVNNI) != 0)
+  && (((isa & (OPTION_MASK_ISA_AVX512VNNI |
OPTION_MASK_ISA_AVX512VL))
+   == (OPTION_MASK_ISA_AVX512VNNI |
OPTION_MASK_ISA_AVX512VL))
+  || (isa2 & OPTION_MASK_ISA2_AVXVNNI) != 0))
+{
+  isa |= OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL;
+  isa2 |= OPTION_MASK_ISA2_AVXVNNI;
+}
+

This for

> OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL
>
> or
>
> OPTION_MASK_ISA2_AVXVNNI ?
>

And this

+  /* Support unified builtin.  */
+  || (mask2 == OPTION_MASK_ISA2_AVXVNNI)

for gcc/testsuite/lib/target-supports.exp where only builtin is
available to check toolchain existence(we may use inline assembly
instead, but it expose an issue),  def_builtin should define
__builtin_ia32_vpdpwssds_v4si  whenever avx512-vnni && avx512vl or
avx-vnni exist, see gcc/testsuite/gcc.target/i386/avxvnni-builtin.c

> Please also note that #defines won't be tested in e.g. sse-13.c, where:
>
> --q--
>   Defining away "extern" and "__inline" results in all of them being
>   compiled as proper functions.  */
>
> #define extern
> #define __inline
> --/q--
>
> so these defines should be reimplemented as extern inline functions.
>

I reimplemented _mm_dpbusd_avx_epi32 as extern inline functions, but
still keep _mm_dpbusd_epi32 as macros. Maybe we should also add some
explicit O0 tests for _mm_d

Re: Enable MOVDIRI, MOVDIR64B, CLDEMOTE and WAITPKG for march=tremont

2020-11-10 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 10, 2020 at 3:22 AM Jason Merrill via Gcc-patches
 wrote:
>
> This patch was also applied to the GCC 9 and 10 branches and breaks those
> builds, because PTA_CLDEMOTE is not defined.
>
Mine, let me fix it, sorry for that.

-- 
BR,
Hongtao


[PATCH] Normalize VARYING for -fstrict-enums.

2020-11-10 Thread Aldy Hernandez via Gcc-patches
The problem here is that the representation for VARYING in
-fstrict-enums is different between value_range and irange.

The helper function irange::normalize_min_max() will normalize to
VARYING only if setting the range to the entire domain of the
underlying type.  That is, [0, 0xff..ff], not the domain as defined by
-fstrict-enums.  This causes problems because the multi-range version
of varying_p() will return true if the range is the domain as defined
by -fstrict-enums.  Thus, normalize_min_max and varying_p have
different concepts of varying for multi-ranges.

(BTW, legacy ranges are different because they never look at the
extremes of a range to determine varying-ness.  They only look at the
kind field.)

One approach is to change all the code to limit ranges to the domain
in the -fstrict-enums world, but this won't work because there are
various instances of gimple where the values assigned or compared are
beyond the limits of TYPE_{MIN,MAX}_VALUE.  One example is the
addition of 0x to represent subtraction.

This patch fixes multi-range varying_p() and set_varying() to agree
with the normalization code, using the extremes of the underlying type,
to represent varying.

Pushed.

gcc/ChangeLog:

PR tree-optimization/97767
* value-range.cc (dump_bound_with_infinite_markers): Use
wi::min_value and wi::max_value.
(range_tests_strict_enum): New.
(range_tests): Call range_tests_strict_enum.
* value-range.h (irange::varying_p): Use wi::min_value
and wi::max_value.
(irange::set_varying): Same.
(irange::normalize_min_max): Remove comment.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr97767.C: New test.
---
 gcc/testsuite/g++.dg/opt/pr97767.C | 10 +++
 gcc/value-range.cc | 42 --
 gcc/value-range.h  | 23 +++-
 3 files changed, 60 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr97767.C

diff --git a/gcc/testsuite/g++.dg/opt/pr97767.C 
b/gcc/testsuite/g++.dg/opt/pr97767.C
new file mode 100644
index 000..da0879d0567
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr97767.C
@@ -0,0 +1,10 @@
+// { dg-do compile }
+// { dg-options "-O -fstrict-enums -ftree-vrp -w" }
+
+enum { E0 = 0, E1 = 1, E2 = 2 } e;
+
+int
+foo (void)
+{
+  return __builtin_popcount ((int) e);
+}
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 61f7da278d6..f83a824a982 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1866,12 +1866,17 @@ static void
 dump_bound_with_infinite_markers (FILE *file, tree bound)
 {
   tree type = TREE_TYPE (bound);
+  wide_int type_min = wi::min_value (TYPE_PRECISION (type), TYPE_SIGN (type));
+  wide_int type_max = wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type));
+
   if (INTEGRAL_TYPE_P (type)
   && !TYPE_UNSIGNED (type)
-  && vrp_val_is_min (bound)
+  && TREE_CODE (bound) == INTEGER_CST
+  && wi::to_wide (bound) == type_min
   && TYPE_PRECISION (type) != 1)
 fprintf (file, "-INF");
-  else if (vrp_val_is_max (bound)
+  else if (TREE_CODE (bound) == INTEGER_CST
+  && wi::to_wide (bound) == type_max
   && TYPE_PRECISION (type) != 1)
 fprintf (file, "+INF");
   else
@@ -2241,6 +2246,38 @@ range_tests_legacy ()
   }
 }
 
+// Simulate -fstrict-enums where the domain of a type is less than the
+// underlying type.
+
+static void
+range_tests_strict_enum ()
+{
+  // The enum can only hold [0, 3].
+  tree rtype = copy_node (unsigned_type_node);
+  TYPE_MIN_VALUE (rtype) = build_int_cstu (rtype, 0);
+  TYPE_MAX_VALUE (rtype) = build_int_cstu (rtype, 3);
+
+  // Test that even though vr1 covers the strict enum domain ([0, 3]),
+  // it does not cover the domain of the underlying type.
+  int_range<1> vr1 (build_int_cstu (rtype, 0), build_int_cstu (rtype, 1));
+  int_range<1> vr2 (build_int_cstu (rtype, 2), build_int_cstu (rtype, 3));
+  vr1.union_ (vr2);
+  ASSERT_TRUE (vr1 == int_range<1> (build_int_cstu (rtype, 0),
+   build_int_cstu (rtype, 3)));
+  ASSERT_FALSE (vr1.varying_p ());
+
+  // Test that copying to a multi-range does not change things.
+  int_range<2> ir1 (vr1);
+  ASSERT_TRUE (ir1 == vr1);
+  ASSERT_FALSE (ir1.varying_p ());
+
+  // The same test as above, but using TYPE_{MIN,MAX}_VALUE instead of [0,3].
+  vr1 = int_range<1> (TYPE_MIN_VALUE (rtype), TYPE_MAX_VALUE (rtype));
+  ir1 = vr1;
+  ASSERT_TRUE (ir1 == vr1);
+  ASSERT_FALSE (ir1.varying_p ());
+}
+
 static void
 range_tests_misc ()
 {
@@ -2442,6 +2479,7 @@ range_tests ()
   range_tests_legacy ();
   range_tests_irange3 ();
   range_tests_int_range_max ();
+  range_tests_strict_enum ();
   range_tests_misc ();
 }
 
diff --git a/gcc/value-range.h b/gcc/value-range.h
index a483fc802dd..7428c91ea57 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -280,12 +280,14 @@ irange::varying_p () const
   tree l = m_base[0];
   tree u = m_base[1];
   tree t = TRE

Re: [PATCH 1/2] IBM Z: Store long doubles in vector registers when possible

2020-11-10 Thread Andreas Krebbel via Gcc-patches
On 09.11.20 20:54, Ilya Leoshkevich wrote:
> On z14+, there are instructions for working with 128-bit floats (long
> doubles) in vector registers.  It's beneficial to use them instead of
> instructions that operate on floating point register pairs, because it
> allows to store 4 times more data in registers at a time, relieving
> register pressure.  The raw performance of the new instructions is
> almost the same as that of the new ones.
> 
> Implement by storing TFmode values in vector registers on z14+.  Since
> not all operations are available with the new instructions, keep the
> old ones available using the new FPRX2 mode, and convert between it and
> TFmode when necessary (this is called "forwarder" expanders below).
> Change the existing TFmode expanders to call either new- or old-style
> ones depending on whether we are on z14+ or older machines
> ("dispatcher" expanders).
> 
> gcc/ChangeLog:
> 
> 2020-11-03  Ilya Leoshkevich  
> 
>   * config/s390/s390-modes.def (FPRX2): New mode.
>   * config/s390/s390-protos.h (s390_fma_allowed_p): New function.
>   * config/s390/s390.c (s390_fma_allowed_p): Likewise.
>   (s390_build_signbit_mask): Support 128-bit masks.
>   (print_operand): Support printing the second word of a TFmode
>   operand as vector register.
>   (constant_modes): Add FPRX2mode.
>   (s390_class_max_nregs): Return 1 for TFmode on z14+.
>   (s390_is_fpr128): New function.
>   (s390_is_vr128): Likewise.
>   (s390_can_change_mode_class): Use s390_is_fpr128 and
>   s390_is_vr128 in order to determine whether mode refers to a FPR
>   pair or to a VR.
>   (s390_emit_compare): Force TFmode operands into registers on
>   z14+.
>   * config/s390/s390.h (HAVE_TF): New macro.
>   (EXPAND_MOVTF): New macro.
>   (EXPAND_TF): Likewise.
>   * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
>   alias.
>   (ALL): Add FPRX2.
>   (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
>   (FP): Likewise.
>   (FP_ANYTF): New mode iterator.
>   (BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
>   (TD_TF): Likewise.
>   (xde): Add FPRX2.
>   (nBFP): Likewise.
>   (nDFP): Likewise.
>   (DSF): Likewise.
>   (DFDI): Likewise.
>   (SFSI): Likewise.
>   (DF): Likewise.
>   (SF): Likewise.
>   (fT0): Likewise.
>   (bt): Likewise.
>   (_d): Likewise.
>   (HALF_TMODE): Likewise.
>   (tf_fpr): New mode_attr.
>   (type): New mode_attr.
>   (*cmp_ccz_0): Use type instead of mode with fsimp.
>   (*cmp_ccs_0_fastmath): Likewise.
>   (*cmptf_ccs): New pattern for wfcxb.
>   (*cmptf_ccsfps): New pattern for wfkxb.
>   (mov): Rename to mov.
>   (signbit2): Rename to signbit2.
>   (isinf2): Renamed to isinf2.
>   (*TDC_insn_): Use type instead of mode with fsimp.
>   (fixuns_trunc2): Rename to
>   fixuns_trunc2.
>   (fix_trunctf2): Rename to fix_trunctf2_fpr.
>   (floatdi2): Rename to floatdi2, use type
>   instead of mode with itof.
>   (floatsi2): Rename to floatsi2, use type
>   instead of mode with itof.
>   (*floatuns2): Use type instead of mode for
>   itof.
>   (floatuns2): Rename to
>   floatuns2.
>   (trunctf2): Rename to trunctf2_fpr, use type instead
>   of mode with fsimp.
>   (extend2): Rename to
>   extend2.
>   (2): Rename to
>   2, use type instead of
>   mode with fsimp.
>   (rint2): Rename to rint2, use
>   type instead of mode with fsimp.
>   (2): Use type instead of mode for
>   fsimp.
>   (rint2): Likewise.
>   (trunc2): Rename to
>   trunc2.
>   (trunc2): Rename to
>   trunc2.
>   (extend2): Rename to
>   extend2.
>   (extend2): Rename to
>   extend2.
>   (add3): Rename to add3, use type instead of
>   mode with fsimp.
>   (*add3_cc): Use type instead of mode with fsimp.
>   (*add3_cconly): Likewise.
>   (sub3): Rename to sub3, use type instead of
>   mode with fsimp.
>   (*sub3_cc): Use type instead of mode with fsimp.
>   (*sub3_cconly): Likewise.
>   (mul3): Rename to mul3, use type instead of
>   mode with fsimp.
>   (fma4): Restrict using s390_fma_allowed_p.
>   (fms4): Restrict using s390_fma_allowed_p.
>   (div3): Rename to div3, use type instead of
>   mode with fdiv.
>   (neg2): Rename to neg2.
>   (*neg2_cc): Use type instead of mode with fsimp.
>   (*neg2_cconly): Likewise.
>   (*neg2_nocc): Likewise.
>   (*neg2): Likeiwse.
>   (abs2): Rename to abs2, use type instead of
>   mode with fdiv.
>   (*abs2_cc): Use type instead of mode with fsimp.
>   (*abs2_cconly): Likewise.
>   (*abs2_nocc): Likewise.
>   (*abs2): Likewise.
>   (*negabs2_cc): Likewise.
>   (*negabs2_cconly): Likewise.
>   (*negabs2_nocc): Likewise.
>   (*negabs2): Likewise.
>   (sqrt2): Ren

Re: [PATCH 2/2] IBM Z: Test long doubles in vector registers

2020-11-10 Thread Andreas Krebbel via Gcc-patches
On 09.11.20 20:54, Ilya Leoshkevich wrote:
> gcc/testsuite/ChangeLog:
> 
> 2020-11-05  Ilya Leoshkevich  
> 
>   * gcc.target/s390/vector/long-double-callee-abi-scan.c: New test.
>   * gcc.target/s390/vector/long-double-caller-abi-run.c: New test.
>   * gcc.target/s390/vector/long-double-caller-abi-scan.c: New test.
>   * gcc.target/s390/vector/long-double-copysign.c: New test.
>   * gcc.target/s390/vector/long-double-fprx2-constant.c: New test.
>   * gcc.target/s390/vector/long-double-from-double.c: New test.
>   * gcc.target/s390/vector/long-double-from-float.c: New test.
>   * gcc.target/s390/vector/long-double-from-i16.c: New test.
>   * gcc.target/s390/vector/long-double-from-i32.c: New test.
>   * gcc.target/s390/vector/long-double-from-i64.c: New test.
>   * gcc.target/s390/vector/long-double-from-i8.c: New test.
>   * gcc.target/s390/vector/long-double-from-u16.c: New test.
>   * gcc.target/s390/vector/long-double-from-u32.c: New test.
>   * gcc.target/s390/vector/long-double-from-u64.c: New test.
>   * gcc.target/s390/vector/long-double-from-u8.c: New test.
>   * gcc.target/s390/vector/long-double-to-double.c: New test.
>   * gcc.target/s390/vector/long-double-to-float.c: New test.
>   * gcc.target/s390/vector/long-double-to-i16.c: New test.
>   * gcc.target/s390/vector/long-double-to-i32.c: New test.
>   * gcc.target/s390/vector/long-double-to-i64.c: New test.
>   * gcc.target/s390/vector/long-double-to-i8.c: New test.
>   * gcc.target/s390/vector/long-double-to-u16.c: New test.
>   * gcc.target/s390/vector/long-double-to-u32.c: New test.
>   * gcc.target/s390/vector/long-double-to-u64.c: New test.
>   * gcc.target/s390/vector/long-double-to-u8.c: New test.
>   * gcc.target/s390/vector/long-double-vec-duplicate.c: New test.
>   * gcc.target/s390/vector/long-double-wf.h: New test.
>   * gcc.target/s390/vector/long-double-wfaxb.c: New test.
>   * gcc.target/s390/vector/long-double-wfcxb-0001.c: New test.
>   * gcc.target/s390/vector/long-double-wfcxb-0111.c: New test.
>   * gcc.target/s390/vector/long-double-wfcxb-1011.c: New test.
>   * gcc.target/s390/vector/long-double-wfcxb-1101.c: New test.
>   * gcc.target/s390/vector/long-double-wfdxb.c: New test.
>   * gcc.target/s390/vector/long-double-wfixb.c: New test.
>   * gcc.target/s390/vector/long-double-wfkxb-0111.c: New test.
>   * gcc.target/s390/vector/long-double-wfkxb-1011.c: New test.
>   * gcc.target/s390/vector/long-double-wfkxb-1101.c: New test.
>   * gcc.target/s390/vector/long-double-wflcxb.c: New test.
>   * gcc.target/s390/vector/long-double-wflpxb.c: New test.
>   * gcc.target/s390/vector/long-double-wfmaxb-2.c: New test.
>   * gcc.target/s390/vector/long-double-wfmaxb-3.c: New test.
>   * gcc.target/s390/vector/long-double-wfmaxb-disabled.c: New test.
>   * gcc.target/s390/vector/long-double-wfmaxb.c: New test.
>   * gcc.target/s390/vector/long-double-wfmsxb-disabled.c: New test.
>   * gcc.target/s390/vector/long-double-wfmsxb.c: New test.
>   * gcc.target/s390/vector/long-double-wfmxb.c: New test.
>   * gcc.target/s390/vector/long-double-wfnmaxb-disabled.c: New test.
>   * gcc.target/s390/vector/long-double-wfnmaxb.c: New test.
>   * gcc.target/s390/vector/long-double-wfnmsxb-disabled.c: New test.
>   * gcc.target/s390/vector/long-double-wfnmsxb.c: New test.
>   * gcc.target/s390/vector/long-double-wfsqxb.c: New test.
>   * gcc.target/s390/vector/long-double-wfsxb-1.c: New test.
>   * gcc.target/s390/vector/long-double-wfsxb.c: New test.
>   * gcc.target/s390/vector/long-double-wftcixb-1.c: New test.
>   * gcc.target/s390/vector/long-double-wftcixb.c: New test.

Ok. Thanks!

Andreas



Re: Improve handling of memory operands in ipa-icf 1/4

2020-11-10 Thread Richard Biener
On Sun, 8 Nov 2020, Jan Hubicka wrote:

> Hello,
> I would like to finally address the problem with ipa-icf being overly
> conservating on memory operand equality and having overly weak hash
> function. This leads to code size, compile time and compile time memory
> use regressions sompared to earlier versions where we merged more
> aggressively because of weaker TBAA.
> 
> Basically comparing memory operand should match for memory accesses that
> does same base+offset computations and have same behaviour WRT TBAA.
> When optimizing for size we would like to discard TBAA info in the case
> of TBAA mismatch if they perform same actual load.
> 
> From past discussion we conlcuded that operand_equal_p (in fold-const)
> is not good place to address this since it is used for generic trees as
> well.  The TBAA matching/hashing is best done in tree-ssa-alias since it
> is inherently tied to its internals.
> 
> I would like to do following
>  1) keep track what parameters are memory operands.  This is implemented
> in the attached patch.
>  2) Add ao_ref interface for hasing and comparing references inheriting
> fold-const.h:operand_compare which then can feed ICF equivalences
> adding:
> 
> int ao_ref_compare (ao_ref ref1, ao_ref ref2)
> 
> It will return mask how refs differ:
> 
> AO_REF_SEMANTICS if the actual access is different
> (i.e. different base, offset or size of the access)
> AO_REF_BASE_ALIAS_SET if base alias set differs.
> AO_REF_ALIAS_SET if base ref set differs.
> AO_REF_ACCESS_PATH if access path differs.
> 
> and 
> 
> ao_ref_hash (ao_ref hash, int flags)
> 
> which will compute hashtable where flags are same AO_REF as above.
> 
> Initial implementation can compare alias sets for !LTO and for LTO
> compare TYPE_MAIN_VARIANTS.  We currently have no way to hash access
> types for LTO, I plan to add that incremntally.
> 
>  3) Make ipa_icf use ao_ref_compare == 0 to do current merging and
> !(ao_ref_compare & AO_REF_SEMANTICS) for size optimized code.
> 
>  4) Add way of hading alias sets for LTO.  This can be done by moving
> canonical type hash out of lto that I think would be nice cleanup
> anyway.  I have older patch for that (it needs to be untied from
> TYPE_CANONICAL computation by adding extra hashtable but it has no
> performance impact)
> 
> Other alternative is to simply collect list of types relevant and
> compare them at WPA time as done by earlier Martin's patch.
> 
> Later we could play with merging TBAA mismatches memory references with
> -O2+ and clearing them only for uninlined clones, but this is tricky and
> can wait for next stage1.
> 
> Does this make sense?

Well, guess we have to see if it works out and how ugly it will be.

Definitely not sth to rush for this stage1 ...

> Attached patch implements set 1.  I use walk_stmt_load_store_ops to
> discover memor operands by storing them to pointer set. This has the
> expense of extra hashing.  An alternative would be to embed the logic
> into icf walkers, but it seems to be fast enough and I do not see
> that as a good motivation to duplicate the logic into comparsion and
> hashers.
> 
> I have bootstrapped/regtstested on x86_64 and would like to commit it
> after some discussion tomorrow.
> 
> Honza
> 
> 
> 2020-11-08  Jan Hubicka  
> 
>   * ipa-icf-gimple.c: Include gimple-walk.h.
>   (func_checker::compare_ssa_name): Update call of compare_operand.
>   (func_checker::hash_operand): Fix comment and add variant taking
>   operand_access_type parameter.
>   (func_checker::compare_operand): Add operand_access_type parameter.
>   (func_checker::compare_asm_inputs_outputs): Add
>   operand_access_type_map parameter; update use of
>   func_checker::compare_operand.
>   (func_checker::compare_gimple_call): Update use of
>   func_checker::compare_operand.
>   (func_checker::compare_gimple_assign): Likewise.
>   (func_checker::compare_gimple_cond): Likewise.
>   (func_checker::compare_gimple_switch): Likewise.
>   (func_checker::compare_gimple_return): Likewise.
>   (func_checker::compare_gimple_goto): Likewise.
>   (func_checker::compare_gimple_asm): Likewise.
>   (visit_load_store): New static functio.
>   (func_checker::classify_operands): New member function.
>   (func_checker::get_operand_access_type): New member function.
>   * ipa-icf-gimple.h (func_checker::operand_access_type): New enum
>   (func_checker::operand_access_type_map): New typedef.
>   (func_checker::compare_operand): Update prototype.
>   (func_checker::compare_asm_inputs_outputs): Likewise.
>   (func_checker::cleassify_operands): Declare.
>   (func_checker::get_operand_access_type): Declare.
>   (func_checker::hash_operand): New variant with operand_access_type.
>   * ipa-icf.c (sem_function::hash_stmt): Update uses of hash_operand.
>   (sem_function::comp

Re: [PATCH] c, c++: Fix up -Wunused-value on COMPLEX_EXPRs [PR97748]

2020-11-10 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 09, 2020 at 10:34:00PM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Nov 09, 2020 at 02:35:58PM -0500, Jason Merrill wrote:
> > How about calling warn_if_unused_value instead of the new
> > warn_if_unused_value_p?
> 
> That seems to work if I just replace the warning_at call with
> warn_if_unused_value call (at least no regression in check-c++-all and
> libstdc++ testsuite).
> Initially I've tried just calling warn_if_unused_value without the NOP_EXPR
> stripping and code/tclass checks, but that regressed a few tests, e.g.
> g++.dg/warn/Wunused-14.C or c-c++-common/Wunused-var-9.c or 3 lines
> in the new test, e.g. because STATEMENT_LIST or CLEANUP_POINT_EXPRs would
> make it through and resulted in bogus warnings.

Bootstrapped/regtested on x86_64-linux and i686-linux successfully now.
If we wanted to simplify, I think the && !(code == ) part could be
dropped too, i.e. just
  if (tclass == tcc_comparison
  || tclass == tcc_unary
  || tclass == tcc_binary
  || code == VEC_PERM_EXPR
  || code == VEC_COND_EXPR)
warn_if_unused_value (e, loc);
because warn_if_unused_value already returns false on those codes.

> +   enum tree_code code = TREE_CODE (e);
> +   enum tree_code_class tclass = TREE_CODE_CLASS (code);
> +   if (tclass == tcc_comparison
> +   || tclass == tcc_unary
> +   || (tclass == tcc_binary
> +   && !(code == MODIFY_EXPR
> +|| code == INIT_EXPR
> +|| code == PREDECREMENT_EXPR
> +|| code == PREINCREMENT_EXPR
> +|| code == POSTDECREMENT_EXPR
> +|| code == POSTINCREMENT_EXPR))
> +   || code == VEC_PERM_EXPR
> +   || code == VEC_COND_EXPR)
> + warn_if_unused_value (e, loc);
>   }
>   }
>expr = build1 (CONVERT_EXPR, void_type_node, expr);

Jakub



[PATCH] sccvn: Fix up push_partial_def little-endian bitfield handling [PR97764]

2020-11-10 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch fixes a thinko in the left-endian push_partial_def path.
As the testcase shows, we have 3 bitfields in the struct,
bitoff  bitsize
0   3
3   28
31  1
the corresponding read is the byte at offset 3 (i.e. 24 bits)
and push_partial_def first handles the full store ({}) to all bits
and then is processing the store to the middle bitfield with value of -1.
Here are the interesting spots:
  pd.offset -= offseti;
this adjusts the pd to { -21, 28 }, the (for little-endian lowest) 21
bits aren't interesting to us, we only care about the upper 7.
  len = native_encode_expr (pd.rhs, this_buffer, bufsize,
MAX (0, -pd.offset) / BITS_PER_UNIT);
native_encode_expr has the offset parameter in bytes and we tell it
that we aren't interested in the first (lowest) two bytes of the number.
It encodes 0xff, 0xff with len == 2 then.
  HOST_WIDE_INT size = pd.size;
  if (pd.offset < 0)
size -= ROUND_DOWN (-pd.offset, BITS_PER_UNIT);
we get 28 - 16, i.e. 12 - the 16 is subtracting those 2 bytes that we
omitted in native_encode_expr.
  size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT);
needed_len is how many bytes the read at most needs, and that is 1,
so we get size 8 and copy all 8 bits (i.e. a single byte plus nothing)
from the native_encode_expr filled this_buffer; this incorrectly sets
the byte to 0xff when we want 0x7f.  The above line is correct for the
pd.offset >= 0 case when we don't skip anything, but for the pd.offset < 0
case we need to subtract also the remainder of the bits we aren't interested
in (the code shifts the bytes by that number of bits).
If it weren't for the big-endian path, we could as well do
  if (pd.offset < 0)
size += pd.offset;
but the big-endian path needs it differently.
With the following patch, amnt is 3 and we subtract from 12 the (8 - 3)
bits and thus get the 7 which is the value we want.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/10?

2020-11-10  Jakub Jelinek  

PR tree-optimization/97764
* tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): For
little-endian stores with negative pd.offset, subtract
BITS_PER_UNIT - amnt from size if amnt is non-zero.

* gcc.c-torture/execute/pr97764.c: New test.

--- gcc/tree-ssa-sccvn.c.jj 2020-11-06 20:31:34.298252809 +0100
+++ gcc/tree-ssa-sccvn.c2020-11-09 14:45:00.454445644 +0100
@@ -2039,12 +2039,12 @@ vn_walk_cb_data::push_partial_def (pd_da
}
   else
{
- size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT);
  if (pd.offset >= 0)
{
  /* LSB of this_buffer[0] byte should be at pd.offset bits
 in buffer.  */
  unsigned int msk;
+ size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT);
  amnt = pd.offset % BITS_PER_UNIT;
  if (amnt)
shift_bytes_in_array_left (this_buffer, len + 1, amnt);
@@ -2074,6 +2074,9 @@ vn_walk_cb_data::push_partial_def (pd_da
{
  amnt = (unsigned HOST_WIDE_INT) pd.offset % BITS_PER_UNIT;
  if (amnt)
+   size -= BITS_PER_UNIT - amnt;
+ size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT);
+ if (amnt)
shift_bytes_in_array_left (this_buffer, len + 1, amnt);
}
  memcpy (p, this_buffer + (amnt != 0), size / BITS_PER_UNIT);
--- gcc/testsuite/gcc.c-torture/execute/pr97764.c.jj2020-11-09 
14:36:20.974258322 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr97764.c   2020-11-09 
14:36:56.108865088 +0100
@@ -0,0 +1,14 @@
+/* PR tree-optimization/97764 */
+/* { dg-require-effective-target int32plus } */
+
+struct S { int b : 3; int c : 28; int d : 1; };
+
+int
+main ()
+{
+  struct S e = {};
+  e.c = -1;
+  if (e.d)
+__builtin_abort ();
+  return 0;
+}

Jakub



Re: Detect EAF flags in ipa-modref

2020-11-10 Thread Jan Hubicka
> 
> Alias oracle query stats:
>   refs_may_alias_p: 65808750 disambiguations, 75664890 queries
>   ref_maybe_used_by_call_p: 153485 disambiguations, 66711204 queries
>   call_may_clobber_ref_p: 22816 disambiguations, 28889 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 36846 queries
>   nonoverlapping_refs_since_match_p: 27271 disambiguations, 58917 must 
> overlaps, 86958 queries
>   aliasing_component_refs_p: 65808 disambiguations, 2067256 queries
>   TBAA oracle: 25929211 disambiguations 60395141 queries
>12391384 are in alias set 0
>10783783 queries asked about the same object
>126 queries asked about the same alias set
>0 access volatile
>9598698 are dependent in the DAG
>1691939 are aritificially in conflict with void *
> 
> Modref stats:
>   modref use: 14284 disambiguations, 53336 queries
>   modref clobber: 1660281 disambiguations, 2130440 queries
>   4311165 tbaa queries (2.023603 per modref query)
>   685304 base compares (0.321673 per modref query)
> 
> PTA query stats:
>   pt_solution_includes: 959190 disambiguations, 13169678 queries
>   pt_solutions_intersect: 1050969 disambiguations, 13246686 queries

I re-run the mainline withtout EAF flag propagation and the results
are correct, so it is not due to independent change pushed between my
tests.

Honza


[PATCH] tree-optimization/97760 - reduction paths with unhandled live stmt

2020-11-10 Thread Richard Biener
This makes sure we reject reduction paths with a live stmt that
is not the last one altering the value.  This is because we do not
handle this in the epilogue unless there's a scalar epilogue loop.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-09  Richard Biener  

PR tree-optimization/97760
* tree-vect-loop.c (check_reduction_path): Reject
reduction paths we do not handle in epilogue generation.

* gcc.dg/vect/pr97760.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr97760.c | 26 ++
 gcc/tree-vect-loop.c|  9 ++---
 2 files changed, 32 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr97760.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr97760.c 
b/gcc/testsuite/gcc.dg/vect/pr97760.c
new file mode 100644
index 000..da5ac937a43
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr97760.c
@@ -0,0 +1,26 @@
+#include "tree-vect.h"
+
+int b=1;
+static int *g = &b;
+
+void __attribute__((noipa))
+h (unsigned int n)
+{
+  int i = 3;
+  int f = 3;
+  for (; f <= 50; f += 4) {
+i += 4;
+*g = i;
+i += n;
+  }
+}
+
+int main ()
+{
+  check_vect ();
+
+  h (9);
+  if (*g != 150 || b != 150)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 977633a3ce3..39b7319e825 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3326,14 +3326,17 @@ pop:
  fail = true;
  break;
}
-  /* Check there's only a single stmt the op is used on inside
- of the loop.  */
+  /* Check there's only a single stmt the op is used on.  For the
+not value-changing tail and the last stmt allow out-of-loop uses.
+???  We could relax this and handle arbitrary live stmts by
+forcing a scalar epilogue for example.  */
   imm_use_iterator imm_iter;
   gimple *op_use_stmt;
   unsigned cnt = 0;
   FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op)
if (!is_gimple_debug (op_use_stmt)
-   && flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt)))
+   && (*code != ERROR_MARK
+   || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
  {
/* We want to allow x + x but not x < 1 ? x : 2.  */
if (is_gimple_assign (op_use_stmt)
-- 
2.26.2


[PATCH] fortran: Fix up gfc_typename CHARACTER length handling [PR97768]

2020-11-10 Thread Jakub Jelinek via Gcc-patches
Hi!

The first testcase below ICEs when f951 is 32-bit (or 64-bit big-endian).
The problem is that ex->ts.u.cl && ex->ts.u.cl->length are both non-NULL,
but ex->ts.u.cl->length->expr_type is not EXPR_CONSTANT, but EXPR_FUNCTION.
value.function.actual and value.function.name are in that case pointers,
but value._mp_alloc and value._mp_size are 4 byte integers no matter what.
So, in 64-bit little-endian the function returns most of the time incorrect
CHARACTER(0) because the most significant 32 bits of the
value.function.actual pointer are likely 0.
Anyway, the following patch is an attempt to get all the cases right.
Uses ex->value.character.length only for ex->expr_type == EXPR_CONSTANT
(i.e. CHARACTER literals), handles the deferred lengths, assumed lengths,
known constant lengths and finally if the length is something other,
just doesn't print it, i.e. prints just CHARACTER (for default kind)
or CHARACTER(KIND=4) (for e.g. kind 4).

The earlier version of the patch has been successfully bootstrapped
and regtested (only with a few regressions) on x86_64-linux and i686-linux,
this version passed all the new and earlier problematic tests, ok for trunk
if it passes another bootstrap/regtest?

2020-11-10  Jakub Jelinek  

PR fortran/97768
* misc.c (gfc_typename): Use ex->value.character.length only if
ex->expr_type == EXPR_CONSTANT.  If ex->ts.deferred, print : instead
of length.  If ex->ts.u.cl && ex->ts.u.cl->length == NULL, print
* instead of length.  Otherwise if character length is non-constant,
print just CHARACTER or CHARACTER(KIND=N).

* gfortran.dg/pr97768_1.f90: New test.
* gfortran.dg/pr97768_2.f90: New test.

--- gcc/fortran/misc.c.jj   2020-11-09 23:01:02.978826528 +0100
+++ gcc/fortran/misc.c  2020-11-10 10:41:22.087850720 +0100
@@ -224,10 +224,32 @@ gfc_typename (gfc_expr *ex)
 
   if (ex->ts.type == BT_CHARACTER)
 {
-  if (ex->ts.u.cl && ex->ts.u.cl->length)
-   length = gfc_mpz_get_hwi (ex->ts.u.cl->length->value.integer);
-  else
+  if (ex->expr_type == EXPR_CONSTANT)
length = ex->value.character.length;
+  else if (ex->ts.deferred)
+   {
+ if (ex->ts.kind == gfc_default_character_kind)
+   return "CHARACTER(:)";
+ sprintf (buffer, "CHARACTER(:,%d)", ex->ts.kind);
+ return buffer;
+   }
+  else if (ex->ts.u.cl && ex->ts.u.cl->length == NULL)
+   {
+ if (ex->ts.kind == gfc_default_character_kind)
+   return "CHARACTER(*)";
+ sprintf (buffer, "CHARACTER(*,%d)", ex->ts.kind);
+ return buffer;
+   }
+  else if (ex->ts.u.cl == NULL
+  || ex->ts.u.cl->length->expr_type != EXPR_CONSTANT)
+   {
+ if (ex->ts.kind == gfc_default_character_kind)
+   return "CHARACTER";
+ sprintf (buffer, "CHARACTER(KIND=%d)", ex->ts.kind);
+ return buffer;
+   }
+  else
+   length = gfc_mpz_get_hwi (ex->ts.u.cl->length->value.integer);
   if (ex->ts.kind == gfc_default_character_kind)
sprintf (buffer, "CHARACTER(" HOST_WIDE_INT_PRINT_DEC ")", length);
   else
--- gcc/testsuite/gfortran.dg/pr97768_1.f90.jj  2020-11-10 10:22:26.053445061 
+0100
+++ gcc/testsuite/gfortran.dg/pr97768_1.f90 2020-11-10 10:22:26.053445061 
+0100
@@ -0,0 +1,25 @@
+! PR fortran/97768
+! { dg-do compile }
+
+module pr97768_1
+  interface operator(.in.)
+module procedure substr_in_str
+  end interface
+contains
+  pure function to_upper (in_str) result (string)
+character(len=*), intent(in) :: in_str
+character(len=len(in_str)) :: string
+string = in_str
+  end function to_upper
+  logical pure function substr_in_str (substring, string)
+character(len=*), intent(in) :: string, substring
+substr_in_str=.false.
+  end function
+end module
+function foo ()
+  use pr97768_1, only : to_upper, operator(.in.)
+  logical :: foo
+  character(len=8) :: str
+  str = 'abcde'
+  foo = 'b' .in. to_upper (str)
+end function foo
--- gcc/testsuite/gfortran.dg/pr97768_2.f90.jj  2020-11-10 10:22:26.053445061 
+0100
+++ gcc/testsuite/gfortran.dg/pr97768_2.f90 2020-11-10 10:46:15.104602757 
+0100
@@ -0,0 +1,53 @@
+! PR fortran/97768
+! { dg-do compile }
+
+module pr97768_2
+  interface operator(.in.)
+module procedure substr_in_str
+  end interface
+contains
+  pure function to_upper (in_str) result (string)
+character(len=*), intent(in) :: in_str
+character(len=len(in_str)) :: string
+string = in_str
+  end function to_upper
+  logical pure function substr_in_str (substring, string)
+character(len=*), intent(in) :: string, substring
+substr_in_str=.false.
+  end function
+end module
+function foo ()
+  use pr97768_2, only : to_upper, operator(.in.)
+  logical :: foo
+  character(len=8) :: str
+  str = 'abcde'
+  foo = to_upper (str) .in. 32! { dg-error "are CHARACTER/INTEGER" }
+end function foo
+function bar (str)
+  use pr97768_2, only : operator

Re: [PATCH] sccvn: Fix up push_partial_def little-endian bitfield handling [PR97764]

2020-11-10 Thread Richard Biener
On Tue, 10 Nov 2020, Jakub Jelinek wrote:

> Hi!
> 
> This patch fixes a thinko in the left-endian push_partial_def path.
> As the testcase shows, we have 3 bitfields in the struct,
> bitoffbitsize
> 0 3
> 3 28
> 311
> the corresponding read is the byte at offset 3 (i.e. 24 bits)
> and push_partial_def first handles the full store ({}) to all bits
> and then is processing the store to the middle bitfield with value of -1.
> Here are the interesting spots:
>   pd.offset -= offseti;
> this adjusts the pd to { -21, 28 }, the (for little-endian lowest) 21
> bits aren't interesting to us, we only care about the upper 7.
>   len = native_encode_expr (pd.rhs, this_buffer, bufsize,
> MAX (0, -pd.offset) / BITS_PER_UNIT);
> native_encode_expr has the offset parameter in bytes and we tell it
> that we aren't interested in the first (lowest) two bytes of the number.
> It encodes 0xff, 0xff with len == 2 then.
>   HOST_WIDE_INT size = pd.size;
>   if (pd.offset < 0)
> size -= ROUND_DOWN (-pd.offset, BITS_PER_UNIT);
> we get 28 - 16, i.e. 12 - the 16 is subtracting those 2 bytes that we
> omitted in native_encode_expr.
>   size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT);
> needed_len is how many bytes the read at most needs, and that is 1,
> so we get size 8 and copy all 8 bits (i.e. a single byte plus nothing)
> from the native_encode_expr filled this_buffer; this incorrectly sets
> the byte to 0xff when we want 0x7f.  The above line is correct for the
> pd.offset >= 0 case when we don't skip anything, but for the pd.offset < 0
> case we need to subtract also the remainder of the bits we aren't interested
> in (the code shifts the bytes by that number of bits).
> If it weren't for the big-endian path, we could as well do
>   if (pd.offset < 0)
>   size += pd.offset;
> but the big-endian path needs it differently.
> With the following patch, amnt is 3 and we subtract from 12 the (8 - 3)
> bits and thus get the 7 which is the value we want.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/10?

OK.

Thanks,
Richard.

> 2020-11-10  Jakub Jelinek  
> 
>   PR tree-optimization/97764
>   * tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): For
>   little-endian stores with negative pd.offset, subtract
>   BITS_PER_UNIT - amnt from size if amnt is non-zero.
> 
>   * gcc.c-torture/execute/pr97764.c: New test.
> 
> --- gcc/tree-ssa-sccvn.c.jj   2020-11-06 20:31:34.298252809 +0100
> +++ gcc/tree-ssa-sccvn.c  2020-11-09 14:45:00.454445644 +0100
> @@ -2039,12 +2039,12 @@ vn_walk_cb_data::push_partial_def (pd_da
>   }
>else
>   {
> -   size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT);
> if (pd.offset >= 0)
>   {
> /* LSB of this_buffer[0] byte should be at pd.offset bits
>in buffer.  */
> unsigned int msk;
> +   size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT);
> amnt = pd.offset % BITS_PER_UNIT;
> if (amnt)
>   shift_bytes_in_array_left (this_buffer, len + 1, amnt);
> @@ -2074,6 +2074,9 @@ vn_walk_cb_data::push_partial_def (pd_da
>   {
> amnt = (unsigned HOST_WIDE_INT) pd.offset % BITS_PER_UNIT;
> if (amnt)
> + size -= BITS_PER_UNIT - amnt;
> +   size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT);
> +   if (amnt)
>   shift_bytes_in_array_left (this_buffer, len + 1, amnt);
>   }
> memcpy (p, this_buffer + (amnt != 0), size / BITS_PER_UNIT);
> --- gcc/testsuite/gcc.c-torture/execute/pr97764.c.jj  2020-11-09 
> 14:36:20.974258322 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr97764.c 2020-11-09 
> 14:36:56.108865088 +0100
> @@ -0,0 +1,14 @@
> +/* PR tree-optimization/97764 */
> +/* { dg-require-effective-target int32plus } */
> +
> +struct S { int b : 3; int c : 28; int d : 1; };
> +
> +int
> +main ()
> +{
> +  struct S e = {};
> +  e.c = -1;
> +  if (e.d)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [Patch] x86: Enable GCC support for Intel AVX-VNNI extension

2020-11-10 Thread Uros Bizjak via Gcc-patches
On Tue, Nov 10, 2020 at 9:08 AM Hongtao Liu  wrote:
>
> On Mon, Nov 9, 2020 at 8:26 PM Uros Bizjak  wrote:
> >
> > On Mon, Nov 9, 2020 at 11:31 AM Hongtao Liu  wrote:
> > >
> > > >
> > > > +  /* Support unified builtin.  */
> > > > +  || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
> > > >
> > > > I don't think we gain anything with unified builtins. Better, just
> > > > introduce separate builtins, e.g for
> > > >
> > >
> > > Unified builtins are used for unified intrinsics, intrinsics users may 
> > > prefer
> > > same interface and let compiler decide encoding version. Separate
> > > buitins may cause
> > > some defination ambiguous when target attribute is used, see avx-vnni-2.c.
> > > We also provide separate intrinsics interface for compatibility with
> > > different compilers(llvm/msvc/icc).
> >
> > Hm, the new intrinsics file introduces:
> >
> > +#ifdef __AVXVNNI__
> > +#define _mm256_dpbusd_avx_epi32(A, B, C) \
> > +  _mm256_dpbusd_epi32((A), (B), (C))
> > ...
> > +#endif /* __AVXVNNI__ */
> > +
> > +#define _mm256_dpbusd_epi32(A, B, C)\
> > +  ((__m256i) __builtin_ia32_vpdpbusd_v8si ((__v8si) (A),\
> > +   (__v8si) (B),\
> > +   (__v8si) (C)))
> > +
> >
> > And there are two versions of intrinsics:
> >
> > _mm256_dpbusd_avx_epi32
> > _mm256_dpbusd_epi32
> >
> > So, is _mm256_dpusb_epi32 active for either
> >
> > OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL
> >
> > or
> >
> > OPTION_MASK_ISA2_AVXVNNI ?
> >
> Yes.
> > Is _mm265_dpbusb_avx_epi32 the "compatibility intrinsics"?
> >
> > In case the above is correct, please expand the comment
> >
> > +  /* Support unified builtin.  */
> > +  || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
> >
> > with the above information, what kind of unified builtin is this.
> >
>
> To explain this, I want to introduce some background first. the first
> version of avx-vnni intrinsics are like below(same as you mentioned in
> the former email)
> extern __inline __m256i
> __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> _mm256_dpwssds_avx_epi32(__m256i __A,__m256i __B,__m256i __C)
> {
>   return (__m256i) __builtin_ia32_vpdpwssds_avx_v8si ((__v8si) __A,
>   (__v8si) __B,
>   (__v8si) __C);
> }
>
> The implementation is normal as before. Then some intrinsics users ask
> us to provide an unified intrinsics interface for the same
> instruction. The request is reasonable, but the problem is avx-vnni is
> independent of avx512-vnni, which means avx512-vnni doesn't imply
> avx-vnni. It causes some difficulties in implementation.
> If we only change patterns in sse.md(to generate vex encoding when
> avx-vnni exists), it would fail gcc.target/i386/avx-vnni-2.c since
> _mm_dpwssds_epi32 is define with
> target_attribute("avx512vl,avx512vnni").
> if we remove target_attribute and still define _mm_dpwssds_epi32 as
> "extern __inline" function, gcc would fail to build since
> __builtin_ia32_vpdpwssds_v4si request target avx512vl and avx512vnni.
> Finally we decided to use macros. To do this, we need to provide
> unified builtins, or else there would be some issue in mapping
> intrinsics to builtins. i.e: we can't do something like below, it
> would still fail gcc.target/i386/avx-vnni-2.c.
>
> #if defined(__AVX512VNNI__) && defined(__AVX512VL__)
> define _mm_dpwssds_epi32 (A, B, C)\
>  ((__m128i) __builtin_ia32_vpdpwssds_v4si (A,B,C))
> #endif
>
> #ifdef __AVXVNNI__
> define _mm_dpwssds_epi32 (A, B, C)\
>  ((__m128i) __builtin_ia32_vpdpwssds_avx_v4si (A,B,C))
> #endif
>
> Also we hack some code to support unified builtin.
>
> +  if bisa & (OPTION_MASK_ISA_AVX512VNNI |
> OPTION_MASK_ISA_AVX512VL))
> +== (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL))
> +   || (bisa2 & OPTION_MASK_ISA2_AVXVNNI) != 0)
> +  && (((isa & (OPTION_MASK_ISA_AVX512VNNI |
> OPTION_MASK_ISA_AVX512VL))
> +   == (OPTION_MASK_ISA_AVX512VNNI |
> OPTION_MASK_ISA_AVX512VL))
> +  || (isa2 & OPTION_MASK_ISA2_AVXVNNI) != 0))
> +{
> +  isa |= OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL;
> +  isa2 |= OPTION_MASK_ISA2_AVXVNNI;
> +}
> +
>
> This for
>
> > OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL
> >
> > or
> >
> > OPTION_MASK_ISA2_AVXVNNI ?
> >
>
> And this
>
> +  /* Support unified builtin.  */
> +  || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
>
> for gcc/testsuite/lib/target-supports.exp where only builtin is
> available to check toolchain existence(we may use inline assembly
> instead, but it expose an issue),  def_builtin should define
> __builtin_ia32_vpdpwssds_v4si  whenever avx512-vnni && avx512vl or
> avx-vnni exist, see gcc/testsuite/gcc.target/i386/avxvnni-builtin.c
>
> > Please also note that #defines won't be tested in e.g. sse-13.c, where:
> >
> > --q--
> >   Defining away "extern" and "__inline" results in all of them being
> >   co

Re: Detect EAF flags in ipa-modref

2020-11-10 Thread Jan Hubicka
> Bootstrapped/regtested x86_64-linux, OK?
> (I will analyze more the t2.c failure)

I have found independent reproducer that is now in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97775

Honza


[PATCH] Use extremes of underlying type when building VR_ANTI_RANGEs.

2020-11-10 Thread Aldy Hernandez via Gcc-patches
[Andrew, this doesn't fix a PR per se.  It's just something I stumbled
upon in my other -fstrict-enums fixes.  It passes tests.  What do you
think?  Should we push it?]

Imagine an enum in an -fstrict-enums world:

enum foo { zero, one, two, three };

Building the inverse of [0,3] currently yields VARYING (instead of
[4,+INF]).  This is because the setter sees 0 and 3 as the extremes of
the type, and per the current code, calculates the inverse to VARYING.
BTW, it really should be UNDEFINED, but this is legacy code I'm afraid
to touch:

  if (is_min && is_max)
{
  /* We cannot deal with empty ranges, drop to varying.
 ???  This could be VR_UNDEFINED instead.  */
  set_varying (type);
  return;
}

However, the extremes of a VARYING are the extremes of the underlying
type ([0, 0xff..ff]), even in the presence of -fstrict-enums.

This fixes the setter to use the extremes of the underlying type.

gcc/ChangeLog:

* value-range.cc (irange::set): Use wi::{min,max}_value
instead of TYPE_{MIN,MAX}_VALUE when building anti-ranges.
(range_tests_strict_enum): New tests.
---
 gcc/value-range.cc | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index f83a824a982..bbd7d69a766 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -274,11 +274,13 @@ irange::set (tree min, tree max, value_range_kind kind)
   // Anti-ranges that can be represented as ranges should be so.
   if (kind == VR_ANTI_RANGE)
 {
-  /* For -fstrict-enums we may receive out-of-range ranges so consider
- values < -INF and values > INF as -INF/INF as well.  */
-  bool is_min = vrp_val_is_min (min);
-  bool is_max = vrp_val_is_max (max);
   tree type = TREE_TYPE (min);
+  unsigned prec = TYPE_PRECISION (type);
+  signop sign = TYPE_SIGN (type);
+  wide_int type_max = wi::max_value (prec, sign);
+  wide_int type_min = wi::min_value (prec, sign);
+  bool is_min = wi::to_wide (min) == type_min;
+  bool is_max = wi::to_wide (max) == type_max;
 
   if (is_min && is_max)
{
@@ -302,14 +304,14 @@ irange::set (tree min, tree max, value_range_kind kind)
 {
  tree one = build_int_cst (TREE_TYPE (max), 1);
  min = int_const_binop (PLUS_EXPR, max, one);
- max = vrp_val_max (TREE_TYPE (max));
+ max = wide_int_to_tree (type, type_max);
  kind = VR_RANGE;
 }
   else if (is_max)
 {
  tree one = build_int_cst (TREE_TYPE (min), 1);
  max = int_const_binop (MINUS_EXPR, min, one);
- min = vrp_val_min (TREE_TYPE (min));
+ min = wide_int_to_tree (type, type_min);
  kind = VR_RANGE;
 }
 }
@@ -2276,6 +2278,19 @@ range_tests_strict_enum ()
   ir1 = vr1;
   ASSERT_TRUE (ir1 == vr1);
   ASSERT_FALSE (ir1.varying_p ());
+
+  // Legacy inversion of [0, 3] should be [4, MAX].
+  vr1 = int_range<1> (build_int_cstu (rtype, 0), build_int_cstu (rtype, 3));
+  vr1.invert ();
+  tree type_max = wide_int_to_tree (rtype,
+   wi::max_value (TYPE_PRECISION (rtype),
+  TYPE_SIGN (rtype)));
+  ASSERT_TRUE (vr1 == int_range<1> (build_int_cstu (rtype, 4), type_max));
+
+  // Multi-range inversion of [0, 3] should be [4, MAX].
+  ir1 = int_range<2> (build_int_cstu (rtype, 0), build_int_cstu (rtype, 3));
+  ir1.invert ();
+  ASSERT_TRUE (vr1 == int_range<2> (build_int_cstu (rtype, 4), type_max));
 }
 
 static void
-- 
2.26.2



[committed] doc: Fix grammar in description of earlyclobber

2020-11-10 Thread Alex Coplan via Gcc-patches
Committing this as obvious.

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b462869437..813875b973b 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -1634,7 +1634,7 @@ constraints with multiple alternatives, sometimes one 
alternative
 requires @samp{&} while others do not.  See, for example, the
 @samp{movdf} insn of the 68000.
 
-A operand which is read by the instruction can be tied to an earlyclobber
+An operand which is read by the instruction can be tied to an earlyclobber
 operand if its only use as an input occurs before the early result is
 written.  Adding alternatives of this form often allows GCC to produce
 better code when only some of the read operands can be affected by the

---

gcc/ChangeLog:

* doc/md.texi (Modifiers): Fix grammar in description of
earlyclobber constraint modifier.


Re: Detect EAF flags in ipa-modref

2020-11-10 Thread Jan Hubicka
> > Bootstrapped/regtested x86_64-linux, OK?
> > (I will analyze more the t2.c failure)
> 
> I have found independent reproducer that is now in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97775
... and with Jakub's fix the testcase works now.
Honza
> 
> Honza


Re: Detect EAF flags in ipa-modref

2020-11-10 Thread Richard Biener
On Tue, 10 Nov 2020, Jan Hubicka wrote:

> Hi,
> this is updated patch for autodetection of EAF flags.  Still the goal is
> to avoid fancy stuff and get besic logic in place (so no dataflow, no IPA
> propagation, no attempts to handle trickier cases).  There is one new failure
> 
> ./gcc/testsuite/gcc/gcc.sum:FAIL: gcc.dg/sso/t2.c   -Wno-scalar-storage-order 
> -O1 -fno-inline  output pattern test
> ./gcc/testsuite/gcc/gcc.sum:FAIL: gcc.dg/sso/t2.c   -Wno-scalar-storage-order 
> -O2  output pattern test
> ./gcc/testsuite/gcc/gcc.sum:FAIL: gcc.dg/sso/t2.c   -Wno-scalar-storage-order 
> -Os  output pattern test
> 
> Which I blieve is bug exposed by detecting dump function to be EAF_DIRECT and
> NOESCAPE (which it is) and packing/updacking the bitfields leads in one bit
> difference. Still no idea why.
> 
> Patch seems to be quite effective on cc1plus turning:
> 
> Alias oracle query stats:
>   refs_may_alias_p: 65808750 disambiguations, 75664890 queries
>   ref_maybe_used_by_call_p: 153485 disambiguations, 66711204 queries
>   call_may_clobber_ref_p: 22816 disambiguations, 28889 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 36846 queries
>   nonoverlapping_refs_since_match_p: 27271 disambiguations, 58917 must 
> overlaps, 86958 queries
>   aliasing_component_refs_p: 65808 disambiguations, 2067256 queries
>   TBAA oracle: 25929211 disambiguations 60395141 queries
>12391384 are in alias set 0
>10783783 queries asked about the same object
>126 queries asked about the same alias set
>0 access volatile
>9598698 are dependent in the DAG
>1691939 are aritificially in conflict with void *
> 
> Modref stats:
>   modref use: 14284 disambiguations, 53336 queries
>   modref clobber: 1660281 disambiguations, 2130440 queries
>   4311165 tbaa queries (2.023603 per modref query)
>   685304 base compares (0.321673 per modref query)
> 
> PTA query stats:
>   pt_solution_includes: 959190 disambiguations, 13169678 queries
>   pt_solutions_intersect: 1050969 disambiguations, 13246686 queries
> 
> Alias oracle query stats:
>   refs_may_alias_p: 66914578 disambiguations, 76692648 queries
>   ref_maybe_used_by_call_p: 244077 disambiguations, 67732086 queries
>   call_may_clobber_ref_p: 111475 disambiguations, 116613 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 37091 queries
>   nonoverlapping_refs_since_match_p: 27267 disambiguations, 59019 must 
> overlaps, 87056 queries
>   aliasing_component_refs_p: 65870 disambiguations, 2063394 queries
>   TBAA oracle: 26024415 disambiguations 60579490 queries
>12450910 are in alias set 0
>10806673 queries asked about the same object
>125 queries asked about the same alias set
>0 access volatile
>9605837 are dependent in the DAG
>1691530 are aritificially in conflict with void *
> 
> Modref stats:
>   modref use: 14272 disambiguations, 277680 queries
>   modref clobber: 1669753 disambiguations, 7849135 queries
>   4330162 tbaa queries (0.551674 per modref query)
>   699241 base compares (0.089085 per modref query)
> 
> PTA query stats:
>   pt_solution_includes: 1833920 disambiguations, 13846032 queries
>   pt_solutions_intersect: 1093785 disambiguations, 13309954 queries
> 
> So almost twice as many pt_solution_includes disambiguations.
> I will re-run the stats overnight to be sure that it is not independent
> change (but both build was from almost same checkout).
> 
> Bootstrapped/regtested x86_64-linux, OK?
> (I will analyze more the t2.c failure)
> 
> Honza
> 
> gcc/ChangeLog:
> 
> 2020-11-10  Jan Hubicka  
> 
>   * gimple.c: Include ipa-modref-tree.h and ipa-modref.h.
>   (gimple_call_arg_flags): Use modref to determine flags.
>   * ipa-modref.c: Include gimple-ssa.h, tree-phinodes.h,
>   tree-ssa-operands.h, stringpool.h and tree-ssanames.h.
>   (analyze_ssa_name_flags): Declare.
>   (modref_summary::useful_p): Summary is also useful if arg flags are
>   known.
>   (dump_eaf_flags): New function.
>   (modref_summary::dump): Use it.
>   (get_modref_function_summary): Be read for current_function_decl
>   being NULL.
>   (memory_access_to): New function.
>   (deref_flags): New function.
>   (call_lhs_flags): New function.
>   (analyze_parms): New function.
>   (analyze_function): Use it.
>   * ipa-modref.h (struct modref_summary): Add arg_flags.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-11-10  Jan Hubicka  
> 
>   * gcc.dg/torture/pta-ptrarith-1.c: Escape parametrs.
> 
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 1afed88e1f1..da90716aa23 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -46,6 +46,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "asan.h"
>  #include "langhooks.h"
>  #include "attr-fnspec.h"
> +#include "ipa-modref-tree.h"
> +#include "ipa-modref.h"
>  

Re: [PATCH][AArch64] Skip arm targets in vq*shr*n_high_n intrinsic tests

2020-11-10 Thread Richard Sandiford via Gcc-patches
David Candler  writes:
> Hi,
>
> These tests should be skipped for arm targets as the instrinsics
> are only supported on aarch64.
>
> Tested on aarch64 and aarch32
>
> gcc/testsuite/ChangeLog
>
> 2020-11-09  David Candler  
>
> * gcc.target/aarch64/advsimd-intrinsics/vqrshrn_high_n.c: Added skip 
> directive.
> * gcc.target/aarch64/advsimd-intrinsics/vqrshrun_high_n.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vqshrn_high_n.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vqshrun_high_n.c: Likewise.

Thanks, pushed to trunk.

Richard

>
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrn_high_n.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrn_high_n.c
> index d9add2908d1..6ebe0743cc4 100644
> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrn_high_n.c
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrn_high_n.c
> @@ -1,3 +1,6 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { arm*-*-* } } */
> +
>  #include 
>  #include "arm-neon-ref.h"
>  #include "compute-ref-data.h"
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrun_high_n.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrun_high_n.c
> index 1a3788cd14a..49d319d0181 100644
> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrun_high_n.c
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrun_high_n.c
> @@ -1,3 +1,6 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { arm*-*-* } } */
> +
>  #include 
>  #include "arm-neon-ref.h"
>  #include "compute-ref-data.h"
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrn_high_n.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrn_high_n.c
> index 72aecc15ba2..8d06f113dc8 100644
> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrn_high_n.c
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrn_high_n.c
> @@ -1,3 +1,6 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { arm*-*-* } } */
> +
>  #include 
>  #include "arm-neon-ref.h"
>  #include "compute-ref-data.h"
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrun_high_n.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrun_high_n.c
> index 4885c029d1a..e8235fe9693 100644
> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrun_high_n.c
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrun_high_n.c
> @@ -1,3 +1,6 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { arm*-*-* } } */
> +
>  #include 
>  #include "arm-neon-ref.h"
>  #include "compute-ref-data.h"


[PATCH] More PRE TLC

2020-11-10 Thread Richard Biener
This makes get_expr_value_id cheap and completes the
constant value-id simplification by turning the constant_value_expressions
into a direct map instead of a set of pre_exprs for the value.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-10  Richard Biener  

* tree-ssa-pre.c (pre_expr_d::value_id): Add.
(constant_value_expressions): Turn into an array of pre_expr.
(get_or_alloc_expr_for_nary): New function.
(get_or_alloc_expr_for_reference): Likewise.
(add_to_value): For constant values only ever add a single
CONSTANT.
(get_expr_value_id): Return the new value_id member.
(vn_valnum_from_value_id): Split out and simplify constant
value id handling.
(get_or_alloc_expr_for_constant): Set the value_id member.
(phi_translate_1): Use get_or_alloc_expr_for_*.
(compute_avail): Likewise.
(bitmap_find_leader): Simplify constant value id handling.
---
 gcc/tree-ssa-pre.c | 183 ++---
 1 file changed, 89 insertions(+), 94 deletions(-)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index fec3b2f80f1..160f3b4593a 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -256,6 +256,7 @@ typedef struct pre_expr_d : nofree_ptr_hash 
 {
   enum pre_expr_kind kind;
   unsigned int id;
+  unsigned value_id;
   location_t loc;
   pre_expr_union u;
 
@@ -422,11 +423,65 @@ get_or_alloc_expr_for_name (tree name)
   result = pre_expr_pool.allocate ();
   result->kind = NAME;
   result->loc = UNKNOWN_LOCATION;
+  result->value_id = VN_INFO (name)->value_id;
   PRE_EXPR_NAME (result) = name;
   alloc_expression_id (result);
   return result;
 }
 
+/* Given an NARY, get or create a pre_expr to represent it.  */
+
+static pre_expr
+get_or_alloc_expr_for_nary (vn_nary_op_t nary,
+   location_t loc = UNKNOWN_LOCATION)
+{
+  struct pre_expr_d expr;
+  pre_expr result;
+  unsigned int result_id;
+
+  expr.kind = NARY;
+  expr.id = 0;
+  PRE_EXPR_NARY (&expr) = nary;
+  result_id = lookup_expression_id (&expr);
+  if (result_id != 0)
+return expression_for_id (result_id);
+
+  result = pre_expr_pool.allocate ();
+  result->kind = NARY;
+  result->loc = loc;
+  result->value_id = nary->value_id;
+  PRE_EXPR_NARY (result) = nary;
+  alloc_expression_id (result);
+  return result;
+}
+
+/* Given an REFERENCE, get or create a pre_expr to represent it.  */
+
+static pre_expr
+get_or_alloc_expr_for_reference (vn_reference_t reference,
+location_t loc = UNKNOWN_LOCATION)
+{
+  struct pre_expr_d expr;
+  pre_expr result;
+  unsigned int result_id;
+
+  expr.kind = REFERENCE;
+  expr.id = 0;
+  PRE_EXPR_REFERENCE (&expr) = reference;
+  result_id = lookup_expression_id (&expr);
+  if (result_id != 0)
+return expression_for_id (result_id);
+
+  result = pre_expr_pool.allocate ();
+  result->kind = REFERENCE;
+  result->loc = loc;
+  result->value_id = reference->value_id;
+  PRE_EXPR_REFERENCE (result) = reference;
+  alloc_expression_id (result);
+  return result;
+}
+
+
 /* An unordered bitmap set.  One bitmap tracks values, the other,
expressions.  */
 typedef class bitmap_set
@@ -444,9 +499,9 @@ public:
 
 /* Mapping from value id to expressions with that value_id.  */
 static vec value_expressions;
-/* ???  We want to just record a single expression for each constant
-   value, one of kind CONSTANT.  */
-static vec constant_value_expressions;
+/* We just record a single expression for each constant value,
+   one of kind CONSTANT.  */
+static vec constant_value_expressions;
 
 
 /* This structure is used to keep track of statistics on what
@@ -640,35 +695,33 @@ phi_trans_add (expr_pred_trans_t *entry, pre_expr e, 
basic_block pred)
 static void
 add_to_value (unsigned int v, pre_expr e)
 {
-  bitmap set;
-
   gcc_checking_assert (get_expr_value_id (e) == v);
 
   if (value_id_constant_p (v))
 {
+  if (e->kind != CONSTANT)
+   return;
+
   if (-v >= constant_value_expressions.length ())
constant_value_expressions.safe_grow_cleared (-v + 1);
 
-  set = constant_value_expressions[-v];
-  if (!set)
-   {
- set = BITMAP_ALLOC (&grand_bitmap_obstack);
- constant_value_expressions[-v] = set;
-   }
+  pre_expr leader = constant_value_expressions[-v];
+  if (!leader)
+   constant_value_expressions[-v] = e;
 }
   else
 {
   if (v >= value_expressions.length ())
value_expressions.safe_grow_cleared (v + 1);
 
-  set = value_expressions[v];
+  bitmap set = value_expressions[v];
   if (!set)
{
  set = BITMAP_ALLOC (&grand_bitmap_obstack);
  value_expressions[v] = set;
}
+  bitmap_set_bit (set, get_or_alloc_expression_id (e));
 }
-  bitmap_set_bit (set, get_or_alloc_expression_id (e));
 }
 
 /* Create a new bitmap set and return it.  */
@@ -687,29 +740,10 @@ bitmap_set_new (void)

Re: [PATCH] C-family : Add attribute 'unavailable'.

2020-11-10 Thread Iain Sandoe

Hi Richard,

Richard Biener  wrote:


On Mon, 9 Nov 2020, Iain Sandoe wrote:


Hi,

I?ve been carrying this patch around on my Darwin branches for a very long
time?

tested across the Darwin patch and on x86_64-linux-gnu,
OK for master?
thanks
Iain

= commit message

If an interface is marked 'deprecated' then, presumably, at some point it
will be withdrawn and no longer available. The 'unavailable' attribute
makes it possible to mark up interfaces to indicate this status. It is  
used

quite extensively in some codebases where a single set of headers can be
used
to permit code generation for multiple system versions.

From a configuration perspective, it also allows a compile test to  
determine

that an interface is missing - rather than requiring a link test.

The implementation follows the pattern of attribute deprecated, but  
produces

an error (where deprecation produces a warning).

This attribute has been implemented in clang for some years.

gcc/c-family/ChangeLog:

* c-attribs.c (handle_unavailable_attribute): New.

gcc/c/ChangeLog:

* c-decl.c (enum deprecated_states): Add unavailable state.
(merge_decls): Copy unavailability.
(quals_from_declspecs): Handle unavailable case.
(start_decl): Amend the logic handling suppression of nested
deprecation states to include unavailability.
(smallest_type_quals_location): Amend comment.
(grokdeclarator): Handle the unavailable deprecation state.
(declspecs_add_type): Set TREE_UNAVAILABLE from the decl specs.
* c-tree.h (struct c_declspecs): Add unavailable_p.
* c-typeck.c (build_component_ref): Handle unavailability.
(build_external_ref): Likewise.

gcc/cp/ChangeLog:

* call.c (build_over_call): Handle unavailable state in addition to
deprecation.
* class.c (type_build_ctor_call): Likewise.
(type_build_dtor_call): Likewise.
* cp-tree.h: Rename cp_warn_deprecated_use to
cp_handle_deprecated_or_unavailable.
* decl.c (duplicate_decls): Merge unavailability.
(grokdeclarator): Handle unavailability in addition to deprecation.
(type_is_unavailable): New.
(grokparms): Handle unavailability in addition to deprecation.
* decl.h (enum deprecated_states): Add
UNAVAILABLE_DEPRECATED_SUPPRESS.
* decl2.c (cplus_decl_attributes): Propagate unavailability to
templates.
(cp_warn_deprecated_use): Rename to ...
(cp_handle_deprecated_or_unavailable): ... this and amend to handle
the unavailable case. It remains a warning in the case of deprecation
but becomes an error in the case of unavailability.
(cp_warn_deprecated_use_scopes): Handle unavailability.
(mark_used): Likewise.
* parser.c (cp_parser_template_name): Likewise.
(cp_parser_template_argument): Likewise.
(cp_parser_parameter_declaration_list): Likewise.
* typeck.c (build_class_member_access_expr): Likewise.
(finish_class_member_access_expr): Likewise.
* typeck2.c (build_functional_cast_1): Likewise.

gcc/ChangeLog:

* lto-streamer-out.c (hash_tree): Stream TREE_UNAVAILABLE.
* print-tree.c (print_node): Handle unavailable attribute.
* tree-core.h (struct tree_base): Add a bit to carry unavailability.
* tree.c (error_unavailable_use): New.
* tree.h (TREE_UNAVAILABLE): New.
(error_unavailable_use): New.


Why'd you need to stream this in LTO, more specifically only handle
it in hashing?  It should be all frontend operation.


I followed the existing implementation for TREE_DEPRECATED ().

Presumably, that entry should also be removed - I will remove and re-test  
(it wasn’t
obvious to me why the entry was present, the deprecation/unavailable stuff  
should

be done by end of gimplification).


You're targeting only DECLs can you use a bit from decl_common where decl
specific bits exist instead?


.. except that we are also targeting types.

We could use a bit in decl_common for decls and in tree_type_common for  
types,
I suppose, but that would then mean that every test for unavailable would  
grow an

additional check to discriminate.

(if we do that, then presumably, as a separate patch, we might as well move
 TREE_DEPRECATED in the same way - since otherwise the impl. is different for
 no obvious reason).

thanks
Iain




Thanks,
Richard.


gcc/objc/ChangeLog:

* objc-act.c (objc_add_property_declaration): Register unavailable
attribute.
(maybe_make_artificial_property_decl): Set available.
(objc_maybe_build_component_ref): Generalise to the method prototype
to count availability.
(objc_build_class_component_ref): Likewise.
(build_private_template): Likewise.
(objc_decl_method_attributes): Handle unavailable attribute.
(lookup_method_in_hash_lists): Amend comments.
(objc_finish_message_expr): Handle unavailability in addition to
deprecation.
(start_class): Likewise.
(finish_class): Likewise.
(lookup_protocol): Likewise.
(objc_declare_protocol): Likewise.
(start_protocol): Register unavailable attribute.
(really_start_method): Likewise.
(objc_gimplify_property_ref): Emit error on encountering an
unavailable entity (and a warning for a deprecated one).

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-unavailable-1.C

Re: testsuite: Adjust pr96789.c to exclude vect_load_lanes

2020-11-10 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin"  writes:
> Hi,
>
> As Lyon pointed out, the newly introduced test case
> gcc.dg/tree-ssa/pr96789.c fails on arm-none-linux-gnueabihf.
> Loop vectorizer is able to vectorize the two loops which
> operate on array tmp with load_lanes feature support.  It
> makes dse3 get unexpected inputs and do nothing.
>
> This patch is to teach the case to respect vect_load_lanes,
> meanwhile to guard the check only under vect_int.

I'm not sure this is the right check.  The test passes on aarch64,
which also has load lanes, but apparently doesn't use them for this
test.  I think the way the loop vectoriser handles the loops will
depend a lot on target costs, which can vary in unpredictable ways.

Does it work if you instead change -ftree-vectorize to -ftree-slp-vectorize?
Or does that defeat the purpose of the test?

Thanks,
Richard

>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/tree-ssa/pr96789.c: Adjusted by excluding vect_load_lanes.
>
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c
> index d6139a014d8..1b89f8b7a6a 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c
> @@ -55,4 +55,7 @@ bar (int16_t res[16], uint8_t *val1, uint8_t *val2)
>  }
>  }
>
> -/* { dg-final { scan-tree-dump {Deleted dead store:.*tmp} "dse3" } } */
> +/* Exclude targets which support load_lanes since loop vectorizer
> +   can vectorize those two loops that operate tmp array so that
> +   subsequent dse3 will not eliminate tmp stores.  */
> +/* { dg-final { scan-tree-dump {Deleted dead store:.*tmp} "dse3" { target { 
> vect_int && { ! vect_load_lanes } } } } } */


[PATCH][MIPS] PR target/77510 Reduce size of MIPS core automaton

2020-11-10 Thread Xu Chenghua


Hi:

This patch reduce reservation of model do not more than 10 cycles. The 
memory of genautomata down to 1GB.


ChangeLog

gcc/
    PR target/77510
    * config/mips/gs464.md: Reduce reservation
    duration to 10 cycles.
    * config/mips/i6400.md: Likewise.
    * config/mips/m5100.md: Likewise.
    * config/mips/p5600.md: Likewise.
    * config/mips/p6600.md: Likewise.
    * config/mips/xlp.md: Likewise.
    * config/mips/xlr.md: Likewise.

>From 29fa1d943a30d97fb888980b1dcfdf38b12b67d9 Mon Sep 17 00:00:00 2001
From: Chenghua Xu 
Date: Tue, 10 Nov 2020 19:04:48 +0800
Subject: [PATCH] [MIPS] Reduce size of MIPS core automaton.

gcc/
	PR target/77510
	* config/mips/gs464.md: Reduce reservation
	duration to 10 cycles.
	* config/mips/i6400.md: Likewise.
	* config/mips/m5100.md: Likewise.
	* config/mips/p5600.md: Likewise.
	* config/mips/p6600.md: Likewise.
	* config/mips/xlp.md: Likewise.
	* config/mips/xlr.md: Likewise.
---
 gcc/config/mips/gs464.md |  8 
 gcc/config/mips/i6400.md | 16 
 gcc/config/mips/m5100.md | 10 +-
 gcc/config/mips/p5600.md |  2 +-
 gcc/config/mips/p6600.md |  2 +-
 gcc/config/mips/xlp.md   | 20 ++--
 gcc/config/mips/xlr.md   |  4 ++--
 7 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/gcc/config/mips/gs464.md b/gcc/config/mips/gs464.md
index c936c74c8c0..3cd51a5c3bf 100644
--- a/gcc/config/mips/gs464.md
+++ b/gcc/config/mips/gs464.md
@@ -71,13 +71,13 @@
   (and (eq_attr "cpu" "gs464")
(and (eq_attr "type" "idiv")
 	(eq_attr "mode" "SI")))
-  "gs464_alu2 * 12")
+  "gs464_alu2 * 10")
 
 (define_insn_reservation "gs464_idiv_di" 25
   (and (eq_attr "cpu" "gs464")
(and (eq_attr "type" "idiv")
 	(eq_attr "mode" "DI")))
-  "gs464_alu2 * 25")
+  "gs464_alu2 * 10")
 
 (define_insn_reservation "gs464_load" 3
   (and (eq_attr "cpu" "gs464")
@@ -120,13 +120,13 @@
   (and (eq_attr "cpu" "gs464")
(and (eq_attr "type" "fdiv,frdiv,fsqrt,frsqrt")
 	(eq_attr "mode" "SF")))
-  "gs464_falu1 * 12")
+  "gs464_falu1 * 10")
 
 (define_insn_reservation "gs464_fdiv_df" 19
   (and (eq_attr "cpu" "gs464")
(and (eq_attr "type" "fdiv,frdiv,fsqrt,frsqrt")
 	(eq_attr "mode" "DF")))
-  "gs464_falu1 * 19")
+  "gs464_falu1 * 10")
 
 ;; Force single-dispatch for unknown or multi.
 (define_insn_reservation "gs464_unknown" 1
diff --git a/gcc/config/mips/i6400.md b/gcc/config/mips/i6400.md
index 5db9f7098c1..5a3c907110e 100644
--- a/gcc/config/mips/i6400.md
+++ b/gcc/config/mips/i6400.md
@@ -99,21 +99,21 @@
   (and (eq_attr "cpu" "i6400")
(and (eq_attr "mode" "V2DI")
 	(eq_attr "type" "simd_div")))
-  "i6400_fpu_short+i6400_fpu_div*36")
+  "i6400_fpu_short+i6400_fpu_div*10")
 
 ;; div.w, mod.w (non-pipelined)
 (define_insn_reservation "i6400_msa_div_w" 20
   (and (eq_attr "cpu" "i6400")
(and (eq_attr "mode" "V4SI")
 	(eq_attr "type" "simd_div")))
-  "i6400_fpu_short+i6400_fpu_div*20")
+  "i6400_fpu_short+i6400_fpu_div*10")
 
 ;; div.h, mod.h (non-pipelined)
 (define_insn_reservation "i6400_msa_div_h" 12
   (and (eq_attr "cpu" "i6400")
(and (eq_attr "mode" "V8HI")
 	(eq_attr "type" "simd_div")))
-  "i6400_fpu_short+i6400_fpu_div*12")
+  "i6400_fpu_short+i6400_fpu_div*10")
 
 ;; div.b, mod.b (non-pipelined)
 (define_insn_reservation "i6400_msa_div_b" 8
@@ -201,13 +201,13 @@
   (and (eq_attr "cpu" "i6400")
(and (eq_attr "mode" "V2DF")
 	(eq_attr "type" "simd_fdiv")))
-  "i6400_fpu_long+i6400_fpu_float_l*30")
+  "i6400_fpu_long+i6400_fpu_float_l*10")
 
 ;; fdiv.w
 (define_insn_reservation "i6400_msa_fdiv_sf" 22
   (and (eq_attr "cpu" "i6400")
(eq_attr "type" "simd_fdiv"))
-  "i6400_fpu_long+i6400_fpu_float_l*22")
+  "i6400_fpu_long+i6400_fpu_float_l*10")
 
 ;;
 ;; FPU pipe
@@ -236,13 +236,13 @@
   (and (eq_attr "cpu" "i6400")
(and (eq_attr "mode" "DF")
 	(eq_attr "type" "fdiv,frdiv,fsqrt,frsqrt")))
-  "i6400_fpu_long+i6400_fpu_apu*30")
+  "i6400_fpu_long+i6400_fpu_apu*10")
 
 ;; div, sqrt (Single Precision)
 (define_insn_reservation "i6400_fpu_div_sf" 22
   (and (eq_attr "cpu" "i6400")
(eq_attr "type" "fdiv,frdiv,fsqrt,frsqrt"))
-  "i6400_fpu_long+i6400_fpu_apu*22")
+  "i6400_fpu_long+i6400_fpu_apu*10")
 
 ;; sdc1, swc1
 (define_insn_reservation "i6400_fpu_store" 1
@@ -312,7 +312,7 @@
 (define_insn_reservation "i6400_int_div" 32
   (and (eq_attr "cpu" "i6400")
(eq_attr "type" "idiv,idiv3"))
-  "i6400_gpdiv*32")
+  "i6400_gpdiv*10")
 
 ;; Load lb, lbu, lh, lhu, lq, lw, lw_i2f, lwxs
 (define_insn_reservation "i6400_int_load" 3
diff --git a/gcc/config/mips/m5100.md b/gcc/config/mips/m5100.md
index 3e1c2e7303f..58534fac817 100644
--- a/gcc/config/mips/m5100.md
+++ b/gcc/config/mips/m5100.md
@@ -109,7 +109,7 @@
 (define_insn_reservation "m51_int_div_si" 34
   (and (eq_attr "cpu" "m5100")
(eq_attr "type" "idiv"))
-  "m51_alu+m51_mul*34")
+  "m51_alu+m51_mul*10")
 
 ;; --

Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-11-10 Thread Richard Sandiford via Gcc-patches
xiezhiheng  writes:
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Tuesday, November 3, 2020 9:57 PM
>> To: xiezhiheng 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> Thanks, I pushed both patches to trunk.
>> 
>
> Thanks.  And I made two separate patches for these two groups, tbl/tbx 
> intrinsics and
> the rest of the arithmetic operation intrinsics.
>
> Note: It does not matter which patch is applied first.

I pushed the TBL/TBX one, but on the other patch:

> @@ -297,7 +297,7 @@
>BUILTIN_VSDQ_I (USHIFTIMM, uqshl_n, 0, ALL)
>  
>/* Implemented by aarch64_reduc_plus_.  */
> -  BUILTIN_VALL (UNOP, reduc_plus_scal_, 10, ALL)
> +  BUILTIN_VALL (UNOP, reduc_plus_scal_, 10, FP)

This is defined for integer and FP modes, so I think it should be
NONE instead of FP.  We'll automatically add FLAGS_FP based on the
mode where necessary.

Otherwise it looks good, thanks.

Richard


Re: [PATCH] C-family : Add attribute 'unavailable'.

2020-11-10 Thread Richard Biener
On Tue, 10 Nov 2020, Iain Sandoe wrote:

> Hi Richard,
> 
> Richard Biener  wrote:
> 
> >On Mon, 9 Nov 2020, Iain Sandoe wrote:
> >
> >>Hi,
> >>
> >>I?ve been carrying this patch around on my Darwin branches for a very long
> >>time?
> >>
> >>tested across the Darwin patch and on x86_64-linux-gnu,
> >>OK for master?
> >>thanks
> >>Iain
> >>
> >>= commit message
> >>
> >>If an interface is marked 'deprecated' then, presumably, at some point it
> >>will be withdrawn and no longer available. The 'unavailable' attribute
> >>makes it possible to mark up interfaces to indicate this status. It is used
> >>quite extensively in some codebases where a single set of headers can be
> >>used
> >>to permit code generation for multiple system versions.
> >>
> >>From a configuration perspective, it also allows a compile test to determine
> >>that an interface is missing - rather than requiring a link test.
> >>
> >>The implementation follows the pattern of attribute deprecated, but produces
> >>an error (where deprecation produces a warning).
> >>
> >>This attribute has been implemented in clang for some years.
> >>
> >>gcc/c-family/ChangeLog:
> >>
> >>* c-attribs.c (handle_unavailable_attribute): New.
> >>
> >>gcc/c/ChangeLog:
> >>
> >>* c-decl.c (enum deprecated_states): Add unavailable state.
> >>(merge_decls): Copy unavailability.
> >>(quals_from_declspecs): Handle unavailable case.
> >>(start_decl): Amend the logic handling suppression of nested
> >>deprecation states to include unavailability.
> >>(smallest_type_quals_location): Amend comment.
> >>(grokdeclarator): Handle the unavailable deprecation state.
> >>(declspecs_add_type): Set TREE_UNAVAILABLE from the decl specs.
> >>* c-tree.h (struct c_declspecs): Add unavailable_p.
> >>* c-typeck.c (build_component_ref): Handle unavailability.
> >>(build_external_ref): Likewise.
> >>
> >>gcc/cp/ChangeLog:
> >>
> >>* call.c (build_over_call): Handle unavailable state in addition to
> >>deprecation.
> >>* class.c (type_build_ctor_call): Likewise.
> >>(type_build_dtor_call): Likewise.
> >>* cp-tree.h: Rename cp_warn_deprecated_use to
> >>cp_handle_deprecated_or_unavailable.
> >>* decl.c (duplicate_decls): Merge unavailability.
> >>(grokdeclarator): Handle unavailability in addition to deprecation.
> >>(type_is_unavailable): New.
> >>(grokparms): Handle unavailability in addition to deprecation.
> >>* decl.h (enum deprecated_states): Add
> >>UNAVAILABLE_DEPRECATED_SUPPRESS.
> >>* decl2.c (cplus_decl_attributes): Propagate unavailability to
> >>templates.
> >>(cp_warn_deprecated_use): Rename to ...
> >>(cp_handle_deprecated_or_unavailable): ... this and amend to handle
> >>the unavailable case. It remains a warning in the case of deprecation
> >>but becomes an error in the case of unavailability.
> >>(cp_warn_deprecated_use_scopes): Handle unavailability.
> >>(mark_used): Likewise.
> >>* parser.c (cp_parser_template_name): Likewise.
> >>(cp_parser_template_argument): Likewise.
> >>(cp_parser_parameter_declaration_list): Likewise.
> >>* typeck.c (build_class_member_access_expr): Likewise.
> >>(finish_class_member_access_expr): Likewise.
> >>* typeck2.c (build_functional_cast_1): Likewise.
> >>
> >>gcc/ChangeLog:
> >>
> >>* lto-streamer-out.c (hash_tree): Stream TREE_UNAVAILABLE.
> >>* print-tree.c (print_node): Handle unavailable attribute.
> >>* tree-core.h (struct tree_base): Add a bit to carry unavailability.
> >>* tree.c (error_unavailable_use): New.
> >>* tree.h (TREE_UNAVAILABLE): New.
> >>(error_unavailable_use): New.
> >
> >Why'd you need to stream this in LTO, more specifically only handle
> >it in hashing?  It should be all frontend operation.
> 
> I followed the existing implementation for TREE_DEPRECATED ().
> 
> Presumably, that entry should also be removed - I will remove and re-test (it
> wasn?t
> obvious to me why the entry was present, the deprecation/unavailable stuff
> should
> be done by end of gimplification).

deprecated_flag is overloaded, we likely use this accessor because
it isn't guarded by any tree code check.  We definitely need the bit
for SSA_NAMEs.

> >You're targeting only DECLs can you use a bit from decl_common where decl
> >specific bits exist instead?
> 
> .. except that we are also targeting types.

Ah, I see, using a bit in tree-base is fine then.  Please document
its use in near to the other flags like

   deprecated_flag:

   TREE_DEPRECATED in
   all decls
   all types

so overloads can be added for other tree kinds.

Richard.

> We could use a bit in decl_common for decls and in tree_type_common for types,
> I suppose, but that would then mean that every test for unavailable would grow
> an
> additional check to discriminate.
> 
> (if we do that, then presumably, as a separate patch, we might as well move
> TREE_DEPRECATED in the same way - since otherwise the impl. is different for
> no obvious reason).
>
> thanks
> Iain
> 
> 
> >
> >Thanks,
> >Richard.
> >
> >>gcc/objc/ChangeLog:
> >>
> >>* objc-act.c (objc_

Re: Enable MOVDIRI, MOVDIR64B, CLDEMOTE and WAITPKG for march=tremont

2020-11-10 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 10, 2020 at 4:17 PM Hongtao Liu  wrote:
>
> On Tue, Nov 10, 2020 at 3:22 AM Jason Merrill via Gcc-patches
>  wrote:
> >
> > This patch was also applied to the GCC 9 and 10 branches and breaks those
> > builds, because PTA_CLDEMOTE is not defined.
> >
> Mine, let me fix it, sorry for that.
>

Since CLDEMOTE already existed in gcc9 and later, define PTA_CLDEMOTE
instead of removing it.
Similar for MOVDIRI, MOVDIR64B.

GCC10-Fix.patch

gcc/ChangeLog
* config/i386/i386-options.c (ix86_option_override_internal):
Handle PTA_CLDEMOTE.
* config/i386/i386.h (PTA_CLDEMOTE): Define.

GCC9-Fix.patch

gcc/ChangeLog
* config/i386/i386.c (ix86_option_override_internal):
Handle PTA_CLDEMOTE, PTA_MOVDIRI, PTA_MOVDIR64B.
* config/i386/i386.h (PTA_CLDEMOTE, PTA_MOVDIRI,
PTA_MOVDIR64B.): Define.

Bootstrap and regression test for i386 backend is ok on releases/gcc-9
and releases/gcc-10 branch.

Sorry for such unnecessary inconvenience.

> --
> BR,
> Hongtao



--
BR,
Hongtao
From 7744c0f741e26123e63a90494efd4a6c989df7c6 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 10 Nov 2020 16:42:06 +0800
Subject: [PATCH] Fix missing defination of PTA_CLDEMOTE.

gcc/ChangeLog
	* config/i386/i386-options.c (ix86_option_override_internal):
	Handle PTA_CLDEMOTE.
	* config/i386/i386.h (PTA_CLDEMOTE): Define.
---
 gcc/config/i386/i386-options.c | 3 +++
 gcc/config/i386/i386.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 5c21fce06a4..14a57a17156 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2226,6 +2226,9 @@ ix86_option_override_internal (bool main_args_p,
 	if (((processor_alias_table[i].flags & PTA_PTWRITE) != 0)
 	&& !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_PTWRITE))
 	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_PTWRITE;
+	if (((processor_alias_table[i].flags & PTA_CLDEMOTE) != 0)
+	&& !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_CLDEMOTE))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_CLDEMOTE;
 
 	if ((processor_alias_table[i].flags
 	   & (PTA_PREFETCH_SSE | PTA_SSE)) != 0)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index d75a86bb8da..08f265fc8e0 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2437,6 +2437,7 @@ const wide_int_bitmask PTA_AVX512BF16 (0, HOST_WIDE_INT_1U << 11);
 const wide_int_bitmask PTA_WAITPKG (0, HOST_WIDE_INT_1U << 12);
 const wide_int_bitmask PTA_MOVDIRI(0, HOST_WIDE_INT_1U << 13);
 const wide_int_bitmask PTA_MOVDIR64B(0, HOST_WIDE_INT_1U << 14);
+const wide_int_bitmask PTA_CLDEMOTE(0, HOST_WIDE_INT_1U << 16);
 
 const wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2
   | PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR;
-- 
2.18.1

From a1b774edeedadf69c56529fa52a9e2ef92cb7957 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 10 Nov 2020 16:48:08 +0800
Subject: [PATCH] Fix missing defination of PTA_CLDEMOTE, PTA_MOVDIRI,
 PTA_MOVDIR64B.

gcc/ChangeLog
	* config/i386/i386.c (ix86_option_override_internal):
	Handle PTA_CLDEMOTE, PTA_MOVDIRI, PTA_MOVDIR64B.
	* config/i386/i386.h (PTA_CLDEMOTE, PTA_MOVDIRI,
	PTA_MOVDIR64B.): Define.
---
 gcc/config/i386/i386.c | 9 +
 gcc/config/i386/i386.h | 3 +++
 2 files changed, 12 insertions(+)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f968d033972..bd61be61fdf 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4058,6 +4058,15 @@ ix86_option_override_internal (bool main_args_p,
 	if (((processor_alias_table[i].flags & PTA_PTWRITE) != 0)
 	&& !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_PTWRITE))
 	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_PTWRITE;
+	if (((processor_alias_table[i].flags & PTA_MOVDIRI) != 0)
+	&& !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MOVDIRI))
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_MOVDIRI;
+	if (((processor_alias_table[i].flags & PTA_MOVDIR64B) != 0)
+	&& !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_MOVDIR64B))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_MOVDIR64B;
+	if (((processor_alias_table[i].flags & PTA_CLDEMOTE) != 0)
+	&& !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_CLDEMOTE))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_CLDEMOTE;
 
 	if ((processor_alias_table[i].flags
 	   & (PTA_PREFETCH_SSE | PTA_SSE)) != 0)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 17d6218e991..1cc9e3d9463 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2379,6 +2379,9 @@ const wide_int_bitmask PTA_PCONFIG (0, HOST_WIDE_INT_1U << 7);
 const wide_int_bitmask PTA_WBNOINVD (0, HOST_WIDE_INT_1U << 8);
 const wide_int_bitmask PTA_WAITPKG (0, HOST_WIDE_INT_1U << 9);
 const wide_int_bitmask PTA_PTWRITE (0, HOST_WIDE_INT_1U << 10);
+const wide_int_bitmask PTA_MOVDIRI(0, HOST_WIDE_INT_1U << 13);
+const wide_int_bitmask PTA_MOVDIR64B(0, HOST_WIDE_INT_1

[PATCH] tree-optimization/97780 - fix ICE in fini_pre

2020-11-10 Thread Richard Biener
This deals with blocks elimination added.

Pushed as obvious.

2020-11-10  Richard Biener  

PR tree-optimization/97780
* tree-ssa-pre.c (fini_pre): Deal with added basic blocks
when freeing PHI_TRANS_TABLE.
---
 gcc/tree-ssa-pre.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 160f3b4593a..90877e3c68e 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -4196,7 +4196,7 @@ fini_pre ()
 
   basic_block bb;
   FOR_ALL_BB_FN (bb, cfun)
-if (PHI_TRANS_TABLE (bb))
+if (bb->aux && PHI_TRANS_TABLE (bb))
   delete PHI_TRANS_TABLE (bb);
   free_aux_for_blocks ();
 }
-- 
2.26.2


Re: [PATCH] Cleanup irange::set.

2020-11-10 Thread Christophe Lyon via Gcc-patches
Hi,

On Mon, 9 Nov 2020 at 15:43, Aldy Hernandez via Gcc-patches
 wrote:
>
> [This is actually part of a larger patch that actually changes
> behavior, but I thought I'd commit the non-invasive cleanups first
> which will simplify the upcoming work.]
>
> irange::set was doing more work than it should for legacy ranges.
> I cleaned up various unnecessary calls to swap_out_of_order_endpoints,
> as well as some duplicate code that could be done with normalize_min_max.
>
> I also removed an obsolete comment wrt sticky infinite overflows.
> Not only did the -INF/+INF(OVF) code get removed in 2017,
> but normalize_min_max() uses wide ints, which ignored overflows
> altogether.
>
> Pushed.
>
> gcc/ChangeLog:
>
> * value-range.cc (irange::swap_out_of_order_endpoints): Rewrite
> into static function.
> (irange::set): Cleanup redundant manipulations.
> * value-range.h (irange::normalize_min_max): Modify object
> in-place instead of modifying arguments.


This is causing regressions on aarch64, such as:
FAIL: gcc.target/aarch64/pr88838.c (internal compiler error)
FAIL: gcc.target/aarch64/pr88838.c (test for excess errors)
Excess errors:
during GIMPLE pass: dom
/gcc/testsuite/gcc.target/aarch64/pr88838.c:5:1: internal compiler
error: tree check: expected integer_cst, have poly_int_cst in to_wide,
at tree.h:5950
0x66741d tree_check_failed(tree_node const*, char const*, int, char const*, ...)
/gcc/tree.c:9754
0x6404e8 tree_check
/gcc/tree.h:3570
0x6404e8 to_wide
/gcc/tree.h:5950
0x7f0816 wi::to_wide(tree_node const*)
/gcc/tree.h:3505
0x1118bdc irange::legacy_upper_bound(unsigned int) const
/gcc/value-range.cc:447
0x1122cd4 irange::upper_bound(unsigned int) const
/gcc/value-range.h:510
0x19493be range_operator::fold_range(irange&, tree_node*, irange
const&, irange const&) const
/gcc/range-op.cc:159
0x19494f4 range_operator::fold_range(irange&, tree_node*, irange
const&, irange const&) const
/gcc/range-op.cc:74
0x10cfee6 range_fold_binary_expr(int_range<1u>*, tree_code,
tree_node*, int_range<1u> const*, int_range<1u> const*)
/gcc/tree-vrp.c:1243
0x116fda4 vr_values::extract_range_from_binary_expr(value_range_equiv*,
tree_code, tree_node*, tree_node*, tree_node*)
/gcc/vr-values.c:853
0x1178b3c vr_values::extract_range_from_assignment(value_range_equiv*, gassign*)
/gcc/vr-values.c:1564
0x185bcbe evrp_range_analyzer::record_ranges_from_stmt(gimple*, bool)
/gcc/gimple-ssa-evrp-analyze.c:304
0xee7f9b dom_opt_dom_walker::before_dom_children(basic_block_def*)
/gcc/tree-ssa-dom.c:1458
0x1817487 dom_walker::walk(basic_block_def*)
/gcc/domwalk.c:309
0xee4f8b execute
/gcc/tree-ssa-dom.c:724


(actually I can see 3245 ICEs on aarch64)

Can you fix it?

Thanks

> ---
>  gcc/value-range.cc | 70 --
>  gcc/value-range.h  | 28 +--
>  2 files changed, 37 insertions(+), 61 deletions(-)
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 5827e812216..2124e229e0c 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -131,13 +131,14 @@ irange::copy_to_legacy (const irange &src)
>  set (src.tree_lower_bound (), src.tree_upper_bound ());
>  }
>
> -// Swap min/max if they are out of order.  Return TRUE if further
> -// processing of the range is necessary, FALSE otherwise.
> +// Swap MIN/MAX if they are out of order and adjust KIND appropriately.
>
> -bool
> -irange::swap_out_of_order_endpoints (tree &min, tree &max,
> - value_range_kind &kind)
> +static void
> +swap_out_of_order_endpoints (tree &min, tree &max, value_range_kind &kind)
>  {
> +  gcc_checking_assert (kind != VR_UNDEFINED);
> +  if (kind == VR_VARYING)
> +return;
>/* Wrong order for min and max, to swap them and the VR type we need
>   to adjust them.  */
>if (tree_int_cst_lt (max, min))
> @@ -149,8 +150,8 @@ irange::swap_out_of_order_endpoints (tree &min, tree &max,
>  for VR_ANTI_RANGE empty range, so drop to varying as well.  */
>if (TYPE_PRECISION (TREE_TYPE (min)) == 1)
> {
> - set_varying (TREE_TYPE (min));
> - return false;
> + kind = VR_VARYING;
> + return;
> }
>
>one = build_int_cst (TREE_TYPE (min), 1);
> @@ -163,12 +164,11 @@ irange::swap_out_of_order_endpoints (tree &min, tree 
> &max,
>  to varying in this case.  */
>if (tree_int_cst_lt (max, min))
> {
> - set_varying (TREE_TYPE (min));
> - return false;
> + kind = VR_VARYING;
> + return;
> }
>kind = kind == VR_RANGE ? VR_ANTI_RANGE : VR_RANGE;
>  }
> -  return true;
>  }
>
>  void
> @@ -253,13 +253,6 @@ irange::set (tree min, tree max, value_range_kind kind)
>&& (POLY_INT_CST_P (min) || POLY_INT_CST_P (max)))
>  kind = VR_VARYING;
>
> -  if (kind == VR_VARYING)

Re: [Patch] Fortran: OpenMP 5.0 (in_,task_)reduction clause extensions

2020-11-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 10, 2020 at 12:40:08AM +0100, Tobias Burnus wrote:
> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -762,6 +762,10 @@ enum omp_mask1
>OMP_CLAUSE_SHARED,
>OMP_CLAUSE_COPYIN,
>OMP_CLAUSE_REDUCTION,
> +  OMP_CLAUSE_REDUCTION_DEFAULT,
> +  OMP_CLAUSE_REDUCTION_MODIFIER,

I don't understand the need for OMP_CLAUSE_REDUCTION_DEFAULT,
default modifier is allowed in all OpenMP reduction clauses and
your gfc_match_omp_clause_reduction function has the openacc argument.
So why can't you keep using OMP_CLAUSE_REDUCTION in place where
you use OMP_CLAUSE_REDUCTION_DEFAULT now and just decide based on !openacc
whether to parse any reduction modifiers?

One could probably get away even without OMP_CLAUSE_REDUCTION_MODIFIER,
just allow all the modifiers first and during resolving complain if
[OMP_LIST_REDUCTION_INSCAN] and/or [OMP_LIST_REDUCTION_TASK] is non-NULL
on constructs where it shouldn't.

When splitting clauses, OMP_LIST_REDUCTION_TASK applies only on the innermost
construct that accepts it (others should treat it as OMP_LIST_REDUCTION),
and inscan is severely limited to which constructs it can appear on.
For task modifier, do you diagnose somewhere what c_split_omp_clauses
diagnoses?
E.g. if task modifier appears with innermost combined construct simd/loop,
or if task modifier appears and not combined with parallel/do/sections?
Inscan is allowed only on do, simd, do simd, parallel do and parallel do sim.d

Also, I think there should be an error or sorry on the inscan modifier
somewhere until the rest of the scan support is implemented (in particular
the changes in the parsing of do/simd body with inscan reductions
including the parsing of the inclusive/exclusive directives in there).

>  #define OMP_DECLARE_SIMD_CLAUSES \
>(omp_mask (OMP_CLAUSE_SIMDLEN) | OMP_CLAUSE_LINEAR \
> | OMP_CLAUSE_UNIFORM  | OMP_CLAUSE_ALIGNED | OMP_CLAUSE_INBRANCH  
> \

I wonder why we have the occassional tabs in the middle of the OMP_*_CLAUSES
defines.
> +   | OMP_CLAUSE_SAFELEN | OMP_CLAUSE_LINEAR | OMP_CLAUSE_ALIGNED \
> +   | OMP_CLAUSE_SIMDLEN  | OMP_CLAUSE_IF | OMP_CLAUSE_ORDER  
> \

Here too.

> +  for (int i = OMP_LIST_REDUCTION; i <= OMP_LIST_REDUCTION_TASK; i++)
> + {
> +   if (mask & GFC_OMP_MASK_TEAMS)
> + clausesa[GFC_OMP_SPLIT_TEAMS].lists[i]
> +   = code->ext.omp_clauses->lists[i];
> +   if (mask & GFC_OMP_MASK_PARALLEL)
> + clausesa[GFC_OMP_SPLIT_PARALLEL].lists[i]
> +   = code->ext.omp_clauses->lists[i];
> +   else if (mask & GFC_OMP_MASK_DO)
> + clausesa[GFC_OMP_SPLIT_DO].lists[i]
> +   = code->ext.omp_clauses->lists[i];
> +   if (mask & GFC_OMP_MASK_SIMD)
> + clausesa[GFC_OMP_SPLIT_SIMD].lists[i]
> +   = code->ext.omp_clauses->lists[i];
> + }

Seems in the middle-end we just ignore OMP_CLAUSE_REDUCTION_TASK on anything
but parallel/{for,do}/sections, so guess this is ok.

Jakub



Re: [PATCH] Cleanup irange::set.

2020-11-10 Thread Andreas Schwab
This breaks aarch64.

spawn -ignore SIGHUP /opt/gcc/test/Build/gcc/xgcc -B/opt/gcc/test/Build/gcc/ 
-mabi=lp64 -fdiagnostics-plain-output -march=armv8.2-a+sve -O3 --save-temps 
-ffat-lto-objects -fno-ident -c -o abs_1.o 
/opt/gcc/test/gcc/testsuite/gcc.target/aarch64/sve/abs_1.c
during GIMPLE pass: dom
/opt/gcc/test/gcc/testsuite/gcc.target/aarch64/sve/abs_1.c: In function 
'vneg_int8_t':
/opt/gcc/test/gcc/testsuite/gcc.target/aarch64/sve/abs_1.c:13:6: internal 
compiler error: tree check: expected integer_cst, have poly_int_cst in to_wide, 
at tree.h:5950
0x6743ab tree_check_failed(tree_node const*, char const*, int, char const*, ...)
../../gcc/tree.c:9752
0x674f3b tree_int_cst_elt_check(tree_node const*, int, char const*, int, char 
const*)
../../gcc/tree.h:3502
0x1033d67 tree_int_cst_elt_check(tree_node const*, int, char const*, int, char 
const*)
../../gcc/tree.h:3437
0x1033d67 wi::to_wide(tree_node const*)
../../gcc/tree.h:5951
0x1033d67 irange::legacy_lower_bound(unsigned int) const
../../gcc/value-range.cc:420
0x103ad23 irange::lower_bound(unsigned int) const
../../gcc/value-range.h:497
0x183154f range_operator::fold_range(irange&, tree_node*, irange const&, irange 
const&) const
../../gcc/range-op.cc:159
0xfeae4b range_fold_binary_expr(int_range<1u>*, tree_code, tree_node*, 
int_range<1u> const*, int_range<1u> const*)
../../gcc/tree-vrp.c:1243
0x107dcaf vr_values::extract_range_from_binary_expr(value_range_equiv*, 
tree_code, tree_node*, tree_node*, tree_node*)
../../gcc/vr-values.c:853
0x1086917 vr_values::extract_range_from_assignment(value_range_equiv*, gassign*)
../../gcc/vr-values.c:1561
0x1747eb3 evrp_range_analyzer::record_ranges_from_stmt(gimple*, bool)
../../gcc/gimple-ssa-evrp-analyze.c:304
0xe3b047 dom_opt_dom_walker::before_dom_children(basic_block_def*)
../../gcc/tree-ssa-dom.c:1458
0x170cf7f dom_walker::walk(basic_block_def*)
../../gcc/domwalk.c:309
0xe38b03 execute
../../gcc/tree-ssa-dom.c:724

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: Enable MOVDIRI, MOVDIR64B, CLDEMOTE and WAITPKG for march=tremont

2020-11-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 10, 2020 at 08:03:29PM +0800, Hongtao Liu via Gcc-patches wrote:
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2437,6 +2437,7 @@ const wide_int_bitmask PTA_AVX512BF16 (0, 
> HOST_WIDE_INT_1U << 11);
>  const wide_int_bitmask PTA_WAITPKG (0, HOST_WIDE_INT_1U << 12);
>  const wide_int_bitmask PTA_MOVDIRI(0, HOST_WIDE_INT_1U << 13);
>  const wide_int_bitmask PTA_MOVDIR64B(0, HOST_WIDE_INT_1U << 14);
> +const wide_int_bitmask PTA_CLDEMOTE(0, HOST_WIDE_INT_1U << 16);

Formatting, there should be a space between PTA_* and (.
Please fix this up on both branches not just for the entries you've added
but also for the older ones, and please fix it on the trunk too:
const wide_int_bitmask PTA_MOVDIRI(0, HOST_WIDE_INT_1U << 13);
const wide_int_bitmask PTA_MOVDIR64B(0, HOST_WIDE_INT_1U << 14);
...
const wide_int_bitmask PTA_AMX_TILE(0, HOST_WIDE_INT_1U << 19);
const wide_int_bitmask PTA_AMX_INT8(0, HOST_WIDE_INT_1U << 20);
const wide_int_bitmask PTA_AMX_BF16(0, HOST_WIDE_INT_1U << 21);
...
const wide_int_bitmask PTA_HRESET(0, HOST_WIDE_INT_1U << 23);
on the trunk suffer from this.

Ok for branches and the above change to trunk is preapproved too,
but please bootstrap/regtest all backports next time.

Jakub



[PATCH] tree-optimization/97769 - fix assert in peeling for alignment

2020-11-10 Thread Richard Biener
The following removes an assert that can not easily be adjusted to
cover the additional cases we now handle after the removal of
the same-align DRs vector.

Tested on x86_64-unknown-linux and aarch64, pushed.

2020-11-10  Richard Biener  

PR tree-optimization/97769
* tree-vect-data-refs.c (vect_update_misalignment_for_peel):
Remove assert.

* gcc.dg/vect/pr97769.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr97769.c | 32 +
 gcc/tree-vect-data-refs.c   |  7 +--
 2 files changed, 33 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr97769.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr97769.c 
b/gcc/testsuite/gcc.dg/vect/pr97769.c
new file mode 100644
index 000..127f91aa8fd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr97769.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+typedef struct {
+int alg;
+int h1[8];
+unsigned d1[1];
+} tmp;
+typedef struct {
+tmp itmp;
+tmp otmp;
+} h1;
+h1 c;
+
+static void
+fn1(char *p1, int p2)
+{
+  int i = 0;
+  for (; i < 4; i++)
+*p1++ = p2;
+}
+
+static void
+fn2(tmp *p1)
+{
+  char *d = (char *)p1->d1;
+  int *b = p1->h1;
+  for (int a; a; a++, d += 4)
+fn1(d, *b++);
+}
+
+void fn3() { fn2(&(&c)->otmp); }
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 8afd3044461..0efab495407 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1186,14 +1186,9 @@ static void
 vect_update_misalignment_for_peel (dr_vec_info *dr_info,
   dr_vec_info *dr_peel_info, int npeel)
 {
-  /* It can be assumed that if dr_info has the same alignment as dr_peel,
- it is aligned in the vector loop.  */
+  /* If dr_info is aligned of dr_peel_info is, then mark it so.  */
   if (vect_dr_aligned_if_peeled_dr_is (dr_info, dr_peel_info))
 {
-  gcc_assert (!known_alignment_for_access_p (dr_info)
- || !known_alignment_for_access_p (dr_peel_info)
- || (DR_MISALIGNMENT (dr_info)
- == DR_MISALIGNMENT (dr_peel_info)));
   SET_DR_MISALIGNMENT (dr_info, 0);
   return;
 }
-- 
2.26.2


Re: Detect EAF flags in ipa-modref

2020-11-10 Thread Jan Hubicka
> > +  tree callee = gimple_call_fndecl (stmt);
> > +  if (callee)
> > +{
> > +  cgraph_node *node = cgraph_node::get (callee);
> > +  modref_summary *summary = node ? get_modref_function_summary (node)
> > +   : NULL;
> > +
> > +  if (summary && summary->arg_flags.length () > arg)
> 
> So could we make modref "transform" push this as fnspec attribute or
> would that not really be an optimization?

It was my original plan to synthetize fnspecs, but I think it is not
very good idea: we have the summary readily available and we can
represent information that fnspecs can't
(do not have artificial limits on number of parameters or counts)

I would preffer fnspecs to be used only for in-compiler declarations.
> > +
> > +/* Analyze EAF flags for SSA name NAME.
> > +   KNOWN_FLAGS is a cache for flags we already determined.
> > +   DEPTH is a recursion depth used to make debug output prettier.  */
> > +
> > +static int
> > +analyze_ssa_name_flags (tree name, vec *known_flags, int depth)
> 
> C++ has references which makes the access to known_flags nicer ;)

Yay, will chang that :)
> 
> > +{
> > +  imm_use_iterator ui;
> > +  gimple *use_stmt;
> > +  int flags = EAF_DIRECT | EAF_NOCLOBBER | EAF_NOESCAPE | EAF_UNUSED;
> > +
> > +  /* See if value is already computed.  */
> > +  if ((*known_flags)[SSA_NAME_VERSION (name)])
> > +{
> > +  /* Punt on cycles for now, so we do not need dataflow.  */
> > +  if ((*known_flags)[SSA_NAME_VERSION (name)] == 1)
> > +   {
> > + if (dump_file)
> > +   fprintf (dump_file,
> > +"%*sGiving up on a cycle in SSA graph\n", depth * 4, "");
> > + return 0;
> > +   }
> > +  return (*known_flags)[SSA_NAME_VERSION (name)] - 2;
> > +}
> > +  /* Recursion guard.  */
> > +  (*known_flags)[SSA_NAME_VERSION (name)] = 1;
> 
> This also guards against multiple evaluations of the same stmts
> but only in some cases?  Consider
> 
>   _1 = ..;
>   _2 = _1 + _3;
>   _4 = _1 + _5;
>   _6 = _2 + _4;
> 
> where we visit _2 = and _4 = from _1 but from both are going
> to visit _6.

Here we first push _6, then we go for _2 then for _1 evaluate _1,
evalueate _2, go for _4 and evaluate _4, and evaluate _6.
It is DFS and you need backward edge in DFS (comming from a PHI).

Cycles seems to somewhat matter for GCC: we do have a lot of functions
that walk linked lists that we could track otherwise.
> 
> Maybe I'm blind but you're not limiting depth?  Guess that asks
> for problems, esp. as you are recursing rather than using a
> worklist or so?
> 
> I see you try to "optimize" the walk by only visiting def->use
> links from parameters but then a RPO walk over all stmts would
> be simpler iteration-wise ...
We usually evaluate just small part of bigger functions (since we lose
track quite easily after hitting first memory store).  My plan was to
change this to actual dataflow once we have it well defined 
(this means after discussing EAF flags with you and adding the logic to
track callsites for true IPA pass that midly complicated things - for
every ssa name I track callsite/arg pair where it is passed to
either directly or indirectly.  Then this is translaed into call summary
and used by IPA pass to compute final flags)

I guess I can add --param ipa-modref-walk-depth for now and handle
dataflow incremntally?
In particular I am not sure if I should just write iterated RPO myself
or use tree-ssa-propagate.h (the second may be overkill).
> 
> > +  if (dump_file)
> > +{
> > +  fprintf (dump_file,
> > +  "%*sAnalyzing flags of ssa name: ", depth * 4, "");
> > +  print_generic_expr (dump_file, name);
> > +  fprintf (dump_file, "\n");
> > +}
> > +
> > +  FOR_EACH_IMM_USE_STMT (use_stmt, ui, name)
> > +{
> > +  if (flags == 0)
> > +   {
> > + BREAK_FROM_IMM_USE_STMT (ui);
> > +   }
> > +  if (is_gimple_debug (use_stmt))
> > +   continue;
> > +  if (dump_file)
> > +   {
> > + fprintf (dump_file, "%*s  Analyzing stmt:", depth * 4, "");
> > + print_gimple_stmt (dump_file, use_stmt, 0);
> > +   }
> > +
> > +  /* Gimple return may load the return value.  */
> > +  if (gimple_code (use_stmt) == GIMPLE_RETURN)
> 
>  if (greturn *ret = dyn_cast  (use_stmt))
> 
> makes the as_a below not needed, similar for the other cases (it
> also makes accesses cheaper, avoiding some checking code).

Looks cleaner indeed.
> 
> > +   {
> > + if (memory_access_to (gimple_return_retval
> > +  (as_a  (use_stmt)), name))
> > +   flags &= ~EAF_UNUSED;
> > +   }
> > +  /* Account for LHS store, arg loads and flags from callee function.  
> > */
> > +  else if (is_gimple_call (use_stmt))
> > +   {
> > + tree callee = gimple_call_fndecl (use_stmt);
> > +
> > + /* Recursion would require bit of propagation; give up for now.  */
> > + if (callee && recursive_call_p (current_function_decl, callee))
> > +   flags = 0;
> > + else
> > + 

[Patch, fortran] PR83118 - [8/9/10/11 Regression] Bad intrinsic assignment of class(*) array component of derived type

2020-11-10 Thread Paul Richard Thomas via Gcc-patches
Hi Everyone,

I am afraid that this is a rather long sad story, mainly due to my efforts
with gfortran being interrupted by daytime work. I posted the first version
of the patch nearly a year ago but this was derailed by Tobias's question
at: https://gcc.gnu.org/legacy-ml/fortran/2019-11/msg00098.html

My recent attempt to post this patch were disrupted by the patch itself
disappearing from the posting. Thanks to Andre and Thomas for pointing this
out. Since then, I have been working on downstream PRs and this has led to
a reworking of the unposted version.

(i) The attached fixes the original problem and is tested by
gfortran.dg/unlimited_polymorphic_32.f03.
(ii) In fixing the original problem, a fair amount of effort was required
to get the element length correct for class temporaries produced by
dependencies in class assignment. This is reflected in the changes to
trans_array.c(gfc_alloc_allocatable_for_assignment) and the new function
get_class_info_from_ss.
(iii) Tobias's testcase in the above posting to the list didn't address
itself to class arrays of the original problem. However, it revealed that
reallocation was not occuring at all for scalar assignments.  This is fixed
by the large chunk in trans-expr.c(trans_class_assignment). The array case
is 'fixed' by testing for unequal element sizes between lhs and rhs before
reallocation in gfc_alloc_allocatable_for_assignment. This is difficult to
test for since, in most cases, the system returns that same address after
reallocation.
(iv) dependency_57.f90 segfaulted at runtime. The other work in
trans_class_assignment was required to fix this.
(v) A number of minor tidy ups were done including the new function
gfc_resize_class_size_with_len to eliminate some repeated code.

Note: Chunks of code are coming within scalarization loops that should be
outside:
  x->_vptr = (struct __vtype__STAR * {ref-all})
&__vtab_INTEGER_4_;
  x->_len = 0;
  D.3977 = x->_vptr->_size;
  D.3978 = x->_len;
  D.3979 = D.3978 > 0 ? D.3977 * D.3978 : D.3977;
also in many cases of class assignment, the lhs vptr is being set more than
once outside the loop when temporaries are involved. I will try to iron out
these issues later on.

This all bootstraps and regtests on FC31/x86_64 - OK for master?

Cheers

Paul

As well as the PR this patch fixes problems in handling class objects;
most importantly class array temporaries, required when dependences
occur in class assignment, and a correct implementation of reallocation
on assignment.

2020-11-10  Paul Thomas  

gcc/fortran
PR fortran/83118
* resolve.c (resolve_ordinary_assign): Generate a vtable if
necessary for scalar non-polymorphic rhs's to unlimited lhs's.
* trans-array.c (get_class_info_from_ss): New function.
(gfc_trans_allocate_array_storage): Defer obtaining class
element type until all sources of class exprs are tried. Use
class API rather than TREE_OPERAND. Look for class expressions
in ss->info by calling get_class_info_from_ss. After obtain
the element size for class descriptors. Where the element type
is unknown, cast the data as character(len=size) to overcome
unlimited polymorphic problems.
(structure_alloc_comps): Replace code that replicates the new
function gfc_resize_class_size_with_len.
(gfc_alloc_allocatable_for_assignment): Obtain element size
for lhs in cases of deferred characters and class enitities.
Move code for the element size of rhs to start of block. Clean
up extraction of class parameters throughout this function.
After the shape check test whether or not the lhs and rhs
element sizes are the same. Use earlier evaluation of
'cond_null'. Reallocation of lhs only to happen if size changes
or element size changes.
* trans-expr.c (gfc_resize_class_size_with_len): New function.
(gfc_conv_procedure_call): Ensure the vtable is present for
passing a non-class actual to an unlimited formal.
(trans_class_vptr_len_assignment): For expressions of type
BT_CLASS, extract the class expression if necessary. Use a
statement block outside the loop body. Ensure that 'rhs' is
of the correct type. Obtain rhs vptr in all circumstances.
(gfc_trans_assignment_1): Simplify some of the logic with
'realloc_flag'. Set 'vptr_copy' for all array assignments to
unlimited polymorphic lhs.
* trans-c (gfc_build_array_ref): Call gfc_resize_class_size_
with_len to correct span for unlimited polymorphic decls.
* trans.h : Add prototype for gfc_resize_class_size_with_len.

gcc/testsuite/
PR fortran/83118
* gfortran.dg/dependency_57.f90: Change to dg-run and test for correct
result.
* gfortran.dg/unlimited_polymorphic_32.f03: New test.


unlimited_polymorphic_32.f03
Description: Binary data
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 1641eb6ca10..daa947af9d1 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -11054,7 +11054,7 @@ resolve_ordinary_assign (gfc_code *code, gfc_namespace *ns)
 
   /* Make sure there is a vtable 

[PATCH] aarch64: Make use of RTL predicates

2020-11-10 Thread Andrea Corallo via Gcc-patches
Hi all,

I'd like to propose this patch to make use of RTL predicates into the
AArch64 back-end where possible.

Bootstrapped and regtested on aarch64-unknown-linux-gnu.

Okay for trunk?

Thanks

  Andrea

>From 9a027963f228b5c11aba30537cf888bba09bd6c2 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Tue, 10 Nov 2020 11:23:15 +
Subject: [PATCH] aarch64: Make use of RTL predicates

2020-11-10  Andrea Corallo  

* config/aarch64/aarch64.c (tls_symbolic_operand_type)
(aarch64_load_symref_appropriately, aarch64_mov128_immediate)
(aarch64_expand_mov_immediate)
(aarch64_maybe_expand_sve_subreg_move)
(aarch64_tls_referenced_p, aarch64_cannot_force_const_mem)
(aarch64_base_register_rtx_p, aarch64_classify_index)
(aarch64_classify_address, aarch64_symbolic_address_p)
(aarch64_reinterpret_float_as_int, aarch64_float_const_rtx_p)
(aarch64_can_const_movi_rtx_p, aarch64_select_cc_mode)
(aarch64_print_operand, aarch64_label_mentioned_p)
(aarch64_secondary_reload, aarch64_preferred_reload_class)
(aarch64_address_cost, aarch64_tls_symbol_p)
(aarch64_classify_symbol, aarch64_legitimate_pic_operand_p)
(aarch64_legitimate_constant_p)
(aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p, aarch64_mov_operand_p)
(fusion_load_store): Use RTL predicates where possible.
---
 gcc/config/aarch64/aarch64.c | 80 ++--
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 97cb68980e9..26cac5f788b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2985,7 +2985,7 @@ tls_symbolic_operand_type (rtx addr)
   enum tls_model tls_kind = TLS_MODEL_NONE;
   poly_int64 offset;
   addr = strip_offset_and_salt (addr, &offset);
-  if (GET_CODE (addr) == SYMBOL_REF)
+  if (SYMBOL_REF_P (addr))
 tls_kind = SYMBOL_REF_TLS_MODEL (addr);
 
   return tls_kind;
@@ -3126,7 +3126,7 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
/* The operand is expected to be MEM.  Whenever the related insn
   pattern changed, above code which calculate mem should be
   updated.  */
-   gcc_assert (GET_CODE (mem) == MEM);
+   gcc_assert (MEM_P (mem));
MEM_READONLY_P (mem) = 1;
MEM_NOTRAP_P (mem) = 1;
emit_insn (insn);
@@ -3169,7 +3169,7 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
mem = XVECEXP (XEXP (SET_SRC (insn), 0), 0, 0);
  }
 
-   gcc_assert (GET_CODE (mem) == MEM);
+   gcc_assert (MEM_P (mem));
MEM_READONLY_P (mem) = 1;
MEM_NOTRAP_P (mem) = 1;
emit_insn (insn);
@@ -4235,7 +4235,7 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
generate,
 bool
 aarch64_mov128_immediate (rtx imm)
 {
-  if (GET_CODE (imm) == CONST_INT)
+  if (CONST_INT_P (imm))
 return true;
 
   gcc_assert (CONST_WIDE_INT_NUNITS (imm) == 2);
@@ -5099,8 +5099,8 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 
   /* Check on what type of symbol it is.  */
   scalar_int_mode int_mode;
-  if ((GET_CODE (imm) == SYMBOL_REF
-   || GET_CODE (imm) == LABEL_REF
+  if ((SYMBOL_REF_P (imm)
+   || LABEL_REF_P (imm)
|| GET_CODE (imm) == CONST
|| GET_CODE (imm) == CONST_POLY_INT)
   && is_a  (mode, &int_mode))
@@ -5390,9 +5390,9 @@ bool
 aarch64_maybe_expand_sve_subreg_move (rtx dest, rtx src)
 {
   gcc_assert (BYTES_BIG_ENDIAN);
-  if (GET_CODE (dest) == SUBREG)
+  if (SUBREG_P (dest))
 dest = SUBREG_REG (dest);
-  if (GET_CODE (src) == SUBREG)
+  if (SUBREG_P (src))
 src = SUBREG_REG (src);
 
   /* The optimization handles two single SVE REGs with different element
@@ -8533,7 +8533,7 @@ aarch64_tls_referenced_p (rtx x)
   FOR_EACH_SUBRTX (iter, array, x, ALL)
 {
   const_rtx x = *iter;
-  if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (x) != 0)
+  if (SYMBOL_REF_P (x) && SYMBOL_REF_TLS_MODEL (x) != 0)
return true;
   /* Don't recurse into UNSPEC_TLS looking for TLS symbols; these are
 TLS offsets, not real symbol references.  */
@@ -8759,7 +8759,7 @@ aarch64_cannot_force_const_mem (machine_mode mode 
ATTRIBUTE_UNUSED, rtx x)
 
   poly_int64 offset;
   rtx base = strip_offset_and_salt (x, &offset);
-  if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF)
+  if (SYMBOL_REF_P (base) || LABEL_REF_P (base))
 {
   /* We checked for POLY_INT_CST offsets above.  */
   if (aarch64_classify_symbol (base, offset.to_constant ())
@@ -8845,7 +8845,7 @@ static bool
 aarch64_base_register_rtx_p (rtx x, bool strict_p)
 {
   if (!strict_p
-  && GET_CODE (x) == SUBREG
+  && SUBREG_P (x)
   && contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (x))])
 x = SUBREG_REG (x);
 
@@ -8864,7 +8864,7 @@ aarch64_classify_index (struct aarch64_address_info 
*info, rtx x,

Re: [PATCH] Cleanup irange::set.

2020-11-10 Thread Aldy Hernandez via Gcc-patches

(actually I can see 3245 ICEs on aarch64)

Can you fix it?


Sure can.

Richard, I seem to have incorrectly removed the early exit for varying, 
and that affected the changes you made for poly ints.  Is there any 
reason we can't just exit and set varying without checking for kind != 
VR_VARYING?


The attached patch fixes the problem for aarch64.

Testing in progress...
Aldy

Early exit from irange::set for poly ints.

My previous cleanups to irange::set moved the early exit when
VARYING.  This caused poly int varyings to be created with
incorrect min/max.

We can just set varying and exit for all poly ints.

gcc/ChangeLog:

* value-range.cc (irange::set): Early exit for poly ints.

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 61f7da278d6..178ef1b0a4f 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -249,9 +249,11 @@ irange::set (tree min, tree max, value_range_kind kind)
   return;
 }

-  if (kind != VR_VARYING
-  && (POLY_INT_CST_P (min) || POLY_INT_CST_P (max)))
-kind = VR_VARYING;
+  if (POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
+{
+  set_varying (TREE_TYPE (min));
+  return;
+}

   // Nothing to canonicalize for symbolic ranges.
   if (TREE_CODE (min) != INTEGER_CST



Re: RFC: Monitoring old PRs, new dg directives [v2]

2020-11-10 Thread Thomas Schwinge
Hi!

On 2020-08-10T17:30:21-0400, Marek Polacek via Gcc-patches 
 wrote:
> This patch adds a new DejaGNU directive, dg-ice, as outlined in the
> proposal here:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550913.html
>
> It means that it's expected that the compiler crashes with an internal
> compiler error when compiling test with such a directive.

Thanks, I find this useful.


So I have a testcase that currently ICEs...  ;-)

spawn -ignore SIGHUP [xgcc]
during GIMPLE pass: omplower
In file included from [...]:4:
[...]: In function 'f_data':
[...]:41:10: internal compiler error: in lower_omp_target, at 
omp-low.c:11890
[backtrace]
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
compiler exited with status 1

I put '{ dg-ice }' into it, and do get the expected:

XFAIL: [...] (internal compiler error)'

But I also get an unexpected:

FAIL: [...] (test for excess errors)'

That's because of:

Excess errors:
during GIMPLE pass: omplower

I can easily '{ dg-prune-output "during GIMPLE pass: omplower" }', of
course, or should this lead-in be pruned internally, automatically?


And then, this is a '{ dg-do run }' testcase, I see:

UNRESOLVED: 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
compilation failed to produce executable

Given '{ dg-ice }', it's obvious that we don't get an executable to run.
I thought that maybe '{ dg-ice }' testcases should automatically be
demoted into '{ dg-do compile }', but that would be wrong for link-time
ICEs.  But can we "expect" the UNRESOLVED, and turn that into another
XFAIL?  Aha, actually I see that for '{ dg-do link }', this seems to
behave as expected (no UNRESOLVED), so we might demote all '{ dg-ice }'
testcases from '{ dg-do run }' to '{ dg-do link }', if that makes sense?


Grüße
 Thomas


> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 63216a0daba..967cb135cb4 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1172,6 +1172,16 @@ Expect the execute step of a test to fail if the 
> conditions (which are
>  the same as for @code{dg-skip-if}) are met.
>  @end table
>
> +@subsubsection Expect the compiler to crash
> +
> +@table @code
> +@item  @{ dg-ice @var{comment} [@{ @var{selector} @} [@{ @var{include-opts} 
> @} [@{ @var{exclude-opts} @}]]] @}
> +Expect the compiler to crash with an internal compiler error and return
> +a nonzero exit status if the conditions (which are the same as for
> +@code{dg-skip-if}) are met.  Used for tests that test bugs that have not been
> +fixed yet.
> +@end table
> +
>  @subsubsection Expect the test executable to fail
>
>  @table @code
> diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
> index 45d97024883..e8ad3052657 100644
> --- a/gcc/testsuite/lib/gcc-dg.exp
> +++ b/gcc/testsuite/lib/gcc-dg.exp
> @@ -308,13 +308,27 @@ proc gcc-dg-test-1 { target_compile prog do_what 
> extra_tool_flags } {
>  verbose "$target_compile $prog $output_file $compile_type $options" 4
>  set comp_output [$target_compile "$prog" "$output_file" "$compile_type" 
> $options]
>
> +global expect_ice
>  # Look for an internal compiler error, which sometimes masks the fact
>  # that we didn't get an expected error message.  XFAIL an ICE via
>  # dg-xfail-if and use { dg-prune-output ".*internal compiler error.*" }
>  # to avoid a second failure for excess errors.
> -if [string match "*internal compiler error*" $comp_output] {
> +# "Error reporting routines re-entered" ICE says "Internal" rather than
> +# "internal", so match that too.
> +if [string match {*[Ii]nternal compiler error*} $comp_output] {
>   upvar 2 name name
> - fail "$name (internal compiler error)"
> + if { $expect_ice == 0 } {
> +   fail "$name (internal compiler error)"
> + } else {
> +   # We expected an ICE and we got it.
> +   xfail "$name (internal compiler error)"
> +   # Prune the ICE from the output.
> +   set comp_output [prune_ices $comp_output]
> + }
> +} elseif { $expect_ice == 1 } {
> + upvar 2 name name
> + # We expected an ICE but we didn't get it.
> + xpass "$name (internal compiler error)"
>  }
>
>  if { $do_what == "repo" } {
> @@ -939,6 +953,7 @@ if { [info procs saved-dg-test] == [list] } {
>   global additional_prunes
>   global compiler_conditional_xfail_data
>   global shouldfail
> + global expect_ice
>   global testname_with_flags
>   global set_target_env_var
>   global set_compiler_env_var
> @@ -954,6 +969,7 @@ if { [info procs saved-dg-test] == [list] } {
>   set additional_sources_used ""
>   set additional_prunes ""
>   set shouldfail 0
>

Re: [PATCH] Use extremes of underlying type when building VR_ANTI_RANGEs.

2020-11-10 Thread Andrew MacLeod via Gcc-patches

On 11/10/20 5:41 AM, Aldy Hernandez wrote:

[Andrew, this doesn't fix a PR per se.  It's just something I stumbled
upon in my other -fstrict-enums fixes.  It passes tests.  What do you
think?  Should we push it?]

Imagine an enum in an -fstrict-enums world:

enum foo { zero, one, two, three };

Building the inverse of [0,3] currently yields VARYING (instead of
[4,+INF]).  This is because the setter sees 0 and 3 as the extremes of
the type, and per the current code, calculates the inverse to VARYING.
BTW, it really should be UNDEFINED, but this is legacy code I'm afraid
to touch:


Didnt we conclude at one point that neither VARYING nor UNDEFINED can be 
inverted?  There was too much ambiguity about what it meant, so callers 
need to check for undefined and varying before they can use invert.  
This makes sure they get their expected results.


in fact, I see this in invert():

gcc_assert (!undefined_p () && !varying_p ());


but it comes after the legacy check..  so maybe there is some attempt to 
preserve legacy behaviour..  dunno.





   if (is_min && is_max)
{
  /* We cannot deal with empty ranges, drop to varying.
 ???  This could be VR_UNDEFINED instead.  */
  set_varying (type);
  return;
}

However, the extremes of a VARYING are the extremes of the underlying
type ([0, 0xff..ff]), even in the presence of -fstrict-enums.

This fixes the setter to use the extremes of the underlying type.

gcc/ChangeLog:

* value-range.cc (irange::set): Use wi::{min,max}_value
instead of TYPE_{MIN,MAX}_VALUE when building anti-ranges.
(range_tests_strict_enum): New tests.
---
  gcc/value-range.cc | 27 +--
  1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index f83a824a982..bbd7d69a766 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -274,11 +274,13 @@ irange::set (tree min, tree max, value_range_kind kind)
// Anti-ranges that can be represented as ranges should be so.
if (kind == VR_ANTI_RANGE)
  {
-  /* For -fstrict-enums we may receive out-of-range ranges so consider
- values < -INF and values > INF as -INF/INF as well.  */
-  bool is_min = vrp_val_is_min (min);
-  bool is_max = vrp_val_is_max (max);
tree type = TREE_TYPE (min);
+  unsigned prec = TYPE_PRECISION (type);
+  signop sign = TYPE_SIGN (type);
+  wide_int type_max = wi::max_value (prec, sign);
+  wide_int type_min = wi::min_value (prec, sign);
+  bool is_min = wi::to_wide (min) == type_min;
+  bool is_max = wi::to_wide (max) == type_max;
  
if (is_min && is_max)

{
@@ -302,14 +304,14 @@ irange::set (tree min, tree max, value_range_kind kind)
  {
  tree one = build_int_cst (TREE_TYPE (max), 1);
  min = int_const_binop (PLUS_EXPR, max, one);
- max = vrp_val_max (TREE_TYPE (max));
+ max = wide_int_to_tree (type, type_max);
  kind = VR_RANGE;
  }
else if (is_max)
  {
  tree one = build_int_cst (TREE_TYPE (min), 1);
  max = int_const_binop (MINUS_EXPR, min, one);
- min = vrp_val_min (TREE_TYPE (min));
+ min = wide_int_to_tree (type, type_min);
  kind = VR_RANGE;
  }
  }
@@ -2276,6 +2278,19 @@ range_tests_strict_enum ()
ir1 = vr1;
ASSERT_TRUE (ir1 == vr1);
ASSERT_FALSE (ir1.varying_p ());
+
+  // Legacy inversion of [0, 3] should be [4, MAX].
+  vr1 = int_range<1> (build_int_cstu (rtype, 0), build_int_cstu (rtype, 3));
+  vr1.invert ();
+  tree type_max = wide_int_to_tree (rtype,
+   wi::max_value (TYPE_PRECISION (rtype),
+  TYPE_SIGN (rtype)));
+  ASSERT_TRUE (vr1 == int_range<1> (build_int_cstu (rtype, 4), type_max));
+
+  // Multi-range inversion of [0, 3] should be [4, MAX].
+  ir1 = int_range<2> (build_int_cstu (rtype, 0), build_int_cstu (rtype, 3));
+  ir1.invert ();
+  ASSERT_TRUE (vr1 == int_range<2> (build_int_cstu (rtype, 4), type_max));
  }
  
  static void




Re: Detect EAF flags in ipa-modref

2020-11-10 Thread Jan Hubicka
Hi,
here is updaed patch.

Honza

Bootstrapped/regtested x86_64-linux, OK (after the fnspec fixes)?

2020-11-10  Jan Hubicka  

* gimple.c: Include ipa-modref-tree.h and ipa-modref.h.
(gimple_call_arg_flags): Use modref to determine flags.
* ipa-modref.c: Include gimple-ssa.h, tree-phinodes.h,
tree-ssa-operands.h, stringpool.h and tree-ssanames.h.
(analyze_ssa_name_flags): Declare.
(modref_summary::useful_p): Summary is also useful if arg flags are
known.
(dump_eaf_flags): New function.
(modref_summary::dump): Use it.
(get_modref_function_summary): Be read for current_function_decl
being NULL.
(memory_access_to): New function.
(deref_flags): New function.
(call_lhs_flags): New function.
(analyze_parms): New function.
(analyze_function): Use it.
* ipa-modref.h (struct modref_summary): Add arg_flags.
* doc/invoke.texi (ipa-modref-max-depth): Document.
* params.opt (ipa-modref-max-depth): New param.

gcc/testsuite/ChangeLog:

2020-11-10  Jan Hubicka  

* gcc.dg/torture/pta-ptrarith-1.c: Escape parametrs.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d2a188d7c75..0bd76d2841e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12953,6 +12953,10 @@ memory locations using the mod/ref information.  This 
parameter ought to be
 bigger than @option{--param ipa-modref-max-bases} and @option{--param
 ipa-modref-max-refs}.
 
+@item ipa-modref-max-depth
+Specified the maximum depth of DFS walk used by modref escape analysis.
+Setting to 0 disables the analysis completely.
+
 @item profile-func-internal-id
 A parameter to control whether to use function internal id in profile
 database lookup. If the value is 0, the compiler uses an id that
diff --git a/gcc/gimple.c b/gcc/gimple.c
index 1afed88e1f1..da90716aa23 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -46,6 +46,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "asan.h"
 #include "langhooks.h"
 #include "attr-fnspec.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 
 
 /* All the tuples have their operand vector (if present) at the very bottom
@@ -1532,24 +1534,45 @@ int
 gimple_call_arg_flags (const gcall *stmt, unsigned arg)
 {
   attr_fnspec fnspec = gimple_call_fnspec (stmt);
-
-  if (!fnspec.known_p ())
-return 0;
-
   int flags = 0;
 
-  if (!fnspec.arg_specified_p (arg))
-;
-  else if (!fnspec.arg_used_p (arg))
-flags = EAF_UNUSED;
-  else
+  if (fnspec.known_p ())
 {
-  if (fnspec.arg_direct_p (arg))
-   flags |= EAF_DIRECT;
-  if (fnspec.arg_noescape_p (arg))
-   flags |= EAF_NOESCAPE;
-  if (fnspec.arg_readonly_p (arg))
-   flags |= EAF_NOCLOBBER;
+  if (!fnspec.arg_specified_p (arg))
+   ;
+  else if (!fnspec.arg_used_p (arg))
+   flags = EAF_UNUSED;
+  else
+   {
+ if (fnspec.arg_direct_p (arg))
+   flags |= EAF_DIRECT;
+ if (fnspec.arg_noescape_p (arg))
+   flags |= EAF_NOESCAPE;
+ if (fnspec.arg_readonly_p (arg))
+   flags |= EAF_NOCLOBBER;
+   }
+}
+  tree callee = gimple_call_fndecl (stmt);
+  if (callee)
+{
+  cgraph_node *node = cgraph_node::get (callee);
+  modref_summary *summary = node ? get_modref_function_summary (node)
+   : NULL;
+
+  if (summary && summary->arg_flags.length () > arg)
+   {
+ int modref_flags = summary->arg_flags[arg];
+
+ /* We have possibly optimized out load.  Be conservative here.  */
+ if (!node->binds_to_current_def_p ())
+   {
+ if ((modref_flags & EAF_UNUSED) && !(flags & EAF_UNUSED))
+   modref_flags &= ~EAF_UNUSED;
+ if ((modref_flags & EAF_DIRECT) && !(flags & EAF_DIRECT))
+   modref_flags &= ~EAF_DIRECT;
+   }
+ flags |= modref_flags;
+   }
 }
   return flags;
 }
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 3f46bebed3c..30e76580fb0 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -61,6 +61,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-fnsummary.h"
 #include "attr-fnspec.h"
 #include "symtab-clones.h"
+#include "gimple-ssa.h"
+#include "tree-phinodes.h"
+#include "tree-ssa-operands.h"
+#include "ssa-iterators.h"
+#include "stringpool.h"
+#include "tree-ssanames.h"
+
+static int analyze_ssa_name_flags (tree name,
+  vec &known_flags, int depth);
 
 /* We record fnspec specifiers for call edges since they depends on actual
gimple statements.  */
@@ -186,6 +195,8 @@ modref_summary::useful_p (int ecf_flags)
 {
   if (ecf_flags & (ECF_CONST | ECF_NOVOPS))
 return false;
+  if (arg_flags.length ())
+return true;
   if (loads && !loads->every_base)
 return true;
   if (ecf_flags & ECF_PURE)
@@ -355,6 +366,22 @@ dump_lto_records (modref_records_lto *tt, FIL

Re: [PATCH] c, c++: Fix up -Wunused-value on COMPLEX_EXPRs [PR97748]

2020-11-10 Thread Jason Merrill via Gcc-patches

On 11/10/20 3:48 AM, Jakub Jelinek wrote:

On Mon, Nov 09, 2020 at 10:34:00PM +0100, Jakub Jelinek via Gcc-patches wrote:

On Mon, Nov 09, 2020 at 02:35:58PM -0500, Jason Merrill wrote:

How about calling warn_if_unused_value instead of the new
warn_if_unused_value_p?


That seems to work if I just replace the warning_at call with
warn_if_unused_value call (at least no regression in check-c++-all and
libstdc++ testsuite).
Initially I've tried just calling warn_if_unused_value without the NOP_EXPR
stripping and code/tclass checks, but that regressed a few tests, e.g.
g++.dg/warn/Wunused-14.C or c-c++-common/Wunused-var-9.c or 3 lines
in the new test, e.g. because STATEMENT_LIST or CLEANUP_POINT_EXPRs would
make it through and resulted in bogus warnings.


Bootstrapped/regtested on x86_64-linux and i686-linux successfully now.
If we wanted to simplify, I think the && !(code == ) part could be
dropped too, i.e. just
  if (tclass == tcc_comparison
  || tclass == tcc_unary
  || tclass == tcc_binary
  || code == VEC_PERM_EXPR
  || code == VEC_COND_EXPR)
warn_if_unused_value (e, loc);
because warn_if_unused_value already returns false on those codes.


OK with that change.

Jason



Re: [PATCH] Cleanup irange::set.

2020-11-10 Thread Richard Sandiford via Gcc-patches
Aldy Hernandez  writes:
>> (actually I can see 3245 ICEs on aarch64)
>> 
>> Can you fix it?
>
> Sure can.
>
> Richard, I seem to have incorrectly removed the early exit for varying, 
> and that affected the changes you made for poly ints.  Is there any 
> reason we can't just exit and set varying without checking for kind != 
> VR_VARYING?

No reason, it seemed more natural to drop to a lower kind with the old
code :-)  (But not with the new code.)

But it isn't obvious to me why the code is now structed the way it is.

  if (POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
{
  set_varying (TREE_TYPE (min));
  return;
}

  // Nothing to canonicalize for symbolic ranges.
  if (TREE_CODE (min) != INTEGER_CST
  || TREE_CODE (max) != INTEGER_CST)
{
  m_kind = kind;
  m_base[0] = min;
  m_base[1] = max;
  m_num_ranges = 1;
  return;
}

  swap_out_of_order_endpoints (min, max, kind);
  if (kind == VR_VARYING)
{
  set_varying (TREE_TYPE (min));
  return;
}

Why do we want to check “min” and “max” being INTEGER_CST before “kind”
being VR_VARYING, and the potentially record VR_VARYING with specific
bounds?  And why do we want to swap the “min” and “max” before checking
whether “kind” is VR_VARYING (when we'll then drop the min and max anyway)?
I think this would benefit from a bit more commentary at least.

Thanks,
Richard


>
> The attached patch fixes the problem for aarch64.
>
> Testing in progress...
> Aldy
>
>  Early exit from irange::set for poly ints.
>
>  My previous cleanups to irange::set moved the early exit when
>  VARYING.  This caused poly int varyings to be created with
>  incorrect min/max.
>
>  We can just set varying and exit for all poly ints.
>
>  gcc/ChangeLog:
>
>  * value-range.cc (irange::set): Early exit for poly ints.
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 61f7da278d6..178ef1b0a4f 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -249,9 +249,11 @@ irange::set (tree min, tree max, value_range_kind kind)
> return;
>   }
>
> -  if (kind != VR_VARYING
> -  && (POLY_INT_CST_P (min) || POLY_INT_CST_P (max)))
> -kind = VR_VARYING;
> +  if (POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
> +{
> +  set_varying (TREE_TYPE (min));
> +  return;
> +}
>
> // Nothing to canonicalize for symbolic ranges.
> if (TREE_CODE (min) != INTEGER_CST


Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-10 Thread Jozef Lawrynowicz
On Mon, Nov 09, 2020 at 12:31:09PM -0800, H.J. Lu via Gcc-patches wrote:
> On Mon, Nov 9, 2020 at 11:56 AM Jozef Lawrynowicz
>  wrote:
> >
> > On Mon, Nov 09, 2020 at 10:36:07AM -0800, H.J. Lu via Gcc-patches wrote:
> > > On Mon, Nov 9, 2020 at 9:41 AM Jozef Lawrynowicz
> > >  wrote:
> > > >
> > > > On Fri, Nov 06, 2020 at 04:39:33PM -0800, H.J. Lu via Gcc-patches wrote:
> > > > > On Fri, Nov 6, 2020 at 4:17 PM Jeff Law  wrote:
> > > > > >
> > > > > >
> > > > > > On 11/6/20 5:13 PM, H.J. Lu wrote:
> > > > > > > On Fri, Nov 6, 2020 at 4:01 PM Jeff Law  wrote:
> > > > > > >>
> > > > > > >> On 11/6/20 4:45 PM, H.J. Lu wrote:
> > > > > > >>> On Fri, Nov 6, 2020 at 3:37 PM Jeff Law  wrote:
> > > > > >  On 11/6/20 4:29 PM, H.J. Lu wrote:
> > > > > > > On Fri, Nov 6, 2020 at 3:22 PM Jeff Law  
> > > > > > > wrote:
> > > > > > >> On 11/5/20 7:34 AM, H.J. Lu via Gcc-patches wrote:
> > > > > > >>> On Thu, Nov 5, 2020 at 3:37 AM Jozef Lawrynowicz
> > > > > > >>>  wrote:
> > > > > >  On Thu, Nov 05, 2020 at 06:21:21AM -0500, Hans-Peter 
> > > > > >  Nilsson wrote:
> > > > > > > On Wed, 4 Nov 2020, H.J. Lu wrote:
> > > > > > >> .retain is ill-defined.   For example,
> > > > > > >>
> > > > > > >> [hjl@gnu-cfl-2 gcc]$ cat /tmp/x.c
> > > > > > >> static int xyzzy __attribute__((__used__));
> > > > > > >> [hjl@gnu-cfl-2 gcc]$ ./xgcc -B./ -S /tmp/x.c -fcommon
> > > > > > >> [hjl@gnu-cfl-2 gcc]$ cat x.s
> > > > > > >> .file "x.c"
> > > > > > >> .text
> > > > > > >> .retain xyzzy  < What does it do?
> > > > > > >> .local xyzzy
> > > > > > >> .comm xyzzy,4,4
> > > > > > >> .ident "GCC: (GNU) 11.0.0 20201103 (experimental)"
> > > > > > >> .section .note.GNU-stack,"",@progbits
> > > > > > >> [hjl@gnu-cfl-2 gcc]$
> > > > > > > To answer that question: it's up to the assembler, but 
> > > > > > > for ELF
> > > > > > > and SHF_GNU_RETAIN, it seems obvious it'd tell the 
> > > > > > > assembler to
> > > > > > > set SHF_GNU_RETAIN for the section where the symbol ends 
> > > > > > > up.
> > > > > > > We both know this isn't rocket science with binutils.
> > > > > >  Indeed, and my patch handles it trivially:
> > > > > >  https://sourceware.org/pipermail/binutils/2020-November/113993.html
> > > > > > 
> > > > > >    +void
> > > > > >    +obj_elf_retain (int arg ATTRIBUTE_UNUSED)
> > > > > >     snip 
> > > > > >    +  sym = get_sym_from_input_line_and_check ();
> > > > > >    +  symbol_get_obj (sym)->retain = 1;
> > > > > > 
> > > > > >    @@ -2624,6 +2704,9 @@ elf_frob_symbol (symbolS *symp, 
> > > > > >  int *puntp)
> > > > > >  }
> > > > > > }
> > > > > > 
> > > > > >    +  if (symbol_get_obj (symp)->retain)
> > > > > >    +elf_section_flags (S_GET_SEGMENT (symp)) |= 
> > > > > >  SHF_GNU_RETAIN;
> > > > > >    +
> > > > > >   /* Double check weak symbols.  */
> > > > > >   if (S_IS_WEAK (symp))
> > > > > > {
> > > > > > 
> > > > > >  We could check that the symbol named in the .retain 
> > > > > >  directive has
> > > > > >  already been defined, however this isn't compatible with 
> > > > > >  GCC
> > > > > >  mark_decl_preserved handling, since mark_decl_preserved is 
> > > > > >  called
> > > > > >  emitted before the local symbols are defined in the 
> > > > > >  assembly output
> > > > > >  file.
> > > > > > 
> > > > > >  GAS should at least validate that the symbol named in the 
> > > > > >  .retain
> > > > > >  directive does end up as a symbol though.
> > > > > > 
> > > > > > >>> Don't add .retain.
> > > > > > >> Why?  I don't see why you find it so objectionable.
> > > > > > >>
> > > > > > > An ELF symbol directive should operate on symbol table:
> > > > > > >
> > > > > > > http://www.sco.com/developers/gabi/latest/ch4.symtab.html
> > > > > > >
> > > > > > > not the section flags where the symbol is defined.
> > > > > >  I agree in general, but I think this is one of those cases 
> > > > > >  where it's
> > > > > >  not so clear.  And what you're talking about is an 
> > > > > >  implementation detail.
> > > > > > >>> There is no need for such a hack.  The proper thing to do in 
> > > > > > >>> ELF is
> > > > > > >>> to place such a symbol in a section with SHF_GNU_RETAIN flag.   
> > > > > > >>> This
> > > > > > >>> also avoids the question what to do with SHN_COMMON.
> > > > > > >> I'm not sure that's a good idea either.  Moving symbols into a 
> > > > > > >> section
> > > > > > >> other than they'd normally live doesn't seem all t

Re: [PATCH] generalized range_query class for multiple contexts

2020-11-10 Thread Martin Sebor via Gcc-patches

On 11/6/20 2:54 PM, Andrew MacLeod wrote:

On 11/6/20 11:13 AM, Martin Sebor wrote:

On 11/5/20 8:08 PM, Andrew MacLeod wrote:

On 11/5/20 7:50 PM, Martin Sebor wrote:

On 11/5/20 5:02 PM, Andrew MacLeod wrote:

On 11/5/20 4:02 PM, Martin Sebor wrote:

On 11/5/20 12:29 PM, Martin Sebor wrote:

On 10/1/20 11:25 AM, Martin Sebor wrote:

On 10/1/20 9:34 AM, Aldy Hernandez wrote:



On 10/1/20 3:22 PM, Andrew MacLeod wrote:
 > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
 >>> Thanks for doing all this!  There isn't anything I don't 
understand
 >>> in the sprintf changes so no questions from me (well, 
almost none).

 >>> Just some comments:
 >> Thanks for your comments on the sprintf/strlen API conversion.
 >>
 >>> The current call statement is available in all functions 
that take
 >>> a directive argument, as dir->info.callstmt.  There should 
be no need
 >>> to also add it as a new argument to the functions that now 
need it.

 >> Fixed.
 >>
 >>> The change adds code along these lines in a bunch of places:
 >>>
 >>> + value_range vr;
 >>> + if (!query->range_of_expr (vr, arg, stmt))
 >>> +   vr.set_varying (TREE_TYPE (arg));
 >>>
 >>> I thought under the new Ranger APIs when a range couldn't be
 >>> determined it would be automatically set to the maximum for
 >>> the type.  I like that and have been moving in that direction
 >>> with my code myself (rather than having an API fail, have it
 >>> set the max range and succeed).
 >> I went through all the above idioms and noticed all are 
being used on
 >> supported types (integers or pointers). So range_of_expr 
will always

 >> return true.  I've removed the if() and the set_varying.
 >>
 >>> Since that isn't so in this case, I think it would still 
be nice
 >>> if the added code could be written as if the range were 
set to
 >>> varying in this case and (ideally) reduced to just 
initialization:

 >>>
 >>> value_range vr = some-function (query, stmt, arg);
 >>>
 >>> some-function could be an inline helper defined just for 
the sprintf
 >>> pass (and maybe also strlen which also seems to use the 
same pattern),
 >>> or it could be a value_range AKA irange ctor, or it could 
be a member

 >>> of range_query, whatever is the most appropriate.
 >>>
 >>> (If assigning/copying a value_range is thought to be too 
expensive,
 >>> declaring it first and then passing it to that helper to 
set it

 >>> would work too).
 >>>
 >>> In strlen, is the removed comment no longer relevant? 
(I.e., does

 >>> the ranger solve the problem?)
 >>>
 >>> -  /* The range below may be "inaccurate" if a 
constant has been
 >>> -    substituted earlier for VAL by this pass that 
hasn't been
 >>> -    propagated through the CFG. This shoud be fixed 
by the new
 >>> -    on-demand VRP if/when it becomes available 
(hopefully in

 >>> -    GCC 11).  */
 >> It should.
 >>
 >>> I'm wondering about the comment added to 
get_range_strlen_dynamic

 >>> and other places:
 >>>
 >>> + // FIXME: Use range_query instead of global ranges.
 >>>
 >>> Is that something you're planning to do in a followup or 
should

 >>> I remember to do it at some point?
 >> I'm not planning on doing it.  It's just a reminder that it 
would be

 >> beneficial to do so.
 >>
 >>> Otherwise I have no concern with the changes.
 >> It's not cleared whether Andrew approved all 3 parts of the 
patchset
 >> or just the valuation part.  I'll wait for his nod before 
committing

 >> this chunk.
 >>
 >> Aldy
 >>
 > I have no issue with it, so OK.

Pushed all 3 patches.

 >
 > Just an observation that should be pointed out, I believe 
Aldy has all
 > the code for converting to a ranger, but we have not pursued 
that any
 > further yet since there is a regression due to our lack of 
equivalence
 > processing I think?  That should be resolved in the coming 
month, but at
 > the moment is a holdback/concern for converting these 
passes... iirc.


Yes.  Martin, the take away here is that the strlen/sprintf 
pass has been converted to the new API, but ranger is still not 
up and running on it (even on the branch).


With the new API, all you have to do is remove all instances of 
evrp_range_analyzer and replace them with a ranger. That's it.
Below is an untested patch that would convert you to a ranger 
once it's contributed.


IIRC when I enabled the ranger for your pass a while back, 
there was one or two regressions due to missing equivalences, 
and the rest were because the tests were expecting an actual 
specific range, and the ranger returned a slightly 
different/better one. You'll need to adjust your tests.


Ack.  I'll be on the lookout for the ranger commit (if you hppen
to remember and CC me on it just in case I might miss it that would
be great).


I have applied the patch and ran some tests.  There are quite
a few failures (see the list below).  I have only looked at
a couple.  The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
boils down to the fol

Re: [PATCH] Use extremes of underlying type when building VR_ANTI_RANGEs.

2020-11-10 Thread Aldy Hernandez via Gcc-patches




On 11/10/20 3:26 PM, Andrew MacLeod wrote:

On 11/10/20 5:41 AM, Aldy Hernandez wrote:

[Andrew, this doesn't fix a PR per se.  It's just something I stumbled
upon in my other -fstrict-enums fixes.  It passes tests.  What do you
think?  Should we push it?]

Imagine an enum in an -fstrict-enums world:

enum foo { zero, one, two, three };

Building the inverse of [0,3] currently yields VARYING (instead of
[4,+INF]).  This is because the setter sees 0 and 3 as the extremes of
the type, and per the current code, calculates the inverse to VARYING.
BTW, it really should be UNDEFINED, but this is legacy code I'm afraid
to touch:


Didnt we conclude at one point that neither VARYING nor UNDEFINED can be 
inverted?  There was too much ambiguity about what it meant, so callers 
need to check for undefined and varying before they can use invert. This 
makes sure they get their expected results.


Right, but this is in the setter.  We aren't inverting a varying, but 
creating the inverse of a range that happens to behave as a varying 
under the old rules.  Per the last set of fixes (or even the 
normalization code we had before), [0,3] is not varying.  [0,MAX] is 
varying.  What I'm trying to avoid is ~[0,3] be VARYING.  It should be 
[4,MAX].



in fact, I see this in invert():

gcc_assert (!undefined_p () && !varying_p ());


but it comes after the legacy check..  so maybe there is some attempt to 
preserve legacy behaviour..  dunno.


But you are correct that this is legacy behavior.  I don't know if 
having legacy and multi-range behavior different in this regard is good, 
so I proposed this fix.


I'm a bit indifferent.  We could leave it as is.

Aldy



libgo patch committed: Update to Go 1.15.4 release

2020-11-10 Thread Ian Lance Taylor via Gcc-patches
This patch updates libgo to the 1.15.4 release.  Bootstrapped and ran
Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
8815a8e97f8d0ee4066e737152c8198a80e9fe32
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 149f20077e3..e62578fc781 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-ae20684902b82883d3d65f2cde0894c7cb3b995b
+893fa057e36ae6c9b2ac5ffdf74634c35b3489c6
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/MERGE b/libgo/MERGE
index 4b158c05c6e..ad239c92fa9 100644
--- a/libgo/MERGE
+++ b/libgo/MERGE
@@ -1,4 +1,4 @@
-1984ee00048b63eacd2155cd6d74a2d13e998272
+0e953add9656c32a788e06438cd7b533e968b7f8
 
 The first line of this file holds the git revision number of the
 last merge done from the master library sources.
diff --git a/libgo/VERSION b/libgo/VERSION
index 93c7b2d3d78..baff2224d62 100644
--- a/libgo/VERSION
+++ b/libgo/VERSION
@@ -1 +1 @@
-go1.15.3
+go1.15.4
diff --git a/libgo/go/cmd/go/internal/modfetch/coderepo_test.go 
b/libgo/go/cmd/go/internal/modfetch/coderepo_test.go
index f69c193b86c..9a0cd7dba69 100644
--- a/libgo/go/cmd/go/internal/modfetch/coderepo_test.go
+++ b/libgo/go/cmd/go/internal/modfetch/coderepo_test.go
@@ -655,11 +655,6 @@ var codeRepoVersionsTests = []struct {
path: "swtch.com/testmod",
versions: []string{"v1.0.0", "v1.1.1"},
},
-   {
-   vcs:  "git",
-   path: "gopkg.in/russross/blackfriday.v2",
-   versions: []string{"v2.0.0", "v2.0.1"},
-   },
{
vcs:  "git",
path: "gopkg.in/natefinch/lumberjack.v2",
diff --git a/libgo/go/compress/flate/deflate_test.go 
b/libgo/go/compress/flate/deflate_test.go
index 3362d256cf4..8a22b8e6f92 100644
--- a/libgo/go/compress/flate/deflate_test.go
+++ b/libgo/go/compress/flate/deflate_test.go
@@ -11,6 +11,7 @@ import (
"internal/testenv"
"io"
"io/ioutil"
+   "math/rand"
"reflect"
"runtime/debug"
"sync"
@@ -896,6 +897,62 @@ func TestBestSpeedMaxMatchOffset(t *testing.T) {
}
 }
 
+func TestBestSpeedShiftOffsets(t *testing.T) {
+   // Test if shiftoffsets properly preserves matches and resets 
out-of-range matches
+   // seen in https://github.com/golang/go/issues/4142
+   enc := newDeflateFast()
+
+   // testData may not generate internal matches.
+   testData := make([]byte, 32)
+   rng := rand.New(rand.NewSource(0))
+   for i := range testData {
+   testData[i] = byte(rng.Uint32())
+   }
+
+   // Encode the testdata with clean state.
+   // Second part should pick up matches from the first block.
+   wantFirstTokens := len(enc.encode(nil, testData))
+   wantSecondTokens := len(enc.encode(nil, testData))
+
+   if wantFirstTokens <= wantSecondTokens {
+   t.Fatalf("test needs matches between inputs to be generated")
+   }
+   // Forward the current indicator to before wraparound.
+   enc.cur = bufferReset - int32(len(testData))
+
+   // Part 1 before wrap, should match clean state.
+   got := len(enc.encode(nil, testData))
+   if wantFirstTokens != got {
+   t.Errorf("got %d, want %d tokens", got, wantFirstTokens)
+   }
+
+   // Verify we are about to wrap.
+   if enc.cur != bufferReset {
+   t.Errorf("got %d, want e.cur to be at bufferReset (%d)", 
enc.cur, bufferReset)
+   }
+
+   // Part 2 should match clean state as well even if wrapped.
+   got = len(enc.encode(nil, testData))
+   if wantSecondTokens != got {
+   t.Errorf("got %d, want %d token", got, wantSecondTokens)
+   }
+
+   // Verify that we wrapped.
+   if enc.cur >= bufferReset {
+   t.Errorf("want e.cur to be < bufferReset (%d), got %d", 
bufferReset, enc.cur)
+   }
+
+   // Forward the current buffer, leaving the matches at the bottom.
+   enc.cur = bufferReset
+   enc.shiftOffsets()
+
+   // Ensure that no matches were picked up.
+   got = len(enc.encode(nil, testData))
+   if wantFirstTokens != got {
+   t.Errorf("got %d, want %d tokens", got, wantFirstTokens)
+   }
+}
+
 func TestMaxStackSize(t *testing.T) {
// This test must not run in parallel with other tests as 
debug.SetMaxStack
// affects all goroutines.
diff --git a/libgo/go/compress/flate/deflatefast.go 
b/libgo/go/compress/flate/deflatefast.go
index 24f8be9d5db..6aa439f13d9 100644
--- a/libgo/go/compress/flate/deflatefast.go
+++ b/libgo/go/compress/flate/deflatefast.go
@@ -270,6 +270,7 @@ func (e *deflateFast) matchLen(s, t int32, src []byte) 
int32 {
 func (e *deflateFast) reset() {
e.prev = e.prev[:0]
// Bump the offset, so all matches will fail distance check.
+   // Nothing should be >= e.cur in 

Re: [PATCH] Overflow-trapping integer arithmetic routines7code

2020-11-10 Thread Jeff Law via Gcc-patches


On 10/5/20 10:49 AM, Stefan Kanthak wrote:
> The implementation of the functions __absv?i2(), __addv?i3() etc. for
> trapping integer overflow provided in libgcc2.c is rather bad.
> Same for __cmp?i2() and __ucmp?i2()
>
> At least for AMD64 and i386 processors GCC creates awful to horrible
> code for them: see 
> for some examples as well as the expected assembly.
>
> The attached diff/patch provides better implementations.

So you didn't indicate how this was tested.   The addition, subtraction,
negation and mulvsi3 are reasonable.  The mulvsi3 change introduced a
failure on m32r, but I'm highly confident it's an pre-existing simulator
bug.


But the abs changes and the other other multiply changes have undesired
LIBGCC2_BAD_CODE ifdefs that you need to clean up before we could accept
those changes.


What I don't understand is why the [u]cmpdi/cmpdi2 changes don't cause
infinite recursion.   I would expect the comparisons you're using to
compute the return value to themselves trigger calls to [u]cmpdi2.  I
can speculate that the expanders are open-coding the comparison, but in
that case I'd expect the libgcc2 routines to never be used, so
optimizing them is pointless :-)   But I put them through my tester and
couldn't see any evidence that they're causing problems though and
reviewed the generated code on m32r since it was convenient to do so.


Also note the ucmpdi changes require updating the prototype in
libgcc2.h, otherwise you get build failures building gcc itself.  I
fixed that minor omission.


So with all that in mind, I installed everything except the bits which
have the LIBGCC2_BAD_CODE ifdefs after testing on the various crosses. 
If you could remove the ifdefs on the abs/mult changes and resubmit them
it'd be appreciated.

Jeff



Re: [PATCH] C-family : Add attribute 'unavailable'.

2020-11-10 Thread Joseph Myers
This patch seems to be missing documentation for the new attribute in 
extend.texi.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][MIPS] PR target/77510 Reduce size of MIPS core automaton

2020-11-10 Thread Paul Koning via Gcc-patches



> On Nov 10, 2020, at 6:42 AM, Xu Chenghua  wrote:
> 
> 
> Hi:
> 
> This patch reduce reservation of model do not more than 10 cycles. The memory 
> of genautomata down to 1GB.

Does this make any significant difference in performance of the generated code? 
 The original cycle counts are from the CPU description, so trimming them to a 
max of 10 means the model no longer reflects the real implementation.  It may 
be the difference is too small to worry about.  Can you comment on this?

paul




Re: [PATCH] Overflow-trapping integer arithmetic routines7code

2020-11-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 10, 2020 at 08:29:58AM -0700, Jeff Law via Gcc-patches wrote:
> On 10/5/20 10:49 AM, Stefan Kanthak wrote:
> > The implementation of the functions __absv?i2(), __addv?i3() etc. for
> > trapping integer overflow provided in libgcc2.c is rather bad.
> > Same for __cmp?i2() and __ucmp?i2()
> >
> > At least for AMD64 and i386 processors GCC creates awful to horrible
> > code for them: see 
> > for some examples as well as the expected assembly.
> >
> > The attached diff/patch provides better implementations.
> 
> So you didn't indicate how this was tested.   The addition, subtraction,
> negation and mulvsi3 are reasonable.  The mulvsi3 change introduced a
> failure on m32r, but I'm highly confident it's an pre-existing simulator
> bug.

The formatting is wrong too (e.g. no space before ( when calling functions).

> What I don't understand is why the [u]cmpdi/cmpdi2 changes don't cause
> infinite recursion.   I would expect the comparisons you're using to
> compute the return value to themselves trigger calls to [u]cmpdi2.  I
> can speculate that the expanders are open-coding the comparison, but in
> that case I'd expect the libgcc2 routines to never be used, so
> optimizing them is pointless :-)

Yeah, I don't really understand these changes either.

Jakub



Re: [PATCH] Overflow-trapping integer arithmetic routines7code

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/10/20 8:57 AM, Jakub Jelinek wrote:
> On Tue, Nov 10, 2020 at 08:29:58AM -0700, Jeff Law via Gcc-patches wrote:
>> On 10/5/20 10:49 AM, Stefan Kanthak wrote:
>>> The implementation of the functions __absv?i2(), __addv?i3() etc. for
>>> trapping integer overflow provided in libgcc2.c is rather bad.
>>> Same for __cmp?i2() and __ucmp?i2()
>>>
>>> At least for AMD64 and i386 processors GCC creates awful to horrible
>>> code for them: see 
>>> for some examples as well as the expected assembly.
>>>
>>> The attached diff/patch provides better implementations.
>> So you didn't indicate how this was tested.   The addition, subtraction,
>> negation and mulvsi3 are reasonable.  The mulvsi3 change introduced a
>> failure on m32r, but I'm highly confident it's an pre-existing simulator
>> bug.
> The formatting is wrong too (e.g. no space before ( when calling functions).

Ugh.  I fixed that in one tree, but not in the tree I committed from. 
I'll adjust momentarily.

jeff



[PATCH/RFC v2] Add -fdiagnostics-path-format=html [v2]

2020-11-10 Thread David Malcolm via Gcc-patches
Here's an updated version of the HTML output idea from:
  https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556848.html

I reworked the HTML path output to show stack frames as well as just
runs of events, using drop-shadows to give a 3D look.  The idea is to try
to highlight the stack of frames as if it were an actual stack of
overlapping cards.

Updated HTML for the example from the earlier patch can be seen here:
  
https://dmalcolm.fedorapeople.org/gcc/2020-11-05/html-examples/test.c.path-1.html
As before, other examples can be seen in that directory, such as:
  Signal handler issue:

https://dmalcolm.fedorapeople.org/gcc/2020-11-05/html-examples/signal-1.c.path-1.html
  Leak due to longjmp past a "free":

https://dmalcolm.fedorapeople.org/gcc/2020-11-05/html-examples/setjmp-7.c.path-1.html

Other changes in v2:
* switched j and k in keyboard navigation so that j is "next event"
and k is "previous event"
* fixed event element IDs, fixing a bug where it assumed they were in
  ascending order
* moved HTML printing code out of path_summary and event_range
* more selftest coverage

As before, this approach merely emits the path information; it doesn't
capture the associated diagnostic.  I'm working on an alternate approach
for v3 of the patch that does that; doing that requires reworking
pretty_printer.

I'm not sure on exactly the correct approach here; for v3 I'm
experimenting with an "html" option for -fdiagnostics-format= which
when selected would supplement the existing text output, by additionally
writing out an HTML file for each diagnostic group, with path
information captured there.  Doing it per group means that the classic
"warning"/"note" pairs would be grouped together in that output.

I'm not sure what that ought to do with paths though; part of the point
of doing it is to have less verbose output on stderr whilst capturing
the pertinent information "on the side" in the HTML file.

Thoughts?

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, FWIW.

gcc/ChangeLog:
* common.opt (diagnostic_path_format): Add DPF_HTML.
* diagnostic-format-json.cc: Include "selftest.h".
* diagnostic-show-locus.c (colorizer::m_context): Replace with...
(colorizer::m_pp): ...this new field.
(layout::m_context): Make const.
(layout::m_is_html): New field.
(layout::m_html_label_writer): New field.
(colorizer::colorizer): Update for use of pp rather than context.
(colorizer::begin_state): Likewise.
(colorizer::finish_state): Likewise.
(colorizer::get_color_by_name): Likewise.
(layout::layout): Make "context" param const.  Add "pp", "is_html"
and "label_writer" params, with defaults.  Use them to initialize
new fields.  Update for change to m_colorizer.
(layout::print_source_line): Add HTML support.
(layout::start_annotation_line): Likewise.
(layout::print_annotation_line): Likewise.
(line_label::line_label): Make context const.  Add
"original_range_idx" param and use it to initialize...
(line_label::m_original_range_idx): ...this new field.
(layout::print_any_labels): Pass original index of range to
line_label ctor.  Add HTML support.
(get_affected_range): Make context const.
(get_printed_columns): Likewise.
(line_corrections::line_corrections): Likewise.
(line_corrections::m_context): Likewise.
(layout::print_line): Don't print fix-it hints for HTML support.
(diagnostic_show_locus_as_html): New.
(selftest::assert_html_eq): New.
(ASSERT_HTML_EQ): New.
(selftest::test_one_liner_simple_caret): Verify the HTML output.
(selftest::test_diagnostic_show_locus_fixit_lines): Likewise.
(selftest::test_html): New.
(selftest::diagnostic_show_locus_c_tests): Call it.
* diagnostic.c (diagnostic_show_any_path): Pass the location to
the print_path callback.
* diagnostic.h (enum diagnostic_path_format): Add DPF_HTML.
(diagnostic_context::num_html_paths): New field.
(diagnostic_context::print_path): Add location_t param.
(class html_label_writer): New.
(diagnostic_show_locus_as_html): New decl.
* doc/invoke.texi (Diagnostic Message Formatting Options): Add
"html" to -fdiagnostics-path-format=.
(-fdiagnostics-path-format=): Add html.
* pretty-print.c (pp_write_text_as_html_to_stream): New.
* pretty-print.h (pp_write_text_as_html_to_stream): New decl.
* selftest-diagnostic.c (selftest::html_printer::html_printer):
New ctor.
(selftest::html_printer::get_html_output): New.
* selftest-diagnostic.h (class selftest::html_printer): New.
* tree-diagnostic-path.cc: Define GCC_DIAG_STYLE where possible
and necessary.  Include "options.h" for dump_base_name.
(path_label::path_label): Add "colorize" param an

[PATCH] [PR target/97727] arm: [testsuite] fix some simd tests on armbe

2020-11-10 Thread Andrea Corallo via Gcc-patches
Hi all,

I'd like to submit this patch to fix three testcases reported to be
failing on arm big endian on PR target/97727.

Okay for trunk?

Thanks

  Andrea
  
>From c2d5faf49647381edbcdce6a709b5ed046ce13a4 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Tue, 10 Nov 2020 15:15:27 +0100
Subject: [PATCH] [PR target/97727] arm: [testsuite] fix some simd tests on
 armbe

2020-11-10  Andrea Corallo  

PR target/97727
* gcc.target/arm/simd/bf16_vldn_1.c: Relax regexps not to fail on
big endian.
* gcc.target/arm/simd/vldn_lane_bf16_1.c: Likewise
* gcc.target/arm/simd/vmmla_1.c: Add -mfloat-abi=hard flag.
---
 .../gcc.target/arm/simd/bf16_vldn_1.c | 48 +--
 .../gcc.target/arm/simd/vldn_lane_bf16_1.c| 30 +++-
 gcc/testsuite/gcc.target/arm/simd/vmmla_1.c   |  2 +-
 3 files changed, 43 insertions(+), 37 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c 
b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
index 222e7af9453..663e76984c3 100644
--- a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
@@ -10,8 +10,8 @@
 /*
 **test_vld2_bf16:
 ** ...
-** vld2.16 {d0-d1}, \[r0\]
-** bx  lr
+** vld[0-9]+.16{d[0-9]+-d[0-9]+}, \[r[0-9]+\]
+** ...
 */
 bfloat16x4x2_t
 test_vld2_bf16 (bfloat16_t * ptr)
@@ -22,8 +22,8 @@ test_vld2_bf16 (bfloat16_t * ptr)
 /*
 **test_vld2q_bf16:
 ** ...
-** vld2.16 {d0-d3}, \[r0\]
-** bx  lr
+** vld[0-9]+.16{d[0-9]+-d[0-9]+}, \[r[0-9]+\]
+** ...
 */
 bfloat16x8x2_t
 test_vld2q_bf16 (bfloat16_t * ptr)
@@ -34,8 +34,8 @@ test_vld2q_bf16 (bfloat16_t * ptr)
 /*
 **test_vld2_dup_bf16:
 ** ...
-** vld2.16 {d0\[\], d1\[\]}, \[r0\]
-** bx  lr
+** vld[0-9]+.16{d[0-9]+\[\], d[0-9]+\[\]}, \[r[0-9]+\]
+** ...
 */
 bfloat16x4x2_t
 test_vld2_dup_bf16 (bfloat16_t * ptr)
@@ -46,8 +46,8 @@ test_vld2_dup_bf16 (bfloat16_t * ptr)
 /*
 **test_vld2q_dup_bf16:
 ** ...
-** vld2.16 {d0, d1, d2, d3}, \[r0\]
-** bx  lr
+** vld[0-9]+.16{d[0-9]+, d[0-9]+, d[0-9]+, d[0-9]+}, \[r[0-9]+\]
+** ...
 */
 bfloat16x8x2_t
 test_vld2q_dup_bf16 (bfloat16_t * ptr)
@@ -58,8 +58,8 @@ test_vld2q_dup_bf16 (bfloat16_t * ptr)
 /*
 **test_vld3_bf16:
 ** ...
-** vld3.16 {d0-d2}, \[r0\]
-** bx  lr
+** vld[0-9]+.16{d[0-9]+-d[0-9]+}, \[r[0-9]+\]
+** ...
 */
 bfloat16x4x3_t
 test_vld3_bf16 (bfloat16_t * ptr)
@@ -70,8 +70,8 @@ test_vld3_bf16 (bfloat16_t * ptr)
 /*
 **test_vld3q_bf16:
 ** ...
-** vld3.16 {d1, d3, d5}, \[r0\]
-** bx  lr
+** vld[0-9]+.16{d[0-9]+, d[0-9]+, d[0-9]+}, \[r[0-9]+\]
+** ...
 */
 bfloat16x8x3_t
 test_vld3q_bf16 (bfloat16_t * ptr)
@@ -82,8 +82,8 @@ test_vld3q_bf16 (bfloat16_t * ptr)
 /*
 **test_vld3_dup_bf16:
 ** ...
-** vld3.16 {d0\[\], d1\[\], d2\[\]}, \[r0\]
-** bx  lr
+** vld[0-9]+.16{d[0-9]+\[\], d[0-9]+\[\], d[0-9]+\[\]}, \[r[0-9]+\]
+** ...
 */
 bfloat16x4x3_t
 test_vld3_dup_bf16 (bfloat16_t * ptr)
@@ -94,8 +94,8 @@ test_vld3_dup_bf16 (bfloat16_t * ptr)
 /*
 **test_vld3q_dup_bf16:
 ** ...
-** vld3.16 {d0\[\], d1\[\], d2\[\]}, \[r0\]
-** bx  lr
+** vld[0-9]+.16{d[0-9]+\[\], d[0-9]+\[\], d[0-9]+\[\]}, \[r[0-9]+\]
+** ...
 */
 bfloat16x8x3_t
 test_vld3q_dup_bf16 (bfloat16_t * ptr)
@@ -106,8 +106,8 @@ test_vld3q_dup_bf16 (bfloat16_t * ptr)
 /*
 **test_vld4_bf16:
 ** ...
-** vld4.16 {d0-d3}, \[r0\]
-** bx  lr
+** vld4.16 {d[0-9]+-d[0-9]+}, \[r[0-9]+\]
+** ...
 */
 bfloat16x4x4_t
 test_vld4_bf16 (bfloat16_t * ptr)
@@ -118,8 +118,8 @@ test_vld4_bf16 (bfloat16_t * ptr)
 /*
 **test_vld4q_bf16:
 ** ...
-** vld4.16 {d1, d3, d5, d7}, \[r0\]
-** bx  lr
+** vld4.16 {d[0-9]+, d[0-9]+, d[0-9]+, d[0-9]+}, \[r[0-9]+\]
+** ...
 */
 bfloat16x8x4_t
 test_vld4q_bf16 (bfloat16_t * ptr)
@@ -130,8 +130,8 @@ test_vld4q_bf16 (bfloat16_t * ptr)
 /*
 **test_vld4_dup_bf16:
 ** ...
-** vld4.16 {d0\[\], d1\[\], d2\[\], d3\[\]}, \[r0\]
-** bx  lr
+** vld4.16 {d[0-9]+\[\], d[0-9]+\[\], d[0-9]+\[\], d[0-9]+\[\]}, 
\[r[0-9]+\]
+** ...
 */
 bfloat16x4x4_t
 test_vld4_dup_bf16 (bfloat16_t * ptr)
@@ -142,8 +142,8 @@ test_vld4_dup_bf16 (bfloat16_t * ptr)
 /*
 **test_vld4q_dup_bf16:
 ** ...
-** vld4.16 {d0\[\], d1\[\], d2\[\], d3\[\]}, \[r0\]
-** bx  lr
+** vld4.16 {d[0-9]+\[\], d[0-9]+\[\], d[0-9]+\[\], d[0-9]+\[\]}, 
\[r[0-9]+\]
+** ...
 */
 bfloat16x8x4_t
 test_vld4q_dup_bf16 (bfloat16_t * ptr)
diff --git a/gcc/testsuite/gcc.target/arm/simd/vldn_lane_bf16_1.c 
b/gcc/testsuite/gcc.target/arm/simd/vldn_lane_bf16_1.c
index 58153eddf62..b235b1f33b9 100644
--- a/gcc/testsuite/gcc.target/arm/simd/vldn_lane_bf16_1.c
+++ b/gcc/testsuite/gcc.target/arm/simd/vldn_lane_bf16_1.c
@@ -8,8 +8,9 @@
 
 /*
 **test_vld2_lane_bf16:
-** vld2.16 {d0\[2\], d1\[2\]}, \[r0\]
-** bx  lr
+** ...
+**

Re: [PATCH 1/3] Refactor copying decl section names

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/12/19 11:28 PM, Strager Neds wrote:
> Sometimes, we need to copy a section name from one decl or symtab node
> to another. Currently, this is done by getting the source's section
> name and setting the destination's section name. For example:
>
> set_decl_section_name (dest, DECL_SECTION_NAME (source));
> dest->set_section (source->get_section ());
>
> This code could be more efficient. Section names are stored in an
> interning hash table, but the current interfaces of
> set_decl_section_name and symtab_node::set_section force unnecessary
> indirections (to get the section name) and hashing (to find the section
> name in the hash table).
>
> Overload set_decl_section_name and symtab_node::set_section to accept an
> existing symtab_node to copy the section name from:
>
> set_decl_section_name (dest, source);
> dest->set_section (*source);
>
> For now, implement these new functions as a simple wrapper around the
> existing functions. In the future, these functions can be implemented
> using just a pointer copy and an increment (for the reference count).
>
> This patch should not change behavior.
>
> Testing: Bootstrap on x86_64-linux-gnu with --disable-multilib
> --enable-checking=release --enable-languages=c,c++. Observe no change in
> test results.
>
> 2019-11-12  Matthew Glazar 
>
> * gcc/cgraph.h (symtab_node::get_section): Constify.
> (symtab_node::set_section): Declare new overload.
> * gcc/symtab.c (symtab_node::set_section): Define new overload.
> (symtab_node::copy_visibility_from): Use new overload of
> symtab_node::set_section.
> (symtab_node::resolve_alias): Same.
> * gcc/tree.h (set_decl_section_name): Declare new overload.
> * gcc/tree.c (set_decl_section_name): Define new overload.
> * gcc/c/c-decl.c (merge_decls): Use new overload of
> set_decl_section_name.
> * gcc/cp/decl.c (duplicate_decls): Same.
> * gcc/cp/method.c (use_thunk): Same.
> * gcc/cp/optimize.c (maybe_clone_body): Same.
> * gcc/d/decl.cc (finish_thunk): Same.
> * gcc/tree-emutls.c (get_emutls_init_templ_addr): Same.
> * gcc/cgraphclones.c (cgraph_node::create_virtual_clone): Use new
> overload of symtab_node::set_section.
> (cgraph_node::create_version_clone_with_body): Same.
> * gcc/trans-mem.c (ipa_tm_create_version): Same.

So only a year after the original post 


Presumably given that you're just adding an overload, there is no need
or intent to change all the callers to set_decl_section_name.   
Right?   I scanned the various remaining calls, some just use a string,
or build up a string based on a decl or something else.  I did find one
other call that should be converted in cp/coroutines.cc which almost
certainly went onto the trunk after your patch was submitted.  I've
adjusted that one.

Also note, your mailer seemed to mangle whitespace in your message.  It
makes it slightly harder to deal with, so if you plan on making regular
contributions, we should probably work to get that fixed.  If the
refactoring + subsequent bugfixes are all you're planning to do, then
I'll fix the whitespace issues manually.


Given the age of the patch, the need to manually apply bits (due to
whitespace issues) and the desire to fix coroutines.cc, I'm giving it a
fresh test spin. 


Jeff





Re: [PATCH] c++: Improve static_assert diagnostic [PR97518]

2020-11-10 Thread Jason Merrill via Gcc-patches

On 11/9/20 7:21 PM, Marek Polacek wrote:

Currently, when a static_assert fails, we only say "static assertion failed".
It would be more useful if we could also print the expression that
evaluated to false; this is especially useful when the condition uses
template parameters.  Consider the motivating example, in which we have
this line:

   static_assert(is_same::value);

if this fails, the user has to play dirty games to get the compiler to
print the template arguments.  With this patch, we say:

   static assertion failed due to requirement 'is_same::value'


I'd rather avoid the word "requirement" here, to avoid confusion with 
Concepts.


Maybe have the usual failed error, and if we're showing the expression, 
have a second inform to say e.g. "%qE evaluates to false"?


Also, if we've narrowed the problem down to a particular subexpression, 
let's only print that one.



which I think is much better.  However, always printing the condition that
evaluated to 'false' wouldn't be very useful: e.g. noexcept(fn) is
always parsed to true/false, so we would say "static assertion failed due
to requirement 'false'" which doesn't help.  So I wound up only printing
the condition when it was instantiation-dependent, that is, we called
finish_static_assert from tsubst_expr.

Moreover, this patch also improves the diagnostic when the condition
consists of a logical AND.  Say you have something like this:

   static_assert(fn1() && fn2() && fn3() && fn4() && fn5());

where fn4() evaluates to false and the other ones to true.  Highlighting
the whole thing is not that helpful because it won't say which clause
evaluated to false.  With the find_failing_clause tweak in this patch
we emit:

   error: static assertion failed
 6 | static_assert(fn1() && fn2() && fn3() && fn4() && fn5());
   |  ~~~^~

so you know right away what's going on.  Unfortunately, when you combine
both things, that is, have an instantiation-dependent expr and && in
a static_assert, we can't yet quite point to the clause that failed.  It
is because when we tsubstitute something like is_same::value, we
generate a VAR_DECL that doesn't have any location.  It would be awesome
if we could wrap it with a location wrapper, but I didn't see anything
obvious.


Hmm, I vaguely remember that at first we were using location wrappers 
less in templates, but I thought that was fixed.  I don't see anything 
setting suppress_location_wrappers, and it looks like 
tsubst_copy_and_build should preserve a location wrapper.



Further tweak could be to print the failing clause of the condition if
possible, whether or not it was instantiation-dependent.


If not instantiation-dependent, underlining the clause seems sufficient.


In passing, I've cleaned up some things:
* use iloc_sentinel when appropriate,
* it's nicer to call contextual_conv_bool instead of the rather verbose
   perform_implicit_conversion_flags,
* no need to check for INTEGER_CST before calling integer_zerop.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/97518
* cp-tree.h (finish_static_assert): Adjust declaration.
* parser.c (cp_parser_static_assert): Pass false to
finish_static_assert.
* pt.c (tsubst_expr): Pass true to finish_static_assert.
* semantics.c (find_failing_clause_r): New function.
(find_failing_clause): New function.
(finish_static_assert): Add a bool parameter.  Use
iloc_sentinel.  Call contextual_conv_bool instead of
perform_implicit_conversion_flags.  Don't check for INTEGER_CST before
calling integer_zerop.  Call find_failing_clause and maybe use its
location.  Print the original condition if SHOW_EXPR_P.

gcc/testsuite/ChangeLog:

PR c++/97518
* g++.dg/diagnostic/pr87386.C: Adjust expected output.
* g++.dg/diagnostic/static_assert1.C: New test.
* g++.dg/diagnostic/static_assert2.C: New test.

libcc1/ChangeLog:

PR c++/97518
* libcp1plugin.cc (plugin_add_static_assert): Pass false to
finish_static_assert.

libstdc++-v3/ChangeLog:

* testsuite/20_util/scoped_allocator/69293_neg.cc: Adjust dg-error.
* testsuite/20_util/uses_allocator/69293_neg.cc: Likewise.
* testsuite/20_util/uses_allocator/cons_neg.cc: Likewise.
* testsuite/26_numerics/random/pr60037-neg.cc: Likewise.
---
  gcc/cp/cp-tree.h  |  2 +-
  gcc/cp/parser.c   |  3 +-
  gcc/cp/pt.c   |  6 +-
  gcc/cp/semantics.c| 79 ---
  gcc/testsuite/g++.dg/diagnostic/pr87386.C |  2 +-
  .../g++.dg/diagnostic/static_assert1.C| 20 +
  .../g++.dg/diagnostic/static_assert2.C| 68 
  libcc1/libcp1plugin.cc|  2 +-
  .../20_util/scoped_allocator/69293_neg.cc |  2 +-
  .../20_uti

Re: [PATCH] c-family: Avoid unnecessary work when -Wpragmas is being ignored

2020-11-10 Thread David Malcolm via Gcc-patches
On Mon, 2020-11-09 at 10:38 -0500, Patrick Palka wrote:
> This speeds up handle_pragma_diagnostic by avoiding computing a
> spelling
> suggestion for an unrecognized option inside a #pragma directive when
> -Wpragmas warnings are being suppressed.
> 
> In the range-v3 library, which contains many instances of
> 
>   #pragma GCC diagnostic push
>   #pragma GCC diagnostic ignored "-Wpragmas"
>   #pragma GCC diagnostic ignored "-Wfoo"
>   ...
>   #pragma GCC diagnostic pop
> 
> compile time is reduced by 33% in some of its tests.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> for
> trunk?

Looks good to me.

Dave



Re: [Patch] Fortran: OpenMP 5.0 (in_,task_)reduction clause extensions

2020-11-10 Thread Tobias Burnus

Hi Jakub,

thanks for the comments. I have now removed
OMP_CLAUSE_REDUCTION{_DEFAULT,_MODIFIER} and use 'openacc' as flag for OpenACC.
The only 'default:' check is now moved to resolution stage.

If I gathered it correctly, the splitting is complex but the
way it is implemented, it happens to be fine.

We discussed that the _LIST part gets too long (but that's something
for later) and that 'omp scan' best get implemented during stage1.
(Additionally, I believe that the ME always complains about 'inscan'
not being in 'omp scan' – and it can't as 'omp scan' is not yet
supported.)

On 10.11.20 13:16, Jakub Jelinek wrote:


+  for (int i = OMP_LIST_REDUCTION; i <= OMP_LIST_REDUCTION_TASK; i++)
+{
+  if (mask & GFC_OMP_MASK_TEAMS)
+clausesa[GFC_OMP_SPLIT_TEAMS].lists[i]
+  = code->ext.omp_clauses->lists[i];
+  if (mask & GFC_OMP_MASK_PARALLEL)
+clausesa[GFC_OMP_SPLIT_PARALLEL].lists[i]
+  = code->ext.omp_clauses->lists[i];
+  else if (mask & GFC_OMP_MASK_DO)
+clausesa[GFC_OMP_SPLIT_DO].lists[i]
+  = code->ext.omp_clauses->lists[i];
+  if (mask & GFC_OMP_MASK_SIMD)
+clausesa[GFC_OMP_SPLIT_SIMD].lists[i]
+  = code->ext.omp_clauses->lists[i];
+}

Seems in the middle-end we just ignore OMP_CLAUSE_REDUCTION_TASK on anything
but parallel/{for,do}/sections, so guess this is ok.



Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran: OpenMP 5.0 (in_,task_)reduction clause extensions

gcc/fortran/ChangeLog:

	PR fortran/95847
	* dump-parse-tree.c (show_omp_clauses): Handle new reduction enums.
	* gfortran.h (OMP_LIST_REDUCTION_INSCAN, OMP_LIST_REDUCTION_TASK,
	OMP_LIST_IN_REDUCTION, OMP_LIST_TASK_REDUCTION): Add enums.
	* openmp.c (enum omp_mask1): Add OMP_CLAUSE_IN_REDUCTION
	and OMP_CLAUSE_TASK_REDUCTION.
	(gfc_match_omp_clause_reduction): Extend reduction handling;
	moved from ...
	(gfc_match_omp_clauses): ... here. Add calls to it.
	(OMP_TASK_CLAUSES, OMP_TARGET_CLAUSES, OMP_TASKLOOP_CLAUSES):
	Add OMP_CLAUSE_IN_REDUCTION.
	(gfc_match_omp_taskgroup): Add task_reduction matching.
	(resolve_omp_clauses): Update for new reduction clause changes;
	remove removed nonmonotonic-schedule restrictions.
	(gfc_resolve_omp_parallel_blocks): Add new enums to switch.
	* trans-openmp.c (gfc_omp_clause_default_ctor,
	gfc_trans_omp_reduction_list, gfc_trans_omp_clauses,
	gfc_split_omp_clauses): Handle updated reduction clause.

gcc/ChangeLog:

	PR fortran/95847
	* gimplify.c (gimplify_scan_omp_clauses, gimplify_omp_loop): Use 'do'
	instead of 'for' in error messages for Fortran.
	* omp-low.c (check_omp_nesting_restrictions): Likewise

gcc/testsuite/ChangeLog:

	PR fortran/95847
	* gfortran.dg/gomp/schedule-modifiers-2.f90: Remove some dg-error.
	* gfortran.dg/gomp/reduction4.f90: New test.
	* gfortran.dg/gomp/reduction5.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-1.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-2.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-3.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-4.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-5.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-6.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-7.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-8.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-9.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-10.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-11.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-12.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-13.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-14.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-15.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-16.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-17.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-18.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-19.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-20.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-21.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-22.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-23.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-24.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-25.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-26.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-27.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-28.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-29.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-30.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-31.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-32.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-33.f90: New test.
	* gfortran.dg/gomp/workshare-reduction-34.f90: New test.
	* gfortran.dg/gomp/workshare-reductio

Re: [PATCH] analyzer: remove dead code

2020-11-10 Thread David Malcolm via Gcc-patches
On Fri, 2020-10-23 at 17:26 +0200, Martin Liška wrote:
> Hey.
> 
> I've noticed that when building GCC with Clang.
> David what do you think about it?

Thanks for the patch.

Looks good to me, assuming it goes through normal bootstrap and
regression-testing.

That said, technically I'm not a maintainer for the analyzer, unless
you count it as part of "diagnostic messages"; I've been making very
liberal use of Jeff's permission to commit followup work when he
approved the initial patch kit last year (in the hope that the steering
committee will make it official at some point).

Dave



Re: [PATCH] analyzer: remove dead code

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/10/20 9:57 AM, David Malcolm via Gcc-patches wrote:
> On Fri, 2020-10-23 at 17:26 +0200, Martin Liška wrote:
>> Hey.
>>
>> I've noticed that when building GCC with Clang.
>> David what do you think about it?
> Thanks for the patch.
>
> Looks good to me, assuming it goes through normal bootstrap and
> regression-testing.
>
> That said, technically I'm not a maintainer for the analyzer, unless
> you count it as part of "diagnostic messages"; I've been making very
> liberal use of Jeff's permission to commit followup work when he
> approved the initial patch kit last year (in the hope that the steering
> committee will make it official at some point).

I think you should be able to ack stuff in the analyzer, I'd actually
deleted this from my queue on the assumption you'd take care of it. :-)


But just to be safe, Martin, the patch is fine :-)


jeff



Re: [PATCH] fortran: Fix up gfc_typename CHARACTER length handling [PR97768]

2020-11-10 Thread Tobias Burnus

On 10.11.20 10:58, Jakub Jelinek via Gcc-patches wrote:


The earlier version of the patch has been successfully bootstrapped
and regtested (only with a few regressions) on x86_64-linux and i686-linux,
this version passed all the new and earlier problematic tests, ok for trunk
if it passes another bootstrap/regtest?


LGTM. Thanks!

Tobias


2020-11-10  Jakub Jelinek  

  PR fortran/97768
  * misc.c (gfc_typename): Use ex->value.character.length only if
  ex->expr_type == EXPR_CONSTANT.  If ex->ts.deferred, print : instead
  of length.  If ex->ts.u.cl && ex->ts.u.cl->length == NULL, print
  * instead of length.  Otherwise if character length is non-constant,
  print just CHARACTER or CHARACTER(KIND=N).

  * gfortran.dg/pr97768_1.f90: New test.
  * gfortran.dg/pr97768_2.f90: New test.

--- gcc/fortran/misc.c.jj 2020-11-09 23:01:02.978826528 +0100
+++ gcc/fortran/misc.c2020-11-10 10:41:22.087850720 +0100
@@ -224,10 +224,32 @@ gfc_typename (gfc_expr *ex)

if (ex->ts.type == BT_CHARACTER)
  {
-  if (ex->ts.u.cl && ex->ts.u.cl->length)
- length = gfc_mpz_get_hwi (ex->ts.u.cl->length->value.integer);
-  else
+  if (ex->expr_type == EXPR_CONSTANT)
  length = ex->value.character.length;
+  else if (ex->ts.deferred)
+ {
+   if (ex->ts.kind == gfc_default_character_kind)
+ return "CHARACTER(:)";
+   sprintf (buffer, "CHARACTER(:,%d)", ex->ts.kind);
+   return buffer;
+ }
+  else if (ex->ts.u.cl && ex->ts.u.cl->length == NULL)
+ {
+   if (ex->ts.kind == gfc_default_character_kind)
+ return "CHARACTER(*)";
+   sprintf (buffer, "CHARACTER(*,%d)", ex->ts.kind);
+   return buffer;
+ }
+  else if (ex->ts.u.cl == NULL
+|| ex->ts.u.cl->length->expr_type != EXPR_CONSTANT)
+ {
+   if (ex->ts.kind == gfc_default_character_kind)
+ return "CHARACTER";
+   sprintf (buffer, "CHARACTER(KIND=%d)", ex->ts.kind);
+   return buffer;
+ }
+  else
+ length = gfc_mpz_get_hwi (ex->ts.u.cl->length->value.integer);
if (ex->ts.kind == gfc_default_character_kind)
  sprintf (buffer, "CHARACTER(" HOST_WIDE_INT_PRINT_DEC ")", length);
else
--- gcc/testsuite/gfortran.dg/pr97768_1.f90.jj2020-11-10 
10:22:26.053445061 +0100
+++ gcc/testsuite/gfortran.dg/pr97768_1.f90   2020-11-10 10:22:26.053445061 
+0100
@@ -0,0 +1,25 @@
+! PR fortran/97768
+! { dg-do compile }
+
+module pr97768_1
+  interface operator(.in.)
+module procedure substr_in_str
+  end interface
+contains
+  pure function to_upper (in_str) result (string)
+character(len=*), intent(in) :: in_str
+character(len=len(in_str)) :: string
+string = in_str
+  end function to_upper
+  logical pure function substr_in_str (substring, string)
+character(len=*), intent(in) :: string, substring
+substr_in_str=.false.
+  end function
+end module
+function foo ()
+  use pr97768_1, only : to_upper, operator(.in.)
+  logical :: foo
+  character(len=8) :: str
+  str = 'abcde'
+  foo = 'b' .in. to_upper (str)
+end function foo
--- gcc/testsuite/gfortran.dg/pr97768_2.f90.jj2020-11-10 
10:22:26.053445061 +0100
+++ gcc/testsuite/gfortran.dg/pr97768_2.f90   2020-11-10 10:46:15.104602757 
+0100
@@ -0,0 +1,53 @@
+! PR fortran/97768
+! { dg-do compile }
+
+module pr97768_2
+  interface operator(.in.)
+module procedure substr_in_str
+  end interface
+contains
+  pure function to_upper (in_str) result (string)
+character(len=*), intent(in) :: in_str
+character(len=len(in_str)) :: string
+string = in_str
+  end function to_upper
+  logical pure function substr_in_str (substring, string)
+character(len=*), intent(in) :: string, substring
+substr_in_str=.false.
+  end function
+end module
+function foo ()
+  use pr97768_2, only : to_upper, operator(.in.)
+  logical :: foo
+  character(len=8) :: str
+  str = 'abcde'
+  foo = to_upper (str) .in. 32! { dg-error "are CHARACTER/INTEGER" }
+end function foo
+function bar (str)
+  use pr97768_2, only : operator(.in.)
+  logical :: bar
+  character(len=*) :: str
+  foo = str .in. 32   ! { dg-error "are 
CHARACTER\\(\\*\\)/INTEGER" }
+end function bar
+function baz (lenstr)
+  use pr97768_2, only : operator(.in.)
+  logical :: baz
+  integer :: lenstr
+  character(len=lenstr) :: str
+  str = 'abc'
+  foo = str .in. 32   ! { dg-error "are CHARACTER/INTEGER" }
+end function baz
+function qux ()
+  use pr97768_2, only : operator(.in.)
+  logical :: qux
+  character(len=8) :: str
+  str = 'def'
+  foo = str .in. 32   ! { dg-error "are CHARACTER\\(8\\)/INTEGER" }
+end function qux
+function corge ()
+  use pr97768_2, only : operator(.in.)
+  logical :: corge
+  character(len=:), allocatable :: str
+  str = 'ghijk'
+  foo = str .in. 32   ! { dg-error "are CHARACTER\\(:\\)/INTEGER" }
+end function corge

  Jakub


-
Mentor Graphics (Deutschland) G

Re: [PATCH] c-family: Avoid unnecessary work when -Wpragmas is being ignored

2020-11-10 Thread Patrick Palka via Gcc-patches
On Tue, 10 Nov 2020, David Malcolm wrote:

> On Mon, 2020-11-09 at 10:38 -0500, Patrick Palka wrote:
> > This speeds up handle_pragma_diagnostic by avoiding computing a
> > spelling
> > suggestion for an unrecognized option inside a #pragma directive when
> > -Wpragmas warnings are being suppressed.
> > 
> > In the range-v3 library, which contains many instances of
> > 
> >   #pragma GCC diagnostic push
> >   #pragma GCC diagnostic ignored "-Wpragmas"
> >   #pragma GCC diagnostic ignored "-Wfoo"
> >   ...
> >   #pragma GCC diagnostic pop
> > 
> > compile time is reduced by 33% in some of its tests.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > for
> > trunk?
> 
> Looks good to me.

Thanks David & Jeff.  Patch committed to trunk as r11-4849.

> 
> Dave
> 
> 



Re: [PATCH] Overflow-trapping integer arithmetic routines7code

2020-11-10 Thread Stefan Kanthak
Jeff Law  wrote:

> On 10/5/20 10:49 AM, Stefan Kanthak wrote:
>> The implementation of the functions __absv?i2(), __addv?i3() etc. for
>> trapping integer overflow provided in libgcc2.c is rather bad.
>> Same for __cmp?i2() and __ucmp?i2()
>>
>> At least for AMD64 and i386 processors GCC creates awful to horrible
>> code for them: see 
>> for some examples as well as the expected assembly.
>>
>> The attached diff/patch provides better implementations.
> 
> So you didn't indicate how this was tested.

I inspected the assembly generated for i386 and AMD64 and found it to be
correct. I also ran both the current and the patched functions in parallel
and compared their results, without detecting any deviation (except for
the greater speed of the patched functions).

> The addition, subtraction, negation and mulvsi3 are reasonable. The
> mulvsi3 change introduced a failure on m32r, but I'm highly confident
> it's an pre-existing simulator bug.

I only have AMD64 processors, so my tests are limited to these platforms.

> But the abs changes and the other other multiply changes have undesired
> LIBGCC2_BAD_CODE ifdefs that you need to clean up before we could accept
> those changes.

I included these alternative #ifdef'ed implementations only to demonstrate
that GCC generates bad^Wnot the best code there, especially for __mulvDI3(),
which I implemented using __umulsidi3() plus 2 __builtin_mul_overflow() and
2 __builtin_add_overflow() for Wtype instead of just a single
__builtin_mul_overflow() for DWtype. 

> What I don't understand is why the [u]cmpdi/cmpdi2 changes don't cause
> infinite recursion. I would expect the comparisons you're using to
> compute the return value to themselves trigger calls to [u]cmpdi2.

Why should the compiler generate calls to [u]cmpdi2() there?
It doesn't generate calls to the missing functions [u]adddi2() or
[u]subdi2() for addition and subtraction of DWtypes either!

JFTR: this also holds for the __ashlDI3(),  __ashrDI3() and __lshrDI3()
  functions, which should better be written as

__ashldi3 (DWtype u, shift_count_type b)
{
  return u << b;
}

__ashrdi3 (DWtype u, shift_count_type b)
{
  return u >> b;
}

__lshrdi3 (UDWtype u, shift_count_type b)
{
  return u >> b;
}

> I can speculate that the expanders are open-coding the comparison, but
> in that case I'd expect the libgcc2 routines to never be used, so
> optimizing them is pointless :-)

Both are documented, so users might call them directly.
Besides that, their source smells -- nobody with a sane mind would NOT
write (a > b) - (a < b) + 1, what every self-respecting compiler/optimiser
should translate into branch-free code.

> But I put them through my tester and couldn't see any evidence that
> they're causing problems though and reviewed the generated code on m32r
> since it was convenient to do so.
> 
> 
> Also note the ucmpdi changes require updating the prototype in libgcc2.h,
> otherwise you get build failures building gcc itself. I fixed that minor
> omission.

Argh, I missed that indeed.

> So with all that in mind, I installed everything except the bits which
> have the LIBGCC2_BAD_CODE ifdefs after testing on the various crosses. 
> If you could remove the ifdefs on the abs/mult changes and resubmit them
> it'd be appreciated.

Done.

regards
Stefan

libgcc2.patch
Description: Binary data


Re: [Patch] Fortran: OpenMP 5.0 (in_,task_)reduction clause extensions

2020-11-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 10, 2020 at 05:46:17PM +0100, Tobias Burnus wrote:
> Fortran: OpenMP 5.0 (in_,task_)reduction clause extensions
> 
> gcc/fortran/ChangeLog:
> 
>   PR fortran/95847
>   * dump-parse-tree.c (show_omp_clauses): Handle new reduction enums.
>   * gfortran.h (OMP_LIST_REDUCTION_INSCAN, OMP_LIST_REDUCTION_TASK,
>   OMP_LIST_IN_REDUCTION, OMP_LIST_TASK_REDUCTION): Add enums.
>   * openmp.c (enum omp_mask1): Add OMP_CLAUSE_IN_REDUCTION
>   and OMP_CLAUSE_TASK_REDUCTION.
>   (gfc_match_omp_clause_reduction): Extend reduction handling;
>   moved from ...
>   (gfc_match_omp_clauses): ... here. Add calls to it.
>   (OMP_TASK_CLAUSES, OMP_TARGET_CLAUSES, OMP_TASKLOOP_CLAUSES):
>   Add OMP_CLAUSE_IN_REDUCTION.
>   (gfc_match_omp_taskgroup): Add task_reduction matching.
>   (resolve_omp_clauses): Update for new reduction clause changes;
>   remove removed nonmonotonic-schedule restrictions.
>   (gfc_resolve_omp_parallel_blocks): Add new enums to switch.
>   * trans-openmp.c (gfc_omp_clause_default_ctor,
>   gfc_trans_omp_reduction_list, gfc_trans_omp_clauses,
>   gfc_split_omp_clauses): Handle updated reduction clause.
> 
> gcc/ChangeLog:
> 
>   PR fortran/95847
>   * gimplify.c (gimplify_scan_omp_clauses, gimplify_omp_loop): Use 'do'
>   instead of 'for' in error messages for Fortran.
>   * omp-low.c (check_omp_nesting_restrictions): Likewise
> 
> gcc/testsuite/ChangeLog:
> 
>   PR fortran/95847
>   * gfortran.dg/gomp/schedule-modifiers-2.f90: Remove some dg-error.
>   * gfortran.dg/gomp/reduction4.f90: New test.
>   * gfortran.dg/gomp/reduction5.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-1.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-2.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-3.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-4.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-5.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-6.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-7.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-8.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-9.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-10.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-11.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-12.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-13.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-14.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-15.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-16.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-17.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-18.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-19.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-20.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-21.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-22.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-23.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-24.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-25.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-26.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-27.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-28.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-29.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-30.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-31.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-32.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-33.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-34.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-35.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-36.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-37.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-38.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-39.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-40.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-41.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-42.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-43.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-44.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-45.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-46.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-47.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-48.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-49.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-50.f90: New test.
>   * gfortran.dg/gomp/workshare-reduction-51.f90: New test.
>   

[PATCH] IBM Z: Fix bootstrap breakage due to HAVE_TF macro

2020-11-10 Thread Ilya Leoshkevich via Gcc-patches
Bootstrap and regtest running on s390x-redhat-linux with --enable-shared
--with-system-zlib --enable-threads=posix --enable-__cxa_atexit
--enable-checking=yes,rtl --enable-gnu-indirect-function
--disable-werror --enable-languages=c,c++,fortran,objc,obj-c++
--with-arch=arch13.  Ok for master?



Commit e627cda56865 ("IBM Z: Store long doubles in vector registers
when possible") introduced HAVE_TF macro which expands to a logical
"or" of HAVE_ constants.  Not all of these constants are available in
GENERATOR_FILE context, so a hack was used: simply expand to true in
this case, because the actual value matters only during compiler
runtime and not during generation.

However, one aspect of this value matters during generation after all:
whether or not it's a constant, which in this case it appears to be.
This results in incorrect values in insn-flags.h and broken bootstrap
for some configurations.

Fix by using a dummy value that is not a constant.

gcc/ChangeLog:

2020-11-10  Ilya Leoshkevich  

* config/s390/s390.h (HAVE_TF): Use opaque value when
GENERATOR_FILE is defined.
---
 gcc/config/s390/s390.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 8c028317b6b..bc579a3dadd 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -1187,8 +1187,9 @@ struct GTY(()) machine_function
 #define TARGET_INDIRECT_BRANCH_TABLE s390_indirect_branch_table
 
 #ifdef GENERATOR_FILE
-/* gencondmd.c is built before insn-flags.h.  */
-#define HAVE_TF(icode) true
+/* gencondmd.c is built before insn-flags.h.  Use an arbitrary opaque value
+   that cannot be optimized away by gen_insn.  */
+#define HAVE_TF(icode) TARGET_HARD_FLOAT
 #else
 #define HAVE_TF(icode) (HAVE_##icode##_fpr || HAVE_##icode##_vr)
 #endif
-- 
2.25.4



Re: [PATCH] IBM Z: Fix bootstrap breakage due to HAVE_TF macro

2020-11-10 Thread Andreas Krebbel via Gcc-patches
On 10.11.20 18:43, Ilya Leoshkevich wrote:
> Bootstrap and regtest running on s390x-redhat-linux with --enable-shared
> --with-system-zlib --enable-threads=posix --enable-__cxa_atexit
> --enable-checking=yes,rtl --enable-gnu-indirect-function
> --disable-werror --enable-languages=c,c++,fortran,objc,obj-c++
> --with-arch=arch13.  Ok for master?
> 
> 
> 
> Commit e627cda56865 ("IBM Z: Store long doubles in vector registers
> when possible") introduced HAVE_TF macro which expands to a logical
> "or" of HAVE_ constants.  Not all of these constants are available in
> GENERATOR_FILE context, so a hack was used: simply expand to true in
> this case, because the actual value matters only during compiler
> runtime and not during generation.
> 
> However, one aspect of this value matters during generation after all:
> whether or not it's a constant, which in this case it appears to be.
> This results in incorrect values in insn-flags.h and broken bootstrap
> for some configurations.
> 
> Fix by using a dummy value that is not a constant.
> 
> gcc/ChangeLog:
> 
> 2020-11-10  Ilya Leoshkevich  
> 
>   * config/s390/s390.h (HAVE_TF): Use opaque value when
>   GENERATOR_FILE is defined.

Ok. Thanks!

Andreas


Re: [PATCH] openmp: Retire nest-var ICV

2020-11-10 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 09, 2020 at 05:43:20PM +, Kwok Cheung Yeung wrote:
> commit b4feb16f3c84b8f82163a4cbba6a31d55fbb8e5b
> Author: Kwok Cheung Yeung 
> Date:   Mon Nov 9 09:34:39 2020 -0800
> 
> openmp: Retire nest-var ICV for OpenMP 5.0
> 
> This removes the nest-var ICV, expressing nesting in terms of the
> max-active-levels-var ICV instead.
> 
> 2020-11-09  Kwok Cheung Yeung  
> 
>   libgomp/
>   * env.c (gomp_global_icv): Remove nest_var field.
>   (gomp_max_active_levels_var): Initialize to 1.
>   (parse_boolean): Return true on success.
>   (handle_omp_display_env): Express OMP_NESTED in terms of
>   gomp_max_active_levels_var.
>   (initialize_env): Set gomp_max_active_levels_var from
>   OMP_MAX_ACTIVE_LEVELS, OMP_NESTED, OMP_NUM_THREADS and
>   OMP_PROC_BIND.
>   * icv.c (omp_set_nested): Express in terms of
>   gomp_max_active_levels_var.
>   (omp_get_nested): Likewise.
>   * libgomp.h (struct gomp_task_icv): Remove nest_var field.
>   * libgomp.texi (omp_get_nested): Update documentation.
>   (omp_set_nested): Likewise.
>   (OMP_MAX_ACTIVE_LEVELS): Likewise.
>   (OMP_NESTED): Likewise.
>   (OMP_NUM_THREADS): Likewise.
>   (OMP_PROC_BIND): Likewise.
>   * parallel.c (gomp_resolve_num_threads): Replace reference
>   to nest_var with gomp_max_active_levels_var.
>   * testsuite/libgomp.c/target-5.c: Remove additional options.
>   (main): Remove references to omp_get_nested and omp_set_nested.

One thing is that max-active-levels-var in 5.0 is per-device,
but in 5.1 per-data environment.  The question is if we should implement
the problematic 5.0 way or the 5.1 one.  E.g.:
#include 
#include 

int
main ()
{
  #pragma omp parallel
  {
omp_set_nested (1);
#pragma omp parallel num_threads(2)
printf ("Hello, world!\n");
  }
}
which used to be valid in 4.5 (where nest-var used to be per-data
environment) is in 5.0 racy (and in 5.1 will not be racy again).
Though, as these are deprecated APIs, perhaps we can just do the 5.0 way for
now.

> --- a/libgomp/libgomp.texi
> +++ b/libgomp/libgomp.texi
> @@ -489,8 +489,11 @@ represent their language-specific counterparts.
>  
>  Nested parallel regions may be initialized at startup by the 
>  @env{OMP_NESTED} environment variable or at runtime using
> -@code{omp_set_nested}.  If undefined, nested parallel regions are
> -disabled by default.
> +@code{omp_set_nested}.  Setting the maximum number of nested
> +regions to above one using the @env{OMP_MAX_ACTIVE_LEVELS}
> +environment variable or @code{omp_set_max_active_levels} will
> +also enable nesting.  If undefined, nested parallel regions
> +are disabled by default.

This doesn't really describe what env.c does.  If undefined, then
if OMP_NESTED is defined, it will be folloed, and if neither is
defined, the code sets the default based on
"OMP_NUM_THREADS or OMP_PROC_BIND is set to a
comma-separated list of more than one value"
as the spec says and only is disabled otherwise.

> @@ -1502,10 +1511,11 @@ disabled by default.
>  @item @emph{Description}:
>  Specifies the initial value for the maximum number of nested parallel
>  regions.  The value of this variable shall be a positive integer.
> -If undefined, the number of active levels is unlimited.
> +If undefined, the number of active levels is one, which effectively
> +disables nested regions.

Similarly.
>  
>  @item @emph{See also}:
> -@ref{omp_set_max_active_levels}
> +@ref{omp_set_max_active_levels}, @ref{OMP_NESTED}
>  
>  @item @emph{Reference}: 
>  @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
> @@ -1541,11 +1551,13 @@ integer, and zero is allowed.  If undefined, the 
> default priority is
>  @item @emph{Description}:
>  Enable or disable nested parallel regions, i.e., whether team members
>  are allowed to create new teams.  The value of this environment variable 
> -shall be @code{TRUE} or @code{FALSE}.  If undefined, nested parallel 
> -regions are disabled by default.
> +shall be @code{TRUE} or @code{FALSE}.  If set to @code{TRUE}, the number
> +of maximum active nested regions supported will by default be set to the
> +maximum supported, otherwise it will be set to one.  If undefined, nested
> +parallel regions are disabled by default.

Again.

> --- a/libgomp/testsuite/libgomp.c/target-5.c
> +++ b/libgomp/testsuite/libgomp.c/target-5.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-Wno-deprecated-declarations" } */
> -
>  #include 
>  #include 
>  
> @@ -7,17 +5,14 @@ int
>  main ()
>  {
>int d_o = omp_get_dynamic ();
> -  int n_o = omp_get_nested ();
>omp_sched_t s_o;
>int c_o;
>omp_get_schedule (&s_o, &c_o);
>int m_o = omp_get_max_threads ();
>omp_set_dynamic (1);
> -  omp_set_nested (1);
>omp_set_schedule (omp_sched_static, 2);
>omp_set_num_threads (4);
>int d = omp_get_dynamic ();
> -  int n = omp_get_nested ();
>omp_sched_t s;
>int c;
>omp_get_sc

[PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Stefan Kanthak via Gcc-patches
The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
is rather bad, it yields bad machine code at least on i386 and AMD64.
Since GCC knows how to shift integers twice the register size these functions
can be written as one-liners.

The implementation of the __bswapsi2() function uses SIGNED instead of
unsigned mask values; cf. __bswapdi2()

Stefan Kanthak

libgcc2.diff
Description: Binary data


Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Eric Botcazou
> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
> GCC knows how to shift integers twice the register size these functions can
> be written as one-liners.

These functions are precisely meant to be used when GCC cannot do that.

-- 
Eric Botcazou




Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 10, 2020 at 06:59:30PM +0100, Stefan Kanthak via Gcc-patches wrote:
> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
> is rather bad, it yields bad machine code at least on i386 and AMD64.
> Since GCC knows how to shift integers twice the register size these functions
> can be written as one-liners.

This looks wrong.  If gcc knows how to do that inline, it will not call the
out of line function at all.  The functions are there for the cases where
gcc can't do that.
So, your patch will shorten the routines on targets where those are never
called, and completely break them on any other (resulting in infinite
recursion there).

Jakub



Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 10, 2020 at 06:59:30PM +0100, Stefan Kanthak via Gcc-patches wrote:
> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
> is rather bad, it yields bad machine code at least on i386 and AMD64.
> Since GCC knows how to shift integers twice the register size these functions
> can be written as one-liners.
> 
> The implementation of the __bswapsi2() function uses SIGNED instead of
> unsigned mask values; cf. __bswapdi2()

The 0xff00 constant is unsigned, the others are signed, but that doesn't
really matter as the high bits are masked off.

Jakub



Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/10/20 11:09 AM, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Nov 10, 2020 at 06:59:30PM +0100, Stefan Kanthak via Gcc-patches 
> wrote:
>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>> is rather bad, it yields bad machine code at least on i386 and AMD64.
>> Since GCC knows how to shift integers twice the register size these functions
>> can be written as one-liners.
> This looks wrong.  If gcc knows how to do that inline, it will not call the
> out of line function at all.  The functions are there for the cases where
> gcc can't do that.
> So, your patch will shorten the routines on targets where those are never
> called, and completely break them on any other (resulting in infinite
> recursion there).

There's a way to test that.  It's possible the expanders know how to
handle these cases now.  I'd be surprised (as I was with the [u]cmpdi2),
but it's possible.  I can throw it into the tester which will spin all
those pesky embedded targets.  If we get infinite recursion it should
show up quite clearly.


Of course, if the expanders now handle all the cases, then these
routines in libgcc2 are dead code and this is a fairly pointless exercise.


jeff



Merge from trunk to gccgo branch

2020-11-10 Thread Ian Lance Taylor via Gcc-patches
I merged trunk revision cf392dbdf17e38026f8e3c0e9af7f5b87f63be56 to
the gccgo branch.

Ian


Re: [PATCH 1/3] Refactor copying decl section names

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/12/19 11:28 PM, Strager Neds wrote:
> * gcc/cgraph.h (symtab_node::get_section): Constify.
> (symtab_node::set_section): Declare new overload.
> * gcc/symtab.c (symtab_node::set_section): Define new overload.
> (symtab_node::copy_visibility_from): Use new overload of
> symtab_node::set_section.
> (symtab_node::resolve_alias): Same.
> * gcc/tree.h (set_decl_section_name): Declare new overload.
> * gcc/tree.c (set_decl_section_name): Define new overload.
> * gcc/c/c-decl.c (merge_decls): Use new overload of
> set_decl_section_name.
> * gcc/cp/decl.c (duplicate_decls): Same.
> * gcc/cp/method.c (use_thunk): Same.
> * gcc/cp/optimize.c (maybe_clone_body): Same.
> * gcc/d/decl.cc (finish_thunk): Same.
> * gcc/tree-emutls.c (get_emutls_init_templ_addr): Same.
> * gcc/cgraphclones.c (cgraph_node::create_virtual_clone): Use new
> overload of symtab_node::set_section.
> (cgraph_node::create_version_clone_with_body): Same.
> * gcc/trans-mem.c (ipa_tm_create_version): Same.

I adjusted the ChangeLog, added an entry for the coroutines.cc addition
I made and pushed this to the trunk.


Thanks for your patience.

Jeff



Re: [PATCH] c++: Improve static_assert diagnostic [PR97518]

2020-11-10 Thread Marek Polacek via Gcc-patches
On Tue, Nov 10, 2020 at 11:32:04AM -0500, Jason Merrill wrote:
> On 11/9/20 7:21 PM, Marek Polacek wrote:
> > Currently, when a static_assert fails, we only say "static assertion 
> > failed".
> > It would be more useful if we could also print the expression that
> > evaluated to false; this is especially useful when the condition uses
> > template parameters.  Consider the motivating example, in which we have
> > this line:
> > 
> >static_assert(is_same::value);
> > 
> > if this fails, the user has to play dirty games to get the compiler to
> > print the template arguments.  With this patch, we say:
> > 
> >static assertion failed due to requirement 'is_same::value'
> 
> I'd rather avoid the word "requirement" here, to avoid confusion with
> Concepts.
> 
> Maybe have the usual failed error, and if we're showing the expression, have
> a second inform to say e.g. "%qE evaluates to false"?

Works for me.

> Also, if we've narrowed the problem down to a particular subexpression,
> let's only print that one.

Done.

> > which I think is much better.  However, always printing the condition that
> > evaluated to 'false' wouldn't be very useful: e.g. noexcept(fn) is
> > always parsed to true/false, so we would say "static assertion failed due
> > to requirement 'false'" which doesn't help.  So I wound up only printing
> > the condition when it was instantiation-dependent, that is, we called
> > finish_static_assert from tsubst_expr.
> > 
> > Moreover, this patch also improves the diagnostic when the condition
> > consists of a logical AND.  Say you have something like this:
> > 
> >static_assert(fn1() && fn2() && fn3() && fn4() && fn5());
> > 
> > where fn4() evaluates to false and the other ones to true.  Highlighting
> > the whole thing is not that helpful because it won't say which clause
> > evaluated to false.  With the find_failing_clause tweak in this patch
> > we emit:
> > 
> >error: static assertion failed
> >  6 | static_assert(fn1() && fn2() && fn3() && fn4() && fn5());
> >|  ~~~^~
> > 
> > so you know right away what's going on.  Unfortunately, when you combine
> > both things, that is, have an instantiation-dependent expr and && in
> > a static_assert, we can't yet quite point to the clause that failed.  It
> > is because when we tsubstitute something like is_same::value, we
> > generate a VAR_DECL that doesn't have any location.  It would be awesome
> > if we could wrap it with a location wrapper, but I didn't see anything
> > obvious.
> 
> Hmm, I vaguely remember that at first we were using location wrappers less
> in templates, but I thought that was fixed.  I don't see anything setting
> suppress_location_wrappers, and it looks like tsubst_copy_and_build should
> preserve a location wrapper.

The problem here is that tsubst_qualified_id produces a VAR_DECL and for those
CAN_HAVE_LOCATION_P is false.  If the user writes (bool) is_same::value,
this works well.  Maybe we could create an artificial (bool) for clauses of
the logical AND operator; decay_conversion in cp_build_binary_op will likely
do it anyway, it just doesn't have any location: cp_build_binary_op takes
an op_location_t, but that doesn't track locations of the operands.  Maybe
some hacks in tsubst_copy_and_build/TRUTH_AND_EXPR would help, I'm not sure.
 
> > Further tweak could be to print the failing clause of the condition if
> > possible, whether or not it was instantiation-dependent.
> 
> If not instantiation-dependent, underlining the clause seems sufficient.

Ack.
 
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Currently, when a static_assert fails, we only say "static assertion failed".
It would be more useful if we could also print the expression that
evaluated to false; this is especially useful when the condition uses
template parameters.  Consider the motivating example, in which we have
this line:

  static_assert(is_same::value);

if this fails, the user has to play dirty games to get the compiler to
print the template arguments.  With this patch, we say:

  error: static assertion failed
  note: 'is_same::value' evaluates to false

which I think is much better.  However, always printing the condition that
evaluated to 'false' wouldn't be very useful: e.g. noexcept(fn) is
always parsed to true/false, so we would say "'false' evaluates to false"
which doesn't help.  So I wound up only printing the condition when it was
instantiation-dependent, that is, we called finish_static_assert from
tsubst_expr.

Moreover, this patch also improves the diagnostic when the condition
consists of a logical AND.  Say you have something like this:

  static_assert(fn1() && fn2() && fn3() && fn4() && fn5());

where fn4() evaluates to false and the other ones to true.  Highlighting
the whole thing is not that helpful because it won't say which clause
evaluated to false.  With the find_failing_clause tweak in this patch
we emit:

  error: static asserti

Re: RFC: Monitoring old PRs, new dg directives [v2]

2020-11-10 Thread Marek Polacek via Gcc-patches
On Tue, Nov 10, 2020 at 03:15:00PM +0100, Thomas Schwinge wrote:
> Hi!
> 
> On 2020-08-10T17:30:21-0400, Marek Polacek via Gcc-patches 
>  wrote:
> > This patch adds a new DejaGNU directive, dg-ice, as outlined in the
> > proposal here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550913.html
> >
> > It means that it's expected that the compiler crashes with an internal
> > compiler error when compiling test with such a directive.
> 
> Thanks, I find this useful.

Glad you find it useful!

> So I have a testcase that currently ICEs...  ;-)
> 
> spawn -ignore SIGHUP [xgcc]
> during GIMPLE pass: omplower
> In file included from [...]:4:
> [...]: In function 'f_data':
> [...]:41:10: internal compiler error: in lower_omp_target, at 
> omp-low.c:11890
> [backtrace]
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> compiler exited with status 1
> 
> I put '{ dg-ice }' into it, and do get the expected:
> 
> XFAIL: [...] (internal compiler error)'
> 
> But I also get an unexpected:
> 
> FAIL: [...] (test for excess errors)'
> 
> That's because of:
> 
> Excess errors:
> during GIMPLE pass: omplower
> 
> I can easily '{ dg-prune-output "during GIMPLE pass: omplower" }', of
> course, or should this lead-in be pruned internally, automatically?

I think prune_ices should prune this automatically.  Want me to submit a patch
for that?

> And then, this is a '{ dg-do run }' testcase, I see:
> 
> UNRESOLVED: 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> compilation failed to produce executable
> 
> Given '{ dg-ice }', it's obvious that we don't get an executable to run.

Yup.

> I thought that maybe '{ dg-ice }' testcases should automatically be
> demoted into '{ dg-do compile }', but that would be wrong for link-time
> ICEs.  But can we "expect" the UNRESOLVED, and turn that into another
> XFAIL?  Aha, actually I see that for '{ dg-do link }', this seems to
> behave as expected (no UNRESOLVED), so we might demote all '{ dg-ice }'
> testcases from '{ dg-do run }' to '{ dg-do link }', if that makes sense?

I think the dg-ice directive shouldn't change whether the test is
dg-do compile/run/link/assemble or something else.  Probably best to
use { dg-do link } if that works, and leave a comment that after fixing
the ICE the test should be turned into dg-do run, if that is what's
desired.

Marek



Re: [PATCH] c++: Improve static_assert diagnostic [PR97518]

2020-11-10 Thread Jason Merrill via Gcc-patches

On 11/10/20 1:59 PM, Marek Polacek wrote:

On Tue, Nov 10, 2020 at 11:32:04AM -0500, Jason Merrill wrote:

On 11/9/20 7:21 PM, Marek Polacek wrote:

Currently, when a static_assert fails, we only say "static assertion failed".
It would be more useful if we could also print the expression that
evaluated to false; this is especially useful when the condition uses
template parameters.  Consider the motivating example, in which we have
this line:

static_assert(is_same::value);

if this fails, the user has to play dirty games to get the compiler to
print the template arguments.  With this patch, we say:

static assertion failed due to requirement 'is_same::value'


I'd rather avoid the word "requirement" here, to avoid confusion with
Concepts.

Maybe have the usual failed error, and if we're showing the expression, have
a second inform to say e.g. "%qE evaluates to false"?


Works for me.


Also, if we've narrowed the problem down to a particular subexpression,
let's only print that one.


Done.


which I think is much better.  However, always printing the condition that
evaluated to 'false' wouldn't be very useful: e.g. noexcept(fn) is
always parsed to true/false, so we would say "static assertion failed due
to requirement 'false'" which doesn't help.  So I wound up only printing
the condition when it was instantiation-dependent, that is, we called
finish_static_assert from tsubst_expr.

Moreover, this patch also improves the diagnostic when the condition
consists of a logical AND.  Say you have something like this:

static_assert(fn1() && fn2() && fn3() && fn4() && fn5());

where fn4() evaluates to false and the other ones to true.  Highlighting
the whole thing is not that helpful because it won't say which clause
evaluated to false.  With the find_failing_clause tweak in this patch
we emit:

error: static assertion failed
  6 | static_assert(fn1() && fn2() && fn3() && fn4() && fn5());
|  ~~~^~

so you know right away what's going on.  Unfortunately, when you combine
both things, that is, have an instantiation-dependent expr and && in
a static_assert, we can't yet quite point to the clause that failed.  It
is because when we tsubstitute something like is_same::value, we
generate a VAR_DECL that doesn't have any location.  It would be awesome
if we could wrap it with a location wrapper, but I didn't see anything
obvious.


Hmm, I vaguely remember that at first we were using location wrappers less
in templates, but I thought that was fixed.  I don't see anything setting
suppress_location_wrappers, and it looks like tsubst_copy_and_build should
preserve a location wrapper.


The problem here is that tsubst_qualified_id produces a VAR_DECL and for those
CAN_HAVE_LOCATION_P is false.


Ah, then perhaps tsubst_qualified_id should call 
maybe_wrap_with_location to preserve the location of the SCOPE_REF.



If the user writes (bool) is_same::value,
this works well.  Maybe we could create an artificial (bool) for clauses of
the logical AND operator; decay_conversion in cp_build_binary_op will likely
do it anyway, it just doesn't have any location: cp_build_binary_op takes
an op_location_t, but that doesn't track locations of the operands.  Maybe
some hacks in tsubst_copy_and_build/TRUTH_AND_EXPR would help, I'm not sure.
  

Further tweak could be to print the failing clause of the condition if
possible, whether or not it was instantiation-dependent.


If not instantiation-dependent, underlining the clause seems sufficient.


Ack.
  
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


-- >8 --
Currently, when a static_assert fails, we only say "static assertion failed".
It would be more useful if we could also print the expression that
evaluated to false; this is especially useful when the condition uses
template parameters.  Consider the motivating example, in which we have
this line:

   static_assert(is_same::value);

if this fails, the user has to play dirty games to get the compiler to
print the template arguments.  With this patch, we say:

   error: static assertion failed
   note: 'is_same::value' evaluates to false

which I think is much better.  However, always printing the condition that
evaluated to 'false' wouldn't be very useful: e.g. noexcept(fn) is
always parsed to true/false, so we would say "'false' evaluates to false"
which doesn't help.  So I wound up only printing the condition when it was
instantiation-dependent, that is, we called finish_static_assert from
tsubst_expr.

Moreover, this patch also improves the diagnostic when the condition
consists of a logical AND.  Say you have something like this:

   static_assert(fn1() && fn2() && fn3() && fn4() && fn5());

where fn4() evaluates to false and the other ones to true.  Highlighting
the whole thing is not that helpful because it won't say which clause
evaluated to false.  With the find_failing_clause tweak in this patch
we emit:

   error: static assertion failed
   

Re: [PATCH] c++: Improve static_assert diagnostic [PR97518]

2020-11-10 Thread Marek Polacek via Gcc-patches
On Tue, Nov 10, 2020 at 02:15:56PM -0500, Jason Merrill wrote:
> On 11/10/20 1:59 PM, Marek Polacek wrote:
> > On Tue, Nov 10, 2020 at 11:32:04AM -0500, Jason Merrill wrote:
> > > On 11/9/20 7:21 PM, Marek Polacek wrote:
> > > > Currently, when a static_assert fails, we only say "static assertion 
> > > > failed".
> > > > It would be more useful if we could also print the expression that
> > > > evaluated to false; this is especially useful when the condition uses
> > > > template parameters.  Consider the motivating example, in which we have
> > > > this line:
> > > > 
> > > > static_assert(is_same::value);
> > > > 
> > > > if this fails, the user has to play dirty games to get the compiler to
> > > > print the template arguments.  With this patch, we say:
> > > > 
> > > > static assertion failed due to requirement 'is_same > > > int>::value'
> > > 
> > > I'd rather avoid the word "requirement" here, to avoid confusion with
> > > Concepts.
> > > 
> > > Maybe have the usual failed error, and if we're showing the expression, 
> > > have
> > > a second inform to say e.g. "%qE evaluates to false"?
> > 
> > Works for me.
> > 
> > > Also, if we've narrowed the problem down to a particular subexpression,
> > > let's only print that one.
> > 
> > Done.
> > 
> > > > which I think is much better.  However, always printing the condition 
> > > > that
> > > > evaluated to 'false' wouldn't be very useful: e.g. noexcept(fn) is
> > > > always parsed to true/false, so we would say "static assertion failed 
> > > > due
> > > > to requirement 'false'" which doesn't help.  So I wound up only printing
> > > > the condition when it was instantiation-dependent, that is, we called
> > > > finish_static_assert from tsubst_expr.
> > > > 
> > > > Moreover, this patch also improves the diagnostic when the condition
> > > > consists of a logical AND.  Say you have something like this:
> > > > 
> > > > static_assert(fn1() && fn2() && fn3() && fn4() && fn5());
> > > > 
> > > > where fn4() evaluates to false and the other ones to true.  Highlighting
> > > > the whole thing is not that helpful because it won't say which clause
> > > > evaluated to false.  With the find_failing_clause tweak in this patch
> > > > we emit:
> > > > 
> > > > error: static assertion failed
> > > >   6 | static_assert(fn1() && fn2() && fn3() && fn4() && fn5());
> > > > |  ~~~^~
> > > > 
> > > > so you know right away what's going on.  Unfortunately, when you combine
> > > > both things, that is, have an instantiation-dependent expr and && in
> > > > a static_assert, we can't yet quite point to the clause that failed.  It
> > > > is because when we tsubstitute something like is_same::value, we
> > > > generate a VAR_DECL that doesn't have any location.  It would be awesome
> > > > if we could wrap it with a location wrapper, but I didn't see anything
> > > > obvious.
> > > 
> > > Hmm, I vaguely remember that at first we were using location wrappers less
> > > in templates, but I thought that was fixed.  I don't see anything setting
> > > suppress_location_wrappers, and it looks like tsubst_copy_and_build should
> > > preserve a location wrapper.
> > 
> > The problem here is that tsubst_qualified_id produces a VAR_DECL and for 
> > those
> > CAN_HAVE_LOCATION_P is false.
> 
> Ah, then perhaps tsubst_qualified_id should call maybe_wrap_with_location to
> preserve the location of the SCOPE_REF.

The SCOPE_REF's location is fine, we just throw it away and return a VAR_DECL.

I think I'll have to extend op_location_t and pass something reasonable from
tsubst_copy_and_build/TRUTH_AND_EXPR to cp_build_binary_op, and use it to set
the operands' locations after decay_conversion :(.

> > + /* See if we can find which clause was failing (for logical AND).  */
> > + tree bad = find_failing_clause (orig_condition);
> > + /* If not, or its location is unusable, fall back to the previous
> > +location.  */
> > + location_t cloc = location;
> > + if (cp_expr_location (bad) != UNKNOWN_LOCATION)
> > +   cloc = cp_expr_location (bad);
> > +
> > /* Report the error. */
> >   if (len == 0)
> > -error ("static assertion failed");
> > +   error_at (cloc, "static assertion failed");
> >   else
> > -error ("static assertion failed: %s",
> > -  TREE_STRING_POINTER (message));
> > +   error_at (cloc, "static assertion failed: %s",
> > + TREE_STRING_POINTER (message));
> > + if (show_expr_p)
> > +   inform (cloc, "%qE evaluates to false",
> > +   /* Nobody wants to see the artifical (bool) cast.  */
> 
> "artificial"
> 
> OK with that typo fixed.

Thanks a lot.

Marek



Re: [PATCH] c++: Improve static_assert diagnostic [PR97518]

2020-11-10 Thread Jason Merrill via Gcc-patches

On 11/10/20 2:28 PM, Marek Polacek wrote:

On Tue, Nov 10, 2020 at 02:15:56PM -0500, Jason Merrill wrote:

On 11/10/20 1:59 PM, Marek Polacek wrote:

On Tue, Nov 10, 2020 at 11:32:04AM -0500, Jason Merrill wrote:

On 11/9/20 7:21 PM, Marek Polacek wrote:

Currently, when a static_assert fails, we only say "static assertion failed".
It would be more useful if we could also print the expression that
evaluated to false; this is especially useful when the condition uses
template parameters.  Consider the motivating example, in which we have
this line:

 static_assert(is_same::value);

if this fails, the user has to play dirty games to get the compiler to
print the template arguments.  With this patch, we say:

 static assertion failed due to requirement 'is_same::value'


I'd rather avoid the word "requirement" here, to avoid confusion with
Concepts.

Maybe have the usual failed error, and if we're showing the expression, have
a second inform to say e.g. "%qE evaluates to false"?


Works for me.


Also, if we've narrowed the problem down to a particular subexpression,
let's only print that one.


Done.


which I think is much better.  However, always printing the condition that
evaluated to 'false' wouldn't be very useful: e.g. noexcept(fn) is
always parsed to true/false, so we would say "static assertion failed due
to requirement 'false'" which doesn't help.  So I wound up only printing
the condition when it was instantiation-dependent, that is, we called
finish_static_assert from tsubst_expr.

Moreover, this patch also improves the diagnostic when the condition
consists of a logical AND.  Say you have something like this:

 static_assert(fn1() && fn2() && fn3() && fn4() && fn5());

where fn4() evaluates to false and the other ones to true.  Highlighting
the whole thing is not that helpful because it won't say which clause
evaluated to false.  With the find_failing_clause tweak in this patch
we emit:

 error: static assertion failed
   6 | static_assert(fn1() && fn2() && fn3() && fn4() && fn5());
 |  ~~~^~

so you know right away what's going on.  Unfortunately, when you combine
both things, that is, have an instantiation-dependent expr and && in
a static_assert, we can't yet quite point to the clause that failed.  It
is because when we tsubstitute something like is_same::value, we
generate a VAR_DECL that doesn't have any location.  It would be awesome
if we could wrap it with a location wrapper, but I didn't see anything
obvious.


Hmm, I vaguely remember that at first we were using location wrappers less
in templates, but I thought that was fixed.  I don't see anything setting
suppress_location_wrappers, and it looks like tsubst_copy_and_build should
preserve a location wrapper.


The problem here is that tsubst_qualified_id produces a VAR_DECL and for those
CAN_HAVE_LOCATION_P is false.


Ah, then perhaps tsubst_qualified_id should call maybe_wrap_with_location to
preserve the location of the SCOPE_REF.


The SCOPE_REF's location is fine, we just throw it away and return a VAR_DECL.


Yes, I'm saying that tsubst_qualified_id should extract the location 
from the SCOPE_REF and pass it to maybe_wrap_with_location to give the 
VAR_DECL a location wrapper.


Jason



[PATCH v2] C-family : Add attribute 'unavailable'.

2020-11-10 Thread Iain Sandoe
Hi Jospeh,

Joseph Myers  wrote:

> This patch seems to be missing documentation for the new attribute in 
> extend.texi.

Apologies, for that omission, revised patch includes the documentation and
also addresses Richi’s comments.

documentation patch tested with “make pdf” and visual inspection of the output.

OK now?
thanks
Iain

—— commit message.

If an interface is marked 'deprecated' then, presumably, at some point it
will be withdrawn and no longer available.  The 'unavailable' attribute
makes it possible to mark up interfaces to indicate this status.  It is used
quite extensively in some codebases where a single set of headers can be used
to permit code generation for multiple system versions.

>From a configuration perspective, it also allows a compile test to determine
that an interface is missing - rather than requiring a link test.

The implementation follows the pattern of attribute deprecated, but produces
an error (where deprecation produces a warning).

This attribute has been implemented in clang for some years.

gcc/c-family/ChangeLog:

* c-attribs.c (handle_unavailable_attribute): New.

gcc/c/ChangeLog:

* c-decl.c (enum deprecated_states): Add unavailable state.
(merge_decls): Copy unavailability.
(quals_from_declspecs): Handle unavailable case.
(start_decl): Amend the logic handling suppression of nested
deprecation states to include unavailability.
(smallest_type_quals_location): Amend comment.
(grokdeclarator): Handle the unavailable deprecation state.
(declspecs_add_type): Set TREE_UNAVAILABLE from the decl specs.
* c-tree.h (struct c_declspecs): Add unavailable_p.
* c-typeck.c (build_component_ref): Handle unavailability.
(build_external_ref): Likewise.

gcc/cp/ChangeLog:

* call.c (build_over_call): Handle unavailable state in addition to
deprecation.
* class.c (type_build_ctor_call): Likewise.
(type_build_dtor_call): Likewise.
* cp-tree.h: Rename cp_warn_deprecated_use to
cp_handle_deprecated_or_unavailable.
* decl.c (duplicate_decls): Merge unavailability.
(grokdeclarator): Handle unavailability in addition to deprecation.
(type_is_unavailable): New.
(grokparms): Handle unavailability in addition to deprecation.
* decl.h (enum deprecated_states): Add
UNAVAILABLE_DEPRECATED_SUPPRESS.
* decl2.c (cplus_decl_attributes): Propagate unavailability to
templates.
(cp_warn_deprecated_use): Rename to ...
(cp_handle_deprecated_or_unavailable): ... this and amend to handle
the unavailable case. It remains a warning in the case of deprecation
but becomes an error in the case of unavailability.
(cp_warn_deprecated_use_scopes): Handle unavailability.
(mark_used): Likewise.
* parser.c (cp_parser_template_name): Likewise.
(cp_parser_template_argument): Likewise.
(cp_parser_parameter_declaration_list): Likewise.
* typeck.c (build_class_member_access_expr): Likewise.
(finish_class_member_access_expr): Likewise.
* typeck2.c (build_functional_cast_1): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document unavailable attribute.
* print-tree.c (print_node): Handle unavailable attribute.
* tree-core.h (struct tree_base): Add a bit to carry unavailability.
* tree.c (error_unavailable_use): New.
* tree.h (TREE_UNAVAILABLE): New.
(error_unavailable_use): New.

gcc/objc/ChangeLog:

* objc-act.c (objc_add_property_declaration): Register unavailable
attribute.
(maybe_make_artificial_property_decl): Set available.
(objc_maybe_build_component_ref): Generalise to the method prototype
to count availability.
(objc_build_class_component_ref): Likewise.
(build_private_template): Likewise.
(objc_decl_method_attributes): Handle unavailable attribute.
(lookup_method_in_hash_lists): Amend comments.
(objc_finish_message_expr): Handle unavailability in addition to
deprecation.
(start_class): Likewise.
(finish_class): Likewise.
(lookup_protocol): Likewise.
(objc_declare_protocol): Likewise.
(start_protocol): Register unavailable attribute.
(really_start_method): Likewise.
(objc_gimplify_property_ref): Emit error on encountering an
unavailable entity (and a warning for a deprecated one).

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-unavailable-1.C: New test.
* g++.dg/ext/attr-unavailable-2.C: New test.
* g++.dg/ext/attr-unavailable-3.C: New test.
* g++.dg/ext/attr-unavailable-4.C: New test.
* g++.dg/ext/attr-unavailable-5.C: New test.
* g++.dg/ext/attr-unavailable-6.C: New test.
* g++.dg/ext/attr-unavailable-7.C: New test.
* g++.dg/ext/attr-unavailable-8.C: New tes

[committed] libstdc++: Reorder constructors in

2020-11-10 Thread Jonathan Wakely via Gcc-patches
This groups all the constructors together, consistent with the synopses
in the C++20 standard.

libstdc++-v3/ChangeLog:

* include/std/sstream (basic_stringbug, basic_istringstream)
(basic_ostringstream, basic_stringstream): Reorder C++20
constructors to be declared next to other constructors.

Tested powerpc64le-linux. Committed to trunk.

commit f7c41c572bbc16d852104515e506936d447debbe
Author: Jonathan Wakely 
Date:   Tue Nov 10 19:12:03 2020

libstdc++: Reorder constructors in 

This groups all the constructors together, consistent with the synopses
in the C++20 standard.

libstdc++-v3/ChangeLog:

* include/std/sstream (basic_stringbug, basic_istringstream)
(basic_ostringstream, basic_stringstream): Reorder C++20
constructors to be declared next to other constructors.

diff --git a/libstdc++-v3/include/std/sstream b/libstdc++-v3/include/std/sstream
index 8cddda297017..d7200ab6ed88 100644
--- a/libstdc++-v3/include/std/sstream
+++ b/libstdc++-v3/include/std/sstream
@@ -149,37 +149,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   : basic_stringbuf(std::move(__rhs), __xfer_bufptrs(__rhs, this))
   { __rhs._M_sync(const_cast(__rhs._M_string.data()), 0, 0); }
 
-  // 27.8.2.2 Assign and swap:
-
-  basic_stringbuf&
-  operator=(const basic_stringbuf&) = delete;
-
-  basic_stringbuf&
-  operator=(basic_stringbuf&& __rhs)
-  {
-   __xfer_bufptrs __st{__rhs, this};
-   const __streambuf_type& __base = __rhs;
-   __streambuf_type::operator=(__base);
-   this->pubimbue(__rhs.getloc());
-   _M_mode = __rhs._M_mode;
-   _M_string = std::move(__rhs._M_string);
-   __rhs._M_sync(const_cast(__rhs._M_string.data()), 0, 0);
-   return *this;
-  }
-
-  void
-  swap(basic_stringbuf& __rhs) noexcept(_Noexcept_swap::value)
-  {
-   __xfer_bufptrs __l_st{*this, std::__addressof(__rhs)};
-   __xfer_bufptrs __r_st{__rhs, this};
-   __streambuf_type& __base = __rhs;
-   __streambuf_type::swap(__base);
-   __rhs.pubimbue(this->pubimbue(__rhs.getloc()));
-   std::swap(_M_mode, __rhs._M_mode);
-   std::swap(_M_string, __rhs._M_string); // XXX not exception safe
-  }
-#endif // C++11
-
 #if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI
   explicit
   basic_stringbuf(const allocator_type& __a)
@@ -226,7 +195,38 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   allocator_type get_allocator() const noexcept
   { return _M_string.get_allocator(); }
-#endif
+#endif // C++20
+
+  // 27.8.2.2 Assign and swap:
+
+  basic_stringbuf&
+  operator=(const basic_stringbuf&) = delete;
+
+  basic_stringbuf&
+  operator=(basic_stringbuf&& __rhs)
+  {
+   __xfer_bufptrs __st{__rhs, this};
+   const __streambuf_type& __base = __rhs;
+   __streambuf_type::operator=(__base);
+   this->pubimbue(__rhs.getloc());
+   _M_mode = __rhs._M_mode;
+   _M_string = std::move(__rhs._M_string);
+   __rhs._M_sync(const_cast(__rhs._M_string.data()), 0, 0);
+   return *this;
+  }
+
+  void
+  swap(basic_stringbuf& __rhs) noexcept(_Noexcept_swap::value)
+  {
+   __xfer_bufptrs __l_st{*this, std::__addressof(__rhs)};
+   __xfer_bufptrs __r_st{__rhs, this};
+   __streambuf_type& __base = __rhs;
+   __streambuf_type::swap(__base);
+   __rhs.pubimbue(this->pubimbue(__rhs.getloc()));
+   std::swap(_M_mode, __rhs._M_mode);
+   std::swap(_M_string, __rhs._M_string); // XXX not exception safe
+  }
+#endif // C++11
 
   // Getters and setters:
 
@@ -282,7 +282,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
else
  return _M_string;
   }
-#endif
+#endif // C++20
 
   /**
*  @brief  Setting a new buffer.
@@ -513,7 +513,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _M_mode(__rhs._M_mode), _M_string(std::move(__rhs._M_string), __a)
   { }
 #endif
-#endif
+#endif // C++11
 };
 
 
@@ -623,27 +623,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _M_stringbuf(std::move(__rhs._M_stringbuf))
   { __istream_type::set_rdbuf(&_M_stringbuf); }
 
-  // 27.8.3.2 Assign and swap:
-
-  basic_istringstream&
-  operator=(const basic_istringstream&) = delete;
-
-  basic_istringstream&
-  operator=(basic_istringstream&& __rhs)
-  {
-   __istream_type::operator=(std::move(__rhs));
-   _M_stringbuf = std::move(__rhs._M_stringbuf);
-   return *this;
-  }
-
-  void
-  swap(basic_istringstream& __rhs)
-  {
-   __istream_type::swap(__rhs);
-   _M_stringbuf.swap(__rhs._M_stringbuf);
-  }
-#endif
-
 #if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI
   basic_istringstream(ios_base::openmode __mode, const allocator_type& __a)
   : __istream_type(), _M_stringbuf(__mode | ios_base::in, __a)
@@ -674,7 +653,28 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
ios_base::openmode __mode = ios_base::in)
   

[committed] libstdc++: Fix more unspecified comparisons to null pointer [PR 97415]

2020-11-10 Thread Jonathan Wakely via Gcc-patches
This adds some more null checks to avoid a relational comparison with a
null pointer, similar to 78198b6021a9695054dab039340202170b88423c.

libstdc++-v3/ChangeLog:

PR libstdc++/97415
* include/std/sstream (basic_stringbuf::_M_update_egptr)
(basic_stringbuf::__xfer_bufptrs::__xfer_bufptrs): Check for
null before comparing pointers.

Tested powerpc64le-linux. Committed to trunk.

commit ced70ebaa372945ec8d73703d81e4a10d6d51c9b
Author: Jonathan Wakely 
Date:   Tue Nov 10 15:46:02 2020

libstdc++: Fix more unspecified comparisons to null pointer [PR 97415]

This adds some more null checks to avoid a relational comparison with a
null pointer, similar to 78198b6021a9695054dab039340202170b88423c.

libstdc++-v3/ChangeLog:

PR libstdc++/97415
* include/std/sstream (basic_stringbuf::_M_update_egptr)
(basic_stringbuf::__xfer_bufptrs::__xfer_bufptrs): Check for
null before comparing pointers.

diff --git a/libstdc++-v3/include/std/sstream b/libstdc++-v3/include/std/sstream
index 437e2ba2a5f8..9c50e4e83281 100644
--- a/libstdc++-v3/include/std/sstream
+++ b/libstdc++-v3/include/std/sstream
@@ -357,13 +357,16 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   void
   _M_update_egptr()
   {
-   const bool __testin = _M_mode & ios_base::in;
-   if (this->pptr() && this->pptr() > this->egptr())
+   if (char_type* __pptr = this->pptr())
  {
-   if (__testin)
- this->setg(this->eback(), this->gptr(), this->pptr());
-   else
- this->setg(this->pptr(), this->pptr(), this->pptr());
+   char_type* __egptr = this->egptr();
+   if (!__egptr || __pptr > __egptr)
+ {
+   if (_M_mode & ios_base::in)
+ this->setg(this->eback(), this->gptr(), __pptr);
+   else
+ this->setg(__pptr, __pptr, __pptr);
+ }
  }
   }
 
@@ -396,7 +399,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  _M_poff[0] = __from.pbase() - __str;
  _M_poff[1] = __from.pptr() - __from.pbase();
  _M_poff[2] = __from.epptr() - __str;
- if (__from.pptr() > __end)
+ if (!__end || __from.pptr() > __end)
__end = __from.pptr();
}
 


[committed] libstdc++: Add remaining C++20 additions to [P0408R7]

2020-11-10 Thread Jonathan Wakely via Gcc-patches
This adds the new overloads of basic_stringbuf::str, and the
corresponding overloads to basic_istringstream, basic_ostringstream and
basic_stringstream.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver (GLIBCXX_3.4.21): Tighten patterns.
(GLIBCXX_3.4.29): Export new symbols.
* include/bits/alloc_traits.h (__allocator_like): New concept.
* include/std/sstream (basic_stringbuf::swap): Add exception
specification.
(basic_stringbuf::str() const): Add ref-qualifier. Use new
_M_high_mark function.
(basic_stringbuf::str(const SAlloc&) const): Define new function.
(basic_stringbuf::str() &&): Likewise.
(basic_stringbuf::str(const basic_string&)):
Likewise.
(basic_stringbuf::str(basic_string&&)): Likewise.
(basic_stringbuf::view() const): Use _M_high_mark.
(basic_istringstream::str, basic_ostringstream::str)
(basic_stringstream::str): Define new overloads.
* src/c++20/sstream-inst.cc (basic_stringbuf::str)
(basic_istringstream::str, basic_ostringstream::str)
(basic_stringstream::str): Explicit instantiation definitions
for new overloads.
* testsuite/27_io/basic_istringstream/view/char/1.cc: Add more
checks.
* testsuite/27_io/basic_istringstream/view/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_ostringstream/view/char/1.cc:
Likewise.
* testsuite/27_io/basic_ostringstream/view/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_stringstream/view/char/1.cc:
Likewise.
* testsuite/27_io/basic_stringstream/view/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_istringstream/str/char/2.cc: New test.
* testsuite/27_io/basic_istringstream/str/wchar_t/2.cc: New test.
* testsuite/27_io/basic_ostringstream/str/char/3.cc: New test.
* testsuite/27_io/basic_ostringstream/str/wchar_t/3.cc: New test.
* testsuite/27_io/basic_stringbuf/str/char/4.cc: New test.
* testsuite/27_io/basic_stringbuf/str/wchar_t/4.cc: New test.
* testsuite/27_io/basic_stringstream/str/char/5.cc: New test.
* testsuite/27_io/basic_stringstream/str/wchar_t/5.cc.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 95cb0fc8c51841cc6a0e51490cb3769eb80fa34c
Author: Jonathan Wakely 
Date:   Tue Nov 10 15:57:04 2020

libstdc++: Add remaining C++20 additions to  [P0408R7]

This adds the new overloads of basic_stringbuf::str, and the
corresponding overloads to basic_istringstream, basic_ostringstream and
basic_stringstream.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver (GLIBCXX_3.4.21): Tighten patterns.
(GLIBCXX_3.4.29): Export new symbols.
* include/bits/alloc_traits.h (__allocator_like): New concept.
* include/std/sstream (basic_stringbuf::swap): Add exception
specification.
(basic_stringbuf::str() const): Add ref-qualifier. Use new
_M_high_mark function.
(basic_stringbuf::str(const SAlloc&) const): Define new function.
(basic_stringbuf::str() &&): Likewise.
(basic_stringbuf::str(const basic_string&)):
Likewise.
(basic_stringbuf::str(basic_string&&)): Likewise.
(basic_stringbuf::view() const): Use _M_high_mark.
(basic_istringstream::str, basic_ostringstream::str)
(basic_stringstream::str): Define new overloads.
* src/c++20/sstream-inst.cc (basic_stringbuf::str)
(basic_istringstream::str, basic_ostringstream::str)
(basic_stringstream::str): Explicit instantiation definitions
for new overloads.
* testsuite/27_io/basic_istringstream/view/char/1.cc: Add more
checks.
* testsuite/27_io/basic_istringstream/view/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_ostringstream/view/char/1.cc:
Likewise.
* testsuite/27_io/basic_ostringstream/view/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_stringstream/view/char/1.cc:
Likewise.
* testsuite/27_io/basic_stringstream/view/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_istringstream/str/char/2.cc: New test.
* testsuite/27_io/basic_istringstream/str/wchar_t/2.cc: New test.
* testsuite/27_io/basic_ostringstream/str/char/3.cc: New test.
* testsuite/27_io/basic_ostringstream/str/wchar_t/3.cc: New test.
* testsuite/27_io/basic_stringbuf/str/char/4.cc: New test.
* testsuite/27_io/basic_stringbuf/str/wchar_t/4.cc: New test.
* testsuite/27_io/basic_stringstream/str/char/5.cc: New test.
* testsuite/27_io/basic_stringstream/str/wchar_t/5.cc.cc: New test.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/co

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Stefan Kanthak
Eric Botcazou  wrote:

>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
>> GCC knows how to shift integers twice the register size these functions can
>> be written as one-liners.
> 
> These functions are precisely meant to be used when GCC cannot do that.

On which processor(s) is GCC unable to generate code for DWtype shifts?

JFTR: if GCC were not able to generate code for DWtype addition and subtraction
  it would also need __[u]addDI3() and __[u]subDI3() functions ... which
  are but missing from libgcc.a

Stefan Kanthak


[pushed] c++: Add 5 unfixed tests.

2020-11-10 Thread Marek Polacek via Gcc-patches
A couple of dg-ice tests.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/testsuite/ChangeLog:

PR c++/52830
PR c++/88982
PR c++/90799
PR c++/87765
PR c++/89565
* g++.dg/cpp0x/constexpr-52830.C: New test.
* g++.dg/cpp0x/vt-88982.C: New test.
* g++.dg/cpp1z/class-deduction76.C: New test.
* g++.dg/cpp1z/constexpr-lambda26.C: New test.
* g++.dg/cpp2a/nontype-class39.C: New test.
---
 gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C  | 37 +++
 gcc/testsuite/g++.dg/cpp0x/vt-88982.C | 14 +++
 .../g++.dg/cpp1z/class-deduction76.C  | 25 +
 .../g++.dg/cpp1z/constexpr-lambda26.C | 13 +++
 gcc/testsuite/g++.dg/cpp2a/nontype-class39.C  | 12 ++
 5 files changed, 101 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/vt-88982.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction76.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-lambda26.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class39.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
new file mode 100644
index 000..2c9d2f9b329
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
@@ -0,0 +1,37 @@
+// PR c++/52830
+// { dg-do compile { target c++11 } }
+// { dg-ice "comptypes" }
+
+template struct eif { typedef void type; };
+template<>   struct eif {};
+
+template struct same
+{
+  static constexpr bool value = false;
+};
+template
+struct same
+{
+  static constexpr bool value = true;
+};
+
+
+struct foo {
+  template
+  void func(T && a,
+typename eif::value>::type * = 0);
+};
+
+template
+void
+foo::
+func(T && a,
+ typename eif::value>::type * )
+{
+}
+
+void do_stuff()
+{
+  foo f;
+  f.func(12);
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/vt-88982.C 
b/gcc/testsuite/g++.dg/cpp0x/vt-88982.C
new file mode 100644
index 000..cb9530dcee1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/vt-88982.C
@@ -0,0 +1,14 @@
+// PR c++/88982
+// { dg-do compile { target c++11 } }
+// { dg-ice "tsubst_pack_expansion" }
+
+template struct A {
+  template class ...Cs, Cs ...Vs> struct B {
+B() {
+}
+  };
+};
+
+template using Int = int;
+template using Char = char;
+A::B b;
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction76.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction76.C
new file mode 100644
index 000..23bb6e8fa9a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction76.C
@@ -0,0 +1,25 @@
+// PR c++/90799
+// { dg-do compile { target c++17 } }
+// { dg-ice "unify" }
+
+template
+void foo() noexcept(T::value);
+
+struct S {
+static constexpr const bool value = true;
+
+template
+void bar() noexcept(T::value);
+};
+
+template
+constexpr bool is_noexcept_function(void(Args...) noexcept(is_noexcept)) 
noexcept {
+return is_noexcept;
+}
+
+template
+constexpr bool is_noexcept_member_function(void(S::*)(Args...) 
noexcept(is_noexcept)) noexcept {
+return is_noexcept;
+}
+
+static_assert(is_noexcept_function(foo));
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda26.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda26.C
new file mode 100644
index 000..d6c8bae525f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda26.C
@@ -0,0 +1,13 @@
+// PR c++/87765
+// { dg-do compile { target c++17 } }
+// { dg-ice "cxx_eval_constant_expression" }
+
+template 
+using foo = int;
+
+struct A {
+  constexpr int bar() const { return 42; }
+};
+
+void baz(A a) {
+  [=](auto c) { return foo { }; }; }
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C
new file mode 100644
index 000..f5f79a71ec2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C
@@ -0,0 +1,12 @@
+// PR c++/89565
+// { dg-do compile { target c++20 } }
+// { dg-ice "resolve_args" }
+
+template 
+struct N{};
+
+template 
+struct S {};
+
+template 
+using NS = S;

base-commit: 8b9a92f794b8ad8011e6beb11a609efa635c4600
-- 
2.28.0



Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
> Eric Botcazou  wrote:
> 
> >> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() 
> >> functions
> >> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
> >> GCC knows how to shift integers twice the register size these functions can
> >> be written as one-liners.
> > 
> > These functions are precisely meant to be used when GCC cannot do that.
> 
> On which processor(s) is GCC unable to generate code for DWtype shifts?

E.g. avr-none, msp430-elf, pdp11-aout.
And I see recursive __cmpdi2 calls on avr-none too.

Jakub



Re: [PATCH] Practical Improvement to libgcc Complex Divide

2020-11-10 Thread Patrick McGehearty via Gcc-patches

2nd ping - submitted Sept 8, 2020, last comment Sept 9, 2020
This patch is a correctness bug fix and not a change to any user
visible interface.

It makes a major improvement to complex divide accuracy, including
algorithm improvements known for 8 years. It is highly desirable
to get these changes into gcc11 and not wait another year.

In recent years there have been substantial improvements in mathematical
functions in glibc and gcc. I'm hoping to continue those contributions
with the goal of making glibc/gcc the "go to" solution for portable
mathematical software development. Substantial delays in getting
reviews makes it more difficult to get management backing for
additional mathematical improvement projects.

Also, if reviewer has any unexpected questions about design decisions,
it would be helpful to ask them while I still remember why I made those
choices.

- Patrick McGehearty
patrick.mcgehea...@oracle.com


On 10/13/2020 9:54 AM, Patrick McGehearty via Gcc-patches wrote:

Ping - still need review of version 4 of this patch.
It has been over a month since the last comment.

- patrick



On 9/9/2020 2:13 AM, Richard Biener wrote:

On Tue, Sep 8, 2020 at 8:50 PM Patrick McGehearty via Gcc-patches
 wrote:

(Version 4)

(Added in version 4)
Fixed Changelog entry to include __divsc3, __divdc3, __divxc3, 
__divtc3.

Revised description to avoid incorrect use of "ulp (units last place)".
Modified float precison case to use double precision when double
precision hardware is available. Otherwise float uses the new 
algorithm.

Added code to scale subnormal numerator arguments when appropriate.
This change reduces 16 bit errors in double precision by a factor of 
140.

Revised results charts to match current version of code.
Added background of tuning approach.

Summary of Purpose

The following patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small 
exponents.


The current code correctly implements Smith's method (1962) [2]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

Background

This project started with an investigation related to
https://urldefense.com/v3/__https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714__;!!GqivPVa7Brio!NjjEhnQQ38VNyP_v8nlAm9uVjvZldUnobfY5hZdq22cMMVauop64MFw3nOHIQXUmy8PToRw$ 
.  Study of Beebe[1]

provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

I developed two sets of test set by randomly distributing values over
a restricted range and the full range of input values. The current
complex divide handled the restricted range well enough, but failed on
the full range more than 1% of the time. Baudin and Smith's primary
test for "ratio" equals zero reduced the cases with 16 or more error
bits by a factor of 5, but still left too many flawed answers. Adding
debug print out to cases with substantial errors allowed me to see the
intermediate calculations for test values that failed. I noted that
for many of the failures, "ratio" was a subnormal. Changing the
"ratio" test from check for zero to check for subnormal reduced the 16
bit error rate by another factor of 12. This single modified test
provides the greatest benefit for the least cost, but the percentage
of cases with greater than 16 bit errors (double precision data) is
still greater than 0.027% (2.7 in 10,000).

Continued examination of remaining errors and their intermediate
computations led to the various tests of input value tests and scaling
to avoid under/overflow. The current patch does not handle some of the
rarest and most extreme combinations of input values, but the random
test data is only showing 1 case in 10 million that has an error of
greater than 12 bits. That case has 18 bits of error and is due to
subtraction cancellation. These results are significantly better
than the results reported by Baudin and Smith.

Support for half, float, double, extended, and long double precision
is included as all ar

Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-11-10 Thread Qing Zhao via Gcc-patches
Jakub and Jeff,

PING^7 on the following patch proposed 8 months ago for gcc11:

https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542198.html 


The deadline for gcc11 stage 1 is approaching.  The pinged patch is one that 
has been sent for review 8 months ago in order to 
Make into gcc11. 

And this is an important feature that our company is waiting for a long time. 

Could you please take a look at this patch and let us know whether it’s ready 
for commit into gcc11? 

Thanks a lot.

Qing




> On Oct 27, 2020, at 5:56 AM, Martin Liška  wrote:
> 
> PING^6
> 
> The patch is in review process for more than 6 months, can please any global
> reviewer take a look at it?
> 
> Thanks,
> Martin
> 
> On 9/25/20 4:55 PM, Martin Liška wrote:
>> PING^5
> 



Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/10/20 1:14 PM, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
>> Eric Botcazou  wrote:
>>
 The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() 
 functions
 is rather bad, it yields bad machine code at least on i386 and AMD64. Since
 GCC knows how to shift integers twice the register size these functions can
 be written as one-liners.
>>> These functions are precisely meant to be used when GCC cannot do that.
>> On which processor(s) is GCC unable to generate code for DWtype shifts?
> E.g. avr-none, msp430-elf, pdp11-aout.
> And I see recursive __cmpdi2 calls on avr-none too.

ACK.  I'll pull those [u]cmpdi changes.  They were iffy at best, this
confirms the concerns we both had.

jeff



Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 10, 2020 at 02:09:20PM -0700, Jeff Law wrote:
> 
> On 11/10/20 1:14 PM, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
> >> Eric Botcazou  wrote:
> >>
>  The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() 
>  functions
>  is rather bad, it yields bad machine code at least on i386 and AMD64. 
>  Since
>  GCC knows how to shift integers twice the register size these functions 
>  can
>  be written as one-liners.
> >>> These functions are precisely meant to be used when GCC cannot do that.
> >> On which processor(s) is GCC unable to generate code for DWtype shifts?
> > E.g. avr-none, msp430-elf, pdp11-aout.
> > And I see recursive __cmpdi2 calls on avr-none too.
> 
> ACK.  I'll pull those [u]cmpdi changes.  They were iffy at best, this
> confirms the concerns we both had.

To be precise, I haven't tried to build libgcc for all of those, just
checked what code is produced for
unsigned long long foo (unsigned long long x) { return x << 3; }
etc. by all trunk cross-compilers I have lying around.
It might be that their __ashlDI3 etc. are actually __ashlsi3 and they
have some other implementation of __ashldi3 etc.

Anyway, I'm not sure if we really need to optimize too much functions which
are never used; and the -ftrapv stuff should one day be reimplemented using
-fsanitize=signed-integer-overflow with an abort behavior on overflows.

libgcc functions which are used heavily of course should be optimized as
much as possible.

Jakub



PING: Re: [PATCH] Add analyzer plugin support and CPython GIL example

2020-11-10 Thread David Malcolm via Gcc-patches
I'd like to ping this patch:
  https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556214.html
Are the non-analyzer parts OK for master?

Thanks
Dave

On Wed, 2020-10-14 at 21:21 -0400, David Malcolm wrote:
> This patch adds a new GCC plugin event: PLUGIN_ANALYZER_INIT, called
> when -fanalyzer is starting, allowing for GCC plugins to register
> additional state-machine-based checks within -fanalyzer.  The idea
> is that 3rd-party code might want to add domain-specific checks for
> its own APIs - with the caveat that the analyzer is itself still
> rather experimental.
> 
> As an example, the patch adds a proof-of-concept plugin to the
> testsuite
> for checking CPython code: verifying that code that relinquishes
> CPython's global interpreter lock doesn't attempt to do anything with
> PyObjects in the sections where the lock isn't held.  It also adds a
> warning about nested releases of the lock, which is forbidden.
> For example:
> 
> demo.c: In function 'foo':
> demo.c:11:3: warning: use of PyObject '*(obj)' without the GIL
>11 |   Py_INCREF (obj);
>   |   ^
>   'test': events 1-3
> |
> |   15 | void test (PyObject *obj)
> |  |  ^~~~
> |  |  |
> |  |  (1) entry to 'test'
> |   16 | {
> |   17 |   Py_BEGIN_ALLOW_THREADS
> |  |   ~~
> |  |   |
> |  |   (2) releasing the GIL here
> |   18 |   foo (obj);
> |  |   ~
> |  |   |
> |  |   (3) calling 'foo' from 'test'
> |
> +--> 'foo': events 4-5
>|
>|9 | foo (PyObject *obj)
>|  | ^~~
>|  | |
>|  | (4) entry to 'foo'
>|   10 | {
>|   11 |   Py_INCREF (obj);
>|  |   ~
>|  |   |
>|  |   (5) PyObject '*(obj)' used here without the GIL
>|
> 
> Doing so requires adding some logic for ignoring macro expansions in
> analyzer diagnostics, since the insides of Py_INCREF and
> Py_BEGIN_ALLOW_THREADS are not of interest to the user for these
> cases.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> Are the non-analyzer parts OK for master?
> 
> gcc/analyzer/ChangeLog:
>   * analyzer-pass.cc (pass_analyzer::execute): Move sorry call
> to...
>   (sorry_no_analyzer): New.
>   * analyzer.h (class state_machine): New forward decl.
>   (class logger): New forward decl.
>   (class plugin_analyzer_init_iface): New.
>   (sorry_no_analyzer): New decl.
>   * checker-path.cc (checker_path::fixup_locations): New.
>   * checker-path.h (checker_event::set_location): New.
>   (checker_path::fixup_locations): New decl.
>   * diagnostic-manager.cc
>   (diagnostic_manager::emit_saved_diagnostic): Call
>   checker_path::fixup_locations, and call fixup_location
>   on the primary location.
>   * engine.cc: Include "plugin.h".
>   (class plugin_analyzer_init_impl): New.
>   (impl_run_checkers): Invoke PLUGIN_ANALYZER_INIT callbacks.
>   * pending-diagnostic.h (pending_diagnostic::fixup_location):
> New
>   vfunc.
> 
> gcc/ChangeLog:
>   * doc/plugins.texi (Plugin callbacks): Add
> PLUGIN_ANALYZER_INIT.
>   * plugin.c (register_callback): Likewise.
>   (invoke_plugin_callbacks_full): Likewise.
>   * plugin.def (PLUGIN_ANALYZER_INIT): New event.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.dg/plugin/analyzer_gil_plugin.c: New test.
>   * gcc.dg/plugin/gil-1.c: New test.
>   * gcc.dg/plugin/gil.h: New header.
>   * gcc.dg/plugin/plugin.exp (plugin_test_list): Add the new
> plugin
>   and test.
> ---
>  gcc/analyzer/analyzer-pass.cc |  18 +-
>  gcc/analyzer/analyzer.h   |  15 +
>  gcc/analyzer/checker-path.cc  |   9 +
>  gcc/analyzer/checker-path.h   |   4 +
>  gcc/analyzer/diagnostic-manager.cc|   9 +-
>  gcc/analyzer/engine.cc|  31 ++
>  gcc/analyzer/pending-diagnostic.h |   8 +
>  gcc/doc/plugins.texi  |   4 +
>  gcc/plugin.c  |   2 +
>  gcc/plugin.def|   4 +
>  .../gcc.dg/plugin/analyzer_gil_plugin.c   | 436
> ++
>  gcc/testsuite/gcc.dg/plugin/gil-1.c   |  90 
>  gcc/testsuite/gcc.dg/plugin/gil.h |  32 ++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp|   2 +
>  14 files changed, 660 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/analyzer_gil_plugin.c
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/gil-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/gil.h
> 
> diff --git a/gcc/analyzer/analyzer-pass.cc b/gcc/analyzer/analyzer-
> pass.cc
> index a27421e46d4..1f65bf8b154 100644
> --- a/gcc/analyzer/analyzer-pass.cc
> +++ b/gcc/analyzer/analyzer-pass.cc
> @@ -83,9 +8

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/10/20 12:44 PM, Stefan Kanthak wrote:
> Eric Botcazou  wrote:
>
>>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
>>> GCC knows how to shift integers twice the register size these functions can
>>> be written as one-liners.
>> These functions are precisely meant to be used when GCC cannot do that.
> On which processor(s) is GCC unable to generate code for DWtype shifts?

We'd have to look at them all, and it's just not worth the headache.  
Just to pick one off the top of my head that I know well would be the H8
which only has QI, HI and SI shifts/rotates.  I'd expect inherent
support for double-word shifts to be the exception rather than the rule
for 32bit embedded targets.


>
> JFTR: if GCC were not able to generate code for DWtype addition and 
> subtraction
>   it would also need __[u]addDI3() and __[u]subDI3() functions ... which
>   are but missing from libgcc.a

Umm, no.  Non-overflow trapping addition/subtraction needs just one
version, signedness doesn't matter.


jeff



[r11-4883 Regression] FAIL: gfortran.dg/gomp/workshare-reduction-57.f90 -O scan-tree-dump-times optimized "__builtin_GOMP_loop_dynamic_next " 1 on Linux/x86_64

2020-11-10 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

e929ef532ad52cde873dfc0849907b020ffc5afd is the first bad commit
commit e929ef532ad52cde873dfc0849907b020ffc5afd
Author: Tobias Burnus 
Date:   Tue Nov 10 18:28:18 2020 +0100

Fortran: OpenMP 5.0 (in_,task_)reduction clause extensions

caused

FAIL: gfortran.dg/gomp/workshare-reduction-26.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_maybe_nonmonotonic_runtime_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-26.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 0, 0, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-27.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_runtime_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-27.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, (?:2147483648|-2147483648), 0, " 
1
FAIL: gfortran.dg/gomp/workshare-reduction-28.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_nonmonotonic_runtime_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-28.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 4, 0, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-36.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_nonmonotonic_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-36.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 2, 1, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-37.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-37.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, (?:2147483650|-2147483646), 1, " 
1
FAIL: gfortran.dg/gomp/workshare-reduction-38.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_nonmonotonic_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-38.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 2, 1, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-39.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_nonmonotonic_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-39.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 2, 3, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-3.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_nonmonotonic_runtime_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-3.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 4, 0, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-40.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-40.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, (?:2147483650|-2147483646), 3, " 
1
FAIL: gfortran.dg/gomp/workshare-reduction-41.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_nonmonotonic_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-41.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 2, 3, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-42.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_nonmonotonic_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-42.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 3, 1, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-43.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-43.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, (?:2147483651|-2147483645), 1, " 
1
FAIL: gfortran.dg/gomp/workshare-reduction-44.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_nonmonotonic_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-44.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 3, 1, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-45.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_nonmonotonic_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-45.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 3, 3, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-46.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-46.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, (?:2147483651|-2147483645), 3, " 
1
FAIL: gfortran.dg/gomp/workshare-reduction-47.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_nonmonotonic_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-47.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop_start [^\n\r]*, 3, 3, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-56.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_doacross_post " 1
FAIL: gfortran.dg/gomp/workshare-reduction-56.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_doacross_wait " 1
FAIL: gf

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/10/20 1:14 PM, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
>> Eric Botcazou  wrote:
>>
 The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() 
 functions
 is rather bad, it yields bad machine code at least on i386 and AMD64. Since
 GCC knows how to shift integers twice the register size these functions can
 be written as one-liners.
>>> These functions are precisely meant to be used when GCC cannot do that.
>> On which processor(s) is GCC unable to generate code for DWtype shifts?
> E.g. avr-none, msp430-elf, pdp11-aout.
> And I see recursive __cmpdi2 calls on avr-none too.

FWIW, the tester only builds libgcc for avr-elf as there's no newlib
port.  msp430-elf builds newlib and will run the testsuite using the
simulator from binutils/gdb.  I don't test pdp11 at all.

While the avr port may advertise DImode shifts and such, under the hood
they're actually implemented as calls into hand written assembly in
libgcc and probably implement an internal ABI, particularly WRT register
usage.


jeff



Re: [Patch, fortran] PR83118 - [8/9/10/11 Regression] Bad intrinsic assignment of class(*) array component of derived type

2020-11-10 Thread Thomas Koenig via Gcc-patches

Hi Paul,


This all bootstraps and regtests on FC31/x86_64 - OK for master?


This is a sizable patch, and from what I can see, it all looks
plausible.  So, I's say OK for master (with one nit, below),
but maybe you could wait a day or so to give others the chance
to look it over, too.

The nit:


PR fortran/83118
* gfortran.dg/dependency_57.f90: Change to dg-run and test for correct
result.


I'd rather not change a test case unless it is needed; if something
breaks it, it is better to leave it as is for bisection.

Could you just make a new test from the run-time version?

Thanks a lot for tackling this thorny issue!

Best regards

Thomas




Re: [PATCH][MIPS] PR target/77510 Reduce size of MIPS core automaton

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/10/20 4:42 AM, Xu Chenghua wrote:
>
> Hi:
>
> This patch reduce reservation of model do not more than 10 cycles. The
> memory of genautomata down to 1GB.

How bad is the memory consumption before this change?


>
> ChangeLog
>
> gcc/
>     PR target/77510
>     * config/mips/gs464.md: Reduce reservation
>     duration to 10 cycles.
>     * config/mips/i6400.md: Likewise.
>     * config/mips/m5100.md: Likewise.
>     * config/mips/p5600.md: Likewise.
>     * config/mips/p6600.md: Likewise.
>     * config/mips/xlp.md: Likewise.
>     * config/mips/xlr.md: Likewise.

I doubt there's a measurable performance penalty for not fully modeling
the latency of these instructions, but I'm quite hesitant to ACK this
change without hearing from others in the MIPS space.   Alternately, if
the affected models aren't being sold anymore, then I wouldn't worry
about getting feedback on them.


Jeff




Re: [PATCH] Objective-C : Implement NSObject attribute.

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/6/20 1:18 PM, Iain Sandoe wrote:
> Hi
>
> Originally, I had this as a Darwin-only patch, however GNUStep
> also makes use of NSObject and similar constructs, so this really
> needs to be available to linux-gnu as well.
>
> tested across several Darwin versions and on x86_64-linux-gnu.
>
> OK for the c-family changes?
> thanks
> Iain
>
> =
>
> This attribute allows pointers to be marked as pointers to
> an NSObject-compatible object.  This allows for additional
> checking of assignment etc. when referring to pointers to
> opaque types (or type-erased via a void * cast).
>
> gcc/c-family/ChangeLog:
>
>   * c-attribs.c (handle_nsobject_attribute): New.
>   * c.opt: Add WNSObject-attribute.
>
> gcc/objc/ChangeLog:
>
>   * objc-act.c (objc_compare_types): Handle NSObject type
>   attributes.
>   (objc_type_valid_for_messaging): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * obj-c++.dg/attributes/nsobject-01.mm: New test.
>   * objc.dg/attributes/nsobject-01.m: New test.

OK with a suitable update to documentation.  I have no idea the
attribute is supposed to do.


jeff




Re: [07/32] cpp main

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/3/20 2:14 PM, Nathan Sidwell wrote:
> Here's the changes to starting the main file. I have added the ability
> to search the user or system include paths for the main file.  That's
> real helpful to users attempting to build header-units.  I'll discuss
> the CLI with the options patch.
>
> Also recording the location at which the main file starts is helpful
> for whole-file diagnostics.
>
>
>
> 07-cpp-main.diff
>
OK with ChangeLog entry.

jeff




Re: [08/32] cpp mkdeps

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/3/20 2:14 PM, Nathan Sidwell wrote:
> The final bit of preprocessor change is that to mkdeps.  We have a new
> kind of thing to be dependent upon -- a compiled module interface.
> That's also an additional target.
>
> This adds the logic to mkdeps to keep those dependencies separate from
> include file dependencies.  I made use of this when demonstrating a
> module-aware Make POC.
>
>
>
> 08-cpp-mkdeps.diff
>
OK with ChangeLog entry.

jeff




Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/10/20 2:17 PM, Jakub Jelinek wrote:
> On Tue, Nov 10, 2020 at 02:09:20PM -0700, Jeff Law wrote:
>> On 11/10/20 1:14 PM, Jakub Jelinek via Gcc-patches wrote:
>>> On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
 Eric Botcazou  wrote:

>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() 
>> functions
>> is rather bad, it yields bad machine code at least on i386 and AMD64. 
>> Since
>> GCC knows how to shift integers twice the register size these functions 
>> can
>> be written as one-liners.
> These functions are precisely meant to be used when GCC cannot do that.
 On which processor(s) is GCC unable to generate code for DWtype shifts?
>>> E.g. avr-none, msp430-elf, pdp11-aout.
>>> And I see recursive __cmpdi2 calls on avr-none too.
>> ACK.  I'll pull those [u]cmpdi changes.  They were iffy at best, this
>> confirms the concerns we both had.
> To be precise, I haven't tried to build libgcc for all of those, just
> checked what code is produced for
> unsigned long long foo (unsigned long long x) { return x << 3; }
> etc. by all trunk cross-compilers I have lying around.
> It might be that their __ashlDI3 etc. are actually __ashlsi3 and they
> have some other implementation of __ashldi3 etc.

Well for avr libgcc's cmpdi compiles into something reasonable, but
ucmpdi calls cmpdi and I can't convince myself it's right.  Regardless,
the [u]cmpdi stuff was very much borderline from a correctness
standpoint and I'm going to revert that part.



>
> Anyway, I'm not sure if we really need to optimize too much functions which
> are never used; and the -ftrapv stuff should one day be reimplemented using
> -fsanitize=signed-integer-overflow with an abort behavior on overflows.
>
> libgcc functions which are used heavily of course should be optimized as
> much as possible.

Agreed.  The improvements to use the overflow intrinsics are somewhat
useful -- they make the code clearer and that's a win.  The potential
self-recursive stuff in [u]cmpdi and the shifts probably aren't useful
(I haven't looked closely at the shifts yet).  Stefan has a fundamental
mis-understanding of how those routines are used and more importantly
the constraints they have to work within.

Jeff

Jeff

>
>   Jakub
>



Re: [PATCH] Objective-C : Implement NSObject attribute.

2020-11-10 Thread Iain Sandoe

Hi Jeff,

Jeff Law  wrote:


This attribute allows pointers to be marked as pointers to
an NSObject-compatible object.  This allows for additional




OK with a suitable update to documentation.  I have no idea the
attribute is supposed to do.


Joseph ack’d this and I applied it to master ..

.. this was intentionally undocumented.

I originally wrote documentation for it.  However, it turns out that it
is undocumented in both clang (and as far as I can tell Apple’s)
material.

This is perhaps because it’s not really intended (or recommended)
for end-user code, but for “internal” cases when writing libraries etc.

WDYT? (I’m happy to ressurect the documentation I wrote and
post/apply it).

thanks
Iain



Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Jeff Law via Gcc-patches


On 11/10/20 10:59 AM, Stefan Kanthak via Gcc-patches wrote:
> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
> is rather bad, it yields bad machine code at least on i386 and AMD64.
> Since GCC knows how to shift integers twice the register size these functions
> can be written as one-liners.
>
> The implementation of the __bswapsi2() function uses SIGNED instead of
> unsigned mask values; cf. __bswapdi2()
>
> Stefan Kanthak
>
> libgcc2.diff
>
> --- -/libgcc/libgcc2.h
> +++ +/libgcc/libgcc2.h
> @@ -391,7 +391,7 @@
>  extern DWtype __negdi2 (DWtype);
>  #endif
>  
> -extern DWtype __lshrdi3 (DWtype, shift_count_type);
> +extern UDWtype __lshrdi3 (UDWtype, shift_count_type);
>  extern DWtype __ashldi3 (DWtype, shift_count_type);
>  extern DWtype __ashrdi3 (DWtype, shift_count_type);
>  
> --- -/libgcc/libgcc2.c
> +++ +/libgcc/libgcc2.c
> @@ -398,30 +398,10 @@
>  /* Unless shift functions are defined with full ANSI prototypes,
> parameter b will be promoted to int if shift_count_type is smaller than 
> an int.  */
>  #ifdef L_lshrdi3
> -DWtype
> -__lshrdi3 (DWtype u, shift_count_type b)
> +UDWtype
> +__lshrdi3 (UDWtype u, shift_count_type b)

As has been pointed out, you can't implement these routines by doing the
operation in the same mode as the argument.  It's potentially
self-recursive.


THe whole point of these routines is to provide double-word capabilities
on targets that don't have intrinsic double-word capabilities. That's
why they're written in a non-obvious way using word sized operations.


>  
> @@ -486,10 +425,10 @@
>  SItype
>  __bswapsi2 (SItype u)
>  {
> -  return u) & 0xff00) >> 24)
> -   | (((u) & 0x00ff) >>  8)
> -   | (((u) & 0xff00) <<  8)
> -   | (((u) & 0x00ff) << 24));
> +  return u) & 0xff00u) >> 24)
> +   | (((u) & 0x00ffu) >>  8)
> +   | (((u) & 0xff00u) <<  8)
> +   | (((u) & 0x00ffu) << 24));

What's the point of this change?  I'm not sure how the signedness of the
constant really matters here.

jeff



  1   2   >