date:20240731

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Hongtao Liu

On Wed, Jul 31, 2024 at 1:06 AM Uros Bizjak  wrote:
>
> On Tue, Jul 30, 2024 at 3:00 PM Richard Biener  wrote:
> >
> > On Tue, 30 Jul 2024, Alexander Monakov wrote:
> >
> > >
> > > On Tue, 30 Jul 2024, Richard Biener wrote:
> > >
> > > > > Oh, and please add a small comment why we don't use XFmode here.
> > > >
> > > > Will do.
> > > >
> > > > /* Do not enable XFmode, there is padding in it and it suffers
> > > >from normalization upon load like SFmode and DFmode when
> > > >not using SSE.  */
> > >
> > > Is it really true? I have no evidence of FLDT performing normalization
> > > (as mentioned in PR 114659, if it did, there would be no way to 
> > > spill/reload
> > > x87 registers).
> >
> > What mangling fld performs depends on the contents of the FP control
> > word which is awkward.  IIRC there's at least a bugreport that it
> > turns sNaN into a qNaN, it seems I was wrong about denormals
> > (when DM is not masked).  And yes, IIRC x87 instability is also
> > related to spills (IIRC we spill in the actual mode of the reg, not in
> > XFmode), but -fexcess-precision=standard should hopefully avoid that.
> > It's also not clear whether all implementations conformed to the
> > specs wrt extended-precision format loads.
>
> FYI, FLDT does not mangle long-double values and does not generate
> exceptions. Please see [1], but ignore shadowed text and instead read
> the "Floating-Point Exceptions" section. So, as far as hardware is
> concerned, it *can* be used to transfer 10-byte values, but I don't
> want to judge from the compiler PoV if this is the way to go. We can
> enable it, perhaps temporarily to experiment a bit - it is easy to
> disable if it causes problems.
>
> Let's CC Intel folks for their opinion, if it is worth using an aging
> x87 to transfer 80-bit data.
I prefer not, in another hook ix86_can_change_mode_class, we have

20372  /* x87 registers can't do subreg at all, as all values are
reformatted
20373 to extended precision.  */
20374  if (MAYBE_FLOAT_CLASS_P (regclass))
20375return false;

I guess it eventually needs reload for XFmode.
>
> [1] https://www.felixcloutier.com/x86/fld
>
> Uros.



-- 
BR,
Hongtao

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak

On Wed, Jul 31, 2024 at 9:11 AM Hongtao Liu  wrote:
>
> On Wed, Jul 31, 2024 at 1:06 AM Uros Bizjak  wrote:
> >
> > On Tue, Jul 30, 2024 at 3:00 PM Richard Biener  wrote:
> > >
> > > On Tue, 30 Jul 2024, Alexander Monakov wrote:
> > >
> > > >
> > > > On Tue, 30 Jul 2024, Richard Biener wrote:
> > > >
> > > > > > Oh, and please add a small comment why we don't use XFmode here.
> > > > >
> > > > > Will do.
> > > > >
> > > > > /* Do not enable XFmode, there is padding in it and it suffers
> > > > >from normalization upon load like SFmode and DFmode when
> > > > >not using SSE.  */
> > > >
> > > > Is it really true? I have no evidence of FLDT performing normalization
> > > > (as mentioned in PR 114659, if it did, there would be no way to 
> > > > spill/reload
> > > > x87 registers).
> > >
> > > What mangling fld performs depends on the contents of the FP control
> > > word which is awkward.  IIRC there's at least a bugreport that it
> > > turns sNaN into a qNaN, it seems I was wrong about denormals
> > > (when DM is not masked).  And yes, IIRC x87 instability is also
> > > related to spills (IIRC we spill in the actual mode of the reg, not in
> > > XFmode), but -fexcess-precision=standard should hopefully avoid that.
> > > It's also not clear whether all implementations conformed to the
> > > specs wrt extended-precision format loads.
> >
> > FYI, FLDT does not mangle long-double values and does not generate
> > exceptions. Please see [1], but ignore shadowed text and instead read
> > the "Floating-Point Exceptions" section. So, as far as hardware is
> > concerned, it *can* be used to transfer 10-byte values, but I don't
> > want to judge from the compiler PoV if this is the way to go. We can
> > enable it, perhaps temporarily to experiment a bit - it is easy to
> > disable if it causes problems.
> >
> > Let's CC Intel folks for their opinion, if it is worth using an aging
> > x87 to transfer 80-bit data.
> I prefer not, in another hook ix86_can_change_mode_class, we have
>
> 20372  /* x87 registers can't do subreg at all, as all values are
> reformatted
> 20373 to extended precision.  */
> 20374  if (MAYBE_FLOAT_CLASS_P (regclass))
> 20375return false;

No, the above applies to SFmode subreg of XFmode value, which is a
no-go. My question refers to the plain XFmode (80-bit) moves, where
x87 is used simply to:

fldt mem1
...
fstp mem2

where x87 is used to perform a move from one 80-bit location to the other.

> I guess it eventually needs reload for XFmode.

There are no reloads, as we would like to perform bit-exact 80-bit
move, e.g. array of 10 chars.

Uros.

Re: [PATCH 1/2] Remove MMX code path in lexer

2024-07-31 Thread Richard Biener

On Tue, Jul 30, 2024 at 5:43 PM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> Host systems with only MMX and no SSE2 should be really rare now.
> Let's remove the MMX code path to keep the number of custom
> implementations the same.
>
> The SSE2 code path is also somewhat dubious now (nearly everything
> should have SSE4 4.2 which is >15 years old now), but the SSE2
> code path is used as fallback for others and also apparently
> Solaris uses it due to tool chain deficiencies.

OK if nobody objects this week.

Thanks,
Richard.

> libcpp/ChangeLog:
>
> * lex.cc (search_line_mmx): Remove function.
> (init_vectorized_lexer): Remove search_line_mmx.
> ---
>  libcpp/lex.cc | 75 ---
>  1 file changed, 75 deletions(-)
>
> diff --git a/libcpp/lex.cc b/libcpp/lex.cc
> index 16f2c23af1e1..1591dcdf151a 100644
> --- a/libcpp/lex.cc
> +++ b/libcpp/lex.cc
> @@ -290,71 +290,6 @@ static const char repl_chars[4][16] 
> __attribute__((aligned(16))) = {
>  '?', '?', '?', '?', '?', '?', '?', '?' },
>  };
>
> -/* A version of the fast scanner using MMX vectorized byte compare insns.
> -
> -   This uses the PMOVMSKB instruction which was introduced with "MMX2",
> -   which was packaged into SSE1; it is also present in the AMD MMX
> -   extension.  Mark the function as using "sse" so that we emit a real
> -   "emms" instruction, rather than the 3dNOW "femms" instruction.  */
> -
> -static const uchar *
> -#ifndef __SSE__
> -__attribute__((__target__("sse")))
> -#endif
> -search_line_mmx (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
> -{
> -  typedef char v8qi __attribute__ ((__vector_size__ (8)));
> -  typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
> -
> -  const v8qi repl_nl = *(const v8qi *)repl_chars[0];
> -  const v8qi repl_cr = *(const v8qi *)repl_chars[1];
> -  const v8qi repl_bs = *(const v8qi *)repl_chars[2];
> -  const v8qi repl_qm = *(const v8qi *)repl_chars[3];
> -
> -  unsigned int misalign, found, mask;
> -  const v8qi *p;
> -  v8qi data, t, c;
> -
> -  /* Align the source pointer.  While MMX doesn't generate unaligned data
> - faults, this allows us to safely scan to the end of the buffer without
> - reading beyond the end of the last page.  */
> -  misalign = (uintptr_t)s & 7;
> -  p = (const v8qi *)((uintptr_t)s & -8);
> -  data = *p;
> -
> -  /* Create a mask for the bytes that are valid within the first
> - 16-byte block.  The Idea here is that the AND with the mask
> - within the loop is "free", since we need some AND or TEST
> - insn in order to set the flags for the branch anyway.  */
> -  mask = -1u << misalign;
> -
> -  /* Main loop processing 8 bytes at a time.  */
> -  goto start;
> -  do
> -{
> -  data = *++p;
> -  mask = -1;
> -
> -start:
> -  t = __builtin_ia32_pcmpeqb(data, repl_nl);
> -  c = __builtin_ia32_pcmpeqb(data, repl_cr);
> -  t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c);
> -  c = __builtin_ia32_pcmpeqb(data, repl_bs);
> -  t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c);
> -  c = __builtin_ia32_pcmpeqb(data, repl_qm);
> -  t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c);
> -  found = __builtin_ia32_pmovmskb (t);
> -  found &= mask;
> -}
> -  while (!found);
> -
> -  __builtin_ia32_emms ();
> -
> -  /* FOUND contains 1 in bits for which we matched a relevant
> - character.  Conversion to the byte index is trivial.  */
> -  found = __builtin_ctz(found);
> -  return (const uchar *)p + found;
> -}
>
>  /* A version of the fast scanner using SSE2 vectorized byte compare insns.  
> */
>
> @@ -509,8 +444,6 @@ init_vectorized_lexer (void)
>minimum = 3;
>  #elif defined(__SSE2__)
>minimum = 2;
> -#elif defined(__SSE__)
> -  minimum = 1;
>  #endif
>
>if (minimum == 3)
> @@ -521,14 +454,6 @@ init_vectorized_lexer (void)
>  impl = search_line_sse42;
>else if (minimum == 2 || (edx & bit_SSE2))
> impl = search_line_sse2;
> -  else if (minimum == 1 || (edx & bit_SSE))
> -   impl = search_line_mmx;
> -}
> -  else if (__get_cpuid (0x8001, &dummy, &dummy, &dummy, &edx))
> -{
> -  if (minimum == 1
> - || (edx & (bit_MMXEXT | bit_CMOV)) == (bit_MMXEXT | bit_CMOV))
> -   impl = search_line_mmx;
>  }
>
>search_line_fast = impl;
> --
> 2.45.2
>

Re: [PATCH] c: Add support for unsequenced and reproducible attributes

2024-07-31 Thread Richard Biener

On Tue, Jul 30, 2024 at 7:05 PM Jakub Jelinek  wrote:
>
> Hi!
>
> C23 added in N2956 ( https://open-std.org/JTC1/SC22/WG14/www/docs/n2956.htm )
> two new attributes, which are described as similar to GCC const and pure
> attributes, but they aren't really same and it seems that even the paper
> is missing some of the differences.
> The paper says unsequenced is the same as const on functions without pointer
> arguments and reproducible is the same as pure on such functions (except
> that they are function type attributes rather than function
> declaration ones), but it seems the paper doesn't consider the finiteness GCC
> relies on (aka non-DECL_LOOPING_CONST_OR_PURE_P) - the paper only talks
> about using the attributes for CSE etc., not for DCE.
>
> The following patch introduces (for now limited) support for those
> attributes, both as standard C23 attributes and as GNU extensions (the
> difference is that the patch is then less strict on where it allows them,
> like other function type attributes they can be specified on function
> declarations as well and apply to the type, while C23 standard ones must
> go on the function declarators (i.e. after closing paren after function
> parameters) or in type specifiers of function type.
>
> If function doesn't have any pointer/reference arguments (I wasn't sure
> whether it must be really just pure pointer arguments or whether say
> struct S { int s; int *p; } passed by value, or unions, or perhaps just
> transparent unions count, and whether variadic functions which can take
> pointer va_arg count too, so the check punts on all of those), the patch
> adds additional internal attribute with " noptr" suffix which then is used
> by flags_from_decl_or_type to handle those easy cases as
> ECF_CONST|ECF_LOOPING_CONST_OR_PURE or
> ECF_PURE|ECF_LOOPING_CONST_OR_PURE
> The harder cases aren't handled right now, I'd hope they can be handled
> incrementally.
>
> I wonder whether we shouldn't emit a warning for the
> gcc.dg/c23-attr-{reproducible,unsequenced}-5.c cases, while the standard
> clearly specifies that composite types should union the attributes and it
> is what GCC implements for decades, for ?: that feels dangerous for the
> new attributes, it would be much better to be conservative on say
> (cond ? unsequenced_function : normal_function) (args)
>
> There is no diagnostics on incorrect [[unsequenced]] or [[reproducible]]
> function definitions, while I think diagnosing non-const static/TLS
> declarations in the former could be easy, the rest feels hard.  E.g. the
> const/pure discovery can just punt on everything it doesn't understand,
> but complete diagnostics would need to understand it.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I wonder if

int foo (uintrptr_t x) { *(int *)x = 1; return 1; }

is considered "noptr" by the standard but then by making a pointer out of
'x' invokes UB?

One more comment below.

> 2024-07-30  Jakub Jelinek  
>
> PR c/116130
> gcc/
> * doc/extend.texi (unsequenced, reproducible): Document new function
> type attributes.
> * calls.cc (flags_from_decl_or_type): Handle "unsequenced noptr" and
> "reproducible noptr" attributes.
> gcc/c-family/
> * c-attribs.cc (c_common_gnu_attributes): Add entries for
> "unsequenced", "reproducible", "unsequenced noptr" and
> "reproducible noptr" attributes.
> (c_maybe_contains_pointers_p): New function.
> (handle_unsequenced_attribute): Likewise.
> (handle_reproducible_attribute): Likewise.
> * c-common.h (handle_unsequenced_attribute): Declare.
> (handle_reproducible_attribute): Likewise.
> * c-lex.cc (c_common_has_attribute): Return 202311 for standard
> unsequenced and reproducible attributes.
> gcc/c/
> * c-decl.cc (handle_std_unsequenced_attribute): New function.
> (handle_std_reproducible_attribute): Likewise.
> (std_attributes): Add entries for "unsequenced" and "reproducible"
> attributes.
> (c_warn_type_attributes): Add TYPE argument.  Allow unsequenced
> or reproducible attributes if it is FUNCTION_TYPE.
> (groktypename): Adjust c_warn_type_attributes caller.
> (grokdeclarator): Likewise.
> (finish_declspecs): Likewise.
> * c-parser.cc (c_parser_declaration_or_fndef): Likewise.
> * c-tree.h (c_warn_type_attributes): Add TYPE argument.
> gcc/testsuite/
> * c-c++-common/attr-reproducible-1.c: New test.
> * c-c++-common/attr-reproducible-2.c: New test.
> * c-c++-common/attr-unsequenced-1.c: New test.
> * c-c++-common/attr-unsequenced-2.c: New test.
> * gcc.dg/c23-attr-reproducible-1.c: New test.
> * gcc.dg/c23-attr-reproducible-2.c: New test.
> * gcc.dg/c23-attr-reproducible-3.c: New test.
> * gcc.dg/c23-attr-reproducible-4.c: New test.
> * gcc.dg/c23-attr-reproducible-5.c: New test.

Re: [PATCH v4 1/3] aarch64: Add march flags for +fp8 arch extensions

2024-07-31 Thread Kyrylo Tkachov

Hi Claudio,

> On 31 Jul 2024, at 08:29, Claudio Bantaloukas  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This introduces the relevant flags to enable access to the fpmr register and 
> fp8 intrinsics, which will be added subsequently.
> 
> gcc/ChangeLog:
> 
>* config/aarch64/aarch64-option-extensions.def (fp8): New.
>* config/aarch64/aarch64.h (TARGET_FP8): Likewise.
>* doc/invoke.texi (AArch64 Options): Document new -march flags
>and extensions.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/aarch64/acle/fp8.c: New test.

Thank you for the explanations and the respins.
This is ok.
Thanks,
Kyrill


> ---
> .../aarch64/aarch64-option-extensions.def |  2 ++
> gcc/config/aarch64/aarch64.h  |  3 +++
> gcc/doc/invoke.texi   |  2 ++
> gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 20 +++
> 4 files changed, 27 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> 
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index 42ec0eec31e..6998627f377 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
> 
> AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
> 
> +AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), "fp8")
> +
> #undef AARCH64_OPT_FMV_EXTENSION
> #undef AARCH64_OPT_EXTENSION
> #undef AARCH64_FMV_FEATURE
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index b7e330438d9..2e75c6b81e2 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -463,6 +463,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE 
> ATTRIBUTE_UNUSED
> && (aarch64_tune_params.extra_tuning_flags \
> & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
> 
> +/* fp8 instructions are enabled through +fp8.  */
> +#define TARGET_FP8 AARCH64_HAVE_ISA (FP8)
> +
> /* Standard register usage.  */
> 
> /* 31 64-bit general purpose registers R0-R30:
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 86f9b5d1fe5..ef2213b4e84 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21849,6 +21849,8 @@ Enable support for Armv9.4-a Guarded Control Stack 
> extension.
> Enable support for Armv8.9-a/9.4-a translation hardening extension.
> @item rcpc3
> Enable the RCpc3 (Release Consistency) extension.
> +@item fp8
> +Enable the fp8 (8-bit floating point) extension.
> 
> @end table
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> new file mode 100644
> index 000..459442be155
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> @@ -0,0 +1,20 @@
> +/* Test the fp8 ACLE intrinsics family.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -march=armv8-a" } */
> +
> +#include 
> +
> +#ifdef __ARM_FEATURE_FP8
> +#error "__ARM_FEATURE_FP8 feature macro defined."
> +#endif
> +
> +#pragma GCC push_options
> +#pragma GCC target("arch=armv9.4-a+fp8")
> +
> +/* We do not define __ARM_FEATURE_FP8 until all
> +   relevant features have been added. */
> +#ifdef __ARM_FEATURE_FP8
> +#error "__ARM_FEATURE_FP8 feature macro defined."
> +#endif
> +
> +#pragma GCC pop_options

Re: [PATCH v4 2/3] aarch64: Add support for moving fpm system register

2024-07-31 Thread Kyrylo Tkachov

Hi Claudio,

> On 31 Jul 2024, at 08:29, Claudio Bantaloukas  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Unlike most system registers, fpmr can be heavily written to in code that
> exercises the fp8 functionality. That is because every fp8 instrinsic call
> can potentially change the value of fpmr.
> Rather than just use an unspec, we treat the fpmr system register like
> all other registers and use a move operation to read and write to it.
> 
> We introduce a new class of moveable system registers that, currently,
> only accepts fpmr and a new constraint, Umv, that allows us to
> selectively use mrs and msr instructions when expanding rtl for them.
> Given that there is code that depends on "real" registers coming before
> "fake" ones, we introduce a new constant FPM_REGNUM that uses an
> existing value and renumber registers below that.
> This requires us to update the bitmaps that describe which registers
> belong to each register class.

So I like the approach though I’ll let Richard review the implementation 
details here.
My only slight concern here is compatibility with LLVM. I notice that LLVM 
doesn’t accept the test case you’ve included as it doesn’t understand “fpmr” in 
its inline assembly. It also doesn’t support the new constraint, of course.
Do you know if there are plans to teach LLVM these inline assembly constructs 
to avoid creating GCC-only sources for fp8?
Thanks,
Kyrill


> 
> gcc/ChangeLog:
> 
>* config/aarch64/aarch64.cc (aarch64_hard_regno_nregs): Add
>support for MOVEABLE_SYSREGS class.
>(aarch64_hard_regno_mode_ok): Allow reads and writes to fpmr.
>(aarch64_regno_regclass): Support MOVEABLE_SYSREGS class.
>(aarch64_class_max_nregs): Likewise.
>* config/aarch64/aarch64.h (FIXED_REGISTERS): add fpmr.
>(CALL_REALLY_USED_REGISTERS): Likewise.
>(REGISTER_NAMES): Likewise.
>(enum reg_class): Add MOVEABLE_SYSREGS class.
>(REG_CLASS_NAMES): Likewise.
>(REG_CLASS_CONTENTS): Update class bitmaps to deal with fpmr,
>the new MOVEABLE_REGS class and renumbering of registers.
>* config/aarch64/aarch64.md: (FPM_REGNUM): added new register
>number, reusing old value.
>(FFR_REGNUM): Renumber.
>(FFRT_REGNUM): Likewise.
>(LOWERING_REGNUM): Likewise.
>(TPIDR2_BLOCK_REGNUM): Likewise.
>(SME_STATE_REGNUM): Likewise.
>(TPIDR2_SETUP_REGNUM): Likewise.
>(ZA_FREE_REGNUM): Likewise.
>(ZA_SAVED_REGNUM): Likewise.
>(ZA_REGNUM): Likewise.
>(ZT0_REGNUM): Likewise.
>(*mov_aarch64): Add support for moveable sysregs.
>(*movsi_aarch64): Likewise.
>(*movdi_aarch64): Likewise.
>* config/aarch64/constraints.md (MOVEABLE_SYSREGS): New constraint.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/aarch64/acle/fp8.c: New tests.
> ---
> gcc/config/aarch64/aarch64.cc   |   8 ++
> gcc/config/aarch64/aarch64.h|  14 ++-
> gcc/config/aarch64/aarch64.md   |  30 --
> gcc/config/aarch64/constraints.md   |   3 +
> gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 101 
> 5 files changed, 142 insertions(+), 14 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index e0cf382998c..9810f2c0390 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -2018,6 +2018,7 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode 
> mode)
> case PR_HI_REGS:
>   return mode == VNx32BImode ? 2 : 1;
> 
> +case MOVEABLE_SYSREGS:
> case FFR_REGS:
> case PR_AND_FFR_REGS:
> case FAKE_REGS:
> @@ -2045,6 +2046,9 @@ aarch64_hard_regno_mode_ok (unsigned regno, 
> machine_mode mode)
> /* This must have the same size as _Unwind_Word.  */
> return mode == DImode;
> 
> +  if (regno == FPM_REGNUM)
> +return mode == QImode || mode == HImode || mode == SImode || mode == 
> DImode;
> +
>   unsigned int vec_flags = aarch64_classify_vector_mode (mode);
>   if (vec_flags == VEC_SVE_PRED)
> return pr_or_ffr_regnum_p (regno);
> @@ -12680,6 +12684,9 @@ aarch64_regno_regclass (unsigned regno)
>   if (PR_REGNUM_P (regno))
> return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
> 
> +  if (regno == FPM_REGNUM)
> +return MOVEABLE_SYSREGS;
> +
>   if (regno == FFR_REGNUM || regno == FFRT_REGNUM)
> return FFR_REGS;
> 
> @@ -13068,6 +13075,7 @@ aarch64_class_max_nregs (reg_class_t regclass, 
> machine_mode mode)
> case PR_HI_REGS:
>   return mode == VNx32BImode ? 2 : 1;
> 
> +case MOVEABLE_SYSREGS:
> case STACK_REG:
> case FFR_REGS:
> case PR_AND_FFR_REGS:
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 2e75c6b81e2..2dfb999bea5 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -523,6 +523,7 @@ constexpr auto AARCH64_FL_DEFAUL

Re: [PATCH v4 3/3] aarch64: Add fpm register helper functions.

2024-07-31 Thread Kyrylo Tkachov

Hi Claudio,

> On 31 Jul 2024, at 08:29, Claudio Bantaloukas  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> The ACLE declares several helper types and functions to facilitate 
> construction
> of `fpm` arguments. These are available when one of the arm_neon.h, arm_sve.h,
> or arm_sme.h headers is included. These helpers don't map to specific FP8
> instructions and there's no expectation that they will produce a given code
> sequence, they're just an abstraction and an aid to the programmer. Thus they 
> are
> implemented in a new header file arm_private_fp8.h
> Users are not expected to include this file, as it is a mere implementation 
> detail,
> subject to change. A check is included to guard against direct inclusion.

This is ok.
Thanks,
Kyrill


> 
> gcc/ChangeLog:
> 
>* config.gcc (extra_headers): Install arm_private_fp8.h.
>* config/aarch64/arm_neon.h: Include arm_private_fp8.h.
>* config/aarch64/arm_sve.h: Likewise.
>* config/aarch64/arm_private_fp8.h: New file
>(fpm_t): New type representing fpmr values.
>(enum __ARM_FPM_FORMAT): New enum representing valid fp8 formats.
>(enum __ARM_FPM_OVERFLOW): New enum representing how some fp8
>calculations work.
>(__arm_fpm_init): New.
>(__arm_set_fpm_src1_format): Likewise.
>(__arm_set_fpm_src2_format): Likewise.
>(__arm_set_fpm_dst_format): Likewise.
>(__arm_set_fpm_overflow_cvt): Likewise.
>(__arm_set_fpm_overflow_mul): Likewise.
>(__arm_set_fpm_lscale): Likewise.
>(__arm_set_fpm_lscale2): Likewise.
>(__arm_set_fpm_nscale): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/aarch64/acle/fp8-helpers-neon.c: New test of fpmr helper
>functions.
>* gcc.target/aarch64/acle/fp8-helpers-sve.c: New test of fpmr helper
>functions presence.
>* gcc.target/aarch64/acle/fp8-helpers-sme.c: New test of fpmr helper
>functions presence.
> ---
> gcc/config.gcc|  2 +-
> gcc/config/aarch64/arm_neon.h |  1 +
> gcc/config/aarch64/arm_private_fp8.h  | 80 +++
> gcc/config/aarch64/arm_sve.h  |  1 +
> .../aarch64/acle/fp8-helpers-neon.c   | 53 
> .../gcc.target/aarch64/acle/fp8-helpers-sme.c | 12 +++
> .../gcc.target/aarch64/acle/fp8-helpers-sve.c | 12 +++
> 7 files changed, 160 insertions(+), 1 deletion(-)
> create mode 100644 gcc/config/aarch64/arm_private_fp8.h
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 7453ade0782..a36dd1bcbc6 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -347,7 +347,7 @@ m32c*-*-*)
> ;;
> aarch64*-*-*)
> cpu_type=aarch64
> - extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h 
> arm_sme.h arm_neon_sve_bridge.h"
> + extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h 
> arm_sme.h arm_neon_sve_bridge.h arm_private_fp8.h"
> c_target_objs="aarch64-c.o"
> cxx_target_objs="aarch64-c.o"
> d_target_objs="aarch64-d.o"
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index c4a09528ffd..e376685489d 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -30,6 +30,7 @@
> #pragma GCC push_options
> #pragma GCC target ("+nothing+simd")
> 
> +#include 
> #pragma GCC aarch64 "arm_neon.h"
> 
> #include 
> diff --git a/gcc/config/aarch64/arm_private_fp8.h 
> b/gcc/config/aarch64/arm_private_fp8.h
> new file mode 100644
> index 000..5668cc24c99
> --- /dev/null
> +++ b/gcc/config/aarch64/arm_private_fp8.h
> @@ -0,0 +1,80 @@
> +/* AArch64 FP8 helper functions.
> +   Do not include this file directly. Use one of arm_neon.h
> +   arm_sme.h arm_sve.h instead.
> +
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +   Contributed by ARM Ltd.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Excepti

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Hongtao Liu

On Wed, Jul 31, 2024 at 3:17 PM Uros Bizjak  wrote:
>
> On Wed, Jul 31, 2024 at 9:11 AM Hongtao Liu  wrote:
> >
> > On Wed, Jul 31, 2024 at 1:06 AM Uros Bizjak  wrote:
> > >
> > > On Tue, Jul 30, 2024 at 3:00 PM Richard Biener  wrote:
> > > >
> > > > On Tue, 30 Jul 2024, Alexander Monakov wrote:
> > > >
> > > > >
> > > > > On Tue, 30 Jul 2024, Richard Biener wrote:
> > > > >
> > > > > > > Oh, and please add a small comment why we don't use XFmode here.
> > > > > >
> > > > > > Will do.
> > > > > >
> > > > > > /* Do not enable XFmode, there is padding in it and it 
> > > > > > suffers
> > > > > >from normalization upon load like SFmode and DFmode when
> > > > > >not using SSE.  */
> > > > >
> > > > > Is it really true? I have no evidence of FLDT performing normalization
> > > > > (as mentioned in PR 114659, if it did, there would be no way to 
> > > > > spill/reload
> > > > > x87 registers).
> > > >
> > > > What mangling fld performs depends on the contents of the FP control
> > > > word which is awkward.  IIRC there's at least a bugreport that it
> > > > turns sNaN into a qNaN, it seems I was wrong about denormals
> > > > (when DM is not masked).  And yes, IIRC x87 instability is also
> > > > related to spills (IIRC we spill in the actual mode of the reg, not in
> > > > XFmode), but -fexcess-precision=standard should hopefully avoid that.
> > > > It's also not clear whether all implementations conformed to the
> > > > specs wrt extended-precision format loads.
> > >
> > > FYI, FLDT does not mangle long-double values and does not generate
> > > exceptions. Please see [1], but ignore shadowed text and instead read
> > > the "Floating-Point Exceptions" section. So, as far as hardware is
> > > concerned, it *can* be used to transfer 10-byte values, but I don't
> > > want to judge from the compiler PoV if this is the way to go. We can
> > > enable it, perhaps temporarily to experiment a bit - it is easy to
> > > disable if it causes problems.
> > >
> > > Let's CC Intel folks for their opinion, if it is worth using an aging
> > > x87 to transfer 80-bit data.
> > I prefer not, in another hook ix86_can_change_mode_class, we have
> >
> > 20372  /* x87 registers can't do subreg at all, as all values are
> > reformatted
> > 20373 to extended precision.  */
> > 20374  if (MAYBE_FLOAT_CLASS_P (regclass))
> > 20375return false;
>
> No, the above applies to SFmode subreg of XFmode value, which is a
> no-go. My question refers to the plain XFmode (80-bit) moves, where
> x87 is used simply to:
>
> fldt mem1
> ...
> fstp mem2
>
> where x87 is used to perform a move from one 80-bit location to the other.
>
> > I guess it eventually needs reload for XFmode.
>
> There are no reloads, as we would like to perform bit-exact 80-bit
> move, e.g. array of 10 chars.
Oh, It's memory copy.
I suspect that the hardware doesn't enable memory renaming for x87 instructions.
So I prefer not.
>
> Uros.



-- 
BR,
Hongtao

Re: [RFH PATCH] c++: Implement C++26 P2963R3 - Ordering of constraints involving fold expressions [PR115746]

2024-07-31 Thread Jakub Jelinek

On Tue, Jul 30, 2024 at 07:51:34PM -0400, Jason Merrill wrote:
> Yeah.
> 
> In the paper a fold expanded constraint doesn't have a parameter mapping,
> only atomic constraints do.  Within the normal form of (__is_same (T, int)
> && ...) we have a single atomic constraint with parameter mapping T -> T,
> which only comes into play when we're checking satisfaction for each
> element.

> > > So, shall we file some https://github.com/cplusplus/CWG/ issue about this?
> > > Whether the packs [temp.constr.fold] talks about are the normalized ones
> > > only (in that case what happens if there are no packs), or all packs
> > > mentioned (in that case, whether there shouldn't be also template 
> > > parameter
> > > mappings on the fold expanded constraints like there are on the atomic
> > > constraints (for the unexpanded packs only)?
> 
> I think there should be parameter mappings for all parameter packs named in
> the fold-expression.  And I suppose for the other template parameters as
> well.

> > > Anyway, I'm afraid on the implementation side, ARGUMENT_PACK_SELECT
> > > didn't help almost at all.  The problem e.g. on fold-constr7.C testcase
> > > is that the ARGUMENT_PACK_SELECT is optimized away before it could be 
> > > used.
> > > tsubst_parameter_mapping (where I could remove the
> > >if (cxx_dialect >= cxx26 && ARGUMENT_PACK_P (arg))
> > > hack without any behavior change) just tsubsts it into int type.
> > > With the hack removed, it will go through
> > >if (ARGUMENT_PACK_P (arg))
> > >  new_arg = tsubst_argument_pack (arg, args, complain, in_decl);
> > > but that still sets new_arg to int INTEGER_TYPE; while if a pack is used
> > > in some nested pack expansion as well as outside of it, we'd need to 
> > > arrange
> > > to reconstruct ARGUMENT_PACK_SELECT in what tsubst_parameter_mapping
> > > arranges.
> > 
> > Ah right, because of the double substitution -- first satisfy_atom
> > substitutes into the parameter mapping, and then it substitutes this
> > substituted parameter mapping into the atomic constraint expression.
> > So after the first substitution the APS might already have gotten
> > "resolved", I think..
> > 
> > IIUC the normal form of the constraint in fold-constr7.C will have
> > the identity parameter mapping Ts -> {Ts...}.  And you'll be passing
> > Ts=APS<{int,int,...}, 0> etc to the recursive satisfy_constraint_r call
> > in satisfy_fold.
> > 
> > Does it work if you wrap the ARGUMENT_PACK_SELECT in a single-element
> > TYPE/NONTYPE_ARGUMENT_PACK?
> 
> I think trying to play games with APS in the normalized form is a mistake;
> I'd think we should only use it it when substituting elements of the
> argument pack into the atomic constraint's parameter mapping.

Thanks to you both.

I'm still lost what to do exactly.
The building of parameter mappings for ATOMIC_CONSTR is done by
build_parameter_mapping but that already during the discover part
(find_template_parameters) uses ctx_parms, so I assume I can't just build
the parameter mappings for FOLD_CONSTR based on the already found unexpanded
parameter packs and just map_arguments the copy of the list or something
like that because that doesn't depend on ctx_parms.
Would satisfy_fold then need to tsubst_parameter_mappings somehow as well?

If satisfy_atom shouldn't use ARGUMENT_PACK_SELECT yet and it should be
only created during tsubst_parameter_mappings, I guess it needs some other
way how to tell tsubst_parameter_mappings of the ATOMIC_CONSTR which of the
parameter packs should have ARGUMENT_PACK_SELECT.  On the side argument
(like the packs TREE_LIST I had in the first version of the patch which
didn't really work), something else?  In any case, it needs to be something
that e.g. the satisfaction cache hash and equal can take into account.
And still am not sure how that can work.  Consider
template  concept C = requires { U (1); };
template  struct A { A (int) {} };
template  requires ((C > && C  && (sizeof (V) + ...) < 80) 
&& ...)
constexpr bool foo () { return true; }
static_assert (foo > ());
For the C > case it needs to apply the ARGUMENT_PACK_SELECT right away
and just subst it into int (or later long and later A ), because
one needs to substitute A into the ATOM_CONSTR expression, in the C 
case either that or could just add ARGUMENT_PACK_SELECT afterwards, and
for the last case it needs to be ARGUMENT_PACK_SELECT so that the pack
expansion expands it.

Also, if FOLD_CONSTR has the mappings, wonder if derive_fold_proof doesn't
need to change the way it checks whether the unexpanded packs are equal
(worried we could e.g. try to compare for subsumption FOLD_CONSTRs from
different templates, one from some concept check and another from a
different one or whatever, and not sure if template_args_equal will return
false if they are packs from different templates with the same index/level.

I've updated most of the testcases in the patch so that they actually use
the template argument (but left one with those).

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak

On Wed, Jul 31, 2024 at 10:02 AM Hongtao Liu  wrote:

> > > > > > On Tue, 30 Jul 2024, Richard Biener wrote:
> > > > > >
> > > > > > > > Oh, and please add a small comment why we don't use XFmode here.
> > > > > > >
> > > > > > > Will do.
> > > > > > >
> > > > > > > /* Do not enable XFmode, there is padding in it and it 
> > > > > > > suffers
> > > > > > >from normalization upon load like SFmode and DFmode 
> > > > > > > when
> > > > > > >not using SSE.  */
> > > > > >
> > > > > > Is it really true? I have no evidence of FLDT performing 
> > > > > > normalization
> > > > > > (as mentioned in PR 114659, if it did, there would be no way to 
> > > > > > spill/reload
> > > > > > x87 registers).
> > > > >
> > > > > What mangling fld performs depends on the contents of the FP control
> > > > > word which is awkward.  IIRC there's at least a bugreport that it
> > > > > turns sNaN into a qNaN, it seems I was wrong about denormals
> > > > > (when DM is not masked).  And yes, IIRC x87 instability is also
> > > > > related to spills (IIRC we spill in the actual mode of the reg, not in
> > > > > XFmode), but -fexcess-precision=standard should hopefully avoid that.
> > > > > It's also not clear whether all implementations conformed to the
> > > > > specs wrt extended-precision format loads.
> > > >
> > > > FYI, FLDT does not mangle long-double values and does not generate
> > > > exceptions. Please see [1], but ignore shadowed text and instead read
> > > > the "Floating-Point Exceptions" section. So, as far as hardware is
> > > > concerned, it *can* be used to transfer 10-byte values, but I don't
> > > > want to judge from the compiler PoV if this is the way to go. We can
> > > > enable it, perhaps temporarily to experiment a bit - it is easy to
> > > > disable if it causes problems.
> > > >
> > > > Let's CC Intel folks for their opinion, if it is worth using an aging
> > > > x87 to transfer 80-bit data.
> > > I prefer not, in another hook ix86_can_change_mode_class, we have
> > >
> > > 20372  /* x87 registers can't do subreg at all, as all values are
> > > reformatted
> > > 20373 to extended precision.  */
> > > 20374  if (MAYBE_FLOAT_CLASS_P (regclass))
> > > 20375return false;
> >
> > No, the above applies to SFmode subreg of XFmode value, which is a
> > no-go. My question refers to the plain XFmode (80-bit) moves, where
> > x87 is used simply to:
> >
> > fldt mem1
> > ...
> > fstp mem2
> >
> > where x87 is used to perform a move from one 80-bit location to the other.
> >
> > > I guess it eventually needs reload for XFmode.
> >
> > There are no reloads, as we would like to perform bit-exact 80-bit
> > move, e.g. array of 10 chars.
> Oh, It's memory copy.
> I suspect that the hardware doesn't enable memory renaming for x87 
> instructions.
> So I prefer not.

OK. Richard, can you please mention the above in the comment why
XFmode is rejected in the hook?

Later, we can perhaps benchmark XFmode move vs. generic memory copy to
get some hard data.

Thanks,
Uros.

[RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-07-31 Thread Mariam Arutunian

 This patch adds a new compiler pass aimed at identifying naive CRC
implementations,
characterized by the presence of a loop calculating a CRC (polynomial long
division).
 Upon detection of a potential CRC, the pass prints an informational
message.

 Performs CRC optimization if optimization level is >= 2,
besides optimizations for size and if fno_gimple_crc_optimization given.

 This pass is added for the detection and optimization of naive CRC
implementations,
improving the efficiency of CRC-related computations.

  This patch includes only initial fast checks for filtering out non-CRCs,
detected possible CRCs verification and optimization parts will be provided
in subsequent patches.

 gcc/

   * Makefile.in (OBJS): Add gimple-crc-optimization.o.
   * common.opt (fgimple-crc-optimization): New option.
   * common.opt.urls: Regenerate to add
   fgimple-crc-optimization.
   * doc/invoke.texi (-fgimple-crc-optimization): Add documentation.
   * gimple-crc-optimization.cc: New file.
   * gimple.cc (set_phi_stmts_not_visited): New function.
   (set_gimple_stmts_not_visited): Likewise.
   (set_bbs_stmts_not_visited): Likewise.
   * gimple.h (set_gimple_stmts_not_visited): New extern function
declaration.
   (set_phi_stmts_not_visited): New extern function declaration.
   (set_bbs_stmts_not_visited): New extern function declaration.
   * opts.cc (default_options_table): Add OPT_fgimple_crc_optimization.
   (enable_fdo_optimizations): Enable gimple-crc-optimization.
   * passes.def (pass_crc_optimization): Add new pass.
   * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
   * tree-pass.h (make_pass_crc_optimization): New extern function
declaration.

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f4bb4a88cf3..0238201981d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1721,6 +1721,7 @@ OBJS = \
 	tree-iterator.o \
 	tree-logical-location.o \
 	tree-loop-distribution.o \
+	gimple-crc-optimization.o \
 	tree-nested.o \
 	tree-nrv.o \
 	tree-object-size.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index c4bda3b7da8..ea44ad1757a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1757,6 +1757,16 @@ Common Var(flag_gcse_after_reload) Optimization
 Perform global common subexpression elimination after register allocation has
 finished.
 
+fgimple-crc-optimization
+Common Var(flag_gimple_crc_optimization) Optimization
+Detect loops calculating CRC and replace with faster implementation.
+If the target supports CRC instruction and the CRC loop uses the same
+polynomial as the one used in the CRC instruction, directly replace with the
+corresponding CRC instruction.
+Otherwise, if the target supports carry-less-multiplication instruction,
+generate CRC using it.
+If neither case applies, generate table-based CRC.
+
 Enum
 Name(dwarf_gnat_encodings) Type(int)
 
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index 0e71bce27c4..afb93d8147f 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -703,6 +703,9 @@ UrlSuffix(gcc/Optimize-Options.html#index-fgcse-las)
 fgcse-after-reload
 UrlSuffix(gcc/Optimize-Options.html#index-fgcse-after-reload)
 
+fgimple-crc-optimization
+UrlSuffix(gcc/Optimize-Options.html#index-fgimple-crc-optimization)
+
 fgraphite-identity
 UrlSuffix(gcc/Optimize-Options.html#index-fgraphite-identity)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4850c7379bf..cba90610027 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -567,8 +567,8 @@ Objective-C and Objective-C++ Dialects}.
 -ffast-math  -ffinite-math-only  -ffloat-store  -fexcess-precision=@var{style}
 -ffinite-loops
 -fforward-propagate  -ffp-contract=@var{style}  -ffunction-sections
--fgcse  -fgcse-after-reload  -fgcse-las  -fgcse-lm  -fgraphite-identity
--fgcse-sm  -fhoist-adjacent-loads  -fif-conversion
+-fgcse  -fgcse-after-reload  -fgcse-las  -fgcse-lm  -fgimple-crc-optimization
+-fgraphite-identity -fgcse-sm  -fhoist-adjacent-loads  -fif-conversion
 -fif-conversion2  -findirect-inlining
 -finline-stringops[=@var{fn}]
 -finline-functions  -finline-functions-called-once  -finline-limit=@var{n}
@@ -12574,6 +12574,7 @@ also turns on the following optimization flags:
 -fexpensive-optimizations
 -ffinite-loops
 -fgcse  -fgcse-lm
+-fgimple-crc-optimization
 -fhoist-adjacent-loads
 -finline-functions
 -finline-small-functions
@@ -13768,6 +13769,19 @@ This flag is disabled by default.
 Note that @option{-flive-patching} is not supported with link-time optimization
 (@option{-flto}).
 
+@opindex fgimple-crc-optimization
+@item -fgimple-crc-optimization
+Detect loops calculating CRC (performing polynomial long division) and
+replace them with a faster implementation.  Detect 8, 16, 32, and 64 bit CRC,
+with a constant polynomial without the leading 1 bit,
+for both bit-forward and bit-reversed cases.
+If the target supports a CRC instruction and the polynomial used in the source
+

Re: [PATCH] c: Add support for unsequenced and reproducible attributes

2024-07-31 Thread Jakub Jelinek

On Wed, Jul 31, 2024 at 09:50:56AM +0200, Richard Biener wrote:
> I wonder if
> 
> int foo (uintrptr_t x) { *(int *)x = 1; return 1; }
> 
> is considered "noptr" by the standard but then by making a pointer out of
> 'x' invokes UB?

I don't know.  The paper claims same behavior as const for functions without
pointer/array arguments (but arrays decay to pointers); that claim is
most likely false because of the infinite loops, but still, I'm not sure
about the
struct S { int *p; };
int bar (struct S x) [[unsequenced]] { x.p[0] = 1; return 1; }
or
int baz (...) [[unsequenced]] { va_list ap; va_start (ap); int *p = va_arg (ap, 
int *); va_end (ap); *p = 1; return 1; }
or
typedef union { int *p; long long *q; } U __attribute__((transparent_union));
int qux (int x, U y) [[unsequenced]] { if (x) y.p[0] = 1; else y.q[1] = 2; 
return 3; }
etc. cases too.

> > +/* Handle an "unsequenced" attribute; arguments as in
> > +   struct attribute_spec.handler.  */
> > +
> > +tree
> > +handle_unsequenced_attribute (tree *node, tree name, tree ARG_UNUSED 
> > (args),
> > + int flags, bool *no_add_attrs)
> > +{
> > +  tree fntype = *node;
> > +  for (tree argtype = TYPE_ARG_TYPES (fntype); argtype;
> > +   argtype = TREE_CHAIN (argtype))
> > +if (argtype == void_list_node)
> 
> I think this warrants a comment that the attribute on variadic functions
> is treated as receiving pointers.

Ok (though, iff we actually need to handle variadic functions that way).

> > +  {
> > +   if (VOID_TYPE_P (TREE_TYPE (fntype)))
> > + warning (OPT_Wattributes, "%qE attribute on function type "
> > +  "without pointer arguments returning %", name);
> > +   const char *name2;
> > +   if (IDENTIFIER_LENGTH (name) == sizeof ("unsequenced") - 1)
> > + name2 = "unsequenced noptr";
> > +   else
> > + name2 = "reproducible noptr";
> > +   if (!lookup_attribute (name2, TYPE_ATTRIBUTES (fntype)))
> > + {
> > +   *no_add_attrs = true;
> 
> shouldn't you set *no_add_attrs also when the noptr attribute is
> already there?  Because otherwise you'll get the non-noptr attr added?

I think that isn't needed.

The reason for this (ugly) dance is to avoid building 2 separate
FUNCTION_TYPEs with build_type_attribute_variant, one with just
"* noptr" attribute and another with both.
If "* noptr" is already there, then just return NULL_TREE; without
*no_add_attrs = true; will in the caller try to lookup the attribute
and if it is found (and have same args, this one doesn't have any),
will not add anything further, otherwise it will add it and
build_type_attribute_variant.
Guess I should add a comment.

> > +   gcc_assert ((flags & (int) ATTR_FLAG_TYPE_IN_PLACE) == 0);
> > +   tree attr = tree_cons (get_identifier (name2), NULL_TREE,
> > +  TYPE_ATTRIBUTES (fntype));
> > +   if (!lookup_attribute (IDENTIFIER_POINTER (name),
> > +  TYPE_ATTRIBUTES (fntype)))
> > + attr = tree_cons (name, NULL_TREE, attr);
> > +   *node = build_type_attribute_variant (*node, attr);
> > + }
> > +   return NULL_TREE;
> > +  }
> > +else if (c_maybe_contains_pointers_p (TREE_VALUE (argtype)))
> > +  break;
> > +  return NULL_TREE;
> > +}

Jakub

Re: [PATCH 1/3][v2] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Sandiford

Paul Koning  writes:
>> On Jul 30, 2024, at 6:17 AM, Richard Biener  wrote:
>> 
>> The following adds a target hook to specify whether regs of MODE can be
>> used to transfer bits.  The hook is supposed to be used for value-numbering
>> to decide whether a value loaded in such mode can be punned to another
>> mode instead of re-loading the value in the other mode and for SRA to
>> decide whether MODE is suitable as container holding a value to be
>> used in different modes.
>> 
>> ...
>> 
>> +@deftypefn {Target Hook} bool TARGET_MODE_CAN_TRANSFER_BITS (machine_mode 
>> @var{mode})
>> +Define this to return false if the mode @var{mode} cannot be used
>> +for memory copying.  The default is to assume modes with the same
>> +precision as size are fine to be used.
>> +@end deftypefn
>> +
>
> I'm a bit confused about the meaning of this hook; the summary at the top 
> speaks of type punning while the documentation talks about memory copying.  
> Those seem rather different.
>
> I'm also wondering about this being tied to a mode rather than a register 
> class.  To given an example: on the PDP11 there are two main register 
> classes, "general" and "float".  General registers handle any bit pattern and 
> support arithmetic operations on integer modes; float registers do not 
> transparently transfer every bit pattern and support float modes.  So only 
> general registers are suitable for memory copies (though on a PDP-11 you 
> don't need registers to do memory copy).  And for type punning, you could 
> load an SF mode value into general registers (a pair) and type-pun them to 
> SImode without reloading.
>
> So what does that mean for this hook on that target?

I think that means that, at the mode level, float modes are not suitable
for bit transfer.  The hook describes the worst-case behaviour, rather
than best-case.

Presumably, the fact that integer modes in general have to be
value-preserving (and so suitable for the new hook) means that float
registers couldn't be used to store integer modes, even ignoring
performance concerns.  The principle is the same for floats.
Either float modes are value-preserving (and so can't be stored in
float registers :)) or they're not value-preserving, in which case
the new hook must return false.

Thanks,
Richard

Re: [PATCH] i386: Mark target option with optimization when enabled with opt level [PR116065]

2024-07-31 Thread Hongtao Liu

On Tue, Jul 30, 2024 at 1:05 PM Hongyu Wang  wrote:
>
> Richard Biener  于2024年7月26日周五 19:45写道：
> >
> > On Fri, Jul 26, 2024 at 10:50 AM Hongyu Wang  wrote:
> > >
> > > Hi,
> > >
> > > When introducing munroll-only-small-loops, the option was marked as
> > > Target Save and added to -O2 default which makes attribute(optimize)
> > > resets target option and causing error when cmdline has O1 and
> > > funciton attribute has O2 and other target options. Mark this option
> > > as Optimization to fix.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu.
> > >
> > > Ok for trunk and backport down to gcc-13?
> >
> > Note this requires bumping LTO_minor_version on branches.
> >
>
> Yes, as the aarch64 fix was not backported I'd like to just fix it for trunk.
Ok for trunk only.
>
> > > gcc/ChangeLog
> > >
> > > PR target/116065
> > > * config/i386/i386.opt (munroll-only-small-loops): Mark as
> > > Optimization instead of Save.
> > >
> > > gcc/testsuite/ChangeLog
> > >
> > > PR target/116065
> > > * gcc.target/i386/pr116065.c: New test.
> > > ---
> > >  gcc/config/i386/i386.opt |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr116065.c | 24 
> > >  2 files changed, 25 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr116065.c
> > >
> > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > > index 353fffb2343..52054bc018a 100644
> > > --- a/gcc/config/i386/i386.opt
> > > +++ b/gcc/config/i386/i386.opt
> > > @@ -1259,7 +1259,7 @@ Target Mask(ISA2_RAOINT) Var(ix86_isa_flags2) Save
> > >  Support RAOINT built-in functions and code generation.
> > >
> > >  munroll-only-small-loops
> > > -Target Var(ix86_unroll_only_small_loops) Init(0) Save
> > > +Target Var(ix86_unroll_only_small_loops) Init(0) Optimization
> > >  Enable conservative small loop unrolling.
> > >
> > >  mlam=
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr116065.c 
> > > b/gcc/testsuite/gcc.target/i386/pr116065.c
> > > new file mode 100644
> > > index 000..083e70f2413
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr116065.c
> > > @@ -0,0 +1,24 @@
> > > +/* PR target/116065  */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O1 -mno-avx" } */
> > > +
> > > +#ifndef __AVX__
> > > +#pragma GCC push_options
> > > +#pragma GCC target("avx")
> > > +#define __DISABLE_AVX__
> > > +#endif /* __AVX__ */
> > > +
> > > +extern inline double __attribute__((__gnu_inline__,__always_inline__))
> > > + foo (double x) { return x; }
> > > +
> > > +#ifdef __DISABLE_AVX__
> > > +#undef __DISABLE_AVX__
> > > +#pragma GCC pop_options
> > > +#endif /* __DISABLE_AVX__ */
> > > +
> > > +void __attribute__((target ("avx"), optimize(3)))
> > > +bar (double *p)
> > > +{
> > > +  *p = foo (*p);
> > > +}
> > > +
> > > --
> > > 2.31.1
> > >



-- 
BR,
Hongtao

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Jakub Jelinek

On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote:
> OK. Richard, can you please mention the above in the comment why
> XFmode is rejected in the hook?
> 
> Later, we can perhaps benchmark XFmode move vs. generic memory copy to
> get some hard data.

My (limited) understanding was that the hook would be used only for cases
where we'd like to e.g. value number some SF/DF/XF etc. mode loads and some
subsequent loads from the same address with different mode but same size
the same and replace say int or long long later load with VIEW_CONVERT_EXPR
of the result of the SF/SF mode load.  That is what was incorrect, because
the load didn't preserve all the bits.  The patch would still keep doing
normal SF/DF/XF etc. mode copies if that is all that happens in the program,
load some floating point value and store it elsewhere or as part of larger
aggregate copy.

Jakub

Re: [Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637]

2024-07-31 Thread Richard Biener

On Tue, Jul 30, 2024 at 7:33 PM Tobias Burnus  wrote:
>
> Richard Biener wrote:
> > On Mon, Jul 29, 2024 at 9:26 PM Tobias Burnus  wrote:
> >> Inside pass_omp_target_link::execute, there is a call to
> >> gimple_regimplify_operands but the value expression is not
> >> expanded.[...]
> >>
> >> Where is_gimple_mem_ref_addr is defined as:
> >>
> >> /* Return true if T is a valid address operand of a MEM_REF.  */
> >>
> >> bool
> >> is_gimple_mem_ref_addr (tree t)
> >> {
> >> return (is_gimple_reg (t)
> >> || TREE_CODE (t) == INTEGER_CST
> >> || (TREE_CODE (t) == ADDR_EXPR
> >> && (CONSTANT_CLASS_P (TREE_OPERAND (t, 0))
> >> || decl_address_invariant_p (TREE_OPERAND (t, 0);
> >> }
> > I think iff then decl_address_invariant_p should be amended.
>
> This does not work - at least not for my use case if OpenMP
> link variables - due to ordering issues.
>
> For the device compilers, the VALUE_EXPR is added in lto_main
> or in do_whole_program_analysis (same file: lto/lto.cc) by
> callingoffload_handle_link_vars. The value expression is then later expanded 
> via pass_omp_target_link::execute, but in between the following happens:
>
> lto_main  callssymbol_table::compile, which then calls
> cgraph_node::expand  and that executes
>
> res |= verify_types_in_gimple_reference (lhs, true); for lhs being:
> MEM  [(c_char * {ref-all})&arr2]
> But when adding the has-value-expr check either directly to 
> is_gimple_mem_ref_addr or to the decl_address_invariant_pit calls, the 
> following condition becomes true the called function in
> tree-cfg.cc:
>
> 3302  if (!is_gimple_mem_ref_addr (TREE_OPERAND (expr, 0))
> 3303  || (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR
> 3304  && verify_address (TREE_OPERAND (expr, 0), false)))
> 3305{
> 3306  error ("invalid address operand in %qs", code_name);
>
> * * * Thus, I am now back to the previous change, except for:
>
> > Why is the gimplify_addr_expr hunk needed?  It should get
> > to gimplifying the VAR_DECL/PARM_DECL by recursion?
>
> Indeed. I wonder why I had (thought to) need it before; possibly
> because it was needed or thought to be needed when trying to trace
> this down.
>
> Previous patch - except for that bit removed - attached.
>
> Thoughts, better ideas?

Looking at pass_omp_target_link::execute I wonder iff find_link_var_op
shouldn't simply do the substitution?  Aka

diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 35313c2ecf3..cf9e5b715ab 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -2893,6 +2893,7 @@ find_link_var_op (tree *tp, int *walk_subtrees, void *)
   && is_global_var (t)
   && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t)))
 {
+  *tp = unshare_expr (DECL_VALUE_EXPR (t));
   *walk_subtrees = 0;
   return t;
 }

which then makes the stmt obviously not gimple?

>
> Tobias

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak

On Wed, Jul 31, 2024 at 10:24 AM Jakub Jelinek  wrote:
>
> On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote:
> > OK. Richard, can you please mention the above in the comment why
> > XFmode is rejected in the hook?
> >
> > Later, we can perhaps benchmark XFmode move vs. generic memory copy to
> > get some hard data.
>
> My (limited) understanding was that the hook would be used only for cases
> where we'd like to e.g. value number some SF/DF/XF etc. mode loads and some
> subsequent loads from the same address with different mode but same size
> the same and replace say int or long long later load with VIEW_CONVERT_EXPR
> of the result of the SF/SF mode load.  That is what was incorrect, because
> the load didn't preserve all the bits.  The patch would still keep doing
> normal SF/DF/XF etc. mode copies if that is all that happens in the program,
> load some floating point value and store it elsewhere or as part of larger
> aggregate copy.

So, the hook should allow everything besides SF/DFmode, simply:


switch (GET_MODE_INNER (mode))
  {
  case SFmode:
  case DFmode:
/* These suffer from normalization upon load when not using SSE.  */
return !(ix86_fpmath & FPMATH_387);
  default:
return true;
  }

Uros,

Re: [RFA][PR rtl-optimization/116136] Fix previously latent SUBREG simplification bug

2024-07-31 Thread Richard Biener

On Tue, Jul 30, 2024 at 10:24 PM Jeff Law  wrote:
>
>
> This fixes a testsuite regression seen on m68k after some of the recent
> ext-dce changes.  Ultimately Richard S and I have concluded the bug was
> a latent issue in subreg simplification.
>
> Essentially when simplifying something like
>
> (set (target:M1) (subreg:M1 (subreg:M2 (reg:M1) 0) 0))
>
> Where M1 > M2.  We'd simplify to:
>
> (set (target:M1) (reg:M1))
>
> The problem is on a big endian target that's wrong.   Consider if M1 is
> DI and M2 is SI.The original should extract bits 32..63 from the
> source register and store them into bits 0..31 of the target register.
> In the simplified form it's just a copy, so bits 0..63 of the source end
> up bits 0..63 of the target.
>
> This shows up as the following regressions on the m68k:
>
> > Tests that now fail, but worked before (3 tests):
> >
> > gcc: gcc.c-torture/execute/960416-1.c   -O2  execution test
> > gcc: gcc.c-torture/execute/960416-1.c   -O2 -flto -fno-use-linker-plugin 
> > -flto-partition=none  execution test
> > gcc: gcc.c-torture/execute/960416-1.c   -Os  execution test
>
>
>
> The fix is pretty trivial, instead of hardcoding "0" as the byte offset
> in the test for the simplification, instead we need to use the
> subreg_lowpart_offset.
>
>
> Anyway, bootstrapped and regression tested on m68k and x86_64 and tested
> on the other embedded targets as well without regressions.  Naturally it
> fixes the regression noted above.  I haven't see other testsuite
> improvements when I spot checked some of the big endian crosses.
>
> OK for the trunk?

OK.

Thanks,
Richard.

> Jeff
>

Re: [PATCH] [x86] Mention _Float16 and __bf16 changes in GCC14.

2024-07-31 Thread Richard Biener

On Wed, Jul 31, 2024 at 6:32 AM liuhongt  wrote:
>
> Ok for trunk?

OK for www.

Richard.

> ---
>  htdocs/gcc-14/changes.html| 7 +++
>  htdocs/gcc-14/porting_to.html | 9 +
>  2 files changed, 16 insertions(+)
>
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index ca4cae0f..b023a4b9 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -982,6 +982,13 @@ __asm (".global __flmap_lock"  "\n\t"
>  AVX512VP2INTERSECT, AVXVNNI, MOVDIR64B, MOVDIRI, and PREFETCHI ISA
>  extensions.
>
> +   The _Float16 and __bf16 type are supported
> +independent of SSE2. W/o SSE2, these types are storage-only, compiler 
> will
> +issue an error when they're used in conversion, unary operation,
> +binary operation, parameter passing or value return. Please use
> +__SSE2__ to detect arithmetic support of these types
> +instead of __FLT16_MAX__(or other similar Macros).
> +  
>  
>
>  MCore
> diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html
> index 3de15d02..b4f87149 100644
> --- a/htdocs/gcc-14/porting_to.html
> +++ b/htdocs/gcc-14/porting_to.html
> @@ -490,6 +490,8 @@ in C23.
>  GCC will probably continue to support old-style function definitions
>  even once C23 is used as the default language dialect.
>
> +
> +
>  C++ language issues
>
>  Header dependency changes
> @@ -554,6 +556,13 @@ incorrect instruction set by GCC 14.
>  The fix in this case is to remember whether pop_options
>  needs to be performed in a new user-defined macro.
>
> +Type _Float16 and __bf16 are supported independent of 
> SSE2 for IA-32/x86-64
> +W/o SSE2, these types are storage-only, compiler will issue an error when
> +  they're used in conversion, unary operation, binary operation, parameter
> +  passing or value return. Please use __SSE2__ to detect
> +  arithmetic support of these types instead of
> +  __FLT16_MAX__(or other similar Macros).
> +
>  
>
>  
> --
> 2.31.1
>

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener

On Wed, 31 Jul 2024, Uros Bizjak wrote:

> On Wed, Jul 31, 2024 at 10:24 AM Jakub Jelinek  wrote:
> >
> > On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote:
> > > OK. Richard, can you please mention the above in the comment why
> > > XFmode is rejected in the hook?
> > >
> > > Later, we can perhaps benchmark XFmode move vs. generic memory copy to
> > > get some hard data.
> >
> > My (limited) understanding was that the hook would be used only for cases
> > where we'd like to e.g. value number some SF/DF/XF etc. mode loads and some
> > subsequent loads from the same address with different mode but same size
> > the same and replace say int or long long later load with VIEW_CONVERT_EXPR
> > of the result of the SF/SF mode load.  That is what was incorrect, because
> > the load didn't preserve all the bits.  The patch would still keep doing
> > normal SF/DF/XF etc. mode copies if that is all that happens in the program,
> > load some floating point value and store it elsewhere or as part of larger
> > aggregate copy.
> 
> So, the hook should allow everything besides SF/DFmode, simply:
> 
> 
> switch (GET_MODE_INNER (mode))
>   {
>   case SFmode:
>   case DFmode:
> /* These suffer from normalization upon load when not using SSE.  */
> return !(ix86_fpmath & FPMATH_387);
>   default:
> return true;
>   }

OK, I think I'll go with this then.  I'm now unsure whether the
wrapper around the hook should reject modes with padding or if
the supposed users (value-numbering and SRA) should deal with that
issue separately.  I do wonder whether

ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
  ? &ieee_extended_intel_128_format
  : TARGET_96_ROUND_53_LONG_DOUBLE
  ? &ieee_extended_intel_96_round_53_format
  : &ieee_extended_intel_96_format));
ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);

unambiguously specifies where the padding is - m68k has

FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_motorola_format);

It's also not clear we can model a x87 10 byte memory copy in RTL since
a mem:XF still touches 12 or 16 bytes - IIRC a store leaves
possible padding as unspecified and not "masked out" even if
the actual fstp will only store 10 bytes.

Richard.

Re: [PATCH] RISC-V: Expand subreg move via slide if necessary [PR116086].

2024-07-31 Thread Richard Sandiford

"Robin Dapp"  writes:
>> > IMO, what ought to happen here is that the RA should spill
>> > the inner register to memory and load the V4SI back from there.
>> > (Or vice versa, for an lvalue.)  Obviously that's not very efficient,
>> > and so a patch like the above might be useful as an optimisation.[*]
>> > But it shouldn't be needed for correctness.  The target-independent
>> > code should already have the information it needs to realise that
>> > it can't predict the register index at compile time (at least for SVE).
>>
>> Or actually, for that case:
>>
>>   /* For pseudo registers, we want most of the same checks.  Namely:
>>
>>  Assume that the pseudo register will be allocated to hard registers
>>  that can hold REGSIZE bytes each.  If OSIZE is not a multiple of 
>> REGSIZE,
>>  the remainder must correspond to the lowpart of the containing hard
>>  register.  If BYTES_BIG_ENDIAN, the lowpart is at the highest offset,
>>  otherwise it is at the lowest offset.
>>
>>  Given that we've already checked the mode and offset alignment,
>>  we only have to check subblock subregs here.  */
>>   if (maybe_lt (osize, regsize)
>>   && ! (lra_in_progress && (FLOAT_MODE_P (imode) || FLOAT_MODE_P 
>> (omode
>> {
>>   /* It is invalid for the target to pick a register size for a mode
>>   that isn't ordered wrt to the size of that mode.  */
>>   poly_uint64 block_size = ordered_min (isize, regsize);
>>   unsigned int start_reg;
>>   poly_uint64 offset_within_reg;
>>   if (!can_div_trunc_p (offset, block_size, &start_reg, 
>> &offset_within_reg)
>>   ...
>>
>
> Like aarch64 we set REGMODE_NATURAL_SIZE for fixed-size modes to
> UNITS_PER_WORD.  Isn't that part of the problem?
>
> In extract_bit_field_as_subreg we check lowpart_bit_field_p (= true because
> 128 is a multiple of UNITS_PER_WORD).  This leads to the subreg expression.
>
> If I have REGMODE_NATURAL_SIZE return a VLA number this fails and we extract
> via memory - but that of course breaks almost everything else :)
>
> When you say the target-independent code should already have all information 
> it
> needs, what are you referring to?  Something else than REGMODE_NATURAL_SIZE?

In the aarch64 example I mentioned, the REGMODE_NATURAL_SIZE of the inner
mode is variable.  (The REGMODE_NATURAL_SIZE of the outer mode is constant,
but it's the inner mode that matters here.)

Thanks,
Richard

Re: [PATCH] c: Add support for unsequenced and reproducible attributes

2024-07-31 Thread Jakub Jelinek

On Wed, Jul 31, 2024 at 10:19:06AM +0200, Jakub Jelinek wrote:
> On Wed, Jul 31, 2024 at 09:50:56AM +0200, Richard Biener wrote:
> > I wonder if
> > 
> > int foo (uintrptr_t x) { *(int *)x = 1; return 1; }
> > 
> > is considered "noptr" by the standard but then by making a pointer out of
> > 'x' invokes UB?
> 
> I don't know.  The paper claims same behavior as const for functions without
> pointer/array arguments (but arrays decay to pointers); that claim is
> most likely false because of the infinite loops, but still, I'm not sure
> about the
> struct S { int *p; };
> int bar (struct S x) [[unsequenced]] { x.p[0] = 1; return 1; }
> or
> int baz (...) [[unsequenced]] { va_list ap; va_start (ap); int *p = va_arg 
> (ap, int *); va_end (ap); *p = 1; return 1; }
> or
> typedef union { int *p; long long *q; } U __attribute__((transparent_union));
> int qux (int x, U y) [[unsequenced]] { if (x) y.p[0] = 1; else y.q[1] = 2; 
> return 3; }
> etc. cases too.

And neither I'm sure about pointers to pointers and the like, so
int corge (int **x) [[unsequenced]] { *x[0] = 42; return 5; }

BTW, my reading of the idempotent property is that
int garply (int *restrict x) { *x += 2; return 1; }
would be invalid, because you can't then schedule another call immediately
after an existing one without changing the observable state of execution.
But
int freddy (int *restrict x) [[unsequenced]] { x[0] = 42; return x[1]; }
would be ok.  So, for the memory reachable from the passed in pointers, each
byte can be either stored or read but not both (at least when such store or
read has then observable side-effects).  Not really sure if we could use
that info in some alias oracle decisions.

Jakub

Re: [PATCH] LoongArch: Rework bswap{hi,si,di}2 definition

2024-07-31 Thread Lulu Cheng




在 2024/7/29 下午3:58, Xi Ruoyao 写道:

Per a gcc-help thread we are generating sub-optimal code for
__builtin_bswap{32,64}.  To fix it:

- Use a single revb.d instruction for bswapdi2.
- Use a single revb.2w instruction for bswapsi2 for TARGET_64BIT,
   revb.2h + rotri.w for !TARGET_64BIT.
- Use a single revb.2h instruction for bswapsi2 (x) r>> 16, and a single
   revb.2w instruction for bswapdi2 (x) r>> 32.

Unfortunately I cannot figure out a way to make the compiler generate
revb.4h or revh.{2w,d} instructions.


This optimization is really ingenious and I have no problem.

I also haven't figured out how to generate revb.4h or revh. {2w,d}.
I think we can merge this patch first.

Thanks.



gcc/ChangeLog:

* config/loongarch/loongarch.md (UNSPEC_REVB_2H, UNSPEC_REVB_4H,
UNSPEC_REVH_D): Remove UNSPECs.
(revb_4h, revh_d): Remove define_insn.
(revb_2h): Define as (rotatert:SI (bswap:SI x) 16) instead of
an UNSPEC.
(revb_2h_extend, revb_2w, *bswapsi2, bswapdi2): New define_insn.
(bswapsi2): Change to define_expand.  Only expand to revb.2h +
rotri.w if !TARGET_64BIT.
(bswapdi2): Change to define_insn of which the output is just a
revb.d instruction.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/revb.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.md | 79 ---
  gcc/testsuite/gcc.target/loongarch/revb.c | 61 +
  2 files changed, 104 insertions(+), 36 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/revb.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index ac94a22eafc..f166e834c56 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -20,11 +20,6 @@
  ;; .
  
  (define_c_enum "unspec" [

-  ;; Integer operations that are too cumbersome to describe directly.
-  UNSPEC_REVB_2H
-  UNSPEC_REVB_4H
-  UNSPEC_REVH_D
-
;; Floating-point moves.
UNSPEC_LOAD_LOW
UNSPEC_LOAD_HIGH
@@ -3155,55 +3150,67 @@ (define_insn "alslsi3_extend"
  
  ;; Reverse the order of bytes of operand 1 and store the result in operand 0.
  
-(define_insn "bswaphi2"

-  [(set (match_operand:HI 0 "register_operand" "=r")
-   (bswap:HI (match_operand:HI 1 "register_operand" "r")))]
+(define_insn "revb_2h"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (rotatert:SI (bswap:SI (match_operand:SI 1 "register_operand" "r"))
+(const_int 16)))]
""
"revb.2h\t%0,%1"
[(set_attr "type" "shift")])
  
-(define_insn_and_split "bswapsi2"

-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (bswap:SI (match_operand:SI 1 "register_operand" "r")))]
-  ""
-  "#"
-  ""
-  [(set (match_dup 0) (unspec:SI [(match_dup 1)] UNSPEC_REVB_2H))
-   (set (match_dup 0) (rotatert:SI (match_dup 0) (const_int 16)))]
-  ""
-  [(set_attr "insn_count" "2")])
-
-(define_insn_and_split "bswapdi2"
+(define_insn "revb_2h_extend"
[(set (match_operand:DI 0 "register_operand" "=r")
-   (bswap:DI (match_operand:DI 1 "register_operand" "r")))]
+   (sign_extend:DI
+ (rotatert:SI
+   (bswap:SI (match_operand:SI 1 "register_operand" "r"))
+   (const_int 16]
"TARGET_64BIT"
-  "#"
-  ""
-  [(set (match_dup 0) (unspec:DI [(match_dup 1)] UNSPEC_REVB_4H))
-   (set (match_dup 0) (unspec:DI [(match_dup 0)] UNSPEC_REVH_D))]
-  ""
-  [(set_attr "insn_count" "2")])
+  "revb.2h\t%0,%1"
+  [(set_attr "type" "shift")])
  
-(define_insn "revb_2h"

-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:SI 1 "register_operand" "r")] 
UNSPEC_REVB_2H))]
+(define_insn "bswaphi2"
+  [(set (match_operand:HI 0 "register_operand" "=r")
+   (bswap:HI (match_operand:HI 1 "register_operand" "r")))]
""
"revb.2h\t%0,%1"
[(set_attr "type" "shift")])
  
-(define_insn "revb_4h"

+(define_insn "revb_2w"
[(set (match_operand:DI 0 "register_operand" "=r")
-   (unspec:DI [(match_operand:DI 1 "register_operand" "r")] 
UNSPEC_REVB_4H))]
+   (rotatert:DI (bswap:DI (match_operand:DI 1 "register_operand" "r"))
+(const_int 32)))]
"TARGET_64BIT"
-  "revb.4h\t%0,%1"
+  "revb.2w\t%0,%1"
[(set_attr "type" "shift")])
  
-(define_insn "revh_d"

+(define_insn "*bswapsi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (bswap:SI (match_operand:SI 1 "register_operand" "r")))]
+  "TARGET_64BIT"
+  "revb.2w\t%0,%1"
+  [(set_attr "type" "shift")])
+
+(define_expand "bswapsi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (bswap:SI (match_operand:SI 1 "register_operand" "r")))]
+  ""
+{
+  if (!TARGET_64BIT)
+{
+  rtx t = gen_reg_rtx (SImode);
+  emit_insn (gen_revb_2h (t, operands[1]));
+  emit_insn (gen_rotrsi3 (operands[0], t, GEN_INT (16)));
+  DONE;
+}
+})
+
+(defi

[PATCH] middle-end/101478 - ICE with degenerate address during gimplification

2024-07-31 Thread Richard Biener

When we gimplify &MEM[0B + 4] we are re-folding the address in case
types are not canonical which ends up with a constant address that
recompute_tree_invariant_for_addr_expr ICEs on.  Properly guard
that call.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR middle-end/101478
* gimplify.cc (gimplify_addr_expr): Check we still have an
ADDR_EXPR before calling recompute_tree_invariant_for_addr_expr.

* gcc.dg/pr101478.c: New testcase.
---
 gcc/gimplify.cc |  3 ++-
 gcc/testsuite/gcc.dg/pr101478.c | 11 +++
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101478.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 47c9913b55b..fab5532c54d 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -6984,7 +6984,8 @@ gimplify_addr_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
*expr_p = build_fold_addr_expr (op0);
 
   /* Make sure TREE_CONSTANT and TREE_SIDE_EFFECTS are set properly.  */
-  recompute_tree_invariant_for_addr_expr (*expr_p);
+  if (TREE_CODE (*expr_p) == ADDR_EXPR)
+   recompute_tree_invariant_for_addr_expr (*expr_p);
 
   /* If we re-built the ADDR_EXPR add a conversion to the original type
  if required.  */
diff --git a/gcc/testsuite/gcc.dg/pr101478.c b/gcc/testsuite/gcc.dg/pr101478.c
new file mode 100644
index 000..527620ea0f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101478.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+struct obj {
+  int n;
+  int l;
+};
+int main()
+{
+  (struct obj *)((char *)(__SIZE_TYPE__)({ 0; }) - (char *)&((struct obj 
*)0)->l);
+}
-- 
2.43.0

[PATCH] testsuite, rs6000: Make {vmx,vsx,p8vector}_hw check for altivec/vsx feature

2024-07-31 Thread Kewen.Lin

Hi,

Different from p9vector_hw, vmx_hw/vsx_hw/p8vector_hw checks
can still succeed without Altivec/VSX feature support.  We
have many runnable test cases only checking for these *_hw
without extra checking for if Altivec/VSX feature enabled or
not.  It means they can fail if being tested by explicitly
disabling Altivec/VSX.  So I think it's reasonable to check
if Altivec/VSX feature is enabled too while checking testing
environment is able to execute some instructions since these
instructions reply on these features.  So similar to what we
test for p9vector_hw, this patch is to modify C functions
used for vmx_hw, vsx_hw and p8vector_hw with according vector
types and constraints.  For p8vector_hw, excepting for VSX
feature, it also requires ISA 2.7 support.  A good thing is
that now almost all of the test cases using p8vector_hw have
specified -mdejagnu-cpu=power8 always or if !has_arch_pwr8.
Considering checking _ARCH_PWR8 in p8vector_hw can stop test
cases being tested even if test case itself has specified
-mdejagnu-cpu=power8, this patch doesn't force p8vector_hw to
check _ARCH_PWR8, instead it updates all existing test cases
which adopt p8vector_hw but don't have -mdejagnu-cpu=power8.
By the way, all test cases adopting p9vector_hw are all fine.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_vsx_hw_available): Modify C source
code used for testing with type vector long long and constraint wa
which require VSX feature.
(check_p8vector_hw_available): Likewise.
(check_vmx_hw_available): Modify C source code used for testing with
type vector int and constraint v which require Altivec feature.
* gcc.target/powerpc/divkc3-1.c: Specify -mdejagnu-cpu=power8 for
!has_arch_pwr8 to ensure power8 support.
* gcc.target/powerpc/mulkc3-1.c: Likewise.
* gcc.target/powerpc/pr96264.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/divkc3-1.c |  1 +
 gcc/testsuite/gcc.target/powerpc/mulkc3-1.c |  1 +
 gcc/testsuite/gcc.target/powerpc/pr96264.c  |  1 +
 gcc/testsuite/lib/target-supports.exp   | 24 -
 4 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-1.c 
b/gcc/testsuite/gcc.target/powerpc/divkc3-1.c
index 89bf04f12a9..96fb5c21204 100644
--- a/gcc/testsuite/gcc.target/powerpc/divkc3-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/divkc3-1.c
@@ -1,5 +1,6 @@
 /* { dg-do run { target { powerpc64*-*-* && p8vector_hw } } } */
 /* { dg-options "-mfloat128 -mvsx" } */
+/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */

 void abort ();

diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-1.c 
b/gcc/testsuite/gcc.target/powerpc/mulkc3-1.c
index b975a91dbd7..1b0a1e24814 100644
--- a/gcc/testsuite/gcc.target/powerpc/mulkc3-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-1.c
@@ -1,5 +1,6 @@
 /* { dg-do run { target { powerpc64*-*-* && p8vector_hw } } } */
 /* { dg-options "-mfloat128 -mvsx" } */
+/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */

 void abort ();

diff --git a/gcc/testsuite/gcc.target/powerpc/pr96264.c 
b/gcc/testsuite/gcc.target/powerpc/pr96264.c
index 9f7d885daf2..906720fdcd1 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr96264.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr96264.c
@@ -1,5 +1,6 @@
 /* { dg-do run { target { powerpc64le-*-* } } } */
 /* { dg-options "-Os -fno-forward-propagate -fschedule-insns -fno-tree-ter 
-Wno-psabi" } */
+/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
 /* { dg-require-effective-target p8vector_hw } */

 typedef unsigned char __attribute__ ((__vector_size__ (64))) v512u8;
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index daa0c75d2bc..2101e9c9c83 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2864,11 +2864,9 @@ proc check_p8vector_hw_available { } {
check_runtime_nocache p8vector_hw_available {
int main()
{
-   #ifdef __MACH__
- asm volatile ("xxlorc vs0,vs0,vs0");
-   #else
- asm volatile ("xxlorc 0,0,0");
-   #endif
+ vector long long v1 = {0x1, 0x2};
+ vector long long v2;
+ asm ("xxlorc %0,%1,%1" : "=wa" (v2) : "wa" (v1));
  return 0;
}
} $options
@@ -3165,11 +3163,9 @@ proc check_vsx_hw_available { } {
check_runtime_nocache vsx_hw_available {
int main()
{
-   #ifdef __MACH__
- asm volatile ("xxlor vs0,vs0,vs0");
-   #else
- asm volatile ("xxlo

[PATCH] testsuite, rs6000: Remove useless powerpc_{altivec,vsx}_ok

2024-07-31 Thread Kewen.Lin

Hi,

Checking the existing powerpc_{altivec,vsx}_ok test cases,
I found there are some test cases which don't require the
checks powerpc_{altivec,vsx} even, some of them already
have other effective target check which can cover check
powerpc_{altivec,vsx}, or some of them don't actually
require VSX/AltiVec feature at all.  So this patch is to
remove such useless checks.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


PR testsuite/114842

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/amo2.c: Remove powerpc_vsx_ok effective target
check as p9vector_hw already covers it.
* gcc.target/powerpc/p9-sign_extend-runnable.c: Likewise.
* gcc.target/powerpc/clone2.c: Remove powerpc_vsx_ok effective target
check as ppc_cpu_supports_hw already covers it.
* gcc.target/powerpc/pr47251.c: Remove powerpc_vsx_ok effective target
check as it doesn't need VSX.
* gcc.target/powerpc/pr60137.c: Likewise.
* gcc.target/powerpc/pr80098-1.c: Likewise.
* gcc.target/powerpc/pr80098-2.c: Likewise.
* gcc.target/powerpc/pr80098-3.c: Likewise.
* gcc.target/powerpc/sd-pwr6.c: Likewise.
* gcc.target/powerpc/pr57744.c: Remove powerpc_vsx_ok effective target
check and option -mvsx as it doesn't need VSX.
* gcc.target/powerpc/pr69548.c: Remove powerpc_vsx_ok effective target
check as it doesn't need VSX, remove lp64 and use int128 instead.
* gcc.target/powerpc/vec-cmpne-long.c: Remove powerpc_vsx_ok effective
target check as p8vector_hw already covers it.
* gcc.target/powerpc/darwin-save-world-1.c: Remove powerpc_altivec_ok
effective target check as vmx_hw already covers it.
---
 gcc/testsuite/gcc.target/powerpc/amo2.c| 1 -
 gcc/testsuite/gcc.target/powerpc/clone2.c  | 1 -
 gcc/testsuite/gcc.target/powerpc/darwin-save-world-1.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c | 1 -
 gcc/testsuite/gcc.target/powerpc/pr47251.c | 1 -
 gcc/testsuite/gcc.target/powerpc/pr57744.c | 3 +--
 gcc/testsuite/gcc.target/powerpc/pr60137.c | 1 -
 gcc/testsuite/gcc.target/powerpc/pr69548.c | 6 +++---
 gcc/testsuite/gcc.target/powerpc/pr80098-1.c   | 1 -
 gcc/testsuite/gcc.target/powerpc/pr80098-2.c   | 1 -
 gcc/testsuite/gcc.target/powerpc/pr80098-3.c   | 1 -
 gcc/testsuite/gcc.target/powerpc/sd-pwr6.c | 1 -
 gcc/testsuite/gcc.target/powerpc/vec-cmpne-long.c  | 1 -
 13 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/amo2.c 
b/gcc/testsuite/gcc.target/powerpc/amo2.c
index 9cb493da53e..592f0fb3f92 100644
--- a/gcc/testsuite/gcc.target/powerpc/amo2.c
+++ b/gcc/testsuite/gcc.target/powerpc/amo2.c
@@ -1,5 +1,4 @@
 /* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-O2 -mvsx -mpower9-misc" } */
 /* { dg-additional-options "-mdejagnu-cpu=power9" { target { ! has_arch_pwr9 } 
} } */

diff --git a/gcc/testsuite/gcc.target/powerpc/clone2.c 
b/gcc/testsuite/gcc.target/powerpc/clone2.c
index e64940b7952..4098e878c21 100644
--- a/gcc/testsuite/gcc.target/powerpc/clone2.c
+++ b/gcc/testsuite/gcc.target/powerpc/clone2.c
@@ -1,6 +1,5 @@
 /* { dg-do run { target { powerpc*-*-linux* } } } */
 /* { dg-options "-mvsx -O2" } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-require-effective-target ppc_cpu_supports_hw } */

 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/darwin-save-world-1.c 
b/gcc/testsuite/gcc.target/powerpc/darwin-save-world-1.c
index 3326765f4fb..27fc1d30a8b 100644
--- a/gcc/testsuite/gcc.target/powerpc/darwin-save-world-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/darwin-save-world-1.c
@@ -1,7 +1,7 @@
 /* { dg-do run { target powerpc*-*-* } } */
 /* { dg-options "-maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
-/* { dg-skip-if "need to be able to execute AltiVec" { ! { powerpc_altivec_ok 
&& vmx_hw } } } */
+/* { dg-skip-if "need to be able to execute AltiVec" { ! vmx_hw } } */

 /* With altivec turned on, Darwin wants to save the world but we did not mark 
lr as being saved any more
as saving the lr is not needed for saving altivec registers.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
index f0514993bc0..595aa4768cc 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
@@ -1,5 +1,4 @@
 /* { dg-do run { target { *-*-linux* && { lp64 && p9vector_hw } } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-O2

[PATCH] testsuite, rs6000: Replace powerpc_vsx_ok with powerpc_vsx

2024-07-31 Thread Kewen.Lin

Hi,

Following up the previous r15-886, this patch to clean up
the remaining powerpc_vsx_ok which actually should use
powerpc_vsx instead.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


PR testsuite/114842

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/error-1.c: Replace powerpc_vsx_ok check with
powerpc_vsx.
* gcc.target/powerpc/warn-2.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-ors-longlong.c: Likewise.
* gcc.target/powerpc/ppc-fortran/pr80108-1.f90: Replace powerpc_vsx_ok
check with powerpc_vsx and remove useless -mfloat128.
* gcc.target/powerpc/pragma_power8.c: Replace powerpc_vsx_ok check with
powerpc_vsx.
---
 gcc/testsuite/gcc.target/powerpc/error-1.c   | 2 +-
 .../gcc.target/powerpc/fold-vec-logical-ors-longlong.c   | 4 ++--
 gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr80108-1.f90   | 4 ++--
 gcc/testsuite/gcc.target/powerpc/pragma_power8.c | 5 -
 gcc/testsuite/gcc.target/powerpc/warn-2.c| 2 +-
 5 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/error-1.c 
b/gcc/testsuite/gcc.target/powerpc/error-1.c
index d38eba8bb8a..9327076baf0 100644
--- a/gcc/testsuite/gcc.target/powerpc/error-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/error-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_vsx } */
 /* { dg-options "-O -mvsx -mno-altivec" } */

 /* { dg-error "'-mvsx' and '-mno-altivec' are incompatible" "" { target *-*-* 
} 0 } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c
index 60af61a7f16..aae4694f551 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c
@@ -4,7 +4,7 @@
 /* { dg-do compile } */
 /* { dg-options "-mvsx -O2" } */
 /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_vsx } */

 #include 

@@ -154,7 +154,7 @@ test6_nor (vector unsigned long long x, vector unsigned 
long long y)

 // The number of xxlor instructions generated varies between 6 and 24 for
 // older systems (power6,power7), as well as for 32-bit versus 64-bit targets.
-// For simplicity, this test now only targets "powerpc_vsx_ok" environments
+// For simplicity, this test now only targets "powerpc_vsx" environments
 // where the answer is expected to be 6.

 /* { dg-final { scan-assembler-times {\mxxlor\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr80108-1.f90 
b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr80108-1.f90
index 00392b5fed9..e0e157bd245 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr80108-1.f90
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr80108-1.f90
@@ -1,7 +1,7 @@
 ! Originally contributed by Tobias Burnas.
 ! { dg-do compile { target { powerpc*-*-* } } }
-! { dg-require-effective-target powerpc_vsx_ok }
-! { dg-options "-mdejagnu-cpu=405 -mpower9-minmax -mfloat128" }
+! { dg-require-effective-target powerpc_vsx }
+! { dg-options "-mdejagnu-cpu=405 -mpower9-minmax" }
 ! { dg-excess-errors "expect error due to conflicting target options" }
 ! Since the error message is not associated with a particular line
 ! number, we cannot use the dg-error directive and cannot specify a
diff --git a/gcc/testsuite/gcc.target/powerpc/pragma_power8.c 
b/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
index 8de815e5a9e..43ea6dd406e 100644
--- a/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
+++ b/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
@@ -1,6 +1,9 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* Ensure there is no explicit -mno-vsx etc., otherwise
+   the below bif __builtin_vec_vcmpeq_p replies on power8
+   vsx would fail.  */
+/* { dg-require-effective-target powerpc_vsx } */
 /* { dg-options "-mdejagnu-cpu=power6 -maltivec -O2" } */

 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/warn-2.c 
b/gcc/testsuite/gcc.target/powerpc/warn-2.c
index 29c6ce50cd7..ba294cb52e5 100644
--- a/gcc/testsuite/gcc.target/powerpc/warn-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/warn-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_vsx } */
 /* { dg-options "-O -mdejagnu-cpu=power7 -mno-altivec" } */

 /* { dg-warning "'-mno-altivec' disables vsx" "" { tar

[PATCH] testsuite, rs6000: Fix some run cases with appropriate *_hw

2024-07-31 Thread Kewen.Lin

Hi,

When cleaning up the remaining powerpc_{vsx,altivec}_ok test
cases, I found some dg-do run test cases which should check
for the appropriate {p8vector,vmx}_hw check instead.  This
patch is to adjust them accordingly.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


gcc/testsuite/ChangeLog:

* gcc.target/powerpc/swaps-p8-46.c: Check for p8vector_hw rather than
powerpc_vsx_ok.
* gcc.target/powerpc/ppc64-abi-2.c: Check for vmx_hw rather than
powerpc_altivec_ok.
* gcc.target/powerpc/pr96139-c.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pr96139-c.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/swaps-p8-46.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c 
b/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c
index b490fc3c2fd..2a5a7602004 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target { { powerpc*-*-linux* && lp64 } && powerpc_altivec_ok 
} } } */
+/* { dg-do run { target { { powerpc*-*-linux* && lp64 } && vmx_hw } } } */
 /* { dg-options "-O2 -fprofile -mprofile-kernel -maltivec -mabi=altivec 
-mno-pcrel" } */
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c 
b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
index 3ada2603428..b39c559ec0b 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -Wall -maltivec" } */
-/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-require-effective-target vmx_hw } */

 /*
  * Based on test created by sjmunroe for pr96139
diff --git a/gcc/testsuite/gcc.target/powerpc/swaps-p8-46.c 
b/gcc/testsuite/gcc.target/powerpc/swaps-p8-46.c
index 3b5154b1231..d0392f25eee 100644
--- a/gcc/testsuite/gcc.target/powerpc/swaps-p8-46.c
+++ b/gcc/testsuite/gcc.target/powerpc/swaps-p8-46.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target le } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2 " } */

 typedef __attribute__ ((__aligned__ (8))) unsigned long long __m64;
--
2.43.5

[PATCH] testsuite, rs6000: Replace powerpc_vsx_ok with powerpc_altivec etc.

2024-07-31 Thread Kewen.Lin

Hi,

This is a follow up patch for the previous patch adjusting
powerpc_vsx_ok with powerpc_vsx, focusing on those test cases
which don't really require VSX feature but used powerpc_vsx_ok
before, they actually require some other effective target check,
like some of them just require ALTIVEC feature, some of them
just require hard float support, and some of them just require
ISA 2.06 etc..

By the way, ppc-fpconv-4.c is the only one missing powerpc_fprs
among ppc-fpconv-*.c after this replacement, so I also fix it
here.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


PR testsuite/114842

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/bswap64-2.c: Replace powerpc_vsx_ok check with
has_arch_pwr7.
* gcc.target/powerpc/ppc-fpconv-2.c: Replace powerpc_vsx_ok check with
powerpc_fprs.
* gcc.target/powerpc/ppc-fpconv-6.c: Likewise.
* gcc.target/powerpc/ppc-pow.c: Likewise.
* gcc.target/powerpc/ppc-target-1.c: Likewise.
* gcc.target/powerpc/ppc-target-2.c: Likewise.
* gcc.target/powerpc/ppc-target-3.c: Likewise.
* gcc.target/powerpc/ppc-target-4.c: Likewise.
* gcc.target/powerpc/ppc-fpconv-4.c: Check for powerpc_fprs.
* gcc.target/powerpc/fold-vec-select-char.c: Replace powerpc_vsx_ok
with powerpc_altivec check and move it after dg-options line.
* gcc.target/powerpc/fold-vec-select-float.c: Likewise.
* gcc.target/powerpc/fold-vec-select-int.c: Likewise.
* gcc.target/powerpc/fold-vec-select-short.c: Likewise.
* gcc.target/powerpc/p9-novsx.c: Likewise.
* gcc.target/powerpc/p9-options-1.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/bswap64-2.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/fold-vec-select-char.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/fold-vec-select-float.c | 6 +++---
 gcc/testsuite/gcc.target/powerpc/fold-vec-select-int.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/fold-vec-select-short.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/p9-novsx.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/p9-options-1.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-fpconv-2.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-fpconv-4.c  | 1 +
 gcc/testsuite/gcc.target/powerpc/ppc-fpconv-6.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-pow.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-target-1.c  | 3 ++-
 gcc/testsuite/gcc.target/powerpc/ppc-target-2.c  | 3 ++-
 gcc/testsuite/gcc.target/powerpc/ppc-target-3.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-target-4.c  | 2 +-
 15 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/bswap64-2.c 
b/gcc/testsuite/gcc.target/powerpc/bswap64-2.c
index 6c3d8ca0528..70d872b5e30 100644
--- a/gcc/testsuite/gcc.target/powerpc/bswap64-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bswap64-2.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-options "-O2 -mpopcntd" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target has_arch_pwr7 } */
 /* { dg-final { scan-assembler "ldbrx" } } */
 /* { dg-final { scan-assembler "stdbrx" } } */

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-char.c
index e055c017536..17e28914aae 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-char.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-char.c
@@ -2,8 +2,8 @@
inputs produce the right code.  */

 /* { dg-do compile } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target powerpc_altivec } */

 #include 

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-float.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-float.c
index 1656fbff2ca..848bd750ff8 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-float.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-float.c
@@ -1,9 +1,9 @@
-/* Verify that overloaded built-ins for vec_sel with float
-   inputs for VSX produce the right code.  */
+/* Verify that overloaded built-ins for vec_sel with float
+   inputs produce the right code.  */

 /* { dg-do compile } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target powerpc_altivec } */

 #include 

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-int.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-int.c
index 510fc564370..f51d741d401 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-int.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-int.c
@@ -2,8 +2,8 @@
inputs produce the right code.  */

 /* {

[PATCH] testsuite, rs6000: Adjust pr78056-[1357].c and remove pr78056-[246].c

2024-07-31 Thread Kewen.Lin

Hi,

When cleaning up the remaining powerpc_{vsx,altivec}_ok test
cases, I found some issues are related to pr78056-*.c.
Firstly, the test points of pr78056-[246].c are no longer
available since r9-3164 drops many HAVE_AS_* and the expected
warning are dropped together, so this patch is to remove them.
Secondly, pr78056-1.c and pr78056-3.c include altivec.h but
don't use any builtins, checking powerpc_altivec is enough
(don't need to check powerpc_vsx).  And pr78056-5.c doesn't
require any altivec/vsx feature, so powerpc_vsx_ok can be
removed.  Lastly, pr78056-7.c should just use powerpc_fprs
instead of dfp_hw as it only cares about insn fcpsgn.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr78056-1.c: Check for powerpc_altivec rather than
powerpc_vsx.
* gcc.target/powerpc/pr78056-3.c: Likewise.
* gcc.target/powerpc/pr78056-5.c: Drop powerpc_vsx_ok check.
* gcc.target/powerpc/pr78056-7.c: Check for powerpc_fprs rather than
dfp_hw.
* gcc.target/powerpc/pr78056-2.c: Remove.
* gcc.target/powerpc/pr78056-4.c: Remove.
* gcc.target/powerpc/pr78056-6.c: Remove.
---
 gcc/testsuite/gcc.target/powerpc/pr78056-1.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/pr78056-2.c | 18 --
 gcc/testsuite/gcc.target/powerpc/pr78056-3.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/pr78056-4.c | 19 ---
 gcc/testsuite/gcc.target/powerpc/pr78056-5.c |  2 --
 gcc/testsuite/gcc.target/powerpc/pr78056-6.c | 25 
 gcc/testsuite/gcc.target/powerpc/pr78056-7.c |  2 --
 7 files changed, 4 insertions(+), 70 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr78056-2.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr78056-4.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr78056-6.c

diff --git a/gcc/testsuite/gcc.target/powerpc/pr78056-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr78056-1.c
index 72640007dbb..49ebafe39b6 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr78056-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr78056-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
-/* { dg-options "-mdejagnu-cpu=power8 -mvsx" } */
-/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target powerpc_altivec } */

 /* This test should succeed on both 32- and 64-bit configurations.  */
 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr78056-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr78056-2.c
deleted file mode 100644
index 5cda9d6193b..000
--- a/gcc/testsuite/gcc.target/powerpc/pr78056-2.c
+++ /dev/null
@@ -1,18 +0,0 @@
-/* { dg-do compile { target { powerpc*-*-* } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-skip-if "" { powerpc_vsx_ok } } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
-/* { dg-options "-mdejagnu-cpu=power8 -mvsx" } */
-
-/* This test should succeed on both 32- and 64-bit configurations.  */
-#include 
-
-/* Though the command line specifies power8 target, this function is
-   to support power9. Expect an error message here because this target
-   does not support power9.  */
-__attribute__((target("cpu=power9")))
-int get_random ()
-{ /* { dg-warning "lacks power9 support" } */
-  return __builtin_darn_32 (); /* { dg-warning "implicit declaration" } */
-}
-
diff --git a/gcc/testsuite/gcc.target/powerpc/pr78056-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr78056-3.c
index cf57d058e8b..745552b244d 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr78056-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr78056-3.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
-/* { dg-options "-mdejagnu-cpu=power7" } */
-/* { dg-require-effective-target powerpc_vsx } */
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-options "-mdejagnu-cpu=power7" } */
+/* { dg-require-effective-target powerpc_altivec } */

 /* This test should succeed on both 32- and 64-bit configurations.  */
 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr78056-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr78056-4.c
deleted file mode 100644
index 0bea0f895fa..000
--- a/gcc/testsuite/gcc.target/powerpc/pr78056-4.c
+++ /dev/null
@@ -1,19 +0,0 @@
-/* { dg-do compile { target { powerpc*-*-* } } } */
-/* powerpc_vsx_ok represents power7 */
-/* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-skip-if "" { powerpc_vsx_ok } } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
-/* { dg-options "-mdejagnu-cpu=power7" } */
-
-/* This test should succeed on both 32- and 64-bit configurations.  */
-#include 
-
-/* Though the command line specifies power7 target, this function is
-   to support power8, which will fail because this platform does not
-   support power8.  */
-__attribute_

Re: [PATCH, rs6000] Add const_vector into any_operand predicate

2024-07-31 Thread Kewen.Lin

Hi Haochen,

on 2024/7/25 11:34, HAO CHEN GUI wrote:
> Hi,
>   This patch add const_vector into any_operand predicate. From my
> understanding, any_operand should include all kinds of operands.
> The const_vector should be included. As emit_move_insn doesn't check
> the predicate, the const_vector is actually supported by vector mode
> move expand. So it should be added into any_operand in case other gen
> function (for instance, maybe_gen_insn) checks the predicate.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Add const_vector into any_operand predicate
> 
> gcc/
>   * config/rs6000/predicates.md (any_operand): Add const_vector.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index d23ce9a77a3..12600368c43 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -19,7 +19,7 @@
> 
>  ;; Return 1 for anything except PARALLEL.
>  (define_predicate "any_operand"
> -  (match_code 
> "const_int,const_double,const_wide_int,const,symbol_ref,label_ref,subreg,reg,mem"))
> +  (match_code 
> "const_int,const_double,const_wide_int,const,symbol_ref,label_ref,subreg,reg,mem,const_vector"))

CC Mike since he added mov and movmisalign.

>From the name, its associated comments and what it currently consists of,
this seemed to be an oversight and looks reasonable to fix.  Maybe it's
read better to put "const_vector" after "const_wide_int", OK for trunk
with this tweaked, but please give others two days or so to chime in,
thanks!

BR,
Kewen

Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-07-31 Thread Kewen.Lin

Hi Carl,

on 2024/7/27 06:56, Carl Love wrote:
> GCC maintainers:
> 
> Per a report from a user, the existing vec_test_lsbb_all_ones and, 
> vec_test_lsbb_all_zeros built-ins are not documented in the GCC documentation 
> file.
> 
> The following patch adds missing documentation for the vec_test_lsbb_all_ones 
> and, vec_test_lsbb_all_zeros built-ins.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>     Carl
> 
> ---
> rs6000, document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros
> 
> Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
> and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
> returns 1 if the least significant bit in each byte is a 1, returns
> 0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
> the least significant bit in each byte is a zero and 0 otherwise.
> 
> The test cases for the built-ins are in files:
>   gcc/testsuite/gcc.target/powerpc/lsbb.c
>   gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c
> 
> 
> gcc/ChangeLog:
>     * doc/extend.texi (vec_test_lsbb_all_ones,
>     vec_test_lsbb_all_zeros): Add documentation for the
>     existing built-ins.
> ---
>  gcc/doc/extend.texi | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 83ff168faf6..96e41c9a905 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -23240,6 +23240,21 @@ signed long long will sign extend the rightmost byte 
> of each doubleword.
>  The following additional built-in functions are also available for the
>  PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}):
> 
> +@smallexample
> +@exdent int vec_test_lsbb_all_ones (vector char);

I think we need to specify "unsigned" char explicitly since we don't actually
allow vector "signed" char as the below testing shows:

int foo11 (vector signed char va)
{ 
  return vec_test_lsbb_all_ones (va);
}

:17:3: error: invalid parameter combination for AltiVec intrinsic 
'__builtin_vec_xvtlsbb_all_ones'
   17 |   return vec_test_lsbb_all_ones (va);


Now we make these two bifs as overload, but there is only one instance 
respectively,
either is with "vector unsigned char" as argument type, but the corresponding 
instance
prototype in builtin table is with "vector signed char".  It's inconsistent and 
weird,
I think we can just update the prototype in builtin table with "vector unsigned 
char"
and remove the entries in overload table.  It can be a follow up patch.

> +@end smallexample
> +@findex vec_test_lsbb_all_ones
> +
> +The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant
> +bit in each byte is a one.  It returns a zero otherwise.

May be better to use the wording "equal to 1" referred from ISA and "returns 0"
matches the preceding "returns 1", like:

“... in each byte is equal to 1.  It returns 0 otherwise.”

> +
> +@smallexample
> +@exdent int vec_test_lsbb_all_zeros (vector char);
> +@end smallexample
> +@findex vec_test_lsbb_all_zeros
> +
> +The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant
> +bit in each byte is a zero.  It returns a zero otherwise.

Likewise, "... in each byte is equal to 0.  It returns 0 otherwise."

OK with these nits tweaked, thanks!

BR,
Kewen

Re: [PATCH] RISC-V: Expand subreg move via slide if necessary [PR116086].

2024-07-31 Thread Robin Dapp

> > Like aarch64 we set REGMODE_NATURAL_SIZE for fixed-size modes to
> > UNITS_PER_WORD.  Isn't that part of the problem?
> >
> > In extract_bit_field_as_subreg we check lowpart_bit_field_p (= true because
> > 128 is a multiple of UNITS_PER_WORD).  This leads to the subreg expression.
> >
> > If I have REGMODE_NATURAL_SIZE return a VLA number this fails and we extract
> > via memory - but that of course breaks almost everything else :)
> >
> > When you say the target-independent code should already have all 
> > information it
> > needs, what are you referring to?  Something else than REGMODE_NATURAL_SIZE?
>
> In the aarch64 example I mentioned, the REGMODE_NATURAL_SIZE of the inner
> mode is variable.  (The REGMODE_NATURAL_SIZE of the outer mode is constant,
> but it's the inner mode that matters here.)

Yes, because in that example the inner mode is a VLA mode.

I meant, for the riscv case, wouldn't we need to make the REGMODE_NATURAL_SIZE
of the VLS mode variable as well? (As that is what it is actually going to be
in the end, some kind of vector move of a variable-sized vector)

Right now we onlye enable VLS modes whose size is not larger than the base
variable vector size (e.g. [128 128] bits) exactly to avoid the kind of
ambiguity between modes that arises here.

-- 
Regards
 Robin

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak

On Wed, Jul 31, 2024 at 10:48 AM Richard Biener  wrote:
>
> On Wed, 31 Jul 2024, Uros Bizjak wrote:
>
> > On Wed, Jul 31, 2024 at 10:24 AM Jakub Jelinek  wrote:
> > >
> > > On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote:
> > > > OK. Richard, can you please mention the above in the comment why
> > > > XFmode is rejected in the hook?
> > > >
> > > > Later, we can perhaps benchmark XFmode move vs. generic memory copy to
> > > > get some hard data.
> > >
> > > My (limited) understanding was that the hook would be used only for cases
> > > where we'd like to e.g. value number some SF/DF/XF etc. mode loads and 
> > > some
> > > subsequent loads from the same address with different mode but same size
> > > the same and replace say int or long long later load with 
> > > VIEW_CONVERT_EXPR
> > > of the result of the SF/SF mode load.  That is what was incorrect, because
> > > the load didn't preserve all the bits.  The patch would still keep doing
> > > normal SF/DF/XF etc. mode copies if that is all that happens in the 
> > > program,
> > > load some floating point value and store it elsewhere or as part of larger
> > > aggregate copy.
> >
> > So, the hook should allow everything besides SF/DFmode, simply:
> >
> >
> > switch (GET_MODE_INNER (mode))
> >   {
> >   case SFmode:
> >   case DFmode:
> > /* These suffer from normalization upon load when not using SSE.  */
> > return !(ix86_fpmath & FPMATH_387);
> >   default:
> > return true;
> >   }
>
> OK, I think I'll go with this then.  I'm now unsure whether the
> wrapper around the hook should reject modes with padding or if
> the supposed users (value-numbering and SRA) should deal with that
> issue separately.  I do wonder whether
>
> ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
>   ? &ieee_extended_intel_128_format
>   : TARGET_96_ROUND_53_LONG_DOUBLE
>   ? &ieee_extended_intel_96_round_53_format
>   : &ieee_extended_intel_96_format));
> ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
>
> unambiguously specifies where the padding is - m68k has
>
> FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_motorola_format);
>
> It's also not clear we can model a x87 10 byte memory copy in RTL since
> a mem:XF still touches 12 or 16 bytes - IIRC a store leaves
> possible padding as unspecified and not "masked out" even if
> the actual fstp will only store 10 bytes.

The hardware will never touch bytes outside 10 bytes range, the
padding is some artificial compiler thingy, so IMO it should be
handled before the hook is called. Please find attached the source I
have used to confirm that a) the copied bits will never be mangled and
b) there is no access outside the 10 bytes range. (BTW: these
particular values are to test the effect of leading bit 63, the
non-hidden normalized bit).

Thanks,
Uros.
int main ()
{
  volatile union cvt
  {
short s[6];
int i[3];
long double d;
  } x, y;

  x.s[5] = 0x5a5a; // guard
  x.s[4] = 0x;
  x.s[3] = 0x4000;
  x.s[2] = 0x1;
  x.s[1] = 0x0;
  x.s[0] = 0x;

  __builtin_printf("%08x %08x %08x\n", x.i[2], x.i[1], x.i[0]);

  y.s[5] = 0xa5a5;  // guard
   
  asm ("" : "=t" (y.d) : "0" (x.d));

  __builtin_printf("%08x %08x %08x\n", y.i[2], y.i[1], y.i[0]);

  if (y.s[0] != x.s[0]
  || y.s[1] != x.s[1]
  || y.s[2] != x.s[2]
  || y.s[3] != x.s[3]
  || y.s[4] != x.s[4])
__builtin_abort();

  return 0;
}

[PATCH] testsuite: Adjust fam-in-union-alone-in-struct-2.c to support BE [PR116148]

2024-07-31 Thread Kewen.Lin

Hi,

As Andrew pointed out in PR116148, fam-in-union-alone-in-struct-2.c
was designed for little-endian, the recent commit r15-2403 made it
be tested with running on BE and PR116148 got exposed.

This patch is to adjust the expected data for members in with_fam_2_v
and with_fam_3_v by considering endianness, also update with_fam_3_v.b[1]
from 0x5f6f7f7f to 0x5f6f7f8f to avoid two "7f"s.

Tested on powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.

Is it ok for trunk?

BR,
Kewen
-
PR testsuite/116148

gcc/testsuite/ChangeLog:

* c-c++-common/fam-in-union-alone-in-struct-2.c: Define macros
WITH_FAM_2_V_B[03] and WITH_FAM_3_V_A[07] as endianness, update the
checking with these macros and initialize with_fam_3_v.b[1] with
0x5f6f7f8f instead of 0x5f6f7f7f.
---
 .../fam-in-union-alone-in-struct-2.c  | 22 ++-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c 
b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
index 93f9d5128f6..7845a7fbab3 100644
--- a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
+++ b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
@@ -16,7 +16,7 @@ union with_fam_2 {
 union with_fam_3 {
   char a[];
   int b[];
-} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f7f}};
+} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f8f}};

 struct only_fam {
   int b[];
@@ -28,16 +28,28 @@ struct only_fam_2 {
   int b[];
 } only_fam_2_v = {{7, 11}};

+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define WITH_FAM_2_V_B0 0x4f
+#define WITH_FAM_2_V_B3 0x1f
+#define WITH_FAM_3_V_A0 0x4f
+#define WITH_FAM_3_V_A7 0x5f
+#else
+#define WITH_FAM_2_V_B0 0x1f
+#define WITH_FAM_2_V_B3 0x4f
+#define WITH_FAM_3_V_A0 0x1f
+#define WITH_FAM_3_V_A7 0x8f
+#endif
+
 int main ()
 {
   if (with_fam_1_v.b[3] != 4
   || with_fam_1_v.b[0] != 1)
 __builtin_abort ();
-  if (with_fam_2_v.b[3] != 0x1f
-  || with_fam_2_v.b[0] != 0x4f)
+  if (with_fam_2_v.b[3] != WITH_FAM_2_V_B3
+  || with_fam_2_v.b[0] != WITH_FAM_2_V_B0)
 __builtin_abort ();
-  if (with_fam_3_v.a[0] != 0x4f
-  || with_fam_3_v.a[7] != 0x5f)
+  if (with_fam_3_v.a[0] != WITH_FAM_3_V_A0
+  || with_fam_3_v.a[7] != WITH_FAM_3_V_A7)
 __builtin_abort ();
   if (only_fam_v.b[0] != 7
   || only_fam_v.b[1] != 11)
--
2.45.2

Re: [PATCH 1/3][v2] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener

On Tue, 30 Jul 2024, Paul Koning wrote:

> 
> 
> > On Jul 30, 2024, at 6:17 AM, Richard Biener  wrote:
> > 
> > The following adds a target hook to specify whether regs of MODE can be
> > used to transfer bits.  The hook is supposed to be used for value-numbering
> > to decide whether a value loaded in such mode can be punned to another
> > mode instead of re-loading the value in the other mode and for SRA to
> > decide whether MODE is suitable as container holding a value to be
> > used in different modes.
> > 
> > ...
> > 
> > +@deftypefn {Target Hook} bool TARGET_MODE_CAN_TRANSFER_BITS (machine_mode 
> > @var{mode})
> > +Define this to return false if the mode @var{mode} cannot be used
> > +for memory copying.  The default is to assume modes with the same
> > +precision as size are fine to be used.
> > +@end deftypefn
> > +
> 
> I'm a bit confused about the meaning of this hook; the summary at the 
> top speaks of type punning while the documentation talks about memory 
> copying.  Those seem rather different.

I agree the wording needs improvement.  I'll try to update it based
on my answer below.

> I'm also wondering about this being tied to a mode rather than a 
> register class.  To given an example: on the PDP11 there are two main 
> register classes, "general" and "float".  General registers handle any 
> bit pattern and support arithmetic operations on integer modes; float 
> registers do not transparently transfer every bit pattern and support 
> float modes.  So only general registers are suitable for memory copies 
> (though on a PDP-11 you don't need registers to do memory copy).  And 
> for type punning, you could load an SF mode value into general registers 
> (a pair) and type-pun them to SImode without reloading.
> 
> So what does that mean for this hook on that target?

If non-float modes are ever allocated to float registers then that would
mean the hook should excempt integer modes.

You are right that this is fundamentally about register classes (or
even instructions).  But the middle-end up to RA has no way to
ensure a register is only placed in a single register class so the
hook asks the target to promise that a register of the specified
mode will always be allocated to a register class suitable for
transfering bits and that loads and store instructions used will
transfer bits from memory unaltered (in case there would for example
be a FP load performing normalization and another load into FP
registers that does not).

It might be possible to re-implement the hook in terms of register
classes and then use the modes register classes this doesn't
capture the x87 situation very well where XFmode is fine but
DFmode is not even though they use the same register file (it's
because of the load instruction).

So I guess for PDP11 the hook wants to exclude all FP modes.

You should possibly see correctness issues with code like

union U { long x; double y; };
void foo(union U *u, union U *u2)
{
  double y = u->y;
  long x = u->x;
  u2->y = y;
  u->x = x;
}

where we perform a double load and pun that to long, possibly
resulting in a different bit-pattern than if loaded from the x
member.

Richard.

Re: arm: Prevent ICE when doloop dec_set is not PLUS_EXPR

2024-07-31 Thread Andre Vieira (lists)


Hi Christophe,

Thanks for the comments, attached new version for testcase, see below 
new cover letter:


This patch refactors and fixes an issue where 
arm_mve_dlstp_check_dec_counter

was making an assumption about the form of what a candidate for a dec_insn.
This dec_insn is the instruction that decreases the loop counter inside a
decrementing loop and we expect it to have the following form:
(set (reg CONDCOUNT)
 (plus (reg CONDCOUNT)
   (const_int)))

Where CONDCOUNT is the loop counter, and const int is the negative constant
used to decrement it.

This patch also improves our search for a valid dec_insn.  Before this patch
we'd only look for a dec_insn inside the loop header if the loop latch was
empty.  We now also search the loop header if the loop latch is not 
empty but
the last instruction is not a valid dec_insn.  This could potentially be 
improved

to search all instructions inside the loop latch.

gcc/ChangeLog:

* config/arm/arm.cc (check_dec_insn): New helper function containing
code hoisted from...
(arm_mve_dlstp_check_dec_counter): ... here. Use check_dec_insn to
check the validity of the candidate dec_insn.

gcc/testsuite/ChangeLog:

* gcc.targer/arm/mve/dlstp-loop-form.c: New test.


On 30/07/2024 21:31, Christophe Lyon wrote:
I manually tried to exercise the testcase with a cross-compiler, and 
found the same issue as the Linaro CI should have reported (there was a 
temporary breakage).


You can find detailed logs from Linaro in gcc.log.1.xz under 
https://ci.linaro.org/job/tcwg_gcc_check--master-arm-precommit/8357/artifact/artifacts/artifacts.precommit/00-sumfiles/


Basically the testcase fails to compile with loads of
dlstp-loop-form.c:6:9: warning: 'pure' attribute on function returning 
'void' [-Wattributes]

then

dlstp-loop-form.c:7:37: error: unknown type name 'float16x8_t'; did you 
mean 'int16x8_t'?

dlstp-loop-form.c: In function 'n':
dlstp-loop-form.c:18:8: error: subscripted value is neither array nor 
pointer nor vector
dlstp-loop-form.c:21:13: error: passing 'e' {aka 'int'} to argument 2 of 
'vfmsq_m', which expects an MVE vector type


Why would the test pass for you?


Because I tested with a toolchain configured for cortex-m85, which has 
mve.fp enabled by default, which means I didn't realize the testcase 
required arm_v8_1m_mve_fp_ok instead of arm_v8_1m_mve_ok.


Addressed that now.diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
92cd168e65937ef7350477464e8b0becf85bceed..e3c3db5c816bfaedf3afb775a0436d4b7c984b51
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -35214,6 +35214,30 @@ arm_mve_dlstp_check_inc_counter (loop *loop, rtx_insn* 
vctp_insn,
   return vctp_insn;
 }
 
+/* Helper function to 'arm_mve_dlstp_check_dec_counter' to make sure DEC_INSN
+   is of the expected form:
+   (set (reg a) (plus (reg a) (const_int)))
+   where (reg a) is the same as CONDCOUNT.  */
+
+static bool
+check_dec_insn (rtx_insn *dec_insn, rtx condcount)
+{
+  if (!NONDEBUG_INSN_P (dec_insn))
+return false;
+  rtx dec_set = single_set (dec_insn);
+  if (!dec_set
+  || !REG_P (SET_DEST (dec_set))
+  || GET_CODE (SET_SRC (dec_set)) != PLUS
+  || !REG_P (XEXP (SET_SRC (dec_set), 0))
+  || !CONST_INT_P (XEXP (SET_SRC (dec_set), 1))
+  || REGNO (SET_DEST (dec_set))
+ != REGNO (XEXP (SET_SRC (dec_set), 0))
+  || REGNO (SET_DEST (dec_set)) != REGNO (condcount))
+return false;
+
+  return true;
+}
+
 /* Helper function to `arm_mve_loop_valid_for_dlstp`.  In the case of a
counter that is decrementing, ensure that it is decrementing by the
right amount in each iteration and that the target condition is what
@@ -35232,30 +35256,19 @@ arm_mve_dlstp_check_dec_counter (loop *loop, 
rtx_insn* vctp_insn,
  modified.  */
   rtx_insn *dec_insn = BB_END (loop->latch);
   /* If not in the loop latch, try to find the decrement in the loop header.  
*/
-  if (!NONDEBUG_INSN_P (dec_insn))
+  if (!check_dec_insn (dec_insn, condcount))
   {
 df_ref temp = df_bb_regno_only_def_find (loop->header, REGNO (condcount));
 /* If we haven't been able to find the decrement, bail out.  */
 if (!temp)
   return NULL;
 dec_insn = DF_REF_INSN (temp);
-  }
 
-  rtx dec_set = single_set (dec_insn);
-
-  /* Next, ensure that it is a PLUS of the form:
- (set (reg a) (plus (reg a) (const_int)))
- where (reg a) is the same as condcount.  */
-  if (!dec_set
-  || !REG_P (SET_DEST (dec_set))
-  || !REG_P (XEXP (SET_SRC (dec_set), 0))
-  || !CONST_INT_P (XEXP (SET_SRC (dec_set), 1))
-  || REGNO (SET_DEST (dec_set))
- != REGNO (XEXP (SET_SRC (dec_set), 0))
-  || REGNO (SET_DEST (dec_set)) != REGNO (condcount))
-return NULL;
+if (!check_dec_insn (dec_insn, condcount))
+  return NULL;
+  }
 
-  decrementnum = INTVAL (XEXP (SET_SRC (dec_set), 1));
+  decrementnum = INTVAL (XEXP (SET_SRC (single_set (dec_insn)), 1));
 
   /* This decrementnum is th

Re: [PATCH v4 2/3] aarch64: Add support for moving fpm system register

2024-07-31 Thread Claudio Bantaloukas



On 31/07/2024 08:57, Kyrylo Tkachov wrote:
> Hi Claudio,
> 
>> On 31 Jul 2024, at 08:29, Claudio Bantaloukas  
>> wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Unlike most system registers, fpmr can be heavily written to in code that
>> exercises the fp8 functionality. That is because every fp8 instrinsic call
>> can potentially change the value of fpmr.
>> Rather than just use an unspec, we treat the fpmr system register like
>> all other registers and use a move operation to read and write to it.
>>
>> We introduce a new class of moveable system registers that, currently,
>> only accepts fpmr and a new constraint, Umv, that allows us to
>> selectively use mrs and msr instructions when expanding rtl for them.
>> Given that there is code that depends on "real" registers coming before
>> "fake" ones, we introduce a new constant FPM_REGNUM that uses an
>> existing value and renumber registers below that.
>> This requires us to update the bitmaps that describe which registers
>> belong to each register class.
> 
> So I like the approach though I’ll let Richard review the implementation 
> details here.
> My only slight concern here is compatibility with LLVM. I notice that LLVM 
> doesn’t accept the test case you’ve included as it doesn’t understand “fpmr” 
> in its inline assembly. It also doesn’t support the new constraint, of course.
> Do you know if there are plans to teach LLVM these inline assembly constructs 
> to avoid creating GCC-only sources for fp8?

Hi Kyrill,
I asked and got confirmation that a patch to add fpmr as a register 
should land soon.

Cheers,
Claudio

> Thanks,
> Kyrill
> 
> 
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64.cc (aarch64_hard_regno_nregs): Add
>> support for MOVEABLE_SYSREGS class.
>> (aarch64_hard_regno_mode_ok): Allow reads and writes to fpmr.
>> (aarch64_regno_regclass): Support MOVEABLE_SYSREGS class.
>> (aarch64_class_max_nregs): Likewise.
>> * config/aarch64/aarch64.h (FIXED_REGISTERS): add fpmr.
>> (CALL_REALLY_USED_REGISTERS): Likewise.
>> (REGISTER_NAMES): Likewise.
>> (enum reg_class): Add MOVEABLE_SYSREGS class.
>> (REG_CLASS_NAMES): Likewise.
>> (REG_CLASS_CONTENTS): Update class bitmaps to deal with fpmr,
>> the new MOVEABLE_REGS class and renumbering of registers.
>> * config/aarch64/aarch64.md: (FPM_REGNUM): added new register
>> number, reusing old value.
>> (FFR_REGNUM): Renumber.
>> (FFRT_REGNUM): Likewise.
>> (LOWERING_REGNUM): Likewise.
>> (TPIDR2_BLOCK_REGNUM): Likewise.
>> (SME_STATE_REGNUM): Likewise.
>> (TPIDR2_SETUP_REGNUM): Likewise.
>> (ZA_FREE_REGNUM): Likewise.
>> (ZA_SAVED_REGNUM): Likewise.
>> (ZA_REGNUM): Likewise.
>> (ZT0_REGNUM): Likewise.
>> (*mov_aarch64): Add support for moveable sysregs.
>> (*movsi_aarch64): Likewise.
>> (*movdi_aarch64): Likewise.
>> * config/aarch64/constraints.md (MOVEABLE_SYSREGS): New constraint.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/aarch64/acle/fp8.c: New tests.
>> ---
>> gcc/config/aarch64/aarch64.cc   |   8 ++
>> gcc/config/aarch64/aarch64.h|  14 ++-
>> gcc/config/aarch64/aarch64.md   |  30 --
>> gcc/config/aarch64/constraints.md   |   3 +
>> gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 101 
>> 5 files changed, 142 insertions(+), 14 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index e0cf382998c..9810f2c0390 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -2018,6 +2018,7 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode 
>> mode)
>>  case PR_HI_REGS:
>>return mode == VNx32BImode ? 2 : 1;
>>
>> +case MOVEABLE_SYSREGS:
>>  case FFR_REGS:
>>  case PR_AND_FFR_REGS:
>>  case FAKE_REGS:
>> @@ -2045,6 +2046,9 @@ aarch64_hard_regno_mode_ok (unsigned regno, 
>> machine_mode mode)
>>  /* This must have the same size as _Unwind_Word.  */
>>  return mode == DImode;
>>
>> +  if (regno == FPM_REGNUM)
>> +return mode == QImode || mode == HImode || mode == SImode || mode == 
>> DImode;
>> +
>>unsigned int vec_flags = aarch64_classify_vector_mode (mode);
>>if (vec_flags == VEC_SVE_PRED)
>>  return pr_or_ffr_regnum_p (regno);
>> @@ -12680,6 +12684,9 @@ aarch64_regno_regclass (unsigned regno)
>>if (PR_REGNUM_P (regno))
>>  return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
>>
>> +  if (regno == FPM_REGNUM)
>> +return MOVEABLE_SYSREGS;
>> +
>>if (regno == FFR_REGNUM || regno == FFRT_REGNUM)
>>  return FFR_REGS;
>>
>> @@ -13068,6 +13075,7 @@ aarch64_class_max_nregs (reg_class_t regclass, 
>> machine_mode mode)
>>  case PR_HI_REGS:
>>return mode == VNx32BImode ? 2 : 1;
>>
>> +case MOVEABL

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener

On Wed, 31 Jul 2024, Uros Bizjak wrote:

> On Wed, Jul 31, 2024 at 10:48 AM Richard Biener  wrote:
> >
> > On Wed, 31 Jul 2024, Uros Bizjak wrote:
> >
> > > On Wed, Jul 31, 2024 at 10:24 AM Jakub Jelinek  wrote:
> > > >
> > > > On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote:
> > > > > OK. Richard, can you please mention the above in the comment why
> > > > > XFmode is rejected in the hook?
> > > > >
> > > > > Later, we can perhaps benchmark XFmode move vs. generic memory copy to
> > > > > get some hard data.
> > > >
> > > > My (limited) understanding was that the hook would be used only for 
> > > > cases
> > > > where we'd like to e.g. value number some SF/DF/XF etc. mode loads and 
> > > > some
> > > > subsequent loads from the same address with different mode but same size
> > > > the same and replace say int or long long later load with 
> > > > VIEW_CONVERT_EXPR
> > > > of the result of the SF/SF mode load.  That is what was incorrect, 
> > > > because
> > > > the load didn't preserve all the bits.  The patch would still keep doing
> > > > normal SF/DF/XF etc. mode copies if that is all that happens in the 
> > > > program,
> > > > load some floating point value and store it elsewhere or as part of 
> > > > larger
> > > > aggregate copy.
> > >
> > > So, the hook should allow everything besides SF/DFmode, simply:
> > >
> > >
> > > switch (GET_MODE_INNER (mode))
> > >   {
> > >   case SFmode:
> > >   case DFmode:
> > > /* These suffer from normalization upon load when not using SSE.  
> > > */
> > > return !(ix86_fpmath & FPMATH_387);
> > >   default:
> > > return true;
> > >   }
> >
> > OK, I think I'll go with this then.  I'm now unsure whether the
> > wrapper around the hook should reject modes with padding or if
> > the supposed users (value-numbering and SRA) should deal with that
> > issue separately.  I do wonder whether
> >
> > ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
> >   ? &ieee_extended_intel_128_format
> >   : TARGET_96_ROUND_53_LONG_DOUBLE
> >   ? &ieee_extended_intel_96_round_53_format
> >   : &ieee_extended_intel_96_format));
> > ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> > ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> >
> > unambiguously specifies where the padding is - m68k has
> >
> > FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_motorola_format);
> >
> > It's also not clear we can model a x87 10 byte memory copy in RTL since
> > a mem:XF still touches 12 or 16 bytes - IIRC a store leaves
> > possible padding as unspecified and not "masked out" even if
> > the actual fstp will only store 10 bytes.
> 
> The hardware will never touch bytes outside 10 bytes range, the
> padding is some artificial compiler thingy, so IMO it should be
> handled before the hook is called. Please find attached the source I
> have used to confirm that a) the copied bits will never be mangled and
> b) there is no access outside the 10 bytes range. (BTW: these
> particular values are to test the effect of leading bit 63, the
> non-hidden normalized bit).

Thanks - I do wonder why GET_MODE_SIZE (XFmode) is not 10 then,
mode_base_align[XFmode] seems to be correctly set to ensure
12 bytes / 16 bytes "effective" size.

Richard.

Re: [PATCH v4 2/3] aarch64: Add support for moving fpm system register

2024-07-31 Thread Kyrylo Tkachov



> On 31 Jul 2024, at 11:31, Claudio Bantaloukas  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On 31/07/2024 08:57, Kyrylo Tkachov wrote:
>> Hi Claudio,
>> 
>>> On 31 Jul 2024, at 08:29, Claudio Bantaloukas  
>>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Unlike most system registers, fpmr can be heavily written to in code that
>>> exercises the fp8 functionality. That is because every fp8 instrinsic call
>>> can potentially change the value of fpmr.
>>> Rather than just use an unspec, we treat the fpmr system register like
>>> all other registers and use a move operation to read and write to it.
>>> 
>>> We introduce a new class of moveable system registers that, currently,
>>> only accepts fpmr and a new constraint, Umv, that allows us to
>>> selectively use mrs and msr instructions when expanding rtl for them.
>>> Given that there is code that depends on "real" registers coming before
>>> "fake" ones, we introduce a new constant FPM_REGNUM that uses an
>>> existing value and renumber registers below that.
>>> This requires us to update the bitmaps that describe which registers
>>> belong to each register class.
>> 
>> So I like the approach though I’ll let Richard review the implementation 
>> details here.
>> My only slight concern here is compatibility with LLVM. I notice that LLVM 
>> doesn’t accept the test case you’ve included as it doesn’t understand “fpmr” 
>> in its inline assembly. It also doesn’t support the new constraint, of 
>> course.
>> Do you know if there are plans to teach LLVM these inline assembly 
>> constructs to avoid creating GCC-only sources for fp8?
> 
> Hi Kyrill,
> I asked and got confirmation that a patch to add fpmr as a register
> should land soon.

Great, thanks for confirming.
Kyrill

> 
> Cheers,
> Claudio
> 
>> Thanks,
>> Kyrill
>> 
>> 
>>> 
>>> gcc/ChangeLog:
>>> 
>>>* config/aarch64/aarch64.cc (aarch64_hard_regno_nregs): Add
>>>support for MOVEABLE_SYSREGS class.
>>>(aarch64_hard_regno_mode_ok): Allow reads and writes to fpmr.
>>>(aarch64_regno_regclass): Support MOVEABLE_SYSREGS class.
>>>(aarch64_class_max_nregs): Likewise.
>>>* config/aarch64/aarch64.h (FIXED_REGISTERS): add fpmr.
>>>(CALL_REALLY_USED_REGISTERS): Likewise.
>>>(REGISTER_NAMES): Likewise.
>>>(enum reg_class): Add MOVEABLE_SYSREGS class.
>>>(REG_CLASS_NAMES): Likewise.
>>>(REG_CLASS_CONTENTS): Update class bitmaps to deal with fpmr,
>>>the new MOVEABLE_REGS class and renumbering of registers.
>>>* config/aarch64/aarch64.md: (FPM_REGNUM): added new register
>>>number, reusing old value.
>>>(FFR_REGNUM): Renumber.
>>>(FFRT_REGNUM): Likewise.
>>>(LOWERING_REGNUM): Likewise.
>>>(TPIDR2_BLOCK_REGNUM): Likewise.
>>>(SME_STATE_REGNUM): Likewise.
>>>(TPIDR2_SETUP_REGNUM): Likewise.
>>>(ZA_FREE_REGNUM): Likewise.
>>>(ZA_SAVED_REGNUM): Likewise.
>>>(ZA_REGNUM): Likewise.
>>>(ZT0_REGNUM): Likewise.
>>>(*mov_aarch64): Add support for moveable sysregs.
>>>(*movsi_aarch64): Likewise.
>>>(*movdi_aarch64): Likewise.
>>>* config/aarch64/constraints.md (MOVEABLE_SYSREGS): New constraint.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>* gcc.target/aarch64/acle/fp8.c: New tests.
>>> ---
>>> gcc/config/aarch64/aarch64.cc   |   8 ++
>>> gcc/config/aarch64/aarch64.h|  14 ++-
>>> gcc/config/aarch64/aarch64.md   |  30 --
>>> gcc/config/aarch64/constraints.md   |   3 +
>>> gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 101 
>>> 5 files changed, 142 insertions(+), 14 deletions(-)
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>>> index e0cf382998c..9810f2c0390 100644
>>> --- a/gcc/config/aarch64/aarch64.cc
>>> +++ b/gcc/config/aarch64/aarch64.cc
>>> @@ -2018,6 +2018,7 @@ aarch64_hard_regno_nregs (unsigned regno, 
>>> machine_mode mode)
>>> case PR_HI_REGS:
>>>   return mode == VNx32BImode ? 2 : 1;
>>> 
>>> +case MOVEABLE_SYSREGS:
>>> case FFR_REGS:
>>> case PR_AND_FFR_REGS:
>>> case FAKE_REGS:
>>> @@ -2045,6 +2046,9 @@ aarch64_hard_regno_mode_ok (unsigned regno, 
>>> machine_mode mode)
>>> /* This must have the same size as _Unwind_Word.  */
>>> return mode == DImode;
>>> 
>>> +  if (regno == FPM_REGNUM)
>>> +return mode == QImode || mode == HImode || mode == SImode || mode == 
>>> DImode;
>>> +
>>>   unsigned int vec_flags = aarch64_classify_vector_mode (mode);
>>>   if (vec_flags == VEC_SVE_PRED)
>>> return pr_or_ffr_regnum_p (regno);
>>> @@ -12680,6 +12684,9 @@ aarch64_regno_regclass (unsigned regno)
>>>   if (PR_REGNUM_P (regno))
>>> return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
>>> 
>>> +  if (regno == FPM_REGNUM)
>>> +return MOVEABLE_SYSREGS;
>>> +
>>>   i

[PATCH] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Filip Kastl

32bit x86 CPUs won't natively support the FFS operation on a 64 bit
type.  Therefore, the switch-exp-transform-3.c test will always fail
with a 32bit target.  I'm fixing my mistake.

gcc/testsuite/ChangeLog:

* gcc.target/i386/switch-exp-transform-3.c: Remove code testing
  that the exponential index transform is able to handle long
  long int.

Signed-off-by: Filip Kastl 
---
 .../gcc.target/i386/switch-exp-transform-3.c  | 51 +--
 1 file changed, 2 insertions(+), 49 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
index c8fae70692e..cd00071d0bc 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
@@ -4,8 +4,7 @@
 /* Checks that the exponential index transformation is done for all these types
of the index variable:
- (unsigned) int
-   - (unsigned) long
-   - (unsigned) long long  */
+   - (unsigned) long  */
 
 int unopt_int(int bit_position)
 {
@@ -99,50 +98,4 @@ int unopt_unsigned_long(unsigned long bit_position)
 }
 }
 
-int unopt_long_long(long long bit_position)
-{
-switch (bit_position)
-{
-case (1 << 0):
-return 0;
-case (1 << 1):
-return 1;
-case (1 << 2):
-return 2;
-case (1 << 3):
-return 3;
-case (1 << 4):
-return 4;
-case (1 << 5):
-return 5;
-case (1 << 6):
-return 6;
-default:
-return 0;
-}
-}
-
-int unopt_unsigned_long_long(unsigned long long bit_position)
-{
-switch (bit_position)
-{
-case (1 << 0):
-return 0;
-case (1 << 1):
-return 1;
-case (1 << 2):
-return 2;
-case (1 << 3):
-return 3;
-case (1 << 4):
-return 4;
-case (1 << 5):
-return 5;
-case (1 << 6):
-return 6;
-default:
-return 0;
-}
-}
-
-/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
"switchconv" } } */
+/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 4 
"switchconv" } } */
-- 
2.45.2

Re: [PATCH] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Jakub Jelinek

On Wed, Jul 31, 2024 at 12:02:08PM +0200, Filip Kastl wrote:
> 32bit x86 CPUs won't natively support the FFS operation on a 64 bit
> type.  Therefore, the switch-exp-transform-3.c test will always fail
> with a 32bit target.  I'm fixing my mistake.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/switch-exp-transform-3.c: Remove code testing
> that the exponential index transform is able to handle long
> long int.

But for -m64 it does and it is good to test even that.
Can't you wrap the long long stuff with
#ifdef __x86_64__
and
do
/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 4 
"switchconv" { target ia32 } } } */
/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
"switchconv" { target { ! ia32 } } } } */
or so?

Jakub

Re: [PATCH] LoongArch: Rework bswap{hi,si,di}2 definition

2024-07-31 Thread Xi Ruoyao

On Wed, 2024-07-31 at 16:57 +0800, Lulu Cheng wrote:
> 
> 在 2024/7/29 下午3:58, Xi Ruoyao 写道:
> > Per a gcc-help thread we are generating sub-optimal code for
> > __builtin_bswap{32,64}.  To fix it:
> > 
> > - Use a single revb.d instruction for bswapdi2.
> > - Use a single revb.2w instruction for bswapsi2 for TARGET_64BIT,
> >     revb.2h + rotri.w for !TARGET_64BIT.
> > - Use a single revb.2h instruction for bswapsi2 (x) r>> 16, and a single
> >     revb.2w instruction for bswapdi2 (x) r>> 32.
> > 
> > Unfortunately I cannot figure out a way to make the compiler generate
> > revb.4h or revh.{2w,d} instructions.
> 
> This optimization is really ingenious and I have no problem.
> 
> I also haven't figured out how to generate revb.4h or revh. {2w,d}.
> I think we can merge this patch first.

Pushed r15-2433.

FWIW I tried a naive pattern for revh.2w:

(set (match_operand:DI 0 "register_operand" "=r")
 (ior:DI
   (and:DI
 (ashift:DI (match_operand:DI 1 "register_operand" "r")
(const_int 16))
 (const_int 18446462603027742720))
   (and:DI
 (lshiftrt:DI (match_dup 1)
  (const_int 16))
 (const_int 281470681808895

But it seems too complex to be recognized.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak

On Wed, Jul 31, 2024 at 11:33 AM Richard Biener  wrote:
>
> On Wed, 31 Jul 2024, Uros Bizjak wrote:
>
> > On Wed, Jul 31, 2024 at 10:48 AM Richard Biener  wrote:
> > >
> > > On Wed, 31 Jul 2024, Uros Bizjak wrote:
> > >
> > > > On Wed, Jul 31, 2024 at 10:24 AM Jakub Jelinek  wrote:
> > > > >
> > > > > On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote:
> > > > > > OK. Richard, can you please mention the above in the comment why
> > > > > > XFmode is rejected in the hook?
> > > > > >
> > > > > > Later, we can perhaps benchmark XFmode move vs. generic memory copy 
> > > > > > to
> > > > > > get some hard data.
> > > > >
> > > > > My (limited) understanding was that the hook would be used only for 
> > > > > cases
> > > > > where we'd like to e.g. value number some SF/DF/XF etc. mode loads 
> > > > > and some
> > > > > subsequent loads from the same address with different mode but same 
> > > > > size
> > > > > the same and replace say int or long long later load with 
> > > > > VIEW_CONVERT_EXPR
> > > > > of the result of the SF/SF mode load.  That is what was incorrect, 
> > > > > because
> > > > > the load didn't preserve all the bits.  The patch would still keep 
> > > > > doing
> > > > > normal SF/DF/XF etc. mode copies if that is all that happens in the 
> > > > > program,
> > > > > load some floating point value and store it elsewhere or as part of 
> > > > > larger
> > > > > aggregate copy.
> > > >
> > > > So, the hook should allow everything besides SF/DFmode, simply:
> > > >
> > > >
> > > > switch (GET_MODE_INNER (mode))
> > > >   {
> > > >   case SFmode:
> > > >   case DFmode:
> > > > /* These suffer from normalization upon load when not using 
> > > > SSE.  */
> > > > return !(ix86_fpmath & FPMATH_387);
> > > >   default:
> > > > return true;
> > > >   }
> > >
> > > OK, I think I'll go with this then.  I'm now unsure whether the
> > > wrapper around the hook should reject modes with padding or if
> > > the supposed users (value-numbering and SRA) should deal with that
> > > issue separately.  I do wonder whether
> > >
> > > ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
> > >   ? &ieee_extended_intel_128_format
> > >   : TARGET_96_ROUND_53_LONG_DOUBLE
> > >   ? &ieee_extended_intel_96_round_53_format
> > >   : &ieee_extended_intel_96_format));
> > > ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> > > ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> > >
> > > unambiguously specifies where the padding is - m68k has
> > >
> > > FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_motorola_format);
> > >
> > > It's also not clear we can model a x87 10 byte memory copy in RTL since
> > > a mem:XF still touches 12 or 16 bytes - IIRC a store leaves
> > > possible padding as unspecified and not "masked out" even if
> > > the actual fstp will only store 10 bytes.
> >
> > The hardware will never touch bytes outside 10 bytes range, the
> > padding is some artificial compiler thingy, so IMO it should be
> > handled before the hook is called. Please find attached the source I
> > have used to confirm that a) the copied bits will never be mangled and
> > b) there is no access outside the 10 bytes range. (BTW: these
> > particular values are to test the effect of leading bit 63, the
> > non-hidden normalized bit).
>
> Thanks - I do wonder why GET_MODE_SIZE (XFmode) is not 10 then,
> mode_base_align[XFmode] seems to be correctly set to ensure
> 12 bytes / 16 bytes "effective" size.

Uh, this decision predates my involvement in GCC development by a long shot ;)

Uros.

[RFC/RFA] [PATCH v2 09/12] Add symbolic execution support.

2024-07-31 Thread Mariam Arutunian

Gives an opportunity to execute the code on bit level,
   assigning symbolic values to the variables which don't have initial
values.
   Supports only CRC specific operations.

   Example:

   uint8_t crc;
   uint8_t pol = 1;
   crc = crc ^ pol;

   during symbolic execution crc's value will be:
   crc(8), crc(7), ... crc(1), crc(0) ^ 1

   Author: Matevos Mehrabyan 

 gcc/

   * Makefile.in (OBJS): Add sym-exec/expression.o,
   sym-exec/state.o, sym-exec/condition.o.
   * configure (sym-exec): New subdir.

 gcc/sym-exec/

   * condition.cc: New file.
   * condition.h: New file.
   * expression-is-a-helper.h: New file.
   * expression.cc: New file.
   * expression.h: New file.
   * state.cc: New file.
   * state.h: New file.

   Signed-off-by: Mariam Arutunian 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 0238201981d..1d10120baf3 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1722,6 +1722,9 @@ OBJS = \
 	tree-logical-location.o \
 	tree-loop-distribution.o \
 	gimple-crc-optimization.o \
+	sym-exec/expression.o \
+	sym-exec/state.o \
+	sym-exec/condition.o \
 	tree-nested.o \
 	tree-nrv.o \
 	tree-object-size.o \
diff --git a/gcc/configure b/gcc/configure
index 1335db2d4d2..68e905fb48e 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -36203,7 +36203,7 @@ $as_echo "$as_me: executing $ac_file commands" >&6;}
 "depdir":C) $SHELL $ac_aux_dir/mkinstalldirs $DEPDIR ;;
 "gccdepdir":C)
   ${CONFIG_SHELL-/bin/sh} $ac_aux_dir/mkinstalldirs build/$DEPDIR
-  for lang in $subdirs c-family common analyzer text-art rtl-ssa
+  for lang in $subdirs c-family common analyzer text-art rtl-ssa sym-exec
   do
   ${CONFIG_SHELL-/bin/sh} $ac_aux_dir/mkinstalldirs $lang/$DEPDIR
   done ;;
diff --git a/gcc/sym-exec/condition.cc b/gcc/sym-exec/condition.cc
new file mode 100644
index 000..5b558d1e315
--- /dev/null
+++ b/gcc/sym-exec/condition.cc
@@ -0,0 +1,53 @@
+#include "condition.h"
+
+bit_condition::bit_condition (value_bit *left, value_bit *right, tree_code code)
+{
+  this->m_left = left;
+  this->m_right = right;
+  this->m_code = code;
+  m_type = BIT_CONDITION;
+}
+
+
+bit_condition::bit_condition (const bit_condition &expr)
+{
+  bit_expression::copy (&expr);
+  m_code = expr.get_code ();
+}
+
+
+tree_code
+bit_condition::get_code () const
+{
+  return m_code;
+}
+
+
+value_bit *
+bit_condition::copy () const
+{
+  return new bit_condition (*this);
+}
+
+
+void
+bit_condition::print_expr_sign ()
+{
+  switch (m_code)
+{
+  case GT_EXPR:
+	fprintf (dump_file, " > ");
+	break;
+  case LT_EXPR:
+	fprintf (dump_file, " < ");
+	break;
+  case EQ_EXPR:
+	fprintf (dump_file, " == ");
+	break;
+  case NE_EXPR:
+	fprintf (dump_file, " != ");
+	break;
+  default:
+	fprintf (dump_file, " ? ");
+}
+}
\ No newline at end of file
diff --git a/gcc/sym-exec/condition.h b/gcc/sym-exec/condition.h
new file mode 100644
index 000..1882c6cfa91
--- /dev/null
+++ b/gcc/sym-exec/condition.h
@@ -0,0 +1,26 @@
+#ifndef SYM_EXEC_CONDITION_H
+#define SYM_EXEC_CONDITION_H
+
+#include "expression.h"
+
+enum condition_status {
+  CS_NO_COND,
+  CS_TRUE,
+  CS_FALSE,
+  CS_SYM
+};
+
+
+class bit_condition : public bit_expression {
+ private:
+  tree_code m_code;
+  void print_expr_sign ();
+
+ public:
+  bit_condition (value_bit *left, value_bit *right, tree_code type);
+  bit_condition (const bit_condition &expr);
+  tree_code get_code () const;
+  value_bit *copy () const;
+};
+
+#endif /* SYM_EXEC_CONDITION_H.  */
\ No newline at end of file
diff --git a/gcc/sym-exec/expression-is-a-helper.h b/gcc/sym-exec/expression-is-a-helper.h
new file mode 100644
index 000..9931254c36e
--- /dev/null
+++ b/gcc/sym-exec/expression-is-a-helper.h
@@ -0,0 +1,204 @@
+#ifndef SYM_EXEC_EXPRESSION_IS_A_HELPER_H
+#define SYM_EXEC_EXPRESSION_IS_A_HELPER_H
+
+#include "condition.h"
+
+/* Defining test functions for value conversion via dyn_cast.  */
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  return ptr->get_type () == value_type::SYMBOLIC_BIT;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  return ptr->get_type () == value_type::BIT;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  value_type type = ptr->get_type ();
+  return type == value_type::BIT_AND_EXPRESSION
+	 || type == value_type::BIT_OR_EXPRESSION
+	 || type == value_type::BIT_XOR_EXPRESSION
+	 || type == value_type::BIT_COMPLEMENT_EXPRESSION
+	 || type == value_type::SHIFT_RIGHT_EXPRESSION
+	 || type == value_type::SHIFT_LEFT_EXPRESSION
+	 || type == value_type::ADD_EXPRESSION
+	 || type == value_type::SUB_EXPRESSION
+	 || type == value_type::BIT_CONDITION;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  return ptr->get_type () == value_type::BIT_AND_EXPRESSION;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *

Re: [PATCH] testsuite: Adjust fam-in-union-alone-in-struct-2.c to support BE [PR116148]

2024-07-31 Thread Sam James

"Kewen.Lin"  writes:

> Hi,
>
> As Andrew pointed out in PR116148, fam-in-union-alone-in-struct-2.c
> was designed for little-endian, the recent commit r15-2403 made it
> be tested with running on BE and PR116148 got exposed.
>
> This patch is to adjust the expected data for members in with_fam_2_v
> and with_fam_3_v by considering endianness, also update with_fam_3_v.b[1]
> from 0x5f6f7f7f to 0x5f6f7f8f to avoid two "7f"s.
>
> Tested on powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.
>
> Is it ok for trunk?

I can't approve it but LGTM & thanks for doing it. Maybe give Qing at
least the rest of the day to comment given it's theirs.

>
> BR,
> Kewen

thanks,
sam


signature.asc
Description: PGP signature

[PATCH] c++/coroutines: only defer expanding co_{await, return, yield} if dependent [PR112341]

2024-07-31 Thread Arsen Arsenović

Tested on x86_64-pc-linux-gnu.  OK for trunk?

TIA, have a lovely day.
-- >8 --
By doing so, we can get diagnostics in template decls when we know we
can.  For instance, in the following:

  awaitable g();
  template
  task f()
  {
co_await g();
co_yield 1;
co_return "foo";
  }

... the coroutine promise type in each statement is always
std::coroutine_handle::promise_type, and all of the operands are
not type-dependent, so we can always compute the resulting types (and
expected types) of these expressions and statements.

Also, when we do not know the type of the CO_AWAIT_EXPR or
CO_YIELD_EXPR, we now return NULL_TREE as the type rather than
unknown_type_node.  This is more correct, since the type is not unknown,
it just isn't determined yet.  This also means we can remove the
CO_AWAIT_EXPR and CO_YIELD_EXPR special-cases from
type_dependent_expression_p.

PR c++/112341 - error: insufficient contextual information to determine type on 
co_await result in function template

gcc/cp/ChangeLog:

PR c++/112341
* coroutines.cc (struct coroutine_info): Also cache the
traits type.
(ensure_coro_initialized): New function.  Makes sure we have
initialized the coroutine state successfully, or informs the
caller should it fail to do so.  Extracted from
coro_promise_type_found_p.
(coro_get_traits_class): New function.  Gets the (cached)
coroutine traits type for a given coroutine.  Extracted from
coro_promise_type_found_p and refactored to cache the result.
(coro_promise_type_found_p): Use the two functions above.
(build_template_co_await_expr): New function.  Builds a
CO_AWAIT_EXPR representing a CO_AWAIT_EXPR in a template
declaration.
(build_co_await): Use the above if processing_template_decl, and
give it a propert type.
(defer_expansion_p): New function.  Returns true iff its
argument is a type-dependent expression OR the current functions
traits class is type dependent.
(finish_co_await_expr): Defer expansion only in the case
defer_expasnion_p returns true.
(finish_co_yield_expr): Ditto.
(finish_co_return_stmt): Ditto.
* pt.cc (type_dependent_expression_p): Do not treat
CO_AWAIT/CO_YIELD specially.

gcc/testsuite/ChangeLog:

PR c++/112341
* g++.dg/coroutines/pr112341-2.C: New test.
* g++.dg/coroutines/pr112341-3.C: New test.
* g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C: New
test.
* g++.dg/coroutines/pr112341.C: New test.
---
 gcc/cp/coroutines.cc  | 142 ++
 gcc/cp/pt.cc  |   5 -
 gcc/testsuite/g++.dg/coroutines/pr112341-2.C  |  25 +++
 gcc/testsuite/g++.dg/coroutines/pr112341-3.C  |  65 
 gcc/testsuite/g++.dg/coroutines/pr112341.C|  21 +++
 .../torture/co-yield-03-tmpl-nondependent.C   | 140 +
 6 files changed, 361 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-2.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-3.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341.C
 create mode 100644 
gcc/testsuite/g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 127a1c06b56e..9494cb499454 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -85,6 +85,7 @@ struct GTY((for_user)) coroutine_info
   tree actor_decl;/* The synthesized actor function.  */
   tree destroy_decl;  /* The synthesized destroy function.  */
   tree promise_type;  /* The cached promise type for this function.  */
+  tree traits_type;   /* The cached traits type for this function.  */
   tree handle_type;   /* The cached coroutine handle for this function.  */
   tree self_h_proxy;  /* A handle instance that is used as the proxy for the
 one that will eventually be allocated in the coroutine
@@ -429,11 +430,12 @@ find_promise_type (tree traits_class)
   return promise_type;
 }
 
+/* Perform initialization of the coroutine processor state, if not done
+   before.  */
+
 static bool
-coro_promise_type_found_p (tree fndecl, location_t loc)
+ensure_coro_initialized (location_t loc)
 {
-  gcc_assert (fndecl != NULL_TREE);
-
   if (!coro_initialized)
 {
   /* Trees we only need to create once.
@@ -466,6 +468,33 @@ coro_promise_type_found_p (tree fndecl, location_t loc)
 
   coro_initialized = true;
 }
+  return true;
+}
+
+/* Try to get the coroutine traits class.  */
+static tree
+coro_get_traits_class (tree fndecl, location_t loc)
+{
+  gcc_assert (fndecl != NULL_TREE);
+
+  if (!ensure_coro_initialized (loc))
+/* We can't continue.  */
+return error_mark_node;
+
+  coroutine_info *coro_info = get_or_insert_coroutine_info (fndecl);
+  auto& traits_type = coro_info->traits_type

[Patch, v3] omp-offload.cc: Fix value-expr handling of 'declare target link' vars [PR115637] (was: [Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637])

2024-07-31 Thread Tobias Burnus


Hi Richard, hi all,

Richard Biener wrote:

Looking at pass_omp_target_link::execute I wonder iff find_link_var_op
shouldn't simply do the substitution?  Aka


This seems to work ...


--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -2893,6 +2893,7 @@ find_link_var_op (tree *tp, int *walk_subtrees, void *)
&& is_global_var (t)
&& lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t)))
  {
+  *tp = unshare_expr (DECL_VALUE_EXPR (t));
*walk_subtrees = 0;
return t;
  }

which then makes the stmt obviously not gimple?


... except that 'return t' prevents updating other value-expr in the 
same stmt, but that can be fixed.


Updated patch attached.

Thanks for the suggestion!

Tobias
omp-offload.cc: Fix value-expr handling of 'declare target link' vars

As the PR and included testcase shows, replacing 'arr2' by its value expression
'*arr2$13$linkptr' failed for
  MEM  [(c_char * {ref-all})&arr2]
which left 'arr2' in the code as unknown symbol. Now expand the value expression
already in pass_omp_target_link::execute's process_link_var_op walk_gimple_stmt
walk - and don't rely on gimple_regimplify_operands.

PR middle-end/115637

gcc/ChangeLog:

	* gimplify.cc (gimplify_body): Fix macro name in the comment.
	* omp-offload.cc (found_link_var): New global var.
	(find_link_var_op): Rename to ...
	(process_link_var_op): ... this. Replace value expr; set
	found_link_var.
	(pass_omp_target_link::execute): Update walk_gimple_stmt call.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: Uncomment
	now working code.

Co-authored-by: Richard Biener  PR115637
-! if (res /= -11436) stop 5
-if (res /= -11546) stop 5 ! FIXME
+! print *, res
+if (res /= -11436) stop 5
   end
   integer function run_device1()
 !$omp declare target
 integer :: i
 run_device1 = -99
-! FIXME: arr2 not link mapped -> PR115637
-!   arr2 = [11,22,33,44]
+arr2 = [11,22,33,44]
 if (any (arr(10:50) /= [(i, i=10,50)])) then
   run_device1 = arr(11)
   return
 end if
-! FIXME: -> PR115637
-! run_device1 = sum(arr(10:13) + arr2)
-run_device1 = sum(arr(10:13) ) ! FIXME
+run_device1 = sum(arr(10:13) + arr2)
 do i = 10, 50
   arr(i) = 3 - 10 * arr(i)
 end do

[PATCH v2] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Filip Kastl

On Wed 2024-07-31 12:18:34, Jakub Jelinek wrote:
> On Wed, Jul 31, 2024 at 12:02:08PM +0200, Filip Kastl wrote:
> > 32bit x86 CPUs won't natively support the FFS operation on a 64 bit
> > type.  Therefore, the switch-exp-transform-3.c test will always fail
> > with a 32bit target.  I'm fixing my mistake.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/i386/switch-exp-transform-3.c: Remove code testing
> >   that the exponential index transform is able to handle long
> >   long int.
> 
> But for -m64 it does and it is good to test even that.
> Can't you wrap the long long stuff with
> #ifdef __x86_64__
> and
> do
> /* { dg-final { scan-tree-dump-times "Applying exponential index transform" 4 
> "switchconv" { target ia32 } } } */
> /* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
> "switchconv" { target { ! ia32 } } } } */
> or so?
> 
>   Jakub
> 

Thanks for the feedback!  Here is a second version of the patch.  I've tested
this version with

make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c 
--target_board='unix{-m32}'"

and

make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c"

on a x86_64 machine and in both cases the test didn't produce any errors and
scan-tree-dump-times was successful.

Is this version ok?

Thanks,
Filip Kastl


-- 8< --


testsuite: Adjust switch-exp-transform-3.c for 32bit

32bit x86 CPUs won't natively support the FFS operation on a 64 bit
type.  Therefore, I'm setting the long long int part of the
switch-exp-transform-3.c test to only execute with 64bit targets.

gcc/testsuite/ChangeLog:

* gcc.target/i386/switch-exp-transform-3.c: Set the long long
  int test to only execute with 64bit targets.

Signed-off-by: Filip Kastl 
---
 gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
index c8fae70692e..64a7b146172 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
@@ -99,6 +99,8 @@ int unopt_unsigned_long(unsigned long bit_position)
 }
 }
 
+#ifdef __x86_64__
+
 int unopt_long_long(long long bit_position)
 {
 switch (bit_position)
@@ -145,4 +147,7 @@ int unopt_unsigned_long_long(unsigned long long 
bit_position)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
"switchconv" } } */
+#endif
+
+/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 4 
"switchconv" { target ia32 } } } */
+/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
"switchconv" { target { ! ia32 } } } } */
-- 
2.45.2

Re: [PATCH v2] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Jakub Jelinek

On Wed, Jul 31, 2024 at 01:32:06PM +0200, Filip Kastl wrote:
> Thanks for the feedback!  Here is a second version of the patch.  I've tested
> this version with
> 
> make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c 
> --target_board='unix{-m32}'"
> 
> and
> 
> make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c"

You can just use
make check RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} 
i386.exp=switch-exp-transform-3.c"

> testsuite: Adjust switch-exp-transform-3.c for 32bit
> 
> 32bit x86 CPUs won't natively support the FFS operation on a 64 bit
> type.  Therefore, I'm setting the long long int part of the
> switch-exp-transform-3.c test to only execute with 64bit targets.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/switch-exp-transform-3.c: Set the long long
> int test to only execute with 64bit targets.

There should be just tab, not tab + 2 spaces before int test.
Ok with that nit changed.

Jakub

Re: [PATCH v2] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Filip Kastl

On Wed 2024-07-31 13:34:28, Jakub Jelinek wrote:
> On Wed, Jul 31, 2024 at 01:32:06PM +0200, Filip Kastl wrote:
> > Thanks for the feedback!  Here is a second version of the patch.  I've 
> > tested
> > this version with
> > 
> > make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c 
> > --target_board='unix{-m32}'"
> > 
> > and
> > 
> > make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c"
> 
> You can just use
> make check RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} 
> i386.exp=switch-exp-transform-3.c"
> 
> > testsuite: Adjust switch-exp-transform-3.c for 32bit
> > 
> > 32bit x86 CPUs won't natively support the FFS operation on a 64 bit
> > type.  Therefore, I'm setting the long long int part of the
> > switch-exp-transform-3.c test to only execute with 64bit targets.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/i386/switch-exp-transform-3.c: Set the long long
> >   int test to only execute with 64bit targets.
> 
> There should be just tab, not tab + 2 spaces before int test.
> Ok with that nit changed.
> 
>   Jakub
> 

Ok, removed the 2 spaces and pushed the patch.

Thanks,
Filip Kastl

Re: [PATCH 1/2] match: Fix types matching for `(?:) !=/== (?:)` [PR116134]

2024-07-31 Thread Richard Biener

On Tue, Jul 30, 2024 at 5:25 PM Andrew Pinski  wrote:
>
> The problem here is that in generic types of comparisons don't need
> to be boolean types (or vector boolean types). And fixes that by making
> sure the types of the conditions match before doing the optimization.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK

> PR middle-end/116134
>
> gcc/ChangeLog:
>
> * match.pd (`(a ? x : y) eq/ne (b ? x : y)`): Check that
> a and b types match.
> (`(a ? x : y) eq/ne (b ? y : x)`): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr116134-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd  | 10 ++
>  gcc/testsuite/gcc.dg/torture/pr116134-1.c |  9 +
>  2 files changed, 15 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116134-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 1c8601229e3..881a827860f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5640,12 +5640,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (for eqne (eq ne)
>(simplify
> (eqne:c (cnd @0 @1 @2) (cnd @3 @1 @2))
> -(cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); }
> - { constant_boolean_node (eqne != NE_EXPR, type); }))
> +(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> + (cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); }
> +  { constant_boolean_node (eqne != NE_EXPR, type); })))
>(simplify
> (eqne:c (cnd @0 @1 @2) (cnd @3 @2 @1))
> -(cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); }
> - { constant_boolean_node (eqne == NE_EXPR, type); }
> +(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> + (cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); }
> +  { constant_boolean_node (eqne == NE_EXPR, type); })
>
>  /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> types are compatible.  */
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116134-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr116134-1.c
> new file mode 100644
> index 000..ab595f99680
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116134-1.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +
> +/* This used to ICE as comparisons on generic can be different types. */
> +/* PR middle-end/116134  */
> +
> +int a;
> +int b;
> +int d;
> +void c() { 1UL <= (d < b) != (1UL & (0 < a | 0L)); }
> --
> 2.43.0
>

Re: [PATCH 2/2] match: Fix wrong code due to `(a ? e : f) !=/== (b ? e : f)` patterns [PR116120]

2024-07-31 Thread Richard Biener

On Tue, Jul 30, 2024 at 5:26 PM Andrew Pinski  wrote:
>
> When this pattern was converted from being only dealing with 0/-1, we missed 
> that if `e == f` is true
> then the optimization is wrong and needs an extra check for that.
>
> This changes the patterns to be:
> /* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
> /* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
> /* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
> /* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
>
> This still produces better code than the original case and in many cases (x 
> != y) will
> still reduce to either false or true.
>
> With this change we also need to make sure `a`, `b` and the resulting types 
> are all
> the same for the same reason as the previous patch.
>
> I updated (well added) to the testcases to make sure there are the right 
> amount of
> comparisons left.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK

> PR tree-optimization/116120
>
> gcc/ChangeLog:
>
> * match.pd (`(a ? x : y) eq/ne (b ? x : y)`): Add test for `x != y`
> in result.
> (`(a ? x : y) eq/ne (b ? y : x)`): Add test for `x == y` in result.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/tree-ssa/pr50.C: Add extra checks on the test.
> * gcc.dg/tree-ssa/pr50-1.c: Likewise.
> * gcc.dg/tree-ssa/pr50.c: Likewise.
> * g++.dg/torture/pr116120-1.c: New test.
> * g++.dg/torture/pr116120-2.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd   | 20 -
>  gcc/testsuite/g++.dg/torture/pr116120-1.c  | 32 
>  gcc/testsuite/g++.dg/torture/pr116120-2.c  | 35 ++
>  gcc/testsuite/g++.dg/tree-ssa/pr50.C   | 10 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c |  9 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr50.c   |  1 +
>  6 files changed, 99 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116120-1.c
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116120-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 881a827860f..4d3ee578371 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5632,21 +5632,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(vec_cond (bit_and (bit_not @0) @1) @2 @3)))
>  #endif
>
> -/* (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE */
> -/* (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE  */
> -/* (a ? x : y) != (b ? y : x) --> (a^b) ? FALSE : TRUE  */
> -/* (a ? x : y) == (b ? y : x) --> (a^b) ? TRUE  : FALSE */
> +/* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
> +/* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
> +/* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
> +/* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
>  (for cnd (cond vec_cond)
>   (for eqne (eq ne)
>(simplify
> (eqne:c (cnd @0 @1 @2) (cnd @3 @1 @2))
> -(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> - (cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); }
> +(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3))
> + && types_match (type, TREE_TYPE (@0)))
> + (cnd (bit_and (bit_xor @0 @3) (ne:type @1 @2))
> +  { constant_boolean_node (eqne == NE_EXPR, type); }
>{ constant_boolean_node (eqne != NE_EXPR, type); })))
>(simplify
> (eqne:c (cnd @0 @1 @2) (cnd @3 @2 @1))
> -(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> - (cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); }
> +(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3))
> + && types_match (type, TREE_TYPE (@0)))
> + (cnd (bit_ior (bit_xor @0 @3) (eq:type @1 @2))
> +  { constant_boolean_node (eqne != NE_EXPR, type); }
>{ constant_boolean_node (eqne == NE_EXPR, type); })
>
>  /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> diff --git a/gcc/testsuite/g++.dg/torture/pr116120-1.c 
> b/gcc/testsuite/g++.dg/torture/pr116120-1.c
> new file mode 100644
> index 000..cffb7fbdc5b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116120-1.c
> @@ -0,0 +1,32 @@
> +// { dg-run }
> +// PR tree-optimization/116120
> +
> +// The optimization for `(a ? x : y) != (b ? x : y)`
> +// missed that x and y could be the same value.
> +
> +typedef int v4si __attribute((__vector_size__(1 * sizeof(int;
> +v4si f1(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
> +  v4si X = a == b ? e : f;
> +  v4si Y = c == d ? e : f;
> +  return (X != Y); // ~(X == Y ? -1 : 0) (x ^ Y)
> +}
> +
> +int f2(int a, int b, int c, int d, int e, int f) {
> +  int X = a == b ? e : f;
> +  int Y = c == d ? e : f;
> +  return (X != Y) ? -1 : 0; // ~(X == Y ? -1 : 0) (x ^ Y)
> +}
> +
> +int main()
> +{
> +  v4si a = {0};
> +  v4si b = {0}; // a == b, true
> +  v4si c = {2};
> +  v4si d = {3

Re: [PATCH 2/2] match: Fix wrong code due to `(a ? e : f) !=/== (b ? e : f)` patterns [PR116120]

2024-07-31 Thread Sam James

Andrew Pinski  writes:

> When this pattern was converted from being only dealing with 0/-1, we missed 
> that if `e == f` is true
> then the optimization is wrong and needs an extra check for that.
>
> This changes the patterns to be:
> /* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
> /* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
> /* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
> /* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
>
> This still produces better code than the original case and in many cases (x 
> != y) will
> still reduce to either false or true.
>
> With this change we also need to make sure `a`, `b` and the resulting types 
> are all
> the same for the same reason as the previous patch.
>
> I updated (well added) to the testcases to make sure there are the right 
> amount of
> comparisons left.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
>   PR tree-optimization/116120
>
> [...]
> diff --git a/gcc/testsuite/g++.dg/torture/pr116120-1.c 
> b/gcc/testsuite/g++.dg/torture/pr116120-1.c
> new file mode 100644
> index 000..cffb7fbdc5b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116120-1.c
> @@ -0,0 +1,32 @@
> +// { dg-run }

dg-do run! Ditto elsewhere.

> [...]

thanks,
sam


signature.asc
Description: PGP signature

Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-31 Thread Jennifer Schmitz

Thanks for the feedback! I updated the patch based on your comments, more 
detailed comments inline below. The updated version was bootstrapped and tested 
again, no regression.
Best,
Jennifer



0001-AArch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch
Description: Binary data

> On 25 Jul 2024, at 14:49, Kyrylo Tkachov  wrote:
> 
> 
> 
>> On 25 Jul 2024, at 13:58, Richard Sandiford  
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Jennifer Schmitz  writes:
>>> Thank you for the feedback. I added checks for SCALAR_INT_MODE_P for the 
>>> reg operands of the compare and if-then-else expressions. As it is not 
>>> legal to have different modes in the operand registers, I only added one 
>>> check for each of the expressions.
>>> The updated patch was bootstrapped and tested again.
>>> Best,
>>> Jennifer
>>> 
>>> From 8da609be99fece8130cf1429bd938b2a26c6672b Mon Sep 17 00:00:00 2001
>>> From: Jennifer Schmitz 
>>> Date: Wed, 24 Jul 2024 06:13:59 -0700
>>> Subject: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2
>>> 
>>> According to the Neoverse V2 Software Optimization Guide (section 4.14), the
>>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
>>> implemented so far. This patch implements and tests the two fusion pairs.
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> There was also no non-noise impact on SPEC CPU2017 benchmark.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Jennifer Schmitz 
>>> 
>>> gcc/
>>> 
>>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>>> fusion logic.
>>> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>>> (cmp+cset): Likewise.
>>> * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>>> field fusible_ops.
>>> 
>>> gcc/testsuite/
>>> 
>>> * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>>> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
>> 
>> Thanks for the update.
>> 
>> It looks from a quick scan like the main three instructions associated
>> with single-set integer COMPAREs are CMP, CMN and TST.  TST could be
>> distinguished from CMP and CMN based on get_attr_type (), although it
>> looks like:
>> 
>> (define_insn "*and_compare0"
>> [(set (reg:CC_Z CC_REGNUM)
>>   (compare:CC_Z
>>(match_operand:SHORT 0 "register_operand" "r")
>>(const_int 0)))]
>> ""
>> "tst\\t%0, "
>> [(set_attr "type" "alus_imm")]
>> )
> 
> We can change that independently.
I submitted a small patch to fix that.
> 
>> 
>> should use logics_imm instead of alus_imm.
>> 
>> Alternatively, we could add a new attribute for "compare_type" and use
>> that.  That would make the test in aarch_macro_fusion_pair_p slightly
>> simpler, since it could use get_attr_compare_type without having to
>> look at the pattern of prev_set.  But there's a danger that we'd
>> forget to add the new attribute for new comparison instructions.
>> 
>> I did wonder whether we could simply punt on CC_Zmode, but that's
>> not a reliable test.
>> 
>> But I suppose the counter-argument to my questions above is: how bad
>> would it be if we fused CMN and TST?  They are at least plausible
>> fusions, so it probably doesn't matter if we include them too.
> 
> CMN and TST can be fused with conditional branches, but not with CSEL 
> according to my reading of the SWOG so I guess we’d want to keep them 
> separate in principle. In practice, I can’t imagine the performance 
> difference will be measurable in real workloads if they are kept together.
> Jennifer’s benchmarking of this patch didn’t show any negative performance 
> consequences of the more aggressive fusion, and even a slight improvement.
As suggested, I used get_attr_type to distinguish TST from CMP/CMN.
> 
> 
>> 
>> So:
>> 
>>> ---
>>> gcc/config/aarch64/aarch64-fusion-pairs.def   |  2 ++
>>> gcc/config/aarch64/aarch64.cc | 22 +
>>> gcc/config/aarch64/tuning_models/neoversev2.h |  5 ++-
>>> .../gcc.target/aarch64/fuse_cmp_csel.c| 33 +++
>>> .../gcc.target/aarch64/fuse_cmp_cset.c| 31 +
>>> 5 files changed, 92 insertions(+), 1 deletion(-)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/fuse_cmp_csel.c
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/fuse_cmp_cset.c
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
>>> b/gcc/config/aarch64/aarch64-fusion-pairs.def
>>> index 9a43b0c8065..bf5e85ba8fe 100644
>>> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
>>> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
>>> @@ -37,5 +37,7 @@ AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
>>> AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
>>> AARCH64_FUSION_PAIR ("alu+cbz", ALU_CBZ)
>>> AARCH64_FUSION_PAIR ("addsub_2reg_const1", ADDSUB_2REG_CONST1)
>>> +AARCH64_FUSION_PAIR ("cmp+csel", CMP_CSEL)
>>> +AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
>>> 
>>> #undef AARCH64

Re: [Patch,v3] omp-offload.cc: Fix value-expr handling of 'declare target link' vars [PR115637] (was: [Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637])

2024-07-31 Thread Richard Biener

On Wed, Jul 31, 2024 at 1:21 PM Tobias Burnus  wrote:
>
> Hi Richard, hi all,
>
> Richard Biener wrote:
>
> Looking at pass_omp_target_link::execute I wonder iff find_link_var_op
> shouldn't simply do the substitution?  Aka
>
> This seems to work ...
>
> --- a/gcc/omp-offload.cc
> +++ b/gcc/omp-offload.cc
> @@ -2893,6 +2893,7 @@ find_link_var_op (tree *tp, int *walk_subtrees, void *)
>&& is_global_var (t)
>&& lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t)))
>  {
> +  *tp = unshare_expr (DECL_VALUE_EXPR (t));
>*walk_subtrees = 0;
>return t;
>  }
>
> which then makes the stmt obviously not gimple?
>
> ... except that 'return t' prevents updating other value-expr in the same 
> stmt, but that can be fixed.
>
> Updated patch attached.

You can pass a

  walk_stmt_info wi;
  wi->data = NULL;

to walk_gimple_stmt and set wi->data instead of using a global
variable (or make wi->data point
to a local variable for some more indirection).

OK as-is or with cleanup as suggested above.

Thanks,
Richard.

> Thanks for the suggestion!
>
> Tobias

[PATCH] libstdc++: Only append "@euro" to locale names for Glibc testing

2024-07-31 Thread Jonathan Wakely

I doubt we want the @euro suffix anywhere except Glibc-based targets. We
certainly don't want to append "@euro" on Solaris, where this change
flips some tests from UNSUPPORTED to PASS, e.g.
21_strings/basic_string/numeric_conversions/char/to_string_float.cc
It will probably also cause some to flip from UNSUPPORTED to FAIL, which
we'll need to address.

Let's restrict it to Glibc.

Tested x86_64-linux and sparc-solaris11.4.

-- >8 --

The testsuite automatically appends "@euro" to "xx.ISO8859-15" locale
names on all targets except FreeBSD, DragonflyBSD, and NetBSD. It should
only be for Glibc, not all non-BSD targets.

libstdc++-v3/ChangeLog:

* testsuite/lib/libstdc++.exp (check_v3_target_namedlocale):
Only append "@euro" to ".ISO8859-15" locales for Glibc.
---
 libstdc++-v3/testsuite/lib/libstdc++.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 18331c80bc2..2510c7f4cbb 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -1032,7 +1032,7 @@ proc check_v3_target_namedlocale { args } {
puts $f "strcpy(result, name);"
puts $f "#if defined __FreeBSD__ || defined __DragonFly__ || defined 
__NetBSD__"
puts $f "/* fall-through */"
-   puts $f "#else"
+   puts $f "#elif defined __GLIBC__"
puts $f "if (strstr(result, \"ISO8859-15\")) {"
puts $f "strcat(result, \"@euro\");"
puts $f "}"
-- 
2.45.2

Re: [PATCH] libstdc++: Only append "@euro" to locale names for Glibc testing

2024-07-31 Thread Jonathan Wakely

On Wed, 31 Jul 2024 at 13:27, Jonathan Wakely  wrote:
>
> I doubt we want the @euro suffix anywhere except Glibc-based targets. We
> certainly don't want to append "@euro" on Solaris, where this change
> flips some tests from UNSUPPORTED to PASS, e.g.
> 21_strings/basic_string/numeric_conversions/char/to_string_float.cc
> It will probably also cause some to flip from UNSUPPORTED to FAIL, which
> we'll need to address.

Oh, I've just realised that the UNSUPPORTED -> PASS I observed on
Solaris was a build using my patch for PR 57585, which is not pushed
yet. I think without that all uses of dg-require-namedlocale might
fail on Solaris, so this change won't actually change anything ...
yet.

It still seems worth doing now though.

>
> Let's restrict it to Glibc.
>
> Tested x86_64-linux and sparc-solaris11.4.
>
> -- >8 --
>
> The testsuite automatically appends "@euro" to "xx.ISO8859-15" locale
> names on all targets except FreeBSD, DragonflyBSD, and NetBSD. It should
> only be for Glibc, not all non-BSD targets.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/lib/libstdc++.exp (check_v3_target_namedlocale):
> Only append "@euro" to ".ISO8859-15" locales for Glibc.
> ---
>  libstdc++-v3/testsuite/lib/libstdc++.exp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
> b/libstdc++-v3/testsuite/lib/libstdc++.exp
> index 18331c80bc2..2510c7f4cbb 100644
> --- a/libstdc++-v3/testsuite/lib/libstdc++.exp
> +++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
> @@ -1032,7 +1032,7 @@ proc check_v3_target_namedlocale { args } {
> puts $f "strcpy(result, name);"
> puts $f "#if defined __FreeBSD__ || defined __DragonFly__ || defined 
> __NetBSD__"
> puts $f "/* fall-through */"
> -   puts $f "#else"
> +   puts $f "#elif defined __GLIBC__"
> puts $f "if (strstr(result, \"ISO8859-15\")) {"
> puts $f "strcat(result, \"@euro\");"
> puts $f "}"
> --
> 2.45.2
>

Re: [PATCH] libstdc++: Only append "@euro" to locale names for Glibc testing

2024-07-31 Thread Rainer Orth

Hi Jonathan,

> On Wed, 31 Jul 2024 at 13:27, Jonathan Wakely  wrote:
>>
>> I doubt we want the @euro suffix anywhere except Glibc-based targets. We
>> certainly don't want to append "@euro" on Solaris, where this change
>> flips some tests from UNSUPPORTED to PASS, e.g.
>> 21_strings/basic_string/numeric_conversions/char/to_string_float.cc
>> It will probably also cause some to flip from UNSUPPORTED to FAIL, which
>> we'll need to address.
>
> Oh, I've just realised that the UNSUPPORTED -> PASS I observed on
> Solaris was a build using my patch for PR 57585, which is not pushed
> yet. I think without that all uses of dg-require-namedlocale might
> fail on Solaris, so this change won't actually change anything ...
> yet.
>
> It still seems worth doing now though.

agreed: while Solaris 11.4 does have a few *.ISO8859-15@euro locales

da_DK.ISO8859-15@euro
en_GB.ISO8859-15@euro
en_US.ISO8859-15@euro
sv_SE.ISO8859-15@euro

the majority (17) are not.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener

On Wed, 31 Jul 2024, Uros Bizjak wrote:

> On Wed, Jul 31, 2024 at 11:33 AM Richard Biener  wrote:
> >
> > On Wed, 31 Jul 2024, Uros Bizjak wrote:
> >
> > > On Wed, Jul 31, 2024 at 10:48 AM Richard Biener  wrote:
> > > >
> > > > On Wed, 31 Jul 2024, Uros Bizjak wrote:
> > > >
> > > > > On Wed, Jul 31, 2024 at 10:24 AM Jakub Jelinek  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote:
> > > > > > > OK. Richard, can you please mention the above in the comment why
> > > > > > > XFmode is rejected in the hook?
> > > > > > >
> > > > > > > Later, we can perhaps benchmark XFmode move vs. generic memory 
> > > > > > > copy to
> > > > > > > get some hard data.
> > > > > >
> > > > > > My (limited) understanding was that the hook would be used only for 
> > > > > > cases
> > > > > > where we'd like to e.g. value number some SF/DF/XF etc. mode loads 
> > > > > > and some
> > > > > > subsequent loads from the same address with different mode but same 
> > > > > > size
> > > > > > the same and replace say int or long long later load with 
> > > > > > VIEW_CONVERT_EXPR
> > > > > > of the result of the SF/SF mode load.  That is what was incorrect, 
> > > > > > because
> > > > > > the load didn't preserve all the bits.  The patch would still keep 
> > > > > > doing
> > > > > > normal SF/DF/XF etc. mode copies if that is all that happens in the 
> > > > > > program,
> > > > > > load some floating point value and store it elsewhere or as part of 
> > > > > > larger
> > > > > > aggregate copy.
> > > > >
> > > > > So, the hook should allow everything besides SF/DFmode, simply:
> > > > >
> > > > >
> > > > > switch (GET_MODE_INNER (mode))
> > > > >   {
> > > > >   case SFmode:
> > > > >   case DFmode:
> > > > > /* These suffer from normalization upon load when not using 
> > > > > SSE.  */
> > > > > return !(ix86_fpmath & FPMATH_387);
> > > > >   default:
> > > > > return true;
> > > > >   }
> > > >
> > > > OK, I think I'll go with this then.  I'm now unsure whether the
> > > > wrapper around the hook should reject modes with padding or if
> > > > the supposed users (value-numbering and SRA) should deal with that
> > > > issue separately.  I do wonder whether
> > > >
> > > > ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
> > > >   ? &ieee_extended_intel_128_format
> > > >   : TARGET_96_ROUND_53_LONG_DOUBLE
> > > >   ? &ieee_extended_intel_96_round_53_format
> > > >   : &ieee_extended_intel_96_format));
> > > > ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> > > > ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> > > >
> > > > unambiguously specifies where the padding is - m68k has
> > > >
> > > > FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_motorola_format);
> > > >
> > > > It's also not clear we can model a x87 10 byte memory copy in RTL since
> > > > a mem:XF still touches 12 or 16 bytes - IIRC a store leaves
> > > > possible padding as unspecified and not "masked out" even if
> > > > the actual fstp will only store 10 bytes.
> > >
> > > The hardware will never touch bytes outside 10 bytes range, the
> > > padding is some artificial compiler thingy, so IMO it should be
> > > handled before the hook is called. Please find attached the source I
> > > have used to confirm that a) the copied bits will never be mangled and
> > > b) there is no access outside the 10 bytes range. (BTW: these
> > > particular values are to test the effect of leading bit 63, the
> > > non-hidden normalized bit).
> >
> > Thanks - I do wonder why GET_MODE_SIZE (XFmode) is not 10 then,
> > mode_base_align[XFmode] seems to be correctly set to ensure
> > 12 bytes / 16 bytes "effective" size.
> 
> Uh, this decision predates my involvement in GCC development by a long shot ;)

diff --git a/gcc/config/i386/i386-modes.def 
b/gcc/config/i386/i386-modes.def
index 6d8f1946f3a..2cc03e30f13 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
XFmode is __float80 is IEEE extended; TFmode is __float128
is IEEE quad.  */
 
-FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
+FRACTIONAL_FLOAT_MODE (XF, 80, 10, ieee_extended_intel_96_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 FLOAT_MODE (HF, 2, ieee_half_format);
 FLOAT_MODE (BF, 2, 0);

bootstraps and tests (-m64/-m32) OK on x86_64-linux.

Richard.

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Jakub Jelinek

On Wed, Jul 31, 2024 at 02:43:36PM +0200, Richard Biener wrote:
> diff --git a/gcc/config/i386/i386-modes.def 
> b/gcc/config/i386/i386-modes.def
> index 6d8f1946f3a..2cc03e30f13 100644
> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
> XFmode is __float80 is IEEE extended; TFmode is __float128
> is IEEE quad.  */
>  
> -FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
> +FRACTIONAL_FLOAT_MODE (XF, 80, 10, ieee_extended_intel_96_format);
>  FLOAT_MODE (TF, 16, ieee_quad_format);
>  FLOAT_MODE (HF, 2, ieee_half_format);
>  FLOAT_MODE (BF, 2, 0);
> 
> bootstraps and tests (-m64/-m32) OK on x86_64-linux.

And does it e.g. pass compat.exp / structure-layout-1.exp testing
against gcc without that patch (ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++)?

Jakub

Re: [PATCH] libstdc++: Only append "@euro" to locale names for Glibc testing

2024-07-31 Thread Jonathan Wakely

On Wed, 31 Jul 2024 at 13:42, Rainer Orth  wrote:
>
> Hi Jonathan,
>
> > On Wed, 31 Jul 2024 at 13:27, Jonathan Wakely  wrote:
> >>
> >> I doubt we want the @euro suffix anywhere except Glibc-based targets. We
> >> certainly don't want to append "@euro" on Solaris, where this change
> >> flips some tests from UNSUPPORTED to PASS, e.g.
> >> 21_strings/basic_string/numeric_conversions/char/to_string_float.cc
> >> It will probably also cause some to flip from UNSUPPORTED to FAIL, which
> >> we'll need to address.
> >
> > Oh, I've just realised that the UNSUPPORTED -> PASS I observed on
> > Solaris was a build using my patch for PR 57585, which is not pushed
> > yet. I think without that all uses of dg-require-namedlocale might
> > fail on Solaris, so this change won't actually change anything ...
> > yet.
> >
> > It still seems worth doing now though.
>
> agreed: while Solaris 11.4 does have a few *.ISO8859-15@euro locales
>
> da_DK.ISO8859-15@euro
> en_GB.ISO8859-15@euro
> en_US.ISO8859-15@euro
> sv_SE.ISO8859-15@euro
>
> the majority (17) are not.

Ah interesting, I only saw en_US.ISO8859-15@euro on cfarm216, which is
an interesting one. US locale using Euro symbol for currency?!

Anyway, thanks for confirming, I'll push this.

[PATCH] libstdc++: Handle strerror returning null

2024-07-31 Thread Jonathan Wakely

As discussed a couple of weeks ago, I'm going to push this.

Tested x86_64-linux (where this #else isn't even used, but I checked it
does at least compile when the #if isn't true).

-- >8 --

The linux man page for strerror says that some systems return NULL for
an unknown error number. That violates the C and POSIX standards, but we
can esily handle it to avoid a segfault.

libstdc++-v3/ChangeLog:

* src/c++11/system_error.cc (strerror_string): Handle
non-conforming NULL return from strerror.
---
 libstdc++-v3/src/c++11/system_error.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/src/c++11/system_error.cc 
b/libstdc++-v3/src/c++11/system_error.cc
index d01451ba1ef..38bc0446110 100644
--- a/libstdc++-v3/src/c++11/system_error.cc
+++ b/libstdc++-v3/src/c++11/system_error.cc
@@ -110,7 +110,11 @@ namespace
 #else
   string strerror_string(int err)
   {
-return strerror(err); // XXX Not thread-safe.
+auto str = strerror(err); // XXX Not thread-safe.
+if (str) [[__likely__]]
+  return str;
+// strerror should not return NULL, but some implementations do.
+return "Unknown error";
   }
 #endif
 
-- 
2.45.2

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener

On Wed, 31 Jul 2024, Jakub Jelinek wrote:

> On Wed, Jul 31, 2024 at 02:43:36PM +0200, Richard Biener wrote:
> > diff --git a/gcc/config/i386/i386-modes.def 
> > b/gcc/config/i386/i386-modes.def
> > index 6d8f1946f3a..2cc03e30f13 100644
> > --- a/gcc/config/i386/i386-modes.def
> > +++ b/gcc/config/i386/i386-modes.def
> > @@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
> > XFmode is __float80 is IEEE extended; TFmode is __float128
> > is IEEE quad.  */
> >  
> > -FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
> > +FRACTIONAL_FLOAT_MODE (XF, 80, 10, ieee_extended_intel_96_format);
> >  FLOAT_MODE (TF, 16, ieee_quad_format);
> >  FLOAT_MODE (HF, 2, ieee_half_format);
> >  FLOAT_MODE (BF, 2, 0);
> > 
> > bootstraps and tests (-m64/-m32) OK on x86_64-linux.
> 
> And does it e.g. pass compat.exp / structure-layout-1.exp testing
> against gcc without that patch (ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++)?

It doesn't.  I would expect differences at least for packed structs
since TYPE_SIZE changes with MODE_SIZE.

Richard.

[PATCH 1/3][v3] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener

The following adds a target hook to specify whether regs of MODE can be
used to transfer bits.  The hook is supposed to be used for value-numbering
to decide whether a value loaded in such mode can be punned to another
mode instead of re-loading the value in the other mode and for SRA to
decide whether MODE is suitable as container holding a value to be
used in different modes.

Adjusted documentation in v3.

* target.def (mode_can_transfer_bits): New target hook.
* target.h (mode_can_transfer_bits): New function wrapping the
hook and providing default behavior.
* doc/tm.texi.in: Update.
* doc/tm.texi: Re-generate.
---
 gcc/doc/tm.texi| 11 +++
 gcc/doc/tm.texi.in |  2 ++
 gcc/target.def | 13 +
 gcc/target.h   | 16 
 4 files changed, 42 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c7535d07f4d..cc33084ed32 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4545,6 +4545,17 @@ is either a declaration of type int or accessed by 
dereferencing
 a pointer to int.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_MODE_CAN_TRANSFER_BITS (machine_mode 
@var{mode})
+Define this to return false if the mode @var{mode} cannot be used
+for memory copying of @code{GET_MODE_SIZE (mode)} units.  This might be
+because a register class allowed for @var{mode} has registers that do
+not transparently transfer every bit pattern or because the load or
+store patterns available for @var{mode} have this issue.
+
+The default is to assume modes with the same precision as size are fine
+to be used.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_TRANSLATE_MODE_ATTRIBUTE 
(machine_mode @var{mode})
 Define this hook if during mode attribute processing, the port should
 translate machine_mode @var{mode} to another mode.  For example, rs6000's
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 64cea3b1eda..8af3f414505 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3455,6 +3455,8 @@ stack.
 
 @hook TARGET_REF_MAY_ALIAS_ERRNO
 
+@hook TARGET_MODE_CAN_TRANSFER_BITS
+
 @hook TARGET_TRANSLATE_MODE_ATTRIBUTE
 
 @hook TARGET_SCALAR_MODE_SUPPORTED_P
diff --git a/gcc/target.def b/gcc/target.def
index 3de1aad4c84..1d0ea6f30ca 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3363,6 +3363,19 @@ a pointer to int.",
  bool, (ao_ref *ref),
  default_ref_may_alias_errno)
 
+DEFHOOK
+(mode_can_transfer_bits,
+ "Define this to return false if the mode @var{mode} cannot be used\n\
+for memory copying of @code{GET_MODE_SIZE (mode)} units.  This might be\n\
+because a register class allowed for @var{mode} has registers that do\n\
+not transparently transfer every bit pattern or because the load or\n\
+store patterns available for @var{mode} have this issue.\n\
+\n\
+The default is to assume modes with the same precision as size are fine\n\
+to be used.",
+ bool, (machine_mode mode),
+ NULL)
+
 /* Support for named address spaces.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_ADDR_SPACE_"
diff --git a/gcc/target.h b/gcc/target.h
index c1f99b97b86..837651d273a 100644
--- a/gcc/target.h
+++ b/gcc/target.h
@@ -312,6 +312,22 @@ estimated_poly_value (poly_int64 x,
 return targetm.estimated_poly_value (x, kind);
 }
 
+/* Return true when MODE can be used to copy GET_MODE_BITSIZE bits
+   unchanged.  */
+
+inline bool
+mode_can_transfer_bits (machine_mode mode)
+{
+  if (mode == BLKmode)
+return true;
+  if (maybe_ne (GET_MODE_BITSIZE (mode),
+   GET_MODE_UNIT_PRECISION (mode) * GET_MODE_NUNITS (mode)))
+return false;
+  if (targetm.mode_can_transfer_bits)
+return targetm.mode_can_transfer_bits (mode);
+  return true;
+}
+
 #ifdef GCC_TM_H
 
 #ifndef CUMULATIVE_ARGS_MAGIC
-- 
2.43.0

[PATCH 2/3] [x86] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener

The following implements the hook, excluding x87 modes for scalar
and complex float modes.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK this way?

Thanks,
Richard.

* i386.cc (TARGET_MODE_CAN_TRANSFER_BITS): Define.
(ix86_mode_can_transfer_bits): New function.
---
 gcc/config/i386/i386.cc | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 12d15feb5e9..9869c44ee15 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -26113,6 +26113,25 @@ ix86_have_ccmp ()
   return (bool) TARGET_APX_CCMP;
 }
 
+/* Implement TARGET_MODE_CAN_TRANSFER_BITS.  */
+static bool
+ix86_mode_can_transfer_bits (machine_mode mode)
+{
+  if (GET_MODE_CLASS (mode) == MODE_FLOAT
+  || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
+switch (GET_MODE_INNER (mode))
+  {
+  case SFmode:
+  case DFmode:
+   /* These suffer from normalization upon load when not using SSE.  */
+   return !(ix86_fpmath & FPMATH_387);
+  default:
+   return true;
+  }
+
+  return true;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -26959,6 +26978,9 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_HAVE_CCMP
 #define TARGET_HAVE_CCMP ix86_have_ccmp
 
+#undef TARGET_MODE_CAN_TRANSFER_BITS
+#define TARGET_MODE_CAN_TRANSFER_BITS ix86_mode_can_transfer_bits
+
 static bool
 ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
 {
-- 
2.43.0

[PATCH 3/3][v3] tree-optimization/114659 - VN and FP to int punning

2024-07-31 Thread Richard Biener

The following addresses another case where x87 FP loads mangle the
bit representation and thus are not suitable for a representative
in other types.  VN was value-numbering a later integer load of 'x'
as the same as a former float load of 'x'.

We can use the new TARGET_MODE_CAN_TRANSFER_BITS hook to identify
problematic modes and enforce strict compatibility for those in
the reference comparison, improving the handling of modes with
padding in visit_reference_op_load.

PR tree-optimization/114659
* tree-ssa-sccvn.cc (visit_reference_op_load): Do not
prevent punning from modes with padding here, but ...
(vn_reference_eq): ... ensure this here, also honoring
types with modes that cannot act as bit container.

* gcc.target/i386/pr114659.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr114659.c | 62 
 gcc/tree-ssa-sccvn.cc| 11 ++---
 2 files changed, 66 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114659.c

diff --git a/gcc/testsuite/gcc.target/i386/pr114659.c 
b/gcc/testsuite/gcc.target/i386/pr114659.c
new file mode 100644
index 000..e1e24d55687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114659.c
@@ -0,0 +1,62 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int
+my_totalorderf (float const *x, float const *y)
+{
+  int xs = __builtin_signbit (*x);
+  int ys = __builtin_signbit (*y);
+  if (!xs != !ys)
+return xs;
+
+  int xn = __builtin_isnan (*x);
+  int yn = __builtin_isnan (*y);
+  if (!xn != !yn)
+return !xn == !xs;
+  if (!xn)
+return *x <= *y;
+
+  unsigned int extended_sign = -!!xs;
+  union { unsigned int i; float f; } xu = {0}, yu = {0};
+  __builtin_memcpy (&xu.f, x, sizeof (float));
+  __builtin_memcpy (&yu.f, y, sizeof (float));
+  return (xu.i ^ extended_sign) <= (yu.i ^ extended_sign);
+}
+
+static float
+positive_NaNf ()
+{
+  float volatile nan = 0.0f / 0.0f;
+  return (__builtin_signbit (nan) ? - nan : nan);
+}
+
+typedef union { float value; unsigned int word[1]; } memory_float;
+
+static memory_float
+construct_memory_SNaNf (float quiet_value)
+{
+  memory_float m;
+  m.value = quiet_value;
+  m.word[0] ^= (unsigned int) 1 << 22;
+  m.word[0] |= (unsigned int) 1;
+  return m;
+}
+
+memory_float x[7] =
+  {
+{ 0 },
+{ 1e-5 },
+{ 1 },
+{ 1e37 },
+{ 1.0f / 0.0f },
+  };
+
+int
+main ()
+{
+  x[5] = construct_memory_SNaNf (positive_NaNf ());
+  x[6] = (memory_float) { positive_NaNf () };
+  if (! my_totalorderf (&x[5].value, &x[6].value))
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index dc377fa16ce..0639ba426ff 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -837,6 +837,9 @@ vn_reference_eq (const_vn_reference_t const vr1, 
const_vn_reference_t const vr2)
TYPE_VECTOR_SUBPARTS (vr2->type)))
return false;
 }
+  else if (TYPE_MODE (vr1->type) != TYPE_MODE (vr2->type)
+  && !mode_can_transfer_bits (TYPE_MODE (vr1->type)))
+return false;
 
   i = 0;
   j = 0;
@@ -5814,13 +5817,7 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
   if (result
   && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
 {
-  /* Avoid the type punning in case the result mode has padding where
-the op we lookup has not.  */
-  if (TYPE_MODE (TREE_TYPE (result)) != BLKmode
- && maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
-  GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)
-   result = NULL_TREE;
-  else if (CONSTANT_CLASS_P (result))
+  if (CONSTANT_CLASS_P (result))
result = const_unop (VIEW_CONVERT_EXPR, TREE_TYPE (op), result);
   else
{
-- 
2.43.0

Re: [PATCH v4 0/3] aarch64: Add initial support for +fp8 arch extensions

2024-07-31 Thread Richard Sandiford

Claudio Bantaloukas  writes:
> This series introduces initial flags and functionality for the fp8 feature.
>
> Specifically, the following are added:
> - functions that enable constructing valid fpm register values.
> - support for the '+fp8' -march modifier.
> - support for reading and writing the new system register FPMR (Floating 
> Point Mode
>   Register) which configures the new FP8 features
>
> Tested against aarch64-unknown-linux-gnu.
>
> V1 of this patch series had "aarch64: Add march flags for +fp8 arch 
> extensions" as
> cover letter title. Since then, changes in V2 are:
>
> aarch64: Add march flags for +fp8 arch extensions
> - Removed __ARM_FEATURE_FP8 define: will be added once the relevant features 
> are in.
> - Some unnecessary whitespace changes were removed.
> - Helper function names now begin with __arm.
>
> aarch64: Add support for moving fpm system register
> - Removed a misleading comment.
> - Removed unnecessary modifier in .md
>
> aarch64: Add fpm register helper functions.
> - Helper functions and fpm_t types are available unconditionally when 
> including arm_acle.h
>
> Changes in V3 are:
>
> aarch64: Add march flags for +fp8 arch extensions
> - removed unnecessary check-function-bodies check
>
> aarch64: Add support for moving fpm system register
> - added check-function-bodies check
>
> aarch64: Add fpm register helper functions.
> - moved fp8 types and helper functions into a new private header file 
> arm_private_fp8.h
> - arm_neon.h and arm_sve.h now include the new header
> - added tests that check the helpers are available when including arm_neon.h
>   arm_sve.h or arm_sme.h 
>
> Changes in V4 are:
>
> aarch64: Add support for moving fpm system register
> - updated commit message
> - fixed length in .md
> - fixed tests to only exercise register moves for specific sizes
>
> aarch64: Add fpm register helper functions.
> - updated error message in arm_private_fp8.h
>
> Is this ok for master? I do not have merge permissions. Can someone merge 
> this for me please?
>
> Thanks,
> Claudio Bantaloukas
>
>
>
> Claudio Bantaloukas (3):
>   aarch64: Add march flags for +fp8 arch extensions
>   aarch64: Add support for moving fpm system register
>   aarch64: Add fpm register helper functions.

LGTM, thanks.  Pushed to trunk.

Richard

>  gcc/config.gcc|   2 +-
>  .../aarch64/aarch64-option-extensions.def |   2 +
>  gcc/config/aarch64/aarch64.cc |   8 ++
>  gcc/config/aarch64/aarch64.h  |  17 ++-
>  gcc/config/aarch64/aarch64.md |  30 +++--
>  gcc/config/aarch64/arm_neon.h |   1 +
>  gcc/config/aarch64/arm_private_fp8.h  |  80 
>  gcc/config/aarch64/arm_sve.h  |   1 +
>  gcc/config/aarch64/constraints.md |   3 +
>  gcc/doc/invoke.texi   |   2 +
>  .../aarch64/acle/fp8-helpers-neon.c   |  53 
>  .../gcc.target/aarch64/acle/fp8-helpers-sme.c |  12 ++
>  .../gcc.target/aarch64/acle/fp8-helpers-sve.c |  12 ++
>  gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 121 ++
>  14 files changed, 329 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/config/aarch64/arm_private_fp8.h
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c

Re: [PATCH] tree-optimization/115825 - improve unroll estimates for volatile accesses

2024-07-31 Thread Richard Biener

On Wed, 10 Jul 2024, Richard Biener wrote:

> The loop unrolling code assumes that one third of all volatile accesses
> can be possibly optimized away which is of course not true.  This leads
> to excessive unrolling in some cases.  The following tracks the number
> of stmts with side-effects as those are not eliminatable later and
> only assumes one third of the other stmts can be further optimized.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> There's quite some testsuite fallout, mostly because of different rounding
> and a size of 8 now no longer is optimistically optimized to 5 but only 6.
> I can fix that by writing
> 
>   *est_eliminated = (unr_insns - not_elim) / 3;
> 
> as
> 
>   *est_eliminated = unr_insns - not_elim - (unr_insns - not_elim) * 2 / 3;
> 
> to preserve the old rounding behavior.  But for example
> 
> FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 LP64 note (test for 
> warnings, line 56)
> 
> shows
> 
>   size:   3 C::C (_25, &MEM  [(void *)&_ZTT2D1 + 48B]);
> 
> which we now consider not being optimizable (correctly I think) and thus
> the optimistic size reduction isn't enough to get the loop unrolled.
> Previously the computed size of 20 was reduced to 13, exactly the size
> of the not unrolled body.
> 
> So the remaining fallout will be
> 
> +FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 LP64 note (test for 
> warnings
> , line 56)
> +FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 note (test for 
> warnings, lin
> e 66)
> ...
> +FAIL: c-c++-common/ubsan/unreachable-3.c  -std=gnu++14  scan-tree-dump 
> optimized "__builtin___ubsan_handle_builtin_unreachable"
> ...
> +FAIL: c-c++-common/ubsan/unreachable-3.c   -O0   scan-tree-dump optimized 
> "__builtin___ubsan_handle_builtin_unreachable"
> 
> for the latter the issue is __builtin___sanitizer_cov_trace_pc ()
> 
> Does this seem feasible overall?  I can fixup the testcases above
> with #pragma unroll ...

Honza - any comments?

> Thanks,
> Richard.
> 
>   PR tree-optimization/115825
>   * tree-ssa-loop-ivcanon.cc (loop_size::not_eliminatable_after_peeling):
>   New.
>   (loop_size::last_iteration_not_eliminatable_after_peeling): Likewise.
>   (tree_estimate_loop_size): Count stmts with side-effects as
>   not optimistically eliminatable.
>   (estimated_unrolled_size): Compute the number of stmts that can
>   be optimistically eliminated by followup transforms.
>   (try_unroll_loop_completely): Adjust.
> 
>   * gcc.dg/tree-ssa/cunroll-17.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c | 11 +++
>  gcc/tree-ssa-loop-ivcanon.cc   | 35 +-
>  2 files changed, 38 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
> new file mode 100644
> index 000..282db99c883
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -fdump-tree-optimized" } */
> +
> +char volatile v;
> +void for16 (void)
> +{
> +  for (char i = 16; i > 0; i -= 2)
> +v = i;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " ={v} " 1 "optimized" } } */
> diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
> index 5ef24a91917..dd941c31648 100644
> --- a/gcc/tree-ssa-loop-ivcanon.cc
> +++ b/gcc/tree-ssa-loop-ivcanon.cc
> @@ -139,11 +139,16 @@ struct loop_size
>   variable where induction variable starts at known constant.)  */
>int eliminated_by_peeling;
>  
> +  /* Number of instructions that cannot be further optimized in the
> + peeled loop, for example volatile accesses.  */
> +  int not_eliminatable_after_peeling;
> +
>/* Same statistics for last iteration of loop: it is smaller because
>   instructions after exit are not executed.  */
>int last_iteration;
>int last_iteration_eliminated_by_peeling;
> -  
> +  int last_iteration_not_eliminatable_after_peeling;
> +
>/* If some IV computation will become constant.  */
>bool constant_iv;
>  
> @@ -267,8 +272,10 @@ tree_estimate_loop_size (class loop *loop, edge exit, 
> edge edge_to_cancel,
>  
>size->overall = 0;
>size->eliminated_by_peeling = 0;
> +  size->not_eliminatable_after_peeling = 0;
>size->last_iteration = 0;
>size->last_iteration_eliminated_by_peeling = 0;
> +  size->last_iteration_not_eliminatable_after_peeling = 0;
>size->num_pure_calls_on_hot_path = 0;
>size->num_non_pure_calls_on_hot_path = 0;
>size->non_call_stmts_on_hot_path = 0;
> @@ -292,6 +299,7 @@ tree_estimate_loop_size (class loop *loop, edge exit, 
> edge edge_to_cancel,
>   {
> gimple *stmt = gsi_stmt (gsi);
> int num = estimate_num_insns (stmt, &eni_size_weights);
> +   bool not_eliminatable_after_peeling = false;
> bool likely_eliminated = false;
> bool like

Re: [PATCH] libstdc++: Only append "@euro" to locale names for Glibc testing

2024-07-31 Thread Rainer Orth

Hi Jonathan,

>> agreed: while Solaris 11.4 does have a few *.ISO8859-15@euro locales
>>
>> da_DK.ISO8859-15@euro
>> en_GB.ISO8859-15@euro
>> en_US.ISO8859-15@euro
>> sv_SE.ISO8859-15@euro
>>
>> the majority (17) are not.
>
> Ah interesting, I only saw en_US.ISO8859-15@euro on cfarm216, which is
> an interesting one. US locale using Euro symbol for currency?!

don't ask me what they were thinking ;-)  Anyway, I found that both
Solaris cfarm systems only had a subset of the available locales
installed.  An artifact of the exact installation method, I supposed.
Whatever the case, that's fixed now.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH] middle-end/114563 - improve release_pages

2024-07-31 Thread Richard Biener

The following improves release_pages when using the madvise path
to sort the freelist to get more page entries contiguous and possibly
release them.  This populates the unused prev pointer so the reclaim
can then easily unlink from the freelist without re-ordering it.
The paths not having madvise do not keep the memory allocated, so
I left them untouched.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.

I've CCed people messing with release_pages;  This doesn't really
address PR114563 but I thought I post this patch anyway - the
actual issue we run into for the PR is the linear search of
G.free_pages when that list becomes large but a requested allocation
cannot be served from it.

PR middle-end/114563
* ggc-page.cc (page_sort): New qsort comparator.
(release_pages): Sort the free_pages list entries after their
memory block virtual address to improve contiguous memory
chunk release.
---
 gcc/ggc-page.cc | 68 ++---
 1 file changed, 48 insertions(+), 20 deletions(-)

diff --git a/gcc/ggc-page.cc b/gcc/ggc-page.cc
index 4245f843a29..c9d8a8cd8e9 100644
--- a/gcc/ggc-page.cc
+++ b/gcc/ggc-page.cc
@@ -1010,6 +1010,19 @@ free_page (page_entry *entry)
   G.free_pages = entry;
 }
 
+/* Comparison function to sort page_entry after virtual address.  */
+
+static int
+page_sort (const void *pa_, const void *pb_)
+{
+  const page_entry *pa = *(const page_entry * const *)pa_;
+  const page_entry *pb = *(const page_entry * const *)pb_;
+  if ((uintptr_t)pa->page < (uintptr_t)pb->page)
+return -1;
+  else
+return 1;
+}
+
 /* Release the free page cache to the system.  */
 
 static void
@@ -1022,7 +1035,7 @@ release_pages (void)
   char *start;
   size_t len;
   size_t mapped_len;
-  page_entry *next, *prev, *newprev;
+  page_entry *prev;
   size_t free_unit = (GGC_QUIRE_SIZE/2) * G.pagesize;
 
   /* First free larger continuous areas to the OS.
@@ -1031,41 +1044,56 @@ release_pages (void)
  This does not always work because the free_pages list is only
  approximately sorted. */
 
-  p = G.free_pages;
+  auto_vec pages;
   prev = NULL;
+  p = G.free_pages;
   while (p)
 {
+  p->prev = prev;
+  pages.safe_push (p);
+  prev = p;
+  p = p->next;
+}
+  pages.qsort (page_sort);
+
+  for (unsigned i = 0; i < pages.length ();)
+{
+  p = pages[i];
   start = p->page;
-  start_p = p;
+  unsigned start_i = i;
   len = 0;
   mapped_len = 0;
-  newprev = prev;
-  while (p && p->page == start + len)
+  while (i < pages.length () && pages[i]->page == start + len)
 {
+ p = pages[i];
   len += p->bytes;
  if (!p->discarded)
- mapped_len += p->bytes;
- newprev = p;
-  p = p->next;
+   mapped_len += p->bytes;
+ ++i;
 }
   if (len >= free_unit)
 {
-  while (start_p != p)
-{
-  next = start_p->next;
-  free (start_p);
-  start_p = next;
-}
+ for (unsigned j = start_i; j != i; ++j)
+   {
+ p = pages[j];
+ if (!p->prev)
+   {
+ G.free_pages = p->next;
+ if (p->next)
+   p->next->prev = NULL;
+   }
+ else
+   {
+ p->prev->next = p->next;
+ if (p->next)
+   p->next->prev = p->prev;
+   }
+ free (pages[j]);
+   }
   munmap (start, len);
- if (prev)
-   prev->next = p;
-  else
-G.free_pages = p;
   G.bytes_mapped -= mapped_len;
  n1 += len;
- continue;
 }
-  prev = newprev;
}
 
   /* Now give back the fragmented pages to the OS, but keep the address 
-- 
2.43.0

Re: arm: Prevent ICE when doloop dec_set is not PLUS_EXPR

2024-07-31 Thread Christophe Lyon


Hi Andre,

On 7/31/24 11:29, Andre Vieira (lists) wrote:

Hi Christophe,

Thanks for the comments, attached new version for testcase, see below 
new cover letter:


Thanks for the improved cover letter, it is indeed clearer.



This patch refactors and fixes an issue where 
arm_mve_dlstp_check_dec_counter

was making an assumption about the form of what a candidate for a dec_insn.
This dec_insn is the instruction that decreases the loop counter inside a
decrementing loop and we expect it to have the following form:
(set (reg CONDCOUNT)
  (plus (reg CONDCOUNT)
    (const_int)))

Where CONDCOUNT is the loop counter, and const int is the negative constant
used to decrement it.

This patch also improves our search for a valid dec_insn.  Before this 
patch

we'd only look for a dec_insn inside the loop header if the loop latch was
empty.  We now also search the loop header if the loop latch is not 
empty but
the last instruction is not a valid dec_insn.  This could potentially be 
improved

to search all instructions inside the loop latch.

gcc/ChangeLog:

     * config/arm/arm.cc (check_dec_insn): New helper function containing
     code hoisted from...
     (arm_mve_dlstp_check_dec_counter): ... here. Use check_dec_insn to
     check the validity of the candidate dec_insn.

gcc/testsuite/ChangeLog:

     * gcc.targer/arm/mve/dlstp-loop-form.c: New test.


On 30/07/2024 21:31, Christophe Lyon wrote:
I manually tried to exercise the testcase with a cross-compiler, and 
found the same issue as the Linaro CI should have reported (there was 
a temporary breakage).


You can find detailed logs from Linaro in gcc.log.1.xz under 
https://ci.linaro.org/job/tcwg_gcc_check--master-arm-precommit/8357/artifact/artifacts/artifacts.precommit/00-sumfiles/


Basically the testcase fails to compile with loads of
dlstp-loop-form.c:6:9: warning: 'pure' attribute on function returning 
'void' [-Wattributes]

then

dlstp-loop-form.c:7:37: error: unknown type name 'float16x8_t'; did 
you mean 'int16x8_t'?

dlstp-loop-form.c: In function 'n':
dlstp-loop-form.c:18:8: error: subscripted value is neither array nor 
pointer nor vector
dlstp-loop-form.c:21:13: error: passing 'e' {aka 'int'} to argument 2 
of 'vfmsq_m', which expects an MVE vector type


Why would the test pass for you?


Because I tested with a toolchain configured for cortex-m85, which has 
mve.fp enabled by default, which means I didn't realize the testcase 
required arm_v8_1m_mve_fp_ok instead of arm_v8_1m_mve_ok.


Addressed that now.


Thanks, I thought you meant you ran the testsuite with -mcpu=cortex-m85 
in RUNTESTFLAGS.


Regarding the patch, did you consider making the new check_dec_insn 
helper return an rtx (NULL or dec_set) instead of bool?
I think it would save a call to single_set when computing decrementnum, 
but that's nitpicking.


Thanks,

Christophe

Re: [PATCH 01/15] arm: [MVE intrinsics] improve comment for orrq shape

2024-07-31 Thread Christophe Lyon

ping for the series?


On Thu, 11 Jul 2024 at 23:43, Christophe Lyon
 wrote:
>
> Add a comment about the lack of "n" forms for floating-point nor 8-bit
> integers, to make it clearer why we use build_16_32 for MODE_n.
>
> 2024-07-11  Christophe Lyon  
>
> gcc/
> * config/arm/arm-mve-builtins-shapes.cc (binary_orrq_def): Improve 
> comment.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
> b/gcc/config/arm/arm-mve-builtins-shapes.cc
> index ba20c6a8f73..e01939469e3 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -865,7 +865,12 @@ SHAPE (binary_opt_n)
> int16x8_t [__arm_]vorrq_m[_s16](int16x8_t inactive, int16x8_t a, 
> int16x8_t b, mve_pred16_t p)
> int16x8_t [__arm_]vorrq_x[_s16](int16x8_t a, int16x8_t b, mve_pred16_t p)
> int16x8_t [__arm_]vorrq[_n_s16](int16x8_t a, const int16_t imm)
> -   int16x8_t [__arm_]vorrq_m_n[_s16](int16x8_t a, const int16_t imm, 
> mve_pred16_t p)  */
> +   int16x8_t [__arm_]vorrq_m_n[_s16](int16x8_t a, const int16_t imm, 
> mve_pred16_t p)
> +
> +   No "_n" forms for floating-point, nor 8-bit integers:
> +   float16x8_t [__arm_]vorrq[_f16](float16x8_t a, float16x8_t b)
> +   float16x8_t [__arm_]vorrq_m[_f16](float16x8_t inactive, float16x8_t a, 
> float16x8_t b, mve_pred16_t p)
> +   float16x8_t [__arm_]vorrq_x[_f16](float16x8_t a, float16x8_t b, 
> mve_pred16_t p)  */
>  struct binary_orrq_def : public overloaded_base<0>
>  {
>bool
> --
> 2.34.1
>

Re: [PATCH] Fix overwriting files with fs::copy_file on windows

2024-07-31 Thread Björn Schäpers


Am 30.07.2024 um 11:13 schrieb Jonathan Wakely:

On Sun, 24 Mar 2024 at 21:34, Björn Schäpers  wrote:


From: Björn Schäpers 

This fixes i.e. https://github.com/msys2/MSYS2-packages/issues/1937
I don't know if I picked the right way to do it.

When acceptable I think the declaration should be moved into
ops-common.h, since then we could use stat_type and also use that in the
commonly used function.

Manually tested on i686-w64-mingw32.

-- >8 --
libstdc++: Fix overwriting files on windows

The inodes have no meaning on windows, thus all files have an inode of
0. Use a differenz approach to identify equivalent files. As a result
std::filesystem::copy_file did not honor
copy_options::overwrite_existing. Factored the method out of
std::filesystem::equivalent.

libstdc++-v3/Changelog:

 * include/bits/fs_ops.h: Add declaration of
   __detail::equivalent_win32.
 * src/c++17/fs_ops.cc (__detail::equivalent_win32): Implement it
 (fs::equivalent): Use __detail::equivalent_win32, factored the
 old test out.
 * src/filesystem/ops-common.h (_GLIBCXX_FILESYSTEM_IS_WINDOWS):
   Use the function.

Signed-off-by: Björn Schäpers 
---
  libstdc++-v3/include/bits/fs_ops.h   |  8 +++
  libstdc++-v3/src/c++17/fs_ops.cc | 79 +---
  libstdc++-v3/src/filesystem/ops-common.h | 10 ++-
  3 files changed, 60 insertions(+), 37 deletions(-)

diff --git a/libstdc++-v3/include/bits/fs_ops.h 
b/libstdc++-v3/include/bits/fs_ops.h
index 90650c47b46..d10b78a4bdd 100644
--- a/libstdc++-v3/include/bits/fs_ops.h
+++ b/libstdc++-v3/include/bits/fs_ops.h
@@ -40,6 +40,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  namespace filesystem
  {
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+namespace __detail
+{
+  bool
+  equivalent_win32(const wchar_t* p1, const wchar_t* p2, error_code& ec);
+} // namespace __detail
+#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
+
/** @addtogroup filesystem
 *  @{
 */
diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index 61df19753ef..3cc87d45237 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -67,6 +67,49 @@
  namespace fs = std::filesystem;
  namespace posix = std::filesystem::__gnu_posix;

+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+bool
+fs::__detail::equivalent_win32(const wchar_t* p1, const wchar_t* p2,
+  error_code& ec)
+{
+  struct auto_handle {
+explicit auto_handle(const path& p_)
+: handle(CreateFileW(p_.c_str(), 0,
+   FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
+   0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
+{ }
+
+~auto_handle()
+{ if (*this) CloseHandle(handle); }
+
+explicit operator bool() const
+{ return handle != INVALID_HANDLE_VALUE; }
+
+bool get_info()
+{ return GetFileInformationByHandle(handle, &info); }
+
+HANDLE handle;
+BY_HANDLE_FILE_INFORMATION info;
+  };
+  auto_handle h1(p1);
+  auto_handle h2(p2);
+  if (!h1 || !h2)
+{
+  if (!h1 && !h2)
+   ec = __last_system_error();
+  return false;
+}
+  if (!h1.get_info() || !h2.get_info())
+{
+  ec = __last_system_error();
+  return false;
+}
+  return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
+&& h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
+&& h1.info.nFileIndexLow == h2.info.nFileIndexLow;
+}
+#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
+
  fs::path
  fs::absolute(const path& p)
  {
@@ -858,41 +901,7 @@ fs::equivalent(const path& p1, const path& p2, error_code& 
ec) noexcept
if (st1.st_mode != st2.st_mode || st1.st_dev != st2.st_dev)
 return false;

-  struct auto_handle {
-   explicit auto_handle(const path& p_)
-   : handle(CreateFileW(p_.c_str(), 0,
- FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
- 0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
-   { }
-
-   ~auto_handle()
-   { if (*this) CloseHandle(handle); }
-
-   explicit operator bool() const
-   { return handle != INVALID_HANDLE_VALUE; }
-
-   bool get_info()
-   { return GetFileInformationByHandle(handle, &info); }
-
-   HANDLE handle;
-   BY_HANDLE_FILE_INFORMATION info;
-  };
-  auto_handle h1(p1);
-  auto_handle h2(p2);
-  if (!h1 || !h2)
-   {
- if (!h1 && !h2)
-   ec = __last_system_error();
- return false;
-   }
-  if (!h1.get_info() || !h2.get_info())
-   {
- ec = __last_system_error();
- return false;
-   }
-  return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
-   && h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
-   && h1.info.nFileIndexLow == h2.info.nFileIndexLow;
+  return __detail::equivalent_win32(p1.c_str(), p2.c_str(), ec);
  #else
return st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino;
  #endif
diff --git a/libstdc++-v3/src/filesystem/

RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-31 Thread Prathamesh Kulkarni



> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: Tuesday, July 30, 2024 4:44 PM
> To: Jakub Jelinek ; Richard Biener
> 
> Cc: Richard Sandiford ; gcc-
> patc...@gcc.gnu.org
> Subject: RE: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> > -Original Message-
> > From: Jakub Jelinek 
> > Sent: Tuesday, July 30, 2024 3:16 PM
> > To: Richard Biener 
> > Cc: Richard Sandiford ; Prathamesh
> Kulkarni
> > ; gcc-patches@gcc.gnu.org
> > Subject: Re: Support streaming of poly_int for offloading when it's
> > degree <= accel's NUM_POLY_INT_COEFFS
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Jul 30, 2024 at 11:25:42AM +0200, Richard Biener wrote:
> > > Only "relevant" stuff should be streamed - the offload code and
> all
> > > trees refered to.
> >
> > Yeah.
> >
> > > > > I think all current issues are because of poly-* leaking in
> for
> > > > > cases where a non-poly would have worked fine, but I have not
> > had
> > > > > a look myself.
> > > >
> > > > One of the cases that Prathamesh mentions is streaming the mode
> > sizes.
> > > > Are those modes "offload target modes" or "host modes"?  It
> seems
> > > > like it shouldn't be an error for the host to have VLA modes per
> > se.
> > > > It's just that those modes can't be used in the host/offload
> > interface.
> > >
> > > There's a requirement that a mode mapping exists from the host to
> > > target enum machine_mode.  I don't remember exactly how we compute
> > > that mapping and whether streaming of some data (and thus poly-
> int)
> > > are part of this.
> >
> > During streaming out, the code records what machine modes are being
> > streamed (in streamer_mode_table).
> > For those modes (and their inner modes) then lto_write_mode_table
> > should stream a table with mode details like class, bits, size,
> inner
> > mode, nunits, real mode format if any, etc.
> > That table is then streamed in in the offloading compiler and it
> > attempts to find corresponding modes (and emits fatal_error if there
> > is no such mode; consider say x86_64 long double with XFmode being
> > used in offloading code which doesn't have XFmode support).
> > Now, because Richard S. changed GET_MODE_SIZE etc. to give poly_int
> > rather than int, this has been changed to use bp_pack_poly_value;
> but
> > that relies on the same number of coefficients for poly_int, which
> is
> > not the case when e.g. offloading aarch64 to gcn or nvptx.
> Indeed, for the minimal test:
> int main()
> {
>   int x;
>   #pragma omp target map (to: x)
>   {
> x = 0;
>   }
>   return x;
> }
> 
> Streaming out mode_table from AArch64 shows:
> mode = SI, mclass = 2, size = 4, prec = 32 mode = DI, mclass = 2, size
> = 8, prec = 64
> 
> While streaming-in for nvptx shows:
> mclass = 2, size = 4, prec = 0
> 
> The discrepancy happens because of differing value of
> NUM_POLY_INT_COEFFS between AArch64 and nvptx.
> From AArch64 it streams out size and prec as <4, 0> and <32, 0>
> respectively, where 0 comes from coeffs[1].
> While streaming-in from nvptx, since NUM_POLY_INT_COEFFS is 1, it
> incorrectly reads size as 4, and prec as 0.
> >
> > From what I can see, this mode table handling are the only uses of
> > bp_pack_poly_value.  So the options are either to stream at the
> start
> > of the mode table the NUM_POLY_INT_COEFFS value and in
> > bp_unpack_poly_value pass to it what we've read and fill in any
> > remaining coeffs with zeros, or in each bp_pack_poly_value stream
> the
> > number of coefficients and then stream that back in and fill in
> > remaining ones (and diagnose if it would try to read non-zero
> > coefficient which isn't stored).
> This is the approach taken in proposed patch (stream-out degree of
> poly_int followed by coeffs).
> 
> > I think streaming NUM_POLY_INT_COEFFS once would be more compact (at
> > least for non-aarch64/riscv targets).
> I will try implementing this, thanks.
Hi,
The attached patch streams-out NUM_POLY_INT_COEFFS only once at beginning of 
mode_table, which should make LTO bytecode more compact
for non VLA hosts. And changes streaming-in of poly_int as follows:

if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
  for (i = 0; i < host_num_poly_int_coeffs; i++)
poly_int.coeffs[i] = stream_in coeff;

  /* Set remaining coeffs to zero (like zero-extension).  */
  for (; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = 0;
}
else
{
  for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = stream_in coeff;

  /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */
  for (; i < host_num_poly_int_coeffs; i++)
{
  val = stream_in coeff;
  if (val != 0)
error ();
}
}

There are a couple of issues in the patch:
(1) The patch streams out NUM_POLY_INT_COEFFS at beginning of mode_table, which 
should work for bp_unpack_poly_value,
(since AFAI

RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-31 Thread Prathamesh Kulkarni



> -Original Message-
> From: Tobias Burnus 
> Sent: Tuesday, July 30, 2024 6:08 PM
> To: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> Prathamesh Kulkarni wrote:
> > Thanks for your suggestions on RFC email, the attached patch adds
> support for streaming of poly_int when it's degree <= accel's
> NUM_POLY_INT_COEFFS.
> 
> First, thanks a lot for your patch!
> 
> Secondly, it seems as if this patch is indented to fully or partially
> fix the following PRs.
> If so, can you add the PR to the commit log such that both "git log"
> will help finding the problem report and the commit will show up in
> the issue?
Hi Tobias,
Thanks for the pointers to relevant Bugzilla PRs! I have included them in my 
latest patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658866.html

Thanks,
Prathamesh
> 
> 
> https://gcc.gnu.org/PR111937
>PR ipa/111937
>offloading from x86_64-linux-gnu to riscv*-linux-gnu will have
> issues
> 
> https://gcc.gnu.org/PR96265
>PR ipa/96265
>offloading to nvptx-none from aarch64-linux-gnu (and
> riscv*-linux-gnu) does not work
> 
> And - marked as duplicate of the latter:
> 
> https://gcc.gnu.org/PR114174
>PR lto/114174
>[aarch64] Offloading to nvptx-none
> 
> Thanks,
> 
> Tobias

[COMMITTED PATCH 1/5] testsuite: libgomp: fix dg-do run typo

2024-07-31 Thread Sam James

'dg-run' is not a valid dejagnu directive, 'dg-do run' is needed here
for the test to be executed.

That said, it actually seems to be executed for me anyway, presumably
a default in the directory, but let's fix it to be consistent with
other uses in the tree and in that test directory even.

libgomp/ChangeLog:
* testsuite/libgomp.c++/declare-target-indirect-1.C: Fix 'dg-run' typo.
---
Committed as obvious.

 libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C 
b/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C
index 1eac6b3fa96b..bd84b492feec 100644
--- a/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C
+++ b/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C
@@ -1,4 +1,4 @@
-// { dg-run }
+// { dg-do run }
 
 #pragma omp begin declare target indirect
 class C

-- 
2.45.2

[COMMITTED PATCH 2/5] testsuite: fix 'dg-do-compile' typos

2024-07-31 Thread Sam James

We want 'dg-do compile', not 'dg-do-compile'. Fix that.

PR target/69194
PR c++/92024
PR c++/110057
* c-c++-common/Wshadow-1.c: Fix 'dg-do compile' typo.
* g++.dg/tree-ssa/devirt-array-destructor-1.C: Likewise.
* g++.dg/tree-ssa/devirt-array-destructor-2.C: Likewise.
* gcc.target/arm/pr69194.c: Likewise.
---
Committed as obvious.

 gcc/testsuite/c-c++-common/Wshadow-1.c| 2 +-
 gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C | 2 +-
 gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C | 2 +-
 gcc/testsuite/gcc.target/arm/pr69194.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/Wshadow-1.c 
b/gcc/testsuite/c-c++-common/Wshadow-1.c
index 4d1edf07f002..3cd99e9087ec 100644
--- a/gcc/testsuite/c-c++-common/Wshadow-1.c
+++ b/gcc/testsuite/c-c++-common/Wshadow-1.c
@@ -1,4 +1,4 @@
-/* { dg-do-compile } */
+/* { dg-do compile } */
 /* { dg-additional-options "-Wshadow=local -Wno-shadow=compatible-local" } */
 int c;
 void foo(int *c, int *d)   /* { dg-bogus   "Wshadow" } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
index ce8dc2a57cd7..eed9a7c17698 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
@@ -1,5 +1,5 @@
 // PR c++/110057
-/* { dg-do-compile } */
+/* { dg-do compile } */
 /* Virtual calls should be devirtualized because we know dynamic type of 
object in array at compile time */
 /* { dg-options "-O3 -fdump-tree-optimized -fno-inline"  } */
 
diff --git a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C 
b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C
index 6b44dc1a4eea..448f3739700f 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C
@@ -1,5 +1,5 @@
 // PR c++/110057
-/* { dg-do-compile } */
+/* { dg-do compile } */
 /* Virtual calls should be devirtualized because we know dynamic type of 
object in array at compile time */
 /* { dg-options "-O3 -fdump-tree-optimized -fno-inline"  } */
 
diff --git a/gcc/testsuite/gcc.target/arm/pr69194.c 
b/gcc/testsuite/gcc.target/arm/pr69194.c
index 477d5f92c8ec..dc1b0d306c2b 100644
--- a/gcc/testsuite/gcc.target/arm/pr69194.c
+++ b/gcc/testsuite/gcc.target/arm/pr69194.c
@@ -1,5 +1,5 @@
 /* PR target/69194 */
-/* { dg-do-compile } */
+/* { dg-do compile } */
 /* { dg-require-effective-target arm_neon_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_neon } */
-- 
2.45.2

[COMMITTED PATCH 3/5] testsuite: fix 'dg-do-preprocess' typo

2024-07-31 Thread Sam James

We want 'dg-do preprocess', not 'dg-do-preprocess'. Fix that.

PR target/106828
* g++.target/loongarch/pr106828.C: Fix 'dg-do compile' typo.
---
Committed as obvious.

 gcc/testsuite/g++.target/loongarch/pr106828.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/loongarch/pr106828.C 
b/gcc/testsuite/g++.target/loongarch/pr106828.C
index 190c1db715f4..0d13cbbd5153 100644
--- a/gcc/testsuite/g++.target/loongarch/pr106828.C
+++ b/gcc/testsuite/g++.target/loongarch/pr106828.C
@@ -1,4 +1,4 @@
-/* { dg-do-preprocess } */
+/* { dg-do preprocess } */
 /* { dg-options "-mabi=lp64d -fsanitize=address" } */
 
 /* Tests whether the compiler supports compile option '-fsanitize=address'.  */
-- 
2.45.2

[COMMITTED PATCH 4/5] testsuite: fix dg-require-effective-target order vs dg-additional-sources

2024-07-31 Thread Sam James

Per gccint, 'dg-require-effective-target' must come before any
'dg-additional-sources' directives. Fix a handful of deviant cases.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/aapcs64/func-ret-3.c: Fix 
dg-require-effective-target directive order.
* gcc.target/aarch64/aapcs64/func-ret-4.c: Likewise.
* gfortran.dg/PR100914.f90: Likewise.

libgomp/ChangeLog:
* testsuite/libgomp.c++/pr24455.C: Fix dg-require-effective-target 
directive order.
* testsuite/libgomp.c/pr24455.c: Likewise.
---
Committed as obvious.

 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c | 2 +-
 gcc/testsuite/gfortran.dg/PR100914.f90| 2 +-
 libgomp/testsuite/libgomp.c++/pr24455.C   | 2 +-
 libgomp/testsuite/libgomp.c/pr24455.c | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
index 1d35ebf14b4b..ebd2e8dd8791 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
@@ -4,9 +4,9 @@
in AAPCS64 \S 4.3.5.  */
 
 /* { dg-do run { target aarch64-*-* } } */
+/* { dg-require-effective-target aarch64_big_endian } */
 /* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
-/* { dg-require-effective-target aarch64_big_endian } */
 
 #ifndef IN_FRAMEWORK
 #define TESTFILE "func-ret-3.c"
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
index 15e1408c62d7..03d42f3dd047 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
@@ -5,9 +5,9 @@
are treated as general composite types.  */
 
 /* { dg-do run { target aarch64*-*-* } } */
+/* { dg-require-effective-target aarch64_big_endian } */
 /* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
-/* { dg-require-effective-target aarch64_big_endian } */
 
 #ifndef IN_FRAMEWORK
 #define TESTFILE "func-ret-4.c"
diff --git a/gcc/testsuite/gfortran.dg/PR100914.f90 
b/gcc/testsuite/gfortran.dg/PR100914.f90
index 8588157e59c0..161f1265fa21 100644
--- a/gcc/testsuite/gfortran.dg/PR100914.f90
+++ b/gcc/testsuite/gfortran.dg/PR100914.f90
@@ -1,7 +1,7 @@
 ! Fails on x86 targets where sizeof(long double) == 16.
 ! { dg-do run }
-! { dg-additional-sources PR100914.c }
 ! { dg-require-effective-target fortran_real_c_float128 }
+! { dg-additional-sources PR100914.c }
 ! { dg-additional-options "-Wno-pedantic" }
 !
 ! Test the fix for PR100914
diff --git a/libgomp/testsuite/libgomp.c++/pr24455.C 
b/libgomp/testsuite/libgomp.c++/pr24455.C
index 8256b6693c8f..9816d37461a5 100644
--- a/libgomp/testsuite/libgomp.c++/pr24455.C
+++ b/libgomp/testsuite/libgomp.c++/pr24455.C
@@ -1,6 +1,6 @@
 // { dg-do run }
-// { dg-additional-sources pr24455-1.C }
 // { dg-require-effective-target tls_runtime }
+// { dg-additional-sources pr24455-1.C }
 // { dg-options "-fno-extern-tls-init" }
 
 extern "C" void abort (void);
diff --git a/libgomp/testsuite/libgomp.c/pr24455.c 
b/libgomp/testsuite/libgomp.c/pr24455.c
index 8af449e7b5c3..4284c1095293 100644
--- a/libgomp/testsuite/libgomp.c/pr24455.c
+++ b/libgomp/testsuite/libgomp.c/pr24455.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
-/* { dg-additional-sources pr24455-1.c } */
 /* { dg-require-effective-target tls_runtime } */
+/* { dg-additional-sources pr24455-1.c } */
 
 extern void abort (void);
 
-- 
2.45.2

[COMMITTED PATCH 5/5] testsuite: fix dg-require-* order vs dg-additional-sources

2024-07-31 Thread Sam James

Per gccint, 'dg-require-*' must come before any
'dg-additional-sources' directives. Fix a handful of deviant cases.

* gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: Fix 
dg-require-profiling
directive order.
* gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: Likewise.
---
Committed as obvious.

 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c | 2 +-
 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c 
b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
index b57d30f91637..f6ec71a9298d 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target lto } */
-/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
 /* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
 /* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate" } */
 
 #ifdef FOR_AUTOFDO_TESTING
diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c 
b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
index 6b5ae93214a5..2ace3c3b9bf1 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target lto } */
-/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
 /* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
 /* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate" } */
 
 #ifdef FOR_AUTOFDO_TESTING
-- 
2.45.2

Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-31 Thread Jakub Jelinek

On Wed, Jul 31, 2024 at 02:58:34PM +, Prathamesh Kulkarni wrote:
> There are a couple of issues in the patch:
> (1) The patch streams out NUM_POLY_INT_COEFFS at beginning of mode_table, 
> which should work for bp_unpack_poly_value,
> (since AFAIK, it's only called by lto_input_mode_table). However, I am not 
> sure if we will always call lto_input_mode_table
> before streaming in poly_int64 / poly_uint64 ? Or should we stream out host 
> NUM_POLY_INT_COEFFS at a different place in LTO bytecode ?

The poly_ints unpacked in lto_input_mode_table obviously are done after
that.
If you use it for streaming in from other sections, you need to check if
they can't be read before the mode table.
And, you don't really need to stream out/in the number for non-offloading
LTO, that should use just NUM_POLY_INT_COEFFS.

> --- a/gcc/data-streamer-in.cc
> +++ b/gcc/data-streamer-in.cc
> @@ -183,9 +183,7 @@ poly_uint64
>  streamer_read_poly_uint64 (class lto_input_block *ib)
>  {
>poly_uint64 res;
> -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> -res.coeffs[i] = streamer_read_uhwi (ib);
> -  return res;
> +  POLY_INT_READ_COMMON(res, streamer_read_uhwi (ib))

Why is this macro and not an inline function or inline function template
oor inline function calling a lambda?
Even if it has to be a macro (I don't see why), it should be defined such
that you need to add ; at the end, ideally not include the return res;
in there because it is just too weird if used like that (or make it return
what will be returned and use return POLY_INT_READ_COMMON...)
and there needs to be a space in between COMMON and (.

> @@ -194,9 +192,7 @@ poly_int64
>  streamer_read_poly_int64 (class lto_input_block *ib)
>  {
>poly_int64 res;
> -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> -res.coeffs[i] = streamer_read_hwi (ib);
> -  return res;
> +  POLY_INT_READ_COMMON(res, streamer_read_hwi (ib))
>  }

Ditto.
> +   __typeof(x.coeffs[0]) val = streamer_read_coeff;  \

You certainly can't use a GCC extension like __typeof here.
Plus missing space.

> +   if (val != 0) \
> + fatal_error (input_location,\
> +  "Degree of % exceeds "  \

Diagnostics shouldn't start with uppercase letter.

> +  "% (%d)",\
> +  NUM_POLY_INT_COEFFS);  \
> + }   \
> +}
> \
> + \
> +  return x;  \
> +}
> +
> --- a/gcc/poly-int.h
> +++ b/gcc/poly-int.h
> @@ -354,6 +354,10 @@ struct poly_result
> ? (void) ((RES).coeffs[I] = VALUE) \
> : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>  
> +/* Number of bits needed to represent maximum value of
> +   NUM_POLY_INT_COEFFS defined by any target.  */
> +#define MAX_NUM_POLY_INT_COEFFS_BITS (2)

Why (2) and not just 2?
There should be some static_assert to make sure it is a maximum for any
target.

> +   if (!integer_zerop (val))
> + fatal_error (input_location,
> +  "Degree of % exceeds "

Again.
> +  "%");
> + }
> +}
>  }

Jakub

Re: [PATCH] Fix overwriting files with fs::copy_file on windows

2024-07-31 Thread Jonathan Wakely

On Wed, 31 Jul 2024 at 15:42, Björn Schäpers  wrote:
>
> Am 30.07.2024 um 11:13 schrieb Jonathan Wakely:
> > On Sun, 24 Mar 2024 at 21:34, Björn Schäpers  wrote:
> >>
> >> From: Björn Schäpers 
> >>
> >> This fixes i.e. https://github.com/msys2/MSYS2-packages/issues/1937
> >> I don't know if I picked the right way to do it.
> >>
> >> When acceptable I think the declaration should be moved into
> >> ops-common.h, since then we could use stat_type and also use that in the
> >> commonly used function.
> >>
> >> Manually tested on i686-w64-mingw32.
> >>
> >> -- >8 --
> >> libstdc++: Fix overwriting files on windows
> >>
> >> The inodes have no meaning on windows, thus all files have an inode of
> >> 0. Use a differenz approach to identify equivalent files. As a result
> >> std::filesystem::copy_file did not honor
> >> copy_options::overwrite_existing. Factored the method out of
> >> std::filesystem::equivalent.
> >>
> >> libstdc++-v3/Changelog:
> >>
> >>  * include/bits/fs_ops.h: Add declaration of
> >>__detail::equivalent_win32.
> >>  * src/c++17/fs_ops.cc (__detail::equivalent_win32): Implement it
> >>  (fs::equivalent): Use __detail::equivalent_win32, factored the
> >>  old test out.
> >>  * src/filesystem/ops-common.h (_GLIBCXX_FILESYSTEM_IS_WINDOWS):
> >>Use the function.
> >>
> >> Signed-off-by: Björn Schäpers 
> >> ---
> >>   libstdc++-v3/include/bits/fs_ops.h   |  8 +++
> >>   libstdc++-v3/src/c++17/fs_ops.cc | 79 +---
> >>   libstdc++-v3/src/filesystem/ops-common.h | 10 ++-
> >>   3 files changed, 60 insertions(+), 37 deletions(-)
> >>
> >> diff --git a/libstdc++-v3/include/bits/fs_ops.h 
> >> b/libstdc++-v3/include/bits/fs_ops.h
> >> index 90650c47b46..d10b78a4bdd 100644
> >> --- a/libstdc++-v3/include/bits/fs_ops.h
> >> +++ b/libstdc++-v3/include/bits/fs_ops.h
> >> @@ -40,6 +40,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >>   namespace filesystem
> >>   {
> >> +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
> >> +namespace __detail
> >> +{
> >> +  bool
> >> +  equivalent_win32(const wchar_t* p1, const wchar_t* p2, error_code& ec);
> >> +} // namespace __detail
> >> +#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
> >> +
> >> /** @addtogroup filesystem
> >>  *  @{
> >>  */
> >> diff --git a/libstdc++-v3/src/c++17/fs_ops.cc 
> >> b/libstdc++-v3/src/c++17/fs_ops.cc
> >> index 61df19753ef..3cc87d45237 100644
> >> --- a/libstdc++-v3/src/c++17/fs_ops.cc
> >> +++ b/libstdc++-v3/src/c++17/fs_ops.cc
> >> @@ -67,6 +67,49 @@
> >>   namespace fs = std::filesystem;
> >>   namespace posix = std::filesystem::__gnu_posix;
> >>
> >> +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
> >> +bool
> >> +fs::__detail::equivalent_win32(const wchar_t* p1, const wchar_t* p2,
> >> +  error_code& ec)
> >> +{
> >> +  struct auto_handle {
> >> +explicit auto_handle(const path& p_)
> >> +: handle(CreateFileW(p_.c_str(), 0,
> >> +   FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
> >> +   0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
> >> +{ }
> >> +
> >> +~auto_handle()
> >> +{ if (*this) CloseHandle(handle); }
> >> +
> >> +explicit operator bool() const
> >> +{ return handle != INVALID_HANDLE_VALUE; }
> >> +
> >> +bool get_info()
> >> +{ return GetFileInformationByHandle(handle, &info); }
> >> +
> >> +HANDLE handle;
> >> +BY_HANDLE_FILE_INFORMATION info;
> >> +  };
> >> +  auto_handle h1(p1);
> >> +  auto_handle h2(p2);
> >> +  if (!h1 || !h2)
> >> +{
> >> +  if (!h1 && !h2)
> >> +   ec = __last_system_error();
> >> +  return false;
> >> +}
> >> +  if (!h1.get_info() || !h2.get_info())
> >> +{
> >> +  ec = __last_system_error();
> >> +  return false;
> >> +}
> >> +  return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
> >> +&& h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
> >> +&& h1.info.nFileIndexLow == h2.info.nFileIndexLow;
> >> +}
> >> +#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
> >> +
> >>   fs::path
> >>   fs::absolute(const path& p)
> >>   {
> >> @@ -858,41 +901,7 @@ fs::equivalent(const path& p1, const path& p2, 
> >> error_code& ec) noexcept
> >> if (st1.st_mode != st2.st_mode || st1.st_dev != st2.st_dev)
> >>  return false;
> >>
> >> -  struct auto_handle {
> >> -   explicit auto_handle(const path& p_)
> >> -   : handle(CreateFileW(p_.c_str(), 0,
> >> - FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
> >> - 0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
> >> -   { }
> >> -
> >> -   ~auto_handle()
> >> -   { if (*this) CloseHandle(handle); }
> >> -
> >> -   explicit operator bool() const
> >> -   { return handle != INVALID_HANDLE_VALUE; }
> >> -
> >> -   bool get_info()
> >> -   { return GetFileInformationByHandle(handle, &info); }
> >> -
> >> -   HANDLE handle;
> >> -   BY_HANDLE_FILE_INFORMA

[PATCH] dir-locals: apply our C settings in C++ also

2024-07-31 Thread Arsen Arsenović

We haven't been applying our settings to our C++.  This patch fixes
that.

Sadly, it seems that the only documented way to apply settings to
multiple modes is to repeat them.  I thought that we can provide a list
of modes to apply, but that seems to not be the case (even thought it
happened to work on my machine).

As a result, C-h C-v fill-column now shows:

  This variable’s value is directory-local, set by the file
  ‘/home/arsen/gcc/pristine/.dir-locals.el’.

As this could affect peoples workflows, I'm posting as a heads-up and
sanity check.

OK for trunk?

TIA, have a lovely day.
-- >8 --
This also works with Emacs 30 Tree-Sitter C and C++ modes, as they are
submodes.

ChangeLog:

* .dir-locals.el: Change c-mode to a list of C, C++ and ObjC
modes that Emacs currently provides.
---
 .dir-locals.el | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/.dir-locals.el b/.dir-locals.el
index fa031cbded99..2c12b3866633 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -18,6 +18,10 @@
  (tcl-continued-indent-level . 4)
  (indent-tabs-mode . t)))
  (nil . ((bug-reference-url-format . "https://gcc.gnu.org/PR%s";)))
+ ;; Please keep C and C++ in sync.
  (c-mode . ((c-file-style . "GNU")
(indent-tabs-mode . t)
-   (fill-column . 79
+   (fill-column . 79)))
+ (c++-mode . ((c-file-style . "GNU")
+ (indent-tabs-mode . t)
+ (fill-column . 79
-- 
2.45.2

Re: [PATCH] middle-end/114563 - improve release_pages

2024-07-31 Thread Andi Kleen

On Wed, Jul 31, 2024 at 04:02:22PM +0200, Richard Biener wrote:
> The following improves release_pages when using the madvise path
> to sort the freelist to get more page entries contiguous and possibly
> release them.  This populates the unused prev pointer so the reclaim
> can then easily unlink from the freelist without re-ordering it.
> The paths not having madvise do not keep the memory allocated, so
> I left them untouched.
> 
> Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> I've CCed people messing with release_pages;  This doesn't really
> address PR114563 but I thought I post this patch anyway - the
> actual issue we run into for the PR is the linear search of
> G.free_pages when that list becomes large but a requested allocation
> cannot be served from it.
> 
>   PR middle-end/114563
>   * ggc-page.cc (page_sort): New qsort comparator.
>   (release_pages): Sort the free_pages list entries after their
>   memory block virtual address to improve contiguous memory
>   chunk release.

I saw this in a profile some time ago and tried it with a slightly
different patch. Instead of a full sort it uses an array to keep
multiple free lists. But I couldn't find any speed ups in non checking
builds later.

My feeling is that an array is probably more efficient.

I guess should compare both on that PR.


diff --git a/gcc/ggc-page.cc b/gcc/ggc-page.cc
index 4245f843a29f..af1627b002c6 100644
--- a/gcc/ggc-page.cc
+++ b/gcc/ggc-page.cc
@@ -234,6 +234,8 @@ static struct
 }
 inverse_table[NUM_ORDERS];
 
+struct free_list;
+
 /* A page_entry records the status of an allocation page.  This
structure is dynamically sized to fit the bitmap in_use_p.  */
 struct page_entry
@@ -251,6 +253,9 @@ struct page_entry
  of the host system page size.)  */
   size_t bytes;
 
+  /* Free list of this page size.  */
+  struct free_list *free_list;
+
   /* The address at which the memory is allocated.  */
   char *page;
 
@@ -368,6 +373,15 @@ struct free_object
 };
 #endif
 
+constexpr int num_free_list = 8;
+
+/* A free_list for pages with BYTES size.  */
+struct free_list
+{
+  size_t bytes;
+  page_entry *free_pages;
+};
+
 /* The rest of the global variables.  */
 static struct ggc_globals
 {
@@ -412,8 +426,8 @@ static struct ggc_globals
   int dev_zero_fd;
 #endif
 
-  /* A cache of free system pages.  */
-  page_entry *free_pages;
+  /* A cache of free system pages. Entry 0 is fallback.  */
+  struct free_list free_lists[num_free_list];
 
 #ifdef USING_MALLOC_PAGE_GROUPS
   page_group *page_groups;
@@ -754,6 +768,26 @@ clear_page_group_in_use (page_group *group, char *page)
 }
 #endif
 
+/* Find a free list for ENTRY_SIZE.  */
+
+static inline struct free_list *
+find_free_list (size_t entry_size)
+{
+  int i;
+  for (i = 1; i < num_free_list; i++)
+{
+  if (G.free_lists[i].bytes == entry_size)
+   return &G.free_lists[i];
+  if (G.free_lists[i].bytes == 0)
+   {
+ G.free_lists[i].bytes = entry_size;
+ return &G.free_lists[i];
+   }
+}
+  /* Fallback.  */
+  return &G.free_lists[0];
+}
+
 /* Allocate a new page for allocating objects of size 2^ORDER,
and return an entry for it.  The entry is not added to the
appropriate page_table list.  */
@@ -770,6 +804,7 @@ alloc_page (unsigned order)
 #ifdef USING_MALLOC_PAGE_GROUPS
   page_group *group;
 #endif
+  struct free_list *free_list;
 
   num_objects = OBJECTS_PER_PAGE (order);
   bitmap_size = BITMAP_SIZE (num_objects + 1);
@@ -782,8 +817,10 @@ alloc_page (unsigned order)
   entry = NULL;
   page = NULL;
 
+  free_list = find_free_list (entry_size);
+
   /* Check the list of free pages for one we can use.  */
-  for (pp = &G.free_pages, p = *pp; p; pp = &p->next, p = *pp)
+  for (pp = &free_list->free_pages, p = *pp; p; pp = &p->next, p = *pp)
 if (p->bytes == entry_size)
   break;
 
@@ -816,7 +853,7 @@ alloc_page (unsigned order)
   /* We want just one page.  Allocate a bunch of them and put the
 extras on the freelist.  (Can only do this optimization with
 mmap for backing store.)  */
-  struct page_entry *e, *f = G.free_pages;
+  struct page_entry *e, *f = free_list->free_pages;
   int i, entries = GGC_QUIRE_SIZE;
 
   page = alloc_anon (NULL, G.pagesize * GGC_QUIRE_SIZE, false);
@@ -833,12 +870,13 @@ alloc_page (unsigned order)
  e = XCNEWVAR (struct page_entry, page_entry_size);
  e->order = order;
  e->bytes = G.pagesize;
+ e->free_list = free_list;
  e->page = page + (i << G.lg_pagesize);
  e->next = f;
  f = e;
}
 
-  G.free_pages = f;
+  free_list->free_pages = f;
 }
   else
 page = alloc_anon (NULL, entry_size, true);
@@ -904,12 +942,13 @@ alloc_page (unsigned order)
  e = XCNEWVAR (struct page_entry, page_entry_size);
  e->order = order;
  e->bytes = G.pagesize;
+ e->free_list = free_list;

Re: [RFH PATCH] c++: Implement C++26 P2963R3 - Ordering of constraints involving fold expressions [PR115746]

2024-07-31 Thread Patrick Palka

On Tue, 30 Jul 2024, Jason Merrill wrote:

> On 7/29/24 5:32 PM, Patrick Palka wrote:
> > On Mon, 29 Jul 2024, Jakub Jelinek wrote:
> > 
> > > On Fri, Jul 26, 2024 at 06:00:12PM -0400, Patrick Palka wrote:
> > > > On Fri, 26 Jul 2024, Jakub Jelinek wrote:
> > > > 
> > > > > On Fri, Jul 26, 2024 at 04:42:36PM -0400, Patrick Palka wrote:
> > > > > > > // P2963R3 - Ordering of constraints involving fold expressions
> > > > > > > // { dg-do compile { target c++20 } }
> > > > > > > 
> > > > > > > template  concept C = (__is_same (T, int) && ...);
> > > > > > > template 
> > > > > > > struct S {
> > > > > > >template  requires (C)
> > > > > > >static constexpr bool foo () { return true; }
> > > > > > > };
> > > > > > > 
> > > > > > > static_assert (S::foo  ());
> > > > > > > 
> > > > > > > somehow the template parameter mapping needs to be remembered even
> > > > > > > for the
> > > > > > > fold expanded constraint, right now the patch will see the pack is
> > > > > > > T,
> > > > > > > which is level 1 index 0, but args aren't arguments of the C
> > > > > > > concept,
> > > > > > > but of the foo function template.
> > > > > > > One can also use requires (C) etc., no?
> > > > > > 
> > > > > > It seems the problem is FOLD_EXPR_PACKS is currently set to the
> > > > > > parameter packs used inside the non-normalized constraints, but I
> > > > > > think
> > > > > > what we really need are the packs used in the normalized
> > > > > > constraints,
> > > > > > specifically the packs used in the target of each parameter mapping
> > > > > > of
> > > > > > each atomic constraint?
> > > > > 
> > > > > But in that case there might be no packs at all.
> > > > > 
> > > > > template  C = true;
> > > > > template  requires (C && ...)
> > > > > constexpr bool foo () { return true; }
> > > > > 
> > > > > If normalized C is just true, it doesn't use any packs.
> > > > > But the [temp.constr.fold] wording assumes it is a pack expansion and
> > > > > that
> > > > > there is at least one pack expansion parameter, otherwise N wouldn't
> > > > > be
> > > > > defined.
> > > > 
> > > > Hmm yeah, I see what you mean.  That seems to be an edge case that's not
> > > > fully accounted for by the wording.
> 
> I agree the wording is unclear, but it seems necessary to me that T is a pack
> expansion parameter, even if it isn't mentioned by the normalized constraint.
> 
> > > > One thing that's unclear to me in that wording is what are the pack
> > > > expansion parameters of a fold expanded constraint.
> > > > 
> > > > In
> > > > 
> > > >template concept C = (__is_same (T, int) && ...);
> > > >template
> > > >void f() requires C;
> > > > 
> > > > is the pack expansion parameter T or V?  In
> > > > 
> > > >template concept C = (__is_same (T, int) && ...);
> > > >template
> > > >void g() requires C;
> > > > 
> > > > it must be T.  So I guess in both cases it must be T.  But then I reckon
> > > > when [temp.constr.fold] mentions "pack expansion parameter(s)" what it
> > > > really means is "target of each pack expansion parameter within the
> > > > parameter mapping"...
> 
> Yeah.
> 
> In the paper a fold expanded constraint doesn't have a parameter mapping, only
> atomic constraints do.  Within the normal form of (__is_same (T, int) && ...)
> we have a single atomic constraint with parameter mapping T -> T, which only
> comes into play when we're checking satisfaction for each element.
> 
> But that doesn't specify how the packs are established.  For many cases it's a
> simple matter of connecting one pack to another, so you could kind of handwave
> it, but it isn't that hard to come up with a testcase that isn't so simple,
> say
> 
> template concept C = (__is_same (T, int) && ...);
> template  struct A { };
> template 
> void g(A, A) requires C;
> 
> How is  expressed in the normalized constraints of g?

Couldn't the parameter mapping be just T -> {U..., V...}?  Ah but
then during satisfaction we somehow need to know to substitute the
elements of U and V serially instead of in parallel, i.e. not conflate
it with `requires (C) && ...)', while also respecting short
circuiting and all that...

> 
> > > So, shall we file some https://github.com/cplusplus/CWG/ issue about this?
> > > Whether the packs [temp.constr.fold] talks about are the normalized ones
> > > only (in that case what happens if there are no packs), or all packs
> > > mentioned (in that case, whether there shouldn't be also template
> > > parameter
> > > mappings on the fold expanded constraints like there are on the atomic
> > > constraints (for the unexpanded packs only)?
> 
> I think there should be parameter mappings for all parameter packs named in
> the fold-expression.  And I suppose for the other template parameters as well.
> 
> > Seems worth submitting an issue, but I'm not 100% sure about my
> > understanding of the paper's wording..  I wonder what Jason thinks.
> > 
> > > 
> > > Interesting testcases could be also:
> > > struct A  {};
>

[PATCH] libstdc++: drop bogus 'dg_do run' directive

2024-07-31 Thread Sam James

We already have a valid 'dg-do run' (- vs _) directive, so drop the bogus
one.

libstdc++-v3/ChangeLog:
* testsuite/28_regex/traits/char/translate.cc: Drop bogus 'dg_do run'.
---
OK? No regressions in the logs but it's a bit weird that it's got a proper
directive with a target specifier so I thought I'd check rather than doing
it as obvious.

 libstdc++-v3/testsuite/28_regex/traits/char/translate.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc 
b/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
index e2552e3cbf05..65119e67e25b 100644
--- a/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
+++ b/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
@@ -1,4 +1,3 @@
-// { dg_do run }
 // { dg-do run { target c++11 } }
 // { dg-timeout-factor 2 }
 

-- 
2.45.2

Re: [PATCH 2/3] [x86] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak

On Wed, Jul 31, 2024 at 3:40 PM Richard Biener  wrote:
>
> The following implements the hook, excluding x87 modes for scalar
> and complex float modes.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK this way?
>
> Thanks,
> Richard.
>
> * i386.cc (TARGET_MODE_CAN_TRANSFER_BITS): Define.
> (ix86_mode_can_transfer_bits): New function.

OK.

Thanks for your efforts and your patience to resolve this issue!

Uros.

> ---
>  gcc/config/i386/i386.cc | 22 ++
>  1 file changed, 22 insertions(+)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 12d15feb5e9..9869c44ee15 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -26113,6 +26113,25 @@ ix86_have_ccmp ()
>return (bool) TARGET_APX_CCMP;
>  }
>
> +/* Implement TARGET_MODE_CAN_TRANSFER_BITS.  */
> +static bool
> +ix86_mode_can_transfer_bits (machine_mode mode)
> +{
> +  if (GET_MODE_CLASS (mode) == MODE_FLOAT
> +  || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
> +switch (GET_MODE_INNER (mode))
> +  {
> +  case SFmode:
> +  case DFmode:
> +   /* These suffer from normalization upon load when not using SSE.  */
> +   return !(ix86_fpmath & FPMATH_387);
> +  default:
> +   return true;
> +  }
> +
> +  return true;
> +}
> +
>  /* Target-specific selftests.  */
>
>  #if CHECKING_P
> @@ -26959,6 +26978,9 @@ ix86_libgcc_floating_mode_supported_p
>  #undef TARGET_HAVE_CCMP
>  #define TARGET_HAVE_CCMP ix86_have_ccmp
>
> +#undef TARGET_MODE_CAN_TRANSFER_BITS
> +#define TARGET_MODE_CAN_TRANSFER_BITS ix86_mode_can_transfer_bits
> +
>  static bool
>  ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
>  {
> --
> 2.43.0
>

[PATCH] RISC-V: Correct mode_idx attribute for viwalu wx variants [PR116149].

2024-07-31 Thread Robin Dapp

Hi,

in PR116149 we choose a wrong vector length which causes wrong values in
a reduction.  The problem happens in avlprop where we choose the
number of units in the instruction's mode as vector length.  For the
non-scalar variants the respective operand has the correct non-widened
mode.  For the scalar variants, however, the same operand has a scalar
mode which obviously only has one unit.  This makes us choose VL = 1
leaving three elements undisturbed (so potentially -1).  Those end up
in the reduction causing the wrong result.

This patch adjusts the mode_idx just for the scalar variants of the
affected instruction patterns.

Regards
 Robin

gcc/ChangeLog:

PR target/116149

* config/riscv/vector.md: Fix mode_idx attribute of scalar
widen add/sub variants.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr116149.c: New test.
---
 gcc/config/riscv/vector.md |  2 ++
 .../gcc.target/riscv/rvv/autovec/pr116149.c| 18 ++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116149.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index bcedf3d79e2..d4d9bd87e91 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -4016,6 +4016,7 @@ (define_insn 
"@pred_single_widen_add_extended_scalar"
   "TARGET_VECTOR"
   "vwadd.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "viwalu")
+   (set_attr "mode_idx" "3")
(set_attr "mode" "")])
 
 (define_insn "@pred_single_widen_sub_extended_scalar"
@@ -4038,6 +4039,7 @@ (define_insn 
"@pred_single_widen_sub_extended_scalar"
   "TARGET_VECTOR"
   "vwsub.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "viwalu")
+   (set_attr "mode_idx" "3")
(set_attr "mode" "")])
 
 (define_insn "@pred_widen_mulsu"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116149.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116149.c
new file mode 100644
index 000..4f5927b96fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116149.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv_zvl256b -mabi=lp64d -mrvv-vector-bits=zvl" 
} */
+
+long a;
+short b[6];
+short c[20];
+int main() {
+  for (short d = 0; d < 20; d += 3) {
+c[d] = 0;
+for (int e = 0; e < 20; e += 2)
+  for (int f = 1; f < 20; f += 2)
+a += (unsigned)b[f + e];
+  }
+  if (a != 0)
+__builtin_abort ();
+}
+
+/* { dg-final { scan-assembler-times "vsetivli\tzero,1" 0 } } */
-- 
2.45.2

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak

On Wed, Jul 31, 2024 at 11:33 AM Richard Biener  wrote:

> > > > > > OK. Richard, can you please mention the above in the comment why
> > > > > > XFmode is rejected in the hook?
> > > > > >
> > > > > > Later, we can perhaps benchmark XFmode move vs. generic memory copy 
> > > > > > to
> > > > > > get some hard data.
> > > > >
> > > > > My (limited) understanding was that the hook would be used only for 
> > > > > cases
> > > > > where we'd like to e.g. value number some SF/DF/XF etc. mode loads 
> > > > > and some
> > > > > subsequent loads from the same address with different mode but same 
> > > > > size
> > > > > the same and replace say int or long long later load with 
> > > > > VIEW_CONVERT_EXPR
> > > > > of the result of the SF/SF mode load.  That is what was incorrect, 
> > > > > because
> > > > > the load didn't preserve all the bits.  The patch would still keep 
> > > > > doing
> > > > > normal SF/DF/XF etc. mode copies if that is all that happens in the 
> > > > > program,
> > > > > load some floating point value and store it elsewhere or as part of 
> > > > > larger
> > > > > aggregate copy.
> > > >
> > > > So, the hook should allow everything besides SF/DFmode, simply:
> > > >
> > > >
> > > > switch (GET_MODE_INNER (mode))
> > > >   {
> > > >   case SFmode:
> > > >   case DFmode:
> > > > /* These suffer from normalization upon load when not using 
> > > > SSE.  */
> > > > return !(ix86_fpmath & FPMATH_387);
> > > >   default:
> > > > return true;
> > > >   }
> > >
> > > OK, I think I'll go with this then.  I'm now unsure whether the
> > > wrapper around the hook should reject modes with padding or if
> > > the supposed users (value-numbering and SRA) should deal with that
> > > issue separately.  I do wonder whether
> > >
> > > ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
> > >   ? &ieee_extended_intel_128_format
> > >   : TARGET_96_ROUND_53_LONG_DOUBLE
> > >   ? &ieee_extended_intel_96_round_53_format
> > >   : &ieee_extended_intel_96_format));
> > > ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> > > ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> > >
> > > unambiguously specifies where the padding is - m68k has
> > >
> > > FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_motorola_format);
> > >
> > > It's also not clear we can model a x87 10 byte memory copy in RTL since
> > > a mem:XF still touches 12 or 16 bytes - IIRC a store leaves
> > > possible padding as unspecified and not "masked out" even if
> > > the actual fstp will only store 10 bytes.
> >
> > The hardware will never touch bytes outside 10 bytes range, the
> > padding is some artificial compiler thingy, so IMO it should be
> > handled before the hook is called. Please find attached the source I
> > have used to confirm that a) the copied bits will never be mangled and
> > b) there is no access outside the 10 bytes range. (BTW: these
> > particular values are to test the effect of leading bit 63, the
> > non-hidden normalized bit).
>
> Thanks - I do wonder why GET_MODE_SIZE (XFmode) is not 10 then,
> mode_base_align[XFmode] seems to be correctly set to ensure
> 12 bytes / 16 bytes "effective" size.

FTR, "long double" AKA __float80 is defined as fundamental type in psABI as:

sizeof 12, alignment 4 for i386 [1] and
sizeof 16, alignment 16 for x86_64 [2].

These values are thus set by ABI despite the fact that hardware
handles only 10 bytes.

[1] Table 2.1, page 8 of https://www.uclibc.org/docs/psABI-i386.pdf
[2] Figure 3.1, page 12 of
https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf

Uros.

Re: [PATCH] libstdc++: drop bogus 'dg_do run' directive

2024-07-31 Thread Jonathan Wakely

On Wed, 31 Jul 2024 at 16:45, Sam James  wrote:
>
> We already have a valid 'dg-do run' (- vs _) directive, so drop the bogus
> one.
>
> libstdc++-v3/ChangeLog:
> * testsuite/28_regex/traits/char/translate.cc: Drop bogus 'dg_do run'.
> ---
> OK? No regressions in the logs but it's a bit weird that it's got a proper
> directive with a target specifier so I thought I'd check rather than doing
> it as obvious.

Definitely OK. Dejagnu will ignore it because it doesn't start with
dg- so it is useless.

Even if it was used, it would be wrong because std::regex can't be
used in C++98 so the c++11 effective target is needed.


>
>  libstdc++-v3/testsuite/28_regex/traits/char/translate.cc | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc 
> b/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
> index e2552e3cbf05..65119e67e25b 100644
> --- a/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
> +++ b/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
> @@ -1,4 +1,3 @@
> -// { dg_do run }
>  // { dg-do run { target c++11 } }
>  // { dg-timeout-factor 2 }
>
>
> --
> 2.45.2
>

Re: arm: Prevent ICE when doloop dec_set is not PLUS_EXPR

2024-07-31 Thread Andre Vieira (lists)

This patch refactors and fixes an issue where 
arm_mve_dlstp_check_dec_counter
was making an assumption about the form of what a candidate for a 
dec_insn

should be, which caused an ICE.
This dec_insn is the instruction that decreases the loop counter 
inside a

decrementing loop and we expect it to have the following form:
(set (reg CONDCOUNT)
 (plus (reg CONDCOUNT)
   (const_int)))

Where CONDCOUNT is the loop counter, and const int is the negative 
constant

used to decrement it.

This patch also improves our search for a valid dec_insn.  Before 
this patch
we'd only look for a dec_insn inside the loop header if the loop 
latch was
empty.  We now also search the loop header if the loop latch is not 
empty but
the last instruction is not a valid dec_insn.  This could 
potentially be improved

to search all instructions inside the loop latch.

gcc/ChangeLog:

* config/arm/arm.cc (check_dec_insn): New helper function 
containing

code hoisted from...
(arm_mve_dlstp_check_dec_counter): ... here. Use 
check_dec_insn to

check the validity of the candidate dec_insn.

gcc/testsuite/ChangeLog:

* gcc.targer/arm/mve/dlstp-loop-form.c: New test.

On 31/07/2024 15:15, Christophe Lyon wrote:
Because I tested with a toolchain configured for cortex-m85, which has 
mve.fp enabled by default, which means I didn't realize the testcase 
required arm_v8_1m_mve_fp_ok instead of arm_v8_1m_mve_ok.


Addressed that now.


Thanks, I thought you meant you ran the testsuite with -mcpu=cortex-m85 
in RUNTESTFLAGS.


To be fair, that's not a terrible assumption. But what I did was I 
configured the toolchain (and single multilib) for I ran them in a build 
configured for armv8.1-m.main+mve.fp+fp.dp and fpu=auto (and 
float-abi=hard).




Regarding the patch, did you consider making the new check_dec_insn 
helper return an rtx (NULL or dec_set) instead of bool?
I think it would save a call to single_set when computing decrementnum, 
but that's nitpicking.


Yeah I had also contemplated that, I'm OK either way, doesn't look too 
bad with the rtx return. See attached.




Thanks,

Christophediff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
92cd168e65937ef7350477464e8b0becf85bceed..363a972170b37275372bb8bf30d510876021c8c0
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -35214,6 +35214,32 @@ arm_mve_dlstp_check_inc_counter (loop *loop, rtx_insn* 
vctp_insn,
   return vctp_insn;
 }
 
+/* Helper function to 'arm_mve_dlstp_check_dec_counter' to make sure DEC_INSN
+   is of the expected form:
+   (set (reg a) (plus (reg a) (const_int)))
+   where (reg a) is the same as CONDCOUNT.
+   Return a rtx with the set if it is in the right format or NULL_RTX
+   otherwise.  */
+
+static rtx
+check_dec_insn (rtx_insn *dec_insn, rtx condcount)
+{
+  if (!NONDEBUG_INSN_P (dec_insn))
+return NULL_RTX;
+  rtx dec_set = single_set (dec_insn);
+  if (!dec_set
+  || !REG_P (SET_DEST (dec_set))
+  || GET_CODE (SET_SRC (dec_set)) != PLUS
+  || !REG_P (XEXP (SET_SRC (dec_set), 0))
+  || !CONST_INT_P (XEXP (SET_SRC (dec_set), 1))
+  || REGNO (SET_DEST (dec_set))
+ != REGNO (XEXP (SET_SRC (dec_set), 0))
+  || REGNO (SET_DEST (dec_set)) != REGNO (condcount))
+return NULL_RTX;
+
+  return dec_set;
+}
+
 /* Helper function to `arm_mve_loop_valid_for_dlstp`.  In the case of a
counter that is decrementing, ensure that it is decrementing by the
right amount in each iteration and that the target condition is what
@@ -35230,30 +35256,19 @@ arm_mve_dlstp_check_dec_counter (loop *loop, 
rtx_insn* vctp_insn,
  loop latch.  Here we simply need to verify that this counter is the same
  reg that is also used in the vctp_insn and that it is not otherwise
  modified.  */
-  rtx_insn *dec_insn = BB_END (loop->latch);
+  rtx dec_set = check_dec_insn (BB_END (loop->latch), condcount);
   /* If not in the loop latch, try to find the decrement in the loop header.  
*/
-  if (!NONDEBUG_INSN_P (dec_insn))
+  if (dec_set == NULL_RTX)
   {
 df_ref temp = df_bb_regno_only_def_find (loop->header, REGNO (condcount));
 /* If we haven't been able to find the decrement, bail out.  */
 if (!temp)
   return NULL;
-dec_insn = DF_REF_INSN (temp);
-  }
-
-  rtx dec_set = single_set (dec_insn);
+dec_set = check_dec_insn (DF_REF_INSN (temp), condcount);
 
-  /* Next, ensure that it is a PLUS of the form:
- (set (reg a) (plus (reg a) (const_int)))
- where (reg a) is the same as condcount.  */
-  if (!dec_set
-  || !REG_P (SET_DEST (dec_set))
-  || !REG_P (XEXP (SET_SRC (dec_set), 0))
-  || !CONST_INT_P (XEXP (SET_SRC (dec_set), 1))
-  || REGNO (SET_DEST (dec_set))
- != REGNO (XEXP (SET_SRC (dec_set), 0))
-  || REGNO (SET_DEST (dec_set)) != REGNO (condcount))
-return NULL;
+if (dec_set == NULL_RTX)
+

[Patch, libgfortran] PR105361 Followup fix to test case

2024-07-31 Thread Jerry D

I plan to push this soon to hopefully fix some test breakage on some 
architetures.  It is simple and obvious. I did not get any feedback on 
this and I do not have access to the machines in question.


Regression tested on linux-x86-64.

Regards,

Jerry

commit bc4ee05dc7c60d534ef927ac5e679f67fb99d54b
Author: Jerry DeLisle 
Date:   Wed Jul 31 08:58:17 2024 -0700

Fortran: Add newline character to test input.

gcc/testsuite/ChangeLog:

PR libfortran/105361

* gfortran.dg/pr105361.f90: Add newline character to test
input to provide more compliant test.

diff --git a/gcc/testsuite/gfortran.dg/pr105361.f90 
b/gcc/testsuite/gfortran.dg/pr105361.f90

index e2d3b07caca..62821c2802d 100644
--- a/gcc/testsuite/gfortran.dg/pr105361.f90
+++ b/gcc/testsuite/gfortran.dg/pr105361.f90
@@ -27,7 +27,7 @@ program main
   type(foo) :: a, b
   real :: c, d
   open(10, access="stream")
-  write(10) "1 2" ! // NEW_LINE('A')
+  write(10) "1 2" // NEW_LINE('A')
   close(10)
   open(10)
   read(10,*) c, d

Re: [PATCH] libstdc++: drop bogus 'dg_do run' directive

2024-07-31 Thread Sam James

Jonathan Wakely  writes:

> On Wed, 31 Jul 2024 at 16:45, Sam James  wrote:
>>
>> We already have a valid 'dg-do run' (- vs _) directive, so drop the bogus
>> one.
>>
>> libstdc++-v3/ChangeLog:
>> * testsuite/28_regex/traits/char/translate.cc: Drop bogus 'dg_do 
>> run'.
>> ---
>> OK? No regressions in the logs but it's a bit weird that it's got a proper
>> directive with a target specifier so I thought I'd check rather than doing
>> it as obvious.
>
> Definitely OK. Dejagnu will ignore it because it doesn't start with
> dg- so it is useless.

Thank you! Will push shortly.

>
> Even if it was used, it would be wrong because std::regex can't be
> used in C++98 so the c++11 effective target is needed.

That's the missing piece I was looking for -- I just didn't want to be
dropping the bogus directive and covering up if something else was wrong
there.

> [...]

thanks,
sam


signature.asc
Description: PGP signature

Re: arm: Prevent ICE when doloop dec_set is not PLUS_EXPR

2024-07-31 Thread Christophe Lyon





On 7/31/24 18:06, Andre Vieira (lists) wrote:
This patch refactors and fixes an issue where 
arm_mve_dlstp_check_dec_counter
     was making an assumption about the form of what a candidate for a 
dec_insn

     should be, which caused an ICE.
     This dec_insn is the instruction that decreases the loop counter 
inside a

     decrementing loop and we expect it to have the following form:
     (set (reg CONDCOUNT)
  (plus (reg CONDCOUNT)
    (const_int)))

     Where CONDCOUNT is the loop counter, and const int is the negative 
constant

     used to decrement it.

     This patch also improves our search for a valid dec_insn.  Before 
this patch
     we'd only look for a dec_insn inside the loop header if the loop 
latch was
     empty.  We now also search the loop header if the loop latch is not 
empty but
     the last instruction is not a valid dec_insn.  This could 
potentially be improved

     to search all instructions inside the loop latch.

     gcc/ChangeLog:

     * config/arm/arm.cc (check_dec_insn): New helper function 
containing

     code hoisted from...
     (arm_mve_dlstp_check_dec_counter): ... here. Use 
check_dec_insn to

     check the validity of the candidate dec_insn.

     gcc/testsuite/ChangeLog:

     * gcc.targer/arm/mve/dlstp-loop-form.c: New test.

On 31/07/2024 15:15, Christophe Lyon wrote:
Because I tested with a toolchain configured for cortex-m85, which 
has mve.fp enabled by default, which means I didn't realize the 
testcase required arm_v8_1m_mve_fp_ok instead of arm_v8_1m_mve_ok.


Addressed that now.


Thanks, I thought you meant you ran the testsuite with 
-mcpu=cortex-m85 in RUNTESTFLAGS.


To be fair, that's not a terrible assumption. But what I did was I 
configured the toolchain (and single multilib) for I ran them in a build 
configured for armv8.1-m.main+mve.fp+fp.dp and fpu=auto (and 
float-abi=hard).




Regarding the patch, did you consider making the new check_dec_insn 
helper return an rtx (NULL or dec_set) instead of bool?
I think it would save a call to single_set when computing 
decrementnum, but that's nitpicking.


Yeah I had also contemplated that, I'm OK either way, doesn't look too 
bad with the rtx return. See attached.




Thanks, LGTM.

Christophe



Thanks,

Christophe

[committed] testsuite: Fix for targets not passing argc/argv [PR116154]

2024-07-31 Thread Dimitar Dimitrov

PRU and other simulator targets do not pass any argv arguments
to main.  Instead of erroneously relying on argc==0, use a volatile
variable instead.

I reverted the fix for PR67947 in r6-3891-g8a18fcf4aa1d5c, and made sure
that the updated test case still fails for x86_64:

  $ make check-gcc-c RUNTESTFLAGS="dg-torture.exp=pr67947.c"
  ...
  FAIL: gcc.dg/torture/pr67947.c   -O1  execution test
  ...
  # of expected passes8
  # of unexpected failures8

Fix was suggested by Andrew Pinski in PR116154.  Committed as obvious.

PR testsuite/116154

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr67947.c: Use volatile variable instead of
argc.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/gcc.dg/torture/pr67947.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr67947.c 
b/gcc/testsuite/gcc.dg/torture/pr67947.c
index 368a8b20cbf..1016f2579cb 100644
--- a/gcc/testsuite/gcc.dg/torture/pr67947.c
+++ b/gcc/testsuite/gcc.dg/torture/pr67947.c
@@ -11,11 +11,13 @@ __attribute__((noinline, noclone)) void foo (int x)
 c++;
 }
 
+volatile int t = 1;
+
 int
 main (int argc, char* argv[])
 {
   int j, k, b = 0;
-  if (argc == 0)
+  if (t == 0)
 b = 1;
   for (j = 0; j < 3; j++)
 for (k = 0; k < 1; k++)
-- 
2.45.2

[PATCH] Make may_trap_p_1 return false for constant pool references [PR116145]

2024-07-31 Thread Richard Sandiford

The testcase contains the constant:

  arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f));

which was initially hoisted by hand, but which gimple optimisers later
propagated to each use (as expected).  The constant was then expanded
as a load-and-duplicate from the constant pool.  Normally that load
should then be hoisted back out of the loop, but may_trap_or_fault_p
stopped that from happening in this case.

The code responsible was:

  if (/* MEM_NOTRAP_P only relates to the actual position of the memory
 reference; moving it out of context such as when moving code
 when optimizing, might cause its address to become invalid.  */
  code_changed
  || !MEM_NOTRAP_P (x))
{
  poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1;
  return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size,
GET_MODE (x), code_changed);
}

where code_changed is true.  (Arguably it doesn't need to be true in
this case, if we inserted invariants on the preheader edge, but it
would still need to be true for conditionally executed loads.)

Normally this wouldn't be a problem, since rtx_addr_can_trap_p_1
would recognise that the address refers to the constant pool.
However, the SVE load-and-replicate instructions have a limited
offset range, so it isn't possible for them to have a LO_SUM address.
All we have is a plain pseudo base register.

MEM_READONLY_P is defined as:

/* 1 if RTX is a mem that is statically allocated in read-only memory.  */
  (RTL_FLAG_CHECK1 ("MEM_READONLY_P", (RTX), MEM)->unchanging)

and so I think it should be safe to move memory references if both
MEM_READONLY_P and MEM_NOTRAP_P are true.

The testcase isn't a minimal reproducer, but I think it's good
to have a realistic full routine in the testsuite.

Bootstrapped & regression-tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
PR rtl-optimization/116145
* rtlanal.cc (may_trap_p_1): Trust MEM_NOTRAP_P even for code
movement if MEM_READONLY_P is also true.

gcc/testsuite/
PR rtl-optimization/116145
* gcc.target/aarch64/sve/acle/general/pr116145.c: New test.
---
 gcc/rtlanal.cc| 14 --
 .../aarch64/sve/acle/general/pr116145.c   | 46 +++
 2 files changed, 56 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c

diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
index 4158a531bdd..893a6afbbc5 100644
--- a/gcc/rtlanal.cc
+++ b/gcc/rtlanal.cc
@@ -3152,10 +3152,16 @@ may_trap_p_1 (const_rtx x, unsigned flags)
  && MEM_VOLATILE_P (x)
  && XEXP (x, 0) == stack_pointer_rtx)
return true;
-  if (/* MEM_NOTRAP_P only relates to the actual position of the memory
-reference; moving it out of context such as when moving code
-when optimizing, might cause its address to become invalid.  */
- code_changed
+  if (/* MEM_READONLY_P means that the memory is both statically
+allocated and readonly, so MEM_NOTRAP_P should remain true
+even if the memory reference is moved.  This is certainly
+true for the important case of force_const_mem.
+
+Otherwise, MEM_NOTRAP_P only relates to the actual position
+of the memory reference; moving it out of context such as
+when moving code when optimizing, might cause its address
+to become invalid.  */
+ (code_changed && !MEM_READONLY_P (x))
  || !MEM_NOTRAP_P (x))
{
  poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c
new file mode 100644
index 000..a3d93d3e1c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c
@@ -0,0 +1,46 @@
+// { dg-options "-O2" }
+
+#include 
+#include 
+
+#pragma GCC target "+sve2"
+
+typedef unsigned char uchar;
+
+const uchar *
+search_line_fast (const uchar *s, const uchar *end)
+{
+  size_t VL = svcntb();
+  svuint8_t arr1, arr2;
+  svbool_t pc, pg = svptrue_b8();
+
+  // This should not be loaded inside the loop every time.
+  arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f));
+
+  for (; s+VL <= end; s += VL) {
+arr1 = svld1_u8(pg, s);
+pc = svmatch_u8(pg, arr1, arr2);
+
+if (svptest_any(pg, pc)) {
+  pc = svbrkb_z(pg, pc);
+  return s+svcntp_b8(pg, pc);
+}
+  }
+
+  // Handle remainder.
+  if (s < end) {
+pg = svwhilelt_b8((size_t)s, (size_t)end);
+
+arr1 = svld1_u8(pg, s);
+pc = svmatch_u8(pg, arr1, arr2);
+
+if (svptest_any(pg, pc)) {
+  pc = svbrkb_z(pg, pc);
+  return s+svcntp_b8(pg, pc);
+}
+  }
+
+  return end;
+}
+
+// { dg-final { scan-assembler {:\n\tld1b\t[^\n]*\n\tmatch\t[^\n]*\n\tb\.} } }
-- 
2.25.1

RE: [PATCH 8/8]AArch64: take gather/scatter decode overhead into account

2024-07-31 Thread Tamar Christina

Hi Kyrill,

> >   /* True if the vector body contains a store to a decl and if the
> >  function is known to have a vld1 from the same decl.
> >
> > @@ -17291,6 +17297,17 @@ aarch64_vector_costs::add_stmt_cost (int count,
> vect_cost_for_stmt kind,
> >stmt_cost = aarch64_detect_vector_stmt_subtype (m_vinfo, kind,
> >stmt_info, vectype,
> >where, stmt_cost);
> > +
> > +  /* Check if we've seen an SVE gather/scatter operation and which 
> > size.  */
> > +  if (kind == scalar_load
> > + && aarch64_sve_mode_p (TYPE_MODE (vectype))
> > + && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) ==
> VMAT_GATHER_SCATTER)
> > +   {
> > + if (GET_MODE_UNIT_BITSIZE (TYPE_MODE (vectype)) == 64)
> > +   m_sve_gather_scatter_x64 = true;
> > + else
> > +   m_sve_gather_scatter_x32 = true;
> 
> This is a bit academic at this stage but SVE2.1 adds quadword gather loads. I 
> know
> we’re not vectoring for those yet, but maybe it’s worth explicitly checking 
> for 32-
> bit size and gcc_unreachable () otherwise?

To be honest I'm not quite sure how to detect it.  If it just 
GET_MODE_UNIT_BITSIZE () == 128?
But do we want an assert in the cost model? Happy to do so though but maybe a 
debug print is more
appropriate? i.e. make it a missed optimization?

> 
> 
> > +   }
> > }
> >
> >   /* Do any SVE-specific adjustments to the cost.  */
> > @@ -17676,6 +17693,18 @@ aarch64_vector_costs::finish_cost (const
> vector_costs *uncast_scalar_costs)
> >   m_costs[vect_body] = adjust_body_cost (loop_vinfo, scalar_costs,
> > m_costs[vect_body]);
> >   m_suggested_unroll_factor = determine_suggested_unroll_factor ();
> > +
> > +  /* For gather and scatters there's an additional overhead for the 
> > first
> > +iteration.  For low count loops they're not beneficial so model the
> > +overhead as loop prologue costs.  */
> > +  if (m_sve_gather_scatter_x32 || m_sve_gather_scatter_x64)
> > +   {
> > + const sve_vec_cost *sve_costs = 
> > aarch64_tune_params.vec_costs->sve;
> > + if (m_sve_gather_scatter_x32)
> > +   m_costs[vect_prologue] += sve_costs->gather_load_x32_init_cost;
> > + else
> > +   m_costs[vect_prologue] += sve_costs->gather_load_x64_init_cost;
> 
> Shouldn’t this not be en else but rather:
> If (m_sve_gather_scatter_x64)
>m_costs[vect_prologue] += sve_costs->gather_load_x64_init_cost;
> 
> In case the loop has both 32-bit and 64-bit gather/scatter?
> 

This was an interesting comment.  After some discussion and more benchmarking
we've changed it to be an additive cost.

> 
> > +   }
> > }
> >
> >   /* Apply the heuristic described above m_stp_sequence_cost.  Prefer
> > diff --git a/gcc/config/aarch64/tuning_models/a64fx.h
> b/gcc/config/aarch64/tuning_models/a64fx.h
> > index
> 6091289d4c3c66f01d7e4dbf97a85c1f8c40bb0b..378a1b3889ee265859786c1
> ff6525fce2305b615 100644
> > --- a/gcc/config/aarch64/tuning_models/a64fx.h
> > +++ b/gcc/config/aarch64/tuning_models/a64fx.h
> > @@ -104,6 +104,8 @@ static const sve_vec_cost a64fx_sve_vector_cost =
> >   13, /* fadda_f64_cost  */
> >   64, /* gather_load_x32_cost  */
> >   32, /* gather_load_x64_cost  */
> > +  0, /* gather_load_x32_init_cost  */
> > +  0, /* gather_load_x64_init_cost  */
> >   1 /* scatter_store_elt_cost  */
> > };
> >
> > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h
> b/gcc/config/aarch64/tuning_models/cortexx925.h
> > index
> fb95e87526985b02410d54a5a3ec8539c1b0ba6d..c4206018a3ff707f89ff33007
> 00ec7dc2a5bc6b0 100644
> > --- a/gcc/config/aarch64/tuning_models/cortexx925.h
> > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
> > @@ -135,6 +135,8 @@ static const sve_vec_cost cortexx925_sve_vector_cost =
> >  operation more than a 64-bit gather.  */
> >   14, /* gather_load_x32_cost  */
> >   12, /* gather_load_x64_cost  */
> > +  42, /* gather_load_x32_init_cost  */
> > +  24, /* gather_load_x64_init_cost  */
> 
> 
> Can you comment on how these numbers are derived?

They were derived essentially from benchmarking.  I did a bunch of runs over 
various cores
to determine at which iteration count they become profitable.  From that as you 
can
probably tell the costs are a multiple of the cost of the operations for the 
specific core.

This because that cost already keeps in mind things like VL differences.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (struct sve_vec_cost): Add
gather_load_x32_init_cost and gather_load_x64_init_cost.
* config/aarch64/aarch64.cc (aarch64_vector_costs): Add
m_sve_gather_scatter_init_cost.
(aarch64_vector_costs::add_stmt_cost): Use them.
(aarch64_vector_costs

Re: [committed] testsuite: Fix for targets not passing argc/argv [PR116154]

2024-07-31 Thread Jeff Law





On 7/31/24 10:39 AM, Dimitar Dimitrov wrote:

PRU and other simulator targets do not pass any argv arguments
to main.  Instead of erroneously relying on argc==0, use a volatile
variable instead.

I reverted the fix for PR67947 in r6-3891-g8a18fcf4aa1d5c, and made sure
that the updated test case still fails for x86_64:

   $ make check-gcc-c RUNTESTFLAGS="dg-torture.exp=pr67947.c"
   ...
   FAIL: gcc.dg/torture/pr67947.c   -O1  execution test
   ...
   # of expected passes8
   # of unexpected failures8

Fix was suggested by Andrew Pinski in PR116154.  Committed as obvious.

PR testsuite/116154

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr67947.c: Use volatile variable instead of
argc.
Thanks.  I'd noted this failing in various targets in my tester and 
assumed it was the argc issue, but hadn't had the time to test a change.


jeff

[committed] pru: Enable section anchoring by default

2024-07-31 Thread Dimitar Dimitrov

Loading an arbitrary constant address in a register is expensive for
PRU.  So enable section anchoring by default to utilize the unsigned
byte constant offset operand of load/store instructions.

gcc/ChangeLog:

* common/config/pru/pru-common.cc
(TARGET_OPTION_OPTIMIZATION_TABLE): New definition.
* config/pru/pru.cc (TARGET_MIN_ANCHOR_OFFSET): Set minimal
anchor offset.
(TARGET_MAX_ANCHOR_OFFSET): Set maximum anchor offset.

gcc/testsuite/ChangeLog:

* gcc.target/pru/section-anchors-1.c: New test.
* gcc.target/pru/section-anchors-2.c: New test.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/common/config/pru/pru-common.cc  | 12 
 gcc/config/pru/pru.cc|  6 ++
 gcc/testsuite/gcc.target/pru/section-anchors-1.c | 14 ++
 gcc/testsuite/gcc.target/pru/section-anchors-2.c | 14 ++
 4 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/pru/section-anchors-1.c
 create mode 100644 gcc/testsuite/gcc.target/pru/section-anchors-2.c

diff --git a/gcc/common/config/pru/pru-common.cc 
b/gcc/common/config/pru/pru-common.cc
index e8dbf28b2d2..cdc31783dfd 100644
--- a/gcc/common/config/pru/pru-common.cc
+++ b/gcc/common/config/pru/pru-common.cc
@@ -33,4 +33,16 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_EXCEPT_UNWIND_INFO
 #define TARGET_EXCEPT_UNWIND_INFO sjlj_except_unwind_info
 
+#undef  TARGET_OPTION_OPTIMIZATION_TABLE
+#define TARGET_OPTION_OPTIMIZATION_TABLE pru_option_optimization_table
+
+/* Set default optimization options.  */
+static const struct default_options pru_option_optimization_table[] =
+  {
+/* Enable section anchors by default at -O1 or higher.  */
+{ OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
+
+{ OPT_LEVELS_NONE, 0, NULL, 0 }
+  };
+
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config/pru/pru.cc b/gcc/config/pru/pru.cc
index 491f66432b3..d0700079187 100644
--- a/gcc/config/pru/pru.cc
+++ b/gcc/config/pru/pru.cc
@@ -3249,6 +3249,12 @@ pru_unwind_word_mode (void)
 #undef TARGET_PRINT_OPERAND_ADDRESS
 #define TARGET_PRINT_OPERAND_ADDRESS pru_print_operand_address
 
+#undef  TARGET_MIN_ANCHOR_OFFSET
+#define TARGET_MIN_ANCHOR_OFFSET  0
+
+#undef  TARGET_MAX_ANCHOR_OFFSET
+#define TARGET_MAX_ANCHOR_OFFSET  255
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE pru_option_override
 
diff --git a/gcc/testsuite/gcc.target/pru/section-anchors-1.c 
b/gcc/testsuite/gcc.target/pru/section-anchors-1.c
new file mode 100644
index 000..4c8da5136c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/pru/section-anchors-1.c
@@ -0,0 +1,14 @@
+/* Ensure section anchors are enabled by default.  */
+
+/* { dg-do assemble } */
+/* { dg-options "-O1" } */
+/* { dg-final { object-size text == 24 } } */
+
+int aa;
+int bb;
+
+int
+test (void)
+{
+  return aa + bb;
+}
diff --git a/gcc/testsuite/gcc.target/pru/section-anchors-2.c 
b/gcc/testsuite/gcc.target/pru/section-anchors-2.c
new file mode 100644
index 000..bd5467edad9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/pru/section-anchors-2.c
@@ -0,0 +1,14 @@
+/* Ensure section anchors are enabled by default.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+int aa;
+int bb;
+
+int
+test (void)
+{
+  return aa + bb;
+  /* { dg-final { scan-assembler {\n\tldi32\tr\d+, \.LANCHOR\d+} } } */
+}
-- 
2.45.2

1 2 >

1 - 100 of 159 matches

Mail list logo