Re: [PATCH V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.

2022-07-20 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 20, 2022 at 8:54 AM Hongtao Liu  wrote:
>
> On Wed, Jul 20, 2022 at 2:18 PM Uros Bizjak  wrote:
> >
> > On Wed, Jul 20, 2022 at 8:14 AM Uros Bizjak  wrote:
> > >
> > > On Wed, Jul 20, 2022 at 4:37 AM Hongtao Liu  wrote:
> > > >
> > > > On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > And split it after reload.
> > > > > > > >
> > > > > > > > > You will need ix86_binary_operator_ok insn constraint here 
> > > > > > > > > with
> > > > > > > > > corresponding expander using 
> > > > > > > > > ix86_fixup_binary_operands_no_copy to
> > > > > > > > > prepare insn operands.
> > > > > > > > Split define_expand with just register_operand, and allow
> > > > > > > > memory/immediate in define_insn, assume combine/forwprop will 
> > > > > > > > do optimization.
> > > > > > >
> > > > > > > But you will *ease* the job of the above passes if you use
> > > > > > > ix86_fixup_binary_operands_no_copy in the expander.
> > > > > > for -m32, it will hit ICE in
> > > > > > Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR,
> > > > > > mode=E_V4QImode, operands=0x7fffa970) a
> > > > > > /gcc/config/i386/i386-expand.cc:1184
> > > > > > 1184  rtx dst = ix86_fixup_binary_operands (code, mode, 
> > > > > > operands);
> > > > > > (gdb) n
> > > > > > 1185  gcc_assert (dst == operands[0]); -- here
> > > > > > (gdb)
> > > > > >
> > > > > > the original operands[0], operands[1], operands[2] are below
> > > > > > (gdb) p debug_rtx (operands[0])
> > > > > > (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars)
> > > > > > (const_int -8220 [0xdfe4])) [0 MEM  > > > > > unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32])
> > > > > > $1 = void
> > > > > > (gdb) p debug_rtx (operands[1])
> > > > > > (subreg:V4QI (reg:SI 129) 0)
> > > > > > $2 = void
> > > > > > (gdb) p debug_rtx (operands[2])
> > > > > > (subreg:V4QI (reg:SI 98 [ _46 ]) 0)
> > > > > > $3 = void
> > > > > > (gdb)
> > > > > >
> > > > > > since operands[0] is mem and not equal to operands[1],
> > > > > > ix86_fixup_binary_operands will create a pseudo register for dst. 
> > > > > > and
> > > > > > then hit ICE.
> > > > > > Is this a bug or assumed?
> > > > >
> > > > > You will need ix86_expand_binary_operator here.
> > > > It will swap memory operand from op1 to op2 and hit ICE for 
> > > > unrecognized insn.
> > > >
> > > > What about this?
> > >
> > > Still no good... You are using commutative operands, so the predicate
> > > of operand 2 should also allow memory. So, the predicate should be
> > > nonimmediate_or_x86_64_const_vector_operand. The intermediate insn
> > > pattern should look something like *_1, but with
> > > added XMM and MMX reg alternatives instead of mask regs.
> >
> > Alternatively, you can use UNKNOWN operator to prevent
> > canonicalization, but then you should not use commutative constraint
> > in the intermediate insn. I think this is the best solution.
> Like this?

Please check the attached (lightly tested) patch that keeps
commutative operands.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 3294c1e6274..0a39d396430 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -86,6 +86,14 @@
   [(V8QI "b") (V4QI "b") (V2QI "b")
(V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")])
 
+;; Mapping to same size integral mode.
+(define_mode_attr mmxinsnmode
+  [(V8QI "DI") (V4QI "SI") (V2QI "HI")
+   (V4HI "DI") (V2HI "SI")
+   (V2SI "DI")
+   (V4HF "DI") (V2HF "SI")
+   (V2SF "DI")])
+
 (define_mode_attr mmxdoublemode
   [(V8QI "V8HI") (V4HI "V4SI")])
 
@@ -350,22 +358,7 @@
   HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1],
mode);
   operands[1] = GEN_INT (val);
-  machine_mode mode;
-  switch (GET_MODE_SIZE (mode))
-{
-case 2:
-  mode = HImode;
-  break;
-case 4:
-  mode = SImode;
-  break;
-case 8:
-  mode = DImode;
-  break;
-default:
-  gcc_unreachable ();
-}
-  operands[0] = lowpart_subreg (mode, operands[0], mode);
+  operands[0] = lowpart_subreg (mode, operands[0], mode);
 })
 
 ;; For TARGET_64BIT we always round up to 8 bytes.
@@ -2974,33 +2967,50 @@
(set_attr "type" "mmxadd,sselog,sselog,sselog")
(set_attr "mode" "DI,TI,TI,TI")])
 
-(define_insn "3"
-  [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v")
+(define_expand "3"
+  [(parallel
+[(set (match_operand:VI_16_32 0 "nonimmediate_operand")
 (any_logic:VI_16_32
- (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v")
- (match_operand:VI_16_32 2 "register_operand" "r,x,x,v")))
-   (clobber (reg:CC FLAGS_REG))]
+ (m

Re: [PATCH] doc: Clarify FENV_ACCESS pragma semantics WRT `-ftrapping-math'

2022-07-20 Thread Richard Biener via Gcc-patches
On Tue, Jul 19, 2022 at 2:24 PM Maciej W. Rozycki  wrote:
>
> Our documentation indicates that it is the `-frounding-math' invocation
> option that controls whether we respect what the FENV_ACCESS pragma
> would imply, should we implement it, regarding the floating point
> environment.  It is only a part of the picture however, because the
> `-ftrapping-math' invocation option also affects how we handle said
> environment.  Clarify that in the description of both options then, as
> well as the FENV_ACCESS pragma itself.

I'll let Joseph comment on the patch since he wrote multiple times how
things should behave (I think it's OK).

We might want to split -ftrapping-math into -fieee-exceptions (whether
we care about the IEEE exception state) and -ftrapping-math (we
assume the user might have enabled the CPU to trap on IEEE
exceptions).  It might be also worth to document that any
inspection/modification of the FP state during asynchronous events
is not supported.

It was already mentioned that our default of -ftrapping-math -fno-rounding-math
doesn't make much sense given we don't really support inspecting the
IEEE exceptions state but -ftrapping-math makes us only careful about
CPU traps caused by IEEE exceptions which isn't the default behavior
anywhere (and should be discouraged in general).  Together with better
documentation I'd support trying to change the default of this for GCC 13.

> gcc/
> * doc/implement-c.texi (Floating point implementation): Mention
> `-fno-trapping-math' in the context of FENV_ACCESS pragma.
> * doc/invoke.texi (Optimize Options): Clarify FENV_ACCESS pragma
> implication in the descriptions of `-fno-trapping-math' and
> `-frounding-math'.
> ---
> Hi,
>
>  Discovered in the course of investigating RISC-V unordered comparisons,
> c.f. .
>
>   Maciej
> ---
>  gcc/doc/implement-c.texi |3 ++-
>  gcc/doc/invoke.texi  |8 +++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> gcc-doc-fenv-access-trapping.diff
> Index: gcc/gcc/doc/implement-c.texi
> ===
> --- gcc.orig/gcc/doc/implement-c.texi
> +++ gcc/gcc/doc/implement-c.texi
> @@ -339,7 +339,8 @@ This is subject to change.
>  7.6.1).}
>
>  This pragma is not implemented, but the default is to ``off'' unless
> -@option{-frounding-math} is used in which case it is ``on''.
> +@option{-frounding-math} is used and @option{-fno-trapping-math} is not
> +in which case it is ``on''.
>
>  @item
>  @cite{Additional floating-point exceptions, rounding modes, environments,
> Index: gcc/gcc/doc/invoke.texi
> ===
> --- gcc.orig/gcc/doc/invoke.texi
> +++ gcc/gcc/doc/invoke.texi
> @@ -13513,6 +13513,11 @@ math functions.
>
>  The default is @option{-ftrapping-math}.
>
> +Future versions of GCC may provide finer control of this setting
> +using C99's @code{FENV_ACCESS} pragma.  This command-line option
> +will be used along with @option{-frounding-math} to specify the
> +default state for @code{FENV_ACCESS}.
> +
>  @item -frounding-math
>  @opindex frounding-math
>  Disable transformations and optimizations that assume default floating-point
> @@ -13531,7 +13536,8 @@ This option is experimental and does not
>  disable all GCC optimizations that are affected by rounding mode.
>  Future versions of GCC may provide finer control of this setting
>  using C99's @code{FENV_ACCESS} pragma.  This command-line option
> -will be used to specify the default state for @code{FENV_ACCESS}.
> +will be used along with @option{-ftrapping-math} to specify the
> +default state for @code{FENV_ACCESS}.
>
>  @item -fsignaling-nans
>  @opindex fsignaling-nans


Re: [PATCH V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.

2022-07-20 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 20, 2022 at 3:18 PM Uros Bizjak  wrote:
>
> On Wed, Jul 20, 2022 at 8:54 AM Hongtao Liu  wrote:
> >
> > On Wed, Jul 20, 2022 at 2:18 PM Uros Bizjak  wrote:
> > >
> > > On Wed, Jul 20, 2022 at 8:14 AM Uros Bizjak  wrote:
> > > >
> > > > On Wed, Jul 20, 2022 at 4:37 AM Hongtao Liu  wrote:
> > > > >
> > > > > On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > And split it after reload.
> > > > > > > > >
> > > > > > > > > > You will need ix86_binary_operator_ok insn constraint here 
> > > > > > > > > > with
> > > > > > > > > > corresponding expander using 
> > > > > > > > > > ix86_fixup_binary_operands_no_copy to
> > > > > > > > > > prepare insn operands.
> > > > > > > > > Split define_expand with just register_operand, and allow
> > > > > > > > > memory/immediate in define_insn, assume combine/forwprop will 
> > > > > > > > > do optimization.
> > > > > > > >
> > > > > > > > But you will *ease* the job of the above passes if you use
> > > > > > > > ix86_fixup_binary_operands_no_copy in the expander.
> > > > > > > for -m32, it will hit ICE in
> > > > > > > Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR,
> > > > > > > mode=E_V4QImode, operands=0x7fffa970) a
> > > > > > > /gcc/config/i386/i386-expand.cc:1184
> > > > > > > 1184  rtx dst = ix86_fixup_binary_operands (code, mode, 
> > > > > > > operands);
> > > > > > > (gdb) n
> > > > > > > 1185  gcc_assert (dst == operands[0]); -- here
> > > > > > > (gdb)
> > > > > > >
> > > > > > > the original operands[0], operands[1], operands[2] are below
> > > > > > > (gdb) p debug_rtx (operands[0])
> > > > > > > (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars)
> > > > > > > (const_int -8220 [0xdfe4])) [0 MEM  > > > > > > unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32])
> > > > > > > $1 = void
> > > > > > > (gdb) p debug_rtx (operands[1])
> > > > > > > (subreg:V4QI (reg:SI 129) 0)
> > > > > > > $2 = void
> > > > > > > (gdb) p debug_rtx (operands[2])
> > > > > > > (subreg:V4QI (reg:SI 98 [ _46 ]) 0)
> > > > > > > $3 = void
> > > > > > > (gdb)
> > > > > > >
> > > > > > > since operands[0] is mem and not equal to operands[1],
> > > > > > > ix86_fixup_binary_operands will create a pseudo register for dst. 
> > > > > > > and
> > > > > > > then hit ICE.
> > > > > > > Is this a bug or assumed?
> > > > > >
> > > > > > You will need ix86_expand_binary_operator here.
> > > > > It will swap memory operand from op1 to op2 and hit ICE for 
> > > > > unrecognized insn.
> > > > >
> > > > > What about this?
> > > >
> > > > Still no good... You are using commutative operands, so the predicate
> > > > of operand 2 should also allow memory. So, the predicate should be
> > > > nonimmediate_or_x86_64_const_vector_operand. The intermediate insn
> > > > pattern should look something like *_1, but with
> > > > added XMM and MMX reg alternatives instead of mask regs.
> > >
> > > Alternatively, you can use UNKNOWN operator to prevent
> > > canonicalization, but then you should not use commutative constraint
> > > in the intermediate insn. I think this is the best solution.
> > Like this?
>
> Please check the attached (lightly tested) patch that keeps
> commutative operands.
Yes, it looks best, I'll fully test the patch.
>
> Uros.



-- 
BR,
Hongtao


Re: [committed] .gitignore: do not ignore config.h

2022-07-20 Thread Richard Biener via Gcc-patches
On Tue, Jul 19, 2022 at 4:09 PM Alexander Monakov via Gcc-patches
 wrote:
>
> GCC does not support in-tree builds at the moment, so .gitignore
> concealing artifacts of accidental in-tree ./configure run may cause
> confusion. Un-ignore config.h, which is known to break the build.

OK

> ChangeLog:
>
> * .gitignore: Do not ignore config.h.
> ---
>  .gitignore | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/.gitignore b/.gitignore
> index 021a8c741..5cc4a0fdf 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -23,7 +23,8 @@
>
>  autom4te.cache
>  config.cache
> -config.h
> +# GCC does not support in-tree builds, do not conceal a stray config.h:
> +# config.h
>  config.intl
>  config.log
>  config.status
> --
> 2.35.1


Re: [PATCH 1/2] Remove unused remove_node_from_expr_list

2022-07-20 Thread Richard Biener via Gcc-patches
On Tue, Jul 19, 2022 at 5:16 PM Alexander Monakov via Gcc-patches
 wrote:
>
> This function remains unused since remove_node_from_insn_list was cloned
> from it.

OK.

> gcc/ChangeLog:
>
> * rtl.h (remove_node_from_expr_list): Remove declaration.
> * rtlanal.cc (remove_node_from_expr_list): Remove (no uses).
> ---
>  gcc/rtl.h  |  1 -
>  gcc/rtlanal.cc | 29 -
>  2 files changed, 30 deletions(-)
>
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index 488016bb4..645c009a3 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -3712,7 +3712,6 @@ extern unsigned hash_rtx_cb (const_rtx, machine_mode, 
> int *, int *,
>  extern rtx regno_use_in (unsigned int, rtx);
>  extern int auto_inc_p (const_rtx);
>  extern bool in_insn_list_p (const rtx_insn_list *, const rtx_insn *);
> -extern void remove_node_from_expr_list (const_rtx, rtx_expr_list **);
>  extern void remove_node_from_insn_list (const rtx_insn *, rtx_insn_list **);
>  extern int loc_mentioned_in_p (rtx *, const_rtx);
>  extern rtx_insn *find_first_parameter_load (rtx_insn *, rtx_insn *);
> diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
> index d78cc6024..ec95ecd6c 100644
> --- a/gcc/rtlanal.cc
> +++ b/gcc/rtlanal.cc
> @@ -2878,35 +2878,6 @@ in_insn_list_p (const rtx_insn_list *listp, const 
> rtx_insn *node)
>return false;
>  }
>
> -/* Search LISTP (an EXPR_LIST) for an entry whose first operand is NODE and
> -   remove that entry from the list if it is found.
> -
> -   A simple equality test is used to determine if NODE matches.  */
> -
> -void
> -remove_node_from_expr_list (const_rtx node, rtx_expr_list **listp)
> -{
> -  rtx_expr_list *temp = *listp;
> -  rtx_expr_list *prev = NULL;
> -
> -  while (temp)
> -{
> -  if (node == temp->element ())
> -   {
> - /* Splice the node out of the list.  */
> - if (prev)
> -   XEXP (prev, 1) = temp->next ();
> - else
> -   *listp = temp->next ();
> -
> - return;
> -   }
> -
> -  prev = temp;
> -  temp = temp->next ();
> -}
> -}
> -
>  /* Search LISTP (an INSN_LIST) for an entry whose first operand is NODE and
> remove that entry from the list if it is found.
>
> --
> 2.35.1
>


Re: [PATCH 2/2] Avoid registering __builtin_setjmp_receiver label twice [PR101347]

2022-07-20 Thread Richard Biener via Gcc-patches
On Tue, Jul 19, 2022 at 5:17 PM Alexander Monakov via Gcc-patches
 wrote:
>
> The testcase in the PR demonstrates how it is possible for one
> __builtin_setjmp_receiver label to appear in
> nonlocal_goto_handler_labels list twice (after the block with
> __builtin_setjmp_setup referring to it was duplicated).
>
> remove_node_from_insn_list did not account for this possibility and
> removed only the first copy from the list. Add an assert verifying that
> duplicates are not present.
>
> To avoid adding a label to the list twice, move registration of the
> label from __builtin_setjmp_setup handling to __builtin_setjmp_receiver.

Eric is probably most familiar with this, but can you make sure to bootstrap
and test this on a SJLJ EH target?  I'm not sure --enable-sjlj-exceptions is
well tested anywhere but on targets not supporting DWARF EH and the
configury is a bit odd suggesting the option is mostly ignored ...

Richard.

> gcc/ChangeLog:
>
> PR rtl-optimization/101347
> * builtins.cc (expand_builtin) [BUILT_IN_SETJMP_SETUP]: Move
> population of nonlocal_goto_handler_labels from here ...
> (expand_builtin) [BUILT_IN_SETJMP_RECEIVER]: ... to here.
> * rtlanal.cc (remove_node_from_insn_list): Verify that a
> duplicate is not present in the remainder of the list.
> ---
>  gcc/builtins.cc | 15 +++
>  gcc/rtlanal.cc  |  1 +
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index e6816d5c8..12a688dd8 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -7467,15 +7467,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
>   tree label = TREE_OPERAND (CALL_EXPR_ARG (exp, 1), 0);
>   rtx_insn *label_r = label_rtx (label);
>
> - /* This is copied from the handling of non-local gotos.  */
>   expand_builtin_setjmp_setup (buf_addr, label_r);
> - nonlocal_goto_handler_labels
> -   = gen_rtx_INSN_LIST (VOIDmode, label_r,
> -nonlocal_goto_handler_labels);
> - /* ??? Do not let expand_label treat us as such since we would
> -not want to be both on the list of non-local labels and on
> -the list of forced labels.  */
> - FORCED_LABEL (label) = 0;
>   return const0_rtx;
> }
>break;
> @@ -7488,6 +7480,13 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
>   rtx_insn *label_r = label_rtx (label);
>
>   expand_builtin_setjmp_receiver (label_r);
> + nonlocal_goto_handler_labels
> +   = gen_rtx_INSN_LIST (VOIDmode, label_r,
> +nonlocal_goto_handler_labels);
> + /* ??? Do not let expand_label treat us as such since we would
> +not want to be both on the list of non-local labels and on
> +the list of forced labels.  */
> + FORCED_LABEL (label) = 0;
>   return const0_rtx;
> }
>break;
> diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
> index ec95ecd6c..56da7435a 100644
> --- a/gcc/rtlanal.cc
> +++ b/gcc/rtlanal.cc
> @@ -2899,6 +2899,7 @@ remove_node_from_insn_list (const rtx_insn *node, 
> rtx_insn_list **listp)
>   else
> *listp = temp->next ();
>
> + gcc_checking_assert (!in_insn_list_p (temp->next (), node));
>   return;
> }
>
> --
> 2.35.1
>


Re: [PATCH] Move pass_cse_sincos after vectorizer.

2022-07-20 Thread Richard Biener via Gcc-patches
On Wed, Jul 20, 2022 at 4:20 AM liuhongt  wrote:
>
> __builtin_cexpi can't be vectorized since there's gap between it and
> vectorized sincos version(In libmvec, it passes a double and two
> double pointer and returns nothing.) And it will lose some
> vectorization opportunity if sin & cos are optimized to cexpi before
> vectorizer.
>
> I'm trying to add vect_recog_cexpi_pattern to split cexpi to sin and
> cos, but it failed vectorizable_simd_clone_call since NULL is returned
> by cgraph_node::get (fndecl).  So alternatively, the patch try to move
> pass_cse_sincos after vectorizer, just before pas_cse_reciprocals.
>
> Also original pass_cse_sincos additionaly expands pow&cabs, this patch
> split that part into a separate pass named pass_expand_powcabs which
> remains the old pass position.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Observe more libmvec sin/cos vectorization in specfp, but no big performance.
>
> Ok for trunk?

OK.

I wonder if we can merge the workers of the three passes we have into
a single function, handing it an argument what to handle to be a bit more
flexible in the future.  That would also avoid doing

> +  NEXT_PASS (pass_cse_sincos);
>NEXT_PASS (pass_cse_reciprocals);

thus two function walks after each other.  But I guess that can be done
as followup (or not if we decide so).

Thanks,
Richard.

>
> gcc/ChangeLog:
>
> * passes.def: (Split pass_cse_sincos to pass_expand_powcabs
> and pass_cse_sincos, and move pass_cse_sincos after vectorizer).
> * timevar.def (TV_TREE_POWCABS): New timevar.
> * tree-pass.h (make_pass_expand_powcabs): Split from pass_cse_sincos.
> * tree-ssa-math-opts.cc (gimple_expand_builtin_cabs): Ditto.
> (class pass_expand_powcabs): Ditto.
> (pass_expand_powcabs::execute): Ditto.
> (make_pass_expand_powcabs): Ditto.
> (pass_cse_sincos::execute): Remove pow/cabs expand part.
> (make_pass_cse_sincos): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pow-sqrt-synth-1.c: Adjust testcase.
> ---
>  gcc/passes.def  |   3 +-
>  gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c |   4 +-
>  gcc/timevar.def |   1 +
>  gcc/tree-pass.h |   1 +
>  gcc/tree-ssa-math-opts.cc   | 112 +++-
>  5 files changed, 97 insertions(+), 24 deletions(-)
>
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 375d3d62d51..6bb92efacd4 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -253,7 +253,7 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_ccp, true /* nonzero_p */);
>/* After CCP we rewrite no longer addressed locals into SSA
>  form if possible.  */
> -  NEXT_PASS (pass_cse_sincos);
> +  NEXT_PASS (pass_expand_powcabs);
>NEXT_PASS (pass_optimize_bswap);
>NEXT_PASS (pass_laddress);
>NEXT_PASS (pass_lim);
> @@ -328,6 +328,7 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_simduid_cleanup);
>NEXT_PASS (pass_lower_vector_ssa);
>NEXT_PASS (pass_lower_switch);
> +  NEXT_PASS (pass_cse_sincos);
>NEXT_PASS (pass_cse_reciprocals);
>NEXT_PASS (pass_reassoc, false /* early_p */);
>NEXT_PASS (pass_strength_reduction);
> diff --git a/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c 
> b/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c
> index 4a94325cdb3..484b29a8fc8 100644
> --- a/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c
> +++ b/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target sqrt_insn } } */
> -/* { dg-options "-fdump-tree-sincos -Ofast --param max-pow-sqrt-depth=8" } */
> +/* { dg-options "-fdump-tree-powcabs -Ofast --param max-pow-sqrt-depth=8" } 
> */
>  /* { dg-additional-options "-mfloat-abi=softfp -mfpu=neon-vfpv4" { target 
> arm*-*-* } } */
>
>  double
> @@ -34,4 +34,4 @@ vecfoo (double *a)
>  a[i] = __builtin_pow (a[i], 1.25);
>  }
>
> -/* { dg-final { scan-tree-dump-times "synthesizing" 7 "sincos" } } */
> +/* { dg-final { scan-tree-dump-times "synthesizing" 7 "powcabs" } } */
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 2dae5e1c760..651af19876f 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -220,6 +220,7 @@ DEFTIMEVAR (TV_TREE_SWITCH_CONVERSION, "tree switch 
> conversion")
>  DEFTIMEVAR (TV_TREE_SWITCH_LOWERING,   "tree switch lowering")
>  DEFTIMEVAR (TV_TREE_RECIP, "gimple CSE reciprocals")
>  DEFTIMEVAR (TV_TREE_SINCOS   , "gimple CSE sin/cos")
> +DEFTIMEVAR (TV_TREE_POWCABS   , "gimple expand pow/cabs")
>  DEFTIMEVAR (TV_TREE_WIDEN_MUL, "gimple widening/fma detection")
>  DEFTIMEVAR (TV_TRANS_MEM , "transactional memory")
>  DEFTIMEVAR (TV_TREE_STRLEN   , "tree strlen optimization")
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 606d1d60b85..4dfe05ed8e0 100644
> --- a/gcc/tree-pass.h
> +++ b/gc

gcc-patches@gcc.gnu.org

2022-07-20 Thread Richard Biener via Gcc-patches
On Wed, Jul 20, 2022 at 4:46 AM liuhongt  wrote:
>
> > My original comments still stand (it feels like this should be more 
> > generic).
> > Can we go the way lowering complex loads/stores first?  A large part
> > of the testcases
> > added by the patch should pass after that.
>
> This is the patch as suggested, one additional change is handling COMPLEX_CST
> for rhs. And it will enable vectorization for pr106010-8a.c.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

OK.

Are there cases left your vectorizer patch handles over this one?

Thanks,
Richard.

> 2022-07-20  Richard Biener  
> Hongtao Liu  
>
> gcc/ChangeLog:
>
> PR tree-optimization/106010
> * tree-complex.cc (init_dont_simulate_again): Lower complex
> type move.
> (expand_complex_move): Also expand COMPLEX_CST for rhs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr106010-1a.c: New test.
> * gcc.target/i386/pr106010-1b.c: New test.
> * gcc.target/i386/pr106010-1c.c: New test.
> * gcc.target/i386/pr106010-2a.c: New test.
> * gcc.target/i386/pr106010-2b.c: New test.
> * gcc.target/i386/pr106010-2c.c: New test.
> * gcc.target/i386/pr106010-3a.c: New test.
> * gcc.target/i386/pr106010-3b.c: New test.
> * gcc.target/i386/pr106010-3c.c: New test.
> * gcc.target/i386/pr106010-4a.c: New test.
> * gcc.target/i386/pr106010-4b.c: New test.
> * gcc.target/i386/pr106010-4c.c: New test.
> * gcc.target/i386/pr106010-5a.c: New test.
> * gcc.target/i386/pr106010-5b.c: New test.
> * gcc.target/i386/pr106010-5c.c: New test.
> * gcc.target/i386/pr106010-6a.c: New test.
> * gcc.target/i386/pr106010-6b.c: New test.
> * gcc.target/i386/pr106010-6c.c: New test.
> * gcc.target/i386/pr106010-7a.c: New test.
> * gcc.target/i386/pr106010-7b.c: New test.
> * gcc.target/i386/pr106010-7c.c: New test.
> * gcc.target/i386/pr106010-8a.c: New test.
> * gcc.target/i386/pr106010-8b.c: New test.
> * gcc.target/i386/pr106010-8c.c: New test.
> * gcc.target/i386/pr106010-9a.c: New test.
> * gcc.target/i386/pr106010-9b.c: New test.
> * gcc.target/i386/pr106010-9c.c: New test.
> * gcc.target/i386/pr106010-9d.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr106010-1a.c |  58 
>  gcc/testsuite/gcc.target/i386/pr106010-1b.c |  63 
>  gcc/testsuite/gcc.target/i386/pr106010-1c.c |  41 +
>  gcc/testsuite/gcc.target/i386/pr106010-2a.c |  82 ++
>  gcc/testsuite/gcc.target/i386/pr106010-2b.c |  62 
>  gcc/testsuite/gcc.target/i386/pr106010-2c.c |  47 ++
>  gcc/testsuite/gcc.target/i386/pr106010-3a.c |  80 ++
>  gcc/testsuite/gcc.target/i386/pr106010-3b.c | 126 
>  gcc/testsuite/gcc.target/i386/pr106010-3c.c |  69 +
>  gcc/testsuite/gcc.target/i386/pr106010-4a.c | 101 +
>  gcc/testsuite/gcc.target/i386/pr106010-4b.c |  67 +
>  gcc/testsuite/gcc.target/i386/pr106010-4c.c |  54 +++
>  gcc/testsuite/gcc.target/i386/pr106010-5a.c | 117 +++
>  gcc/testsuite/gcc.target/i386/pr106010-5b.c |  80 ++
>  gcc/testsuite/gcc.target/i386/pr106010-5c.c |  62 
>  gcc/testsuite/gcc.target/i386/pr106010-6a.c | 115 ++
>  gcc/testsuite/gcc.target/i386/pr106010-6b.c | 157 
>  gcc/testsuite/gcc.target/i386/pr106010-6c.c |  80 ++
>  gcc/testsuite/gcc.target/i386/pr106010-7a.c |  58 
>  gcc/testsuite/gcc.target/i386/pr106010-7b.c |  63 
>  gcc/testsuite/gcc.target/i386/pr106010-7c.c |  41 +
>  gcc/testsuite/gcc.target/i386/pr106010-8a.c |  58 
>  gcc/testsuite/gcc.target/i386/pr106010-8b.c |  53 +++
>  gcc/testsuite/gcc.target/i386/pr106010-8c.c |  38 +
>  gcc/testsuite/gcc.target/i386/pr106010-9a.c |  89 +++
>  gcc/testsuite/gcc.target/i386/pr106010-9b.c |  90 +++
>  gcc/testsuite/gcc.target/i386/pr106010-9c.c |  90 +++
>  gcc/testsuite/gcc.target/i386/pr106010-9d.c |  92 
>  gcc/tree-complex.cc |   9 +-
>  29 files changed, 2141 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-3c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr1

gcc-patches@gcc.gnu.org

2022-07-20 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 20, 2022 at 4:00 PM Richard Biener via Gcc-patches
 wrote:
>
> On Wed, Jul 20, 2022 at 4:46 AM liuhongt  wrote:
> >
> > > My original comments still stand (it feels like this should be more 
> > > generic).
> > > Can we go the way lowering complex loads/stores first?  A large part
> > > of the testcases
> > > added by the patch should pass after that.
> >
> > This is the patch as suggested, one additional change is handling 
> > COMPLEX_CST
> > for rhs. And it will enable vectorization for pr106010-8a.c.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
>
> OK.
>
> Are there cases left your vectorizer patch handles over this one?
No.
>
> Thanks,
> Richard.
>
> > 2022-07-20  Richard Biener  
> > Hongtao Liu  
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/106010
> > * tree-complex.cc (init_dont_simulate_again): Lower complex
> > type move.
> > (expand_complex_move): Also expand COMPLEX_CST for rhs.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr106010-1a.c: New test.
> > * gcc.target/i386/pr106010-1b.c: New test.
> > * gcc.target/i386/pr106010-1c.c: New test.
> > * gcc.target/i386/pr106010-2a.c: New test.
> > * gcc.target/i386/pr106010-2b.c: New test.
> > * gcc.target/i386/pr106010-2c.c: New test.
> > * gcc.target/i386/pr106010-3a.c: New test.
> > * gcc.target/i386/pr106010-3b.c: New test.
> > * gcc.target/i386/pr106010-3c.c: New test.
> > * gcc.target/i386/pr106010-4a.c: New test.
> > * gcc.target/i386/pr106010-4b.c: New test.
> > * gcc.target/i386/pr106010-4c.c: New test.
> > * gcc.target/i386/pr106010-5a.c: New test.
> > * gcc.target/i386/pr106010-5b.c: New test.
> > * gcc.target/i386/pr106010-5c.c: New test.
> > * gcc.target/i386/pr106010-6a.c: New test.
> > * gcc.target/i386/pr106010-6b.c: New test.
> > * gcc.target/i386/pr106010-6c.c: New test.
> > * gcc.target/i386/pr106010-7a.c: New test.
> > * gcc.target/i386/pr106010-7b.c: New test.
> > * gcc.target/i386/pr106010-7c.c: New test.
> > * gcc.target/i386/pr106010-8a.c: New test.
> > * gcc.target/i386/pr106010-8b.c: New test.
> > * gcc.target/i386/pr106010-8c.c: New test.
> > * gcc.target/i386/pr106010-9a.c: New test.
> > * gcc.target/i386/pr106010-9b.c: New test.
> > * gcc.target/i386/pr106010-9c.c: New test.
> > * gcc.target/i386/pr106010-9d.c: New test.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr106010-1a.c |  58 
> >  gcc/testsuite/gcc.target/i386/pr106010-1b.c |  63 
> >  gcc/testsuite/gcc.target/i386/pr106010-1c.c |  41 +
> >  gcc/testsuite/gcc.target/i386/pr106010-2a.c |  82 ++
> >  gcc/testsuite/gcc.target/i386/pr106010-2b.c |  62 
> >  gcc/testsuite/gcc.target/i386/pr106010-2c.c |  47 ++
> >  gcc/testsuite/gcc.target/i386/pr106010-3a.c |  80 ++
> >  gcc/testsuite/gcc.target/i386/pr106010-3b.c | 126 
> >  gcc/testsuite/gcc.target/i386/pr106010-3c.c |  69 +
> >  gcc/testsuite/gcc.target/i386/pr106010-4a.c | 101 +
> >  gcc/testsuite/gcc.target/i386/pr106010-4b.c |  67 +
> >  gcc/testsuite/gcc.target/i386/pr106010-4c.c |  54 +++
> >  gcc/testsuite/gcc.target/i386/pr106010-5a.c | 117 +++
> >  gcc/testsuite/gcc.target/i386/pr106010-5b.c |  80 ++
> >  gcc/testsuite/gcc.target/i386/pr106010-5c.c |  62 
> >  gcc/testsuite/gcc.target/i386/pr106010-6a.c | 115 ++
> >  gcc/testsuite/gcc.target/i386/pr106010-6b.c | 157 
> >  gcc/testsuite/gcc.target/i386/pr106010-6c.c |  80 ++
> >  gcc/testsuite/gcc.target/i386/pr106010-7a.c |  58 
> >  gcc/testsuite/gcc.target/i386/pr106010-7b.c |  63 
> >  gcc/testsuite/gcc.target/i386/pr106010-7c.c |  41 +
> >  gcc/testsuite/gcc.target/i386/pr106010-8a.c |  58 
> >  gcc/testsuite/gcc.target/i386/pr106010-8b.c |  53 +++
> >  gcc/testsuite/gcc.target/i386/pr106010-8c.c |  38 +
> >  gcc/testsuite/gcc.target/i386/pr106010-9a.c |  89 +++
> >  gcc/testsuite/gcc.target/i386/pr106010-9b.c |  90 +++
> >  gcc/testsuite/gcc.target/i386/pr106010-9c.c |  90 +++
> >  gcc/testsuite/gcc.target/i386/pr106010-9d.c |  92 
> >  gcc/tree-complex.cc |   9 +-
> >  29 files changed, 2141 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1c.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2c.c
> >  create mode 100644 gcc/testsuite/gcc.tar

Re: [PATCH] Move pass_cse_sincos after vectorizer.

2022-07-20 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 20, 2022 at 3:59 PM Richard Biener via Gcc-patches
 wrote:
>
> On Wed, Jul 20, 2022 at 4:20 AM liuhongt  wrote:
> >
> > __builtin_cexpi can't be vectorized since there's gap between it and
> > vectorized sincos version(In libmvec, it passes a double and two
> > double pointer and returns nothing.) And it will lose some
> > vectorization opportunity if sin & cos are optimized to cexpi before
> > vectorizer.
> >
> > I'm trying to add vect_recog_cexpi_pattern to split cexpi to sin and
> > cos, but it failed vectorizable_simd_clone_call since NULL is returned
> > by cgraph_node::get (fndecl).  So alternatively, the patch try to move
> > pass_cse_sincos after vectorizer, just before pas_cse_reciprocals.
> >
> > Also original pass_cse_sincos additionaly expands pow&cabs, this patch
> > split that part into a separate pass named pass_expand_powcabs which
> > remains the old pass position.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Observe more libmvec sin/cos vectorization in specfp, but no big 
> > performance.
> >
> > Ok for trunk?
>
> OK.
>
> I wonder if we can merge the workers of the three passes we have into
> a single function, handing it an argument what to handle to be a bit more
> flexible in the future.  That would also avoid doing
>
> > +  NEXT_PASS (pass_cse_sincos);
> >NEXT_PASS (pass_cse_reciprocals);
>
> thus two function walks after each other.  But I guess that can be done
> as followup (or not if we decide so).
Let me try this as followup.
>
> Thanks,
> Richard.
>
> >
> > gcc/ChangeLog:
> >
> > * passes.def: (Split pass_cse_sincos to pass_expand_powcabs
> > and pass_cse_sincos, and move pass_cse_sincos after vectorizer).
> > * timevar.def (TV_TREE_POWCABS): New timevar.
> > * tree-pass.h (make_pass_expand_powcabs): Split from 
> > pass_cse_sincos.
> > * tree-ssa-math-opts.cc (gimple_expand_builtin_cabs): Ditto.
> > (class pass_expand_powcabs): Ditto.
> > (pass_expand_powcabs::execute): Ditto.
> > (make_pass_expand_powcabs): Ditto.
> > (pass_cse_sincos::execute): Remove pow/cabs expand part.
> > (make_pass_cse_sincos): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/pow-sqrt-synth-1.c: Adjust testcase.
> > ---
> >  gcc/passes.def  |   3 +-
> >  gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c |   4 +-
> >  gcc/timevar.def |   1 +
> >  gcc/tree-pass.h |   1 +
> >  gcc/tree-ssa-math-opts.cc   | 112 +++-
> >  5 files changed, 97 insertions(+), 24 deletions(-)
> >
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 375d3d62d51..6bb92efacd4 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -253,7 +253,7 @@ along with GCC; see the file COPYING3.  If not see
> >NEXT_PASS (pass_ccp, true /* nonzero_p */);
> >/* After CCP we rewrite no longer addressed locals into SSA
> >  form if possible.  */
> > -  NEXT_PASS (pass_cse_sincos);
> > +  NEXT_PASS (pass_expand_powcabs);
> >NEXT_PASS (pass_optimize_bswap);
> >NEXT_PASS (pass_laddress);
> >NEXT_PASS (pass_lim);
> > @@ -328,6 +328,7 @@ along with GCC; see the file COPYING3.  If not see
> >NEXT_PASS (pass_simduid_cleanup);
> >NEXT_PASS (pass_lower_vector_ssa);
> >NEXT_PASS (pass_lower_switch);
> > +  NEXT_PASS (pass_cse_sincos);
> >NEXT_PASS (pass_cse_reciprocals);
> >NEXT_PASS (pass_reassoc, false /* early_p */);
> >NEXT_PASS (pass_strength_reduction);
> > diff --git a/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c 
> > b/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c
> > index 4a94325cdb3..484b29a8fc8 100644
> > --- a/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c
> > +++ b/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile { target sqrt_insn } } */
> > -/* { dg-options "-fdump-tree-sincos -Ofast --param max-pow-sqrt-depth=8" } 
> > */
> > +/* { dg-options "-fdump-tree-powcabs -Ofast --param max-pow-sqrt-depth=8" 
> > } */
> >  /* { dg-additional-options "-mfloat-abi=softfp -mfpu=neon-vfpv4" { target 
> > arm*-*-* } } */
> >
> >  double
> > @@ -34,4 +34,4 @@ vecfoo (double *a)
> >  a[i] = __builtin_pow (a[i], 1.25);
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "synthesizing" 7 "sincos" } } */
> > +/* { dg-final { scan-tree-dump-times "synthesizing" 7 "powcabs" } } */
> > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > index 2dae5e1c760..651af19876f 100644
> > --- a/gcc/timevar.def
> > +++ b/gcc/timevar.def
> > @@ -220,6 +220,7 @@ DEFTIMEVAR (TV_TREE_SWITCH_CONVERSION, "tree switch 
> > conversion")
> >  DEFTIMEVAR (TV_TREE_SWITCH_LOWERING,   "tree switch lowering")
> >  DEFTIMEVAR (TV_TREE_RECIP, "gimple CSE reciprocals")
> >  DEFTIMEVAR (TV_TREE_SINCOS   , "gimple CSE sin/cos")
> > +DEFTIMEVAR (TV_TREE_POWCABS   , "gimple expand pow/cabs

Re: [PATCH v2.1 3/4] aarch64: Consolidate simd type lookup functions

2022-07-20 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti  writes:
> On Wed, Jul 13, 2022 at 05:36:04PM +0100, Richard Sandiford wrote:
>> I like the part about getting rid of:
>> 
>> static tree
>> aarch64_simd_builtin_type (machine_mode mode,
>> bool unsigned_p, bool poly_p)
>> 
>> and the flow of the new function.  However, I think it's still
>> slightly more readable if we keep the switch and lookup routines
>> separate, partly to keep down the size of the main routine and
>> partly to avoid the goto.
>
> I agree.
>
>> So how about:
>> 
>> - aarch64_simd_builtin_std_type becomes aarch64_int_or_fp_element_type
>>   but otherwise stays as-is
>>
>> ...
>
> I've called it aarch64_int_or_fp_type, because it's sometimes used for
> an operand that doesn't represent an element of a vector.
>
>
> Updated patch below.
>
> ---
>
> There were several similarly-named functions, which each built or looked up an
> operand type using a different subset of valid modes or qualifiers.
>
> This change provides a single function to return operand types, which can
> additionally handle const and pointer qualifiers. For clarity, the existing
> functionality is kept in separate helper functions.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc
>   (aarch64_simd_builtin_std_type): Rename to...
>   (aarch64_int_or_fp_type): ...this, and allow irrelevant qualifiers.
>   (aarch64_lookup_simd_builtin_type): Rename to...
>   (aarch64_simd_builtin_type): ...this. Add const/pointer
>   support, and extract table lookup to...
>   (aarch64_lookup_simd_type_in_table): ...this function.
>   (aarch64_init_crc32_builtins): Update to use aarch64_simd_builtin_type.
>   (aarch64_init_fcmla_laneq_builtins): Ditto.
>   (aarch64_init_simd_builtin_functions): Ditto.

LGTM, thanks.  OK for trunk with a couple of minor formatting fixes:

> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 
> 55ad2e8b6831d6cc2b039270c8656d429347092d..cd7c2a79d9b4d67adf1d9de1f9b56eb3a0d1ee2b
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -788,12 +788,13 @@ aarch64_general_mangle_builtin_type (const_tree type)
>return NULL;
>  }
>  
> +/* Helper function for aarch64_simd_builtin_type. */
>  static tree
> -aarch64_simd_builtin_std_type (machine_mode mode,
> -enum aarch64_type_qualifiers q)
> +aarch64_int_or_fp_type (machine_mode mode,
> +enum aarch64_type_qualifiers qualifiers)

This line should be reindented so that the arguments continue to line up.

>  {
> -#define QUAL_TYPE(M)  \
> -  ((q == qualifier_none) ? int##M##_type_node : unsigned_int##M##_type_node);
> +#define QUAL_TYPE(M) ((qualifiers & qualifier_unsigned) \
> +? unsigned_int##M##_type_node : int##M##_type_node);
>switch (mode)
>  {
>  case E_QImode:
> [...]
> @@ -1383,13 +1381,13 @@ aarch64_init_simd_builtins (void)
>  static void
>  aarch64_init_crc32_builtins ()
>  {
> -  tree usi_type = aarch64_simd_builtin_std_type (SImode, qualifier_unsigned);
> +  tree usi_type = aarch64_simd_builtin_type (SImode, qualifier_unsigned);
>unsigned int i = 0;
>  
>for (i = 0; i < ARRAY_SIZE (aarch64_crc_builtin_data); ++i)
>  {
>aarch64_crc_builtin_datum* d = &aarch64_crc_builtin_data[i];
> -  tree argtype = aarch64_simd_builtin_std_type (d->mode,
> +  tree argtype = aarch64_simd_builtin_type (d->mode,
>   qualifier_unsigned);

Same here.

Richard

>tree ftype = build_function_type_list (usi_type, usi_type, argtype, 
> NULL_TREE);
>tree attrs = aarch64_get_attributes (FLAG_NONE, d->mode);


Re: [PATCH 2/2] Avoid registering __builtin_setjmp_receiver label twice [PR101347]

2022-07-20 Thread Eric Botcazou via Gcc-patches
> Eric is probably most familiar with this, but can you make sure to bootstrap
> and test this on a SJLJ EH target?  I'm not sure --enable-sjlj-exceptions
> is well tested anywhere but on targets not supporting DWARF EH and the
> configury is a bit odd suggesting the option is mostly ignored ...

This is a specific circuitry for __builtln_setjmp so it is *not* exercised by 
the SJLJ exception scheme.  It used to be exercised by the GNAT bootstrap, but 
that's no longer the case either.

I think that the fix is sensible, assuming that it passes the C testsuite.

-- 
Eric Botcazou




Re: [COMMITTED] [PATCH 1/2] Remove recursion from range_from_dom.

2022-07-20 Thread Mikael Morin

Hello,

I spotted a few nits.  See below.

Le 20/07/2022 à 00:10, Andrew MacLeod via Gcc-patches a écrit :

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index da7b8055d42..20dd5ead3bc 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1312,6 +1312,38 @@ ranger_cache::fill_block_cache (tree name, basic_block 
bb, basic_block def_bb)
 fprintf (dump_file, "  Propagation update done.\n");
 }
 
+// Resolve the range of BB if the dominators range is R by calculating incoming

+// edges to this block.  All lead back to the dominator so should be cheap.
+// The range for BB is set and returned in R.
+
+void
+ranger_cache::resolve_dom (vrange &r, tree name, basic_block bb)
+{
+  basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (name));
+  basic_block dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+  // if it doesn't already have a value, store the incoming range.
+  if (!m_on_entry.bb_range_p (name, dom_bb) && def_bb != dom_bb)
+{
+  // If the range can't be store, don't try to accumulate
+  // the range in PREV_BB due to excessive recalculations.


As a consequence of the refactoring, PREV_BB doesn’t exist anymore.  It 
should be BB, I think.



+  if (!m_on_entry.set_bb_range (name, dom_bb, r))
+   return;
+}
+  // With the dominator set, we should be able to cheaply query
+  // each incoming edge now and accumulate the results.
+  r.set_undefined ();
+  edge e;
+  edge_iterator ei;
+  Value_Range er (TREE_TYPE (name));
+  FOR_EACH_EDGE (e, ei, bb->preds)
+{
+  edge_range (er, e, name, RFD_READ_ONLY);
+  r.union_ (er);
+}
+  // Set the cache in PREV_BB so it is not calculated again.


Same here.


+  m_on_entry.set_bb_range (name, bb, r);
+}
 
 // Get the range of NAME from dominators of BB and return it in R.  Search the

 // dominator tree based on MODE.


(...)


@@ -1403,14 +1402,25 @@ ranger_cache::range_from_dom (vrange &r, tree name, 
basic_block start_bb,
fprintf (dump_file, " at function top\n");
 }
 
-  // Now process any outgoing edges that we seen along the way.

+  // Now process any blocks wit incoming edges that nay have adjustemnts.
   while (m_workback.length () > start_limit)
 {
   int_range_max er;
   prev_bb = m_workback.pop ();
+  if (!single_pred_p (prev_bb))
+   {
+ // Non single pred means we need to cache a vsalue in the dominator


... cache a *value* in ...


+ // so we can cheaply calculate incoming edges to this block, and
+ // then store the resulting value.  If processing mode is not
+ // RFD_FILL, then the cache cant be stored to, so don't try.
+ // Otherwise this becomes a quadratic timed calculation.
+ if (mode == RFD_FILL)
+   resolve_dom (r, name, prev_bb);
+ continue;
+   }
+
   edge e = single_pred_edge (prev_bb);
   bb = e->src;
-
   if (m_gori.outgoing_edge_range_p (er, e, name, *this))
{
  r.intersect (er);


[PATCH] rs6000: Suggest unroll factor for loop vectorization

2022-07-20 Thread Kewen.Lin via Gcc-patches
Hi,

Commit r12-6679-g7ca1582ca60dc8 made vectorizer accept one
unroll factor to be applied to vectorization factor when
vectorizing the main loop, it would be suggested by target
when doing costing.

This patch introduces function determine_suggested_unroll_factor
for rs6000 port, to make it be able to suggest the unroll factor
for a given loop being vectorized.  Referring to aarch64 port
and basing on the analysis on SPEC2017 performance evaluation
results, it mainly considers these aspects:
  1) unroll option and pragma which can disable unrolling for the
 given loop;
  2) simple hardware resource model with issued non memory access
 vector insn per cycle;
  3) aggressive heuristics when iteration count is unknown:
 - reduction case to break cross iteration dependency;
 - emulated gather load;
  4) estimated iteration count when iteration count is unknown;

With this patch, SPEC2017 performance evaluation results on
Power8/9/10 are listed below (speedup pct.):

  * Power10
- O2: all are neutral (excluding some noises);
- Ofast: 510.parest_r +6.67%, the others are neutral
 (use ... for the followings);
- Ofast + unroll: 510.parest_r +5.91%, ...
- Ofast + LTO + PGO: 510.parest_r +3.00%, ...
- Ofast + cheap vect cost: 510.parest_r +6.23%, ...
- Ofast + very-cheap vect cost: all are neutral;

  * Power9
- Ofast: 510.parest_r +8.73%, 538.imagick_r +11.18%
 (likely noise), 500.perlbench_r +1.84%, ...

  * Power8
- Ofast: 510.parest_r +5.43%, ...;

This patch also introduces one documented parameter
rs6000-vect-unroll-limit= similar to what aarch64 proposes,
by evaluating on P8/P9/P10, the default value 4 is slightly
better than the other choices like 2 and 8.

It also parameterizes two other values as undocumented
parameters for future tweaking.  One parameter is
rs6000-vect-unroll-issue, it's to simply model hardware
resource for non memory access vector instructions to avoid
excessive unrolling, initially I tried to use the value in
the hook rs6000_issue_rate, but the evaluation showed it's
bad, so I evaluated different values 2/4/6/8 on P8/P9/P10 at
Ofast, the results showed the default value 4 is good enough
on these different architectures.  For a record, choice 8
could make 510.parest_r's gain become smaller or gone on
P8/P9/P10; choice 6 could make 503.bwaves_r degrade by more
than 1% on P8/P10; and choice 2 could make 538.imagick_r
degrade by 3.8%.  The other parameter is
rs6000-vect-unroll-reduc-threshold.  It's mainly inspired by
510.parest_r and tweaked as it, evaluating with different
values 0/1/2/3 for the threshold, it showed value 1 is the
best choice.  For a record, choice 0 could make 525.x264_r
degrade by 2% and 527.cam4_r degrade by 2.95% on P10,
548.exchange2_r degrade by 1.41% and 527.cam4_r degrade by
2.54% on P8; choice 2 and bigger values could make
510.parest_r's gain become smaller.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and P8,
and powerpc64le-linux-gnu P9.  Bootstrapped on
powerpc64le-linux-gnu P10, but one failure was exposed during
regression testing there, it's identified as one miss
optimization and can be reproduced without this support,
PR106365 was opened for further tracking.

Is it for trunk?

BR,
Kewen
--
gcc/ChangeLog:

* config/rs6000/rs6000.cc (class rs6000_cost_data): Add new members
m_nstores, m_reduc_factor, m_gather_load and member function
determine_suggested_unroll_factor.
(rs6000_cost_data::update_target_cost_per_stmt): Update for m_nstores,
m_reduc_factor and m_gather_load.
(rs6000_cost_data::determine_suggested_unroll_factor): New function.
(rs6000_cost_data::finish_cost): Use determine_suggested_unroll_factor.
* config/rs6000/rs6000.opt (rs6000-vect-unroll-limit): New parameter.
(rs6000-vect-unroll-issue): Likewise.
(rs6000-vect-unroll-reduc-threshold): Likewise.
* doc/invoke.texi (rs6000-vect-unroll-limit): Document new parameter.

---
 gcc/config/rs6000/rs6000.cc  | 125 ++-
 gcc/config/rs6000/rs6000.opt |  18 +
 gcc/doc/invoke.texi  |   7 ++
 3 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3ff16b8ae04..d0f107d70a8 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -5208,16 +5208,23 @@ protected:
vect_cost_model_location, unsigned int);
   void density_test (loop_vec_info);
   void adjust_vect_cost_per_loop (loop_vec_info);
+  unsigned int determine_suggested_unroll_factor (loop_vec_info);

   /* Total number of vectorized stmts (loop only).  */
   unsigned m_nstmts = 0;
   /* Total number of loads (loop only).  */
   unsigned m_nloads = 0;
+  /* Total number of stores (loop only).  */
+  unsigned m_nstores = 0;
+  /* Reduction factor for suggesting unroll factor (loop only).  */
+  unsigned m_redu

[PATCH] rs6000/test: Fix empty TU in some cases of effective targets

2022-07-20 Thread Kewen.Lin via Gcc-patches
Hi,

As the failure of test case gcc.target/powerpc/pr92398.p9-.c in
PR106345 shows, some test sources for some powerpc effective
targets use empty translation unit wrongly.  The test sources
could go with options like "-ansi -pedantic-errors", then those
effective target checkings will fail unexpectedly with the
error messages like:

  error: ISO C forbids an empty translation unit [-Wpedantic]

This patch is to fix empty TUs with one dummy variable definition
accordingly.

Tested on powerpc64-linux-gnu P7 and P8 and
powerpc64le-linux-gnu P9 and P10.  Excepting for the failures
on gcc.target/powerpc/pr92398.p9-.c fixed, I can see it helps to
bring back some testing coverage like:

NA->PASS: gcc.target/powerpc/pr92398.p9+.c
NA->PASS: gcc.target/powerpc/pr93453-1.c

I'll push this soon if no objections.

BR,
Kewen
-

PR testsuite/106345

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_powerpc_sqrt): Add
a variable definition to avoid pedwarn about empty translation unit.
(check_effective_target_has_arch_pwr5): Likewise.
(check_effective_target_has_arch_pwr6): Likewise.
(check_effective_target_has_arch_pwr7): Likewise.
(check_effective_target_has_arch_pwr8): Likewise.
(check_effective_target_has_arch_pwr9): Likewise.
(check_effective_target_has_arch_pwr10): Likewise.
(check_effective_target_has_arch_ppc64): Likewise.
(check_effective_target_ppc_float128): Likewise.
(check_effective_target_ppc_float128_insns): Likewise.
(check_effective_target_powerpc_vsx): Likewise.
---
 gcc/testsuite/lib/target-supports.exp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4ed7b25b9a4..aac2a557f5d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6262,6 +6262,7 @@ proc check_effective_target_powerpc_sqrt { } {
#ifndef _ARCH_PPCSQ
#error _ARCH_PPCSQ is not defined
#endif
+   int dummy;
 } {}]
 }

@@ -6373,6 +6374,7 @@ proc check_effective_target_has_arch_pwr5 { } {
#error does not have power5 support.
#else
/* "has power5 support" */
+   int dummy;
#endif
} [current_compiler_flags]]
 }
@@ -6383,6 +6385,7 @@ proc check_effective_target_has_arch_pwr6 { } {
#error does not have power6 support.
#else
/* "has power6 support" */
+   int dummy;
#endif
} [current_compiler_flags]]
 }
@@ -6393,6 +6396,7 @@ proc check_effective_target_has_arch_pwr7 { } {
#error does not have power7 support.
#else
/* "has power7 support" */
+   int dummy;
#endif
} [current_compiler_flags]]
 }
@@ -6403,6 +6407,7 @@ proc check_effective_target_has_arch_pwr8 { } {
#error does not have power8 support.
#else
/* "has power8 support" */
+   int dummy;
#endif
} [current_compiler_flags]]
 }
@@ -6413,6 +6418,7 @@ proc check_effective_target_has_arch_pwr9 { } {
#error does not have power9 support.
#else
/* "has power9 support" */
+   int dummy;
#endif
} [current_compiler_flags]]
 }
@@ -6423,6 +6429,7 @@ proc check_effective_target_has_arch_pwr10 { } {
#error does not have power10 support.
#else
/* "has power10 support" */
+   int dummy;
#endif
} [current_compiler_flags]]
 }
@@ -6433,6 +6440,7 @@ proc check_effective_target_has_arch_ppc64 { } {
#error does not have ppc64 support.
#else
/* "has ppc64 support" */
+   int dummy;
#endif
} [current_compiler_flags]]
 }
@@ -6523,6 +6531,7 @@ proc check_effective_target_ppc_float128 { } {
#ifndef __FLOAT128__
  nope no good
#endif
+   int dummy;
 }]
 }

@@ -6533,6 +6542,7 @@ proc check_effective_target_ppc_float128_insns { } {
#ifndef __FLOAT128_HARDWARE__
  nope no good
#endif
+   int dummy;
 }]
 }

@@ -6543,6 +6553,7 @@ proc check_effective_target_powerpc_vsx { } {
#ifndef __VSX__
  nope no vsx
#endif
+   int dummy;
 }]
 }

--
2.27.0


Re: [PATCH] Fortran: error recovery on invalid array reference of non-array [PR103590]

2022-07-20 Thread Mikael Morin

Le 19/07/2022 à 23:34, Harald Anlauf a écrit :


Committed: r13-1757-gf838d15641d256e21ffc126c3277b290ed743928


Thanks.


[PATCH] rs6000/test: Update some cases with -mdejagnu-tune

2022-07-20 Thread Kewen.Lin via Gcc-patches
Hi,

As PR106345 shows, some test cases should be updated with
-mdejagnu-tune, since their test points are sensitive to
rs6000_tune, such as: group_ending_nop, loop align (ic),
float conversion cost etc.

This patch is to replace -mdejagnu-cpu with -mdejagnu-tune
or append -mdejagnu-tune (keep the original -mdejagnu-cpu
when it's required) accordingly.

Tested on powerpc64-linux-gnu P7 and P8 and
powerpc64le-linux-gnu P9 and P10, also with explicit p10
tune setting for configuration.

I'll push this soon if no objections.

BR,
Kewen
-
PR testsuite/106345

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/lhs-1.c: Replace -mdejagnu-cpu with
-mdejagnu-tune.
* gcc.target/powerpc/loop_align.c: Likewise.
* gcc.target/powerpc/lhs-2.c: Append -mdejagnu-tune.
* gcc.target/powerpc/lhs-3.c: Likewise.
* gcc.target/powerpc/compress-float-ppc-pic.c: Likewise.
* gcc.target/powerpc/compress-float-ppc.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/compress-float-ppc-pic.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/compress-float-ppc.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/lhs-1.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/lhs-2.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/lhs-3.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/loop_align.c | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/compress-float-ppc-pic.c 
b/gcc/testsuite/gcc.target/powerpc/compress-float-ppc-pic.c
index 8961be51d2f..d14ccb433b9 100644
--- a/gcc/testsuite/gcc.target/powerpc/compress-float-ppc-pic.c
+++ b/gcc/testsuite/gcc.target/powerpc/compress-float-ppc-pic.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target powerpc_fprs } } */
-/* { dg-options "-O2 -fpic -mdejagnu-cpu=power5" } */
+/* { dg-options "-O2 -fpic -mdejagnu-cpu=power5 -mdejagnu-tune=power5" } */
 /* { dg-require-effective-target fpic } */

 double foo (double x) {
diff --git a/gcc/testsuite/gcc.target/powerpc/compress-float-ppc.c 
b/gcc/testsuite/gcc.target/powerpc/compress-float-ppc.c
index 650f559f347..d6f84e57ab9 100644
--- a/gcc/testsuite/gcc.target/powerpc/compress-float-ppc.c
+++ b/gcc/testsuite/gcc.target/powerpc/compress-float-ppc.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target powerpc_fprs } } */
-/* { dg-options "-O2 -mdejagnu-cpu=power5" } */
+/* { dg-options "-O2 -mdejagnu-cpu=power5 -mdejagnu-tune=power5" } */

 double foo (double x) {
   return x + 1.75;
diff --git a/gcc/testsuite/gcc.target/powerpc/lhs-1.c 
b/gcc/testsuite/gcc.target/powerpc/lhs-1.c
index 4e13fd2fb70..bcb41abbe91 100644
--- a/gcc/testsuite/gcc.target/powerpc/lhs-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/lhs-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
-/* { dg-options "-O2 -mdejagnu-cpu=power5" } */
+/* { dg-options "-O2 -mdejagnu-tune=power5" } */
 /* { dg-final { scan-assembler-times "nop" 3 } } */

 /* Test generation of nops in load hit store situation.  Make sure enough nop
diff --git a/gcc/testsuite/gcc.target/powerpc/lhs-2.c 
b/gcc/testsuite/gcc.target/powerpc/lhs-2.c
index d1b18b1591d..22aa0d8712f 100644
--- a/gcc/testsuite/gcc.target/powerpc/lhs-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/lhs-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
-/* { dg-options "-O2 -mdejagnu-cpu=power6 -msched-groups" } */
+/* { dg-options "-O2 -mdejagnu-cpu=power6 -mdejagnu-tune=power6 
-msched-groups" } */
 /* { dg-final { scan-assembler "ori 1,1,0" } } */

 /* Test generation of group ending nop in load hit store situation.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/lhs-3.c 
b/gcc/testsuite/gcc.target/powerpc/lhs-3.c
index 9d6bbcf69f7..d7b500092bb 100644
--- a/gcc/testsuite/gcc.target/powerpc/lhs-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/lhs-3.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
-/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7 -mdejagnu-tune=power7" } */
 /* { dg-final { scan-assembler "ori 2,2,0" } } */

 /* Test generation of group ending nop in load hit store situation.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/loop_align.c 
b/gcc/testsuite/gcc.target/powerpc/loop_align.c
index ef67f77efed..36e3b4c98c3 100644
--- a/gcc/testsuite/gcc.target/powerpc/loop_align.c
+++ b/gcc/testsuite/gcc.target/powerpc/loop_align.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */
-/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16 
-fno-unroll-loops" } */
+/* { dg-options "-O2 -mdejagnu-tune=power7 -falign-functions=16 
-fno-unroll-loops" } */
 /* { dg-final { scan-assembler ".p2align 5" } } */

 void f(double *a, double *b, double *c, unsigned long n) {
--
2.27.0


[PATCH] Improve SLP codegen, avoiding unnecessary TREE_ADDRESSABLE

2022-07-20 Thread Richard Biener via Gcc-patches
The following adjusts vectorizer code generation to avoid splitting
out address increments for invariant addresses which causes objects
to get TREE_ADDRESSABLE when not necessary.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-data-refs.cc (bump_vector_ptr): Return an
invariant updated address when the input was invariant.
---
 gcc/tree-vect-data-refs.cc | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 609cacc4971..b279a82551e 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -5195,6 +5195,14 @@ bump_vector_ptr (vec_info *vinfo,
 
   if (TREE_CODE (dataref_ptr) == SSA_NAME)
 new_dataref_ptr = copy_ssa_name (dataref_ptr);
+  else if (is_gimple_min_invariant (dataref_ptr))
+/* When possible avoid emitting a separate increment stmt that will
+   force the addressed object addressable.  */
+return build1 (ADDR_EXPR, TREE_TYPE (dataref_ptr),
+  fold_build2 (MEM_REF,
+   TREE_TYPE (TREE_TYPE (dataref_ptr)),
+   dataref_ptr,
+   fold_convert (ptr_type_node, update)));
   else
 new_dataref_ptr = make_ssa_name (TREE_TYPE (dataref_ptr));
   incr_stmt = gimple_build_assign (new_dataref_ptr, POINTER_PLUS_EXPR,
-- 
2.35.3


[PING^4] nvptx: forward '-v' command-line option to assembler, linker

2022-07-20 Thread Thomas Schwinge
Hi Tom!

Ping.


Grüße
 Thomas


On 2022-07-13T10:41:23+0200, I wrote:
> Hi Tom!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2022-07-05T16:58:54+0200, I wrote:
>> Hi Tom!
>>
>> Ping.
>>
>>
>> Grüße
>>  Thomas
>>
>>
>> On 2022-06-07T17:41:16+0200, I wrote:
>>> Hi!
>>>
>>> On 2022-05-30T09:06:21+0200, Tobias Burnus  wrote:
 On 29.05.22 22:49, Thomas Schwinge wrote:
> Not sure if that's what you had in mind, but what do you think about the
> attached "nvptx: forward '-v' command-line option to assembler, linker"?
> OK to push to GCC master branch (after merging
> 
> "Put '-v' verbose output onto stderr instead of stdout")?

 I was mainly thinking of some way to have it available — which
 '-foffload-options=-Wa,-v' already permits on the GCC side. (Once the
 nvptx-tools patch actually makes use of the '-v'.)
>>>
>>> (Merged a week ago.)
>>>
 If I understand your patch correctly, this patch now causes 'gcc -v' to
 imply 'gcc -v -Wa,-v'. I think that's okay, since 'gcc -v' already
 outputs a lot of lines and those lines can be helpful to understand what
 happens and what not.
>>>
>>> ACK.
>>>
 Tom, your thoughts on this?
>>>
>>> Ping.
>>>
>>>
>>> Grüße
>>>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 17c35607d4927299b0c4bd19dd6fd205c85c4a4b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sun, 29 May 2022 22:31:43 +0200
Subject: [PATCH] nvptx: forward '-v' command-line option to assembler, linker

For example, for offloading compilation with '-save-temps -v', before vs. after
word-diff then looks like:

[...]
 [...]/build-gcc-offload-nvptx-none/gcc/as {+-v -v+} -o ./a.xnvptx-none.mkoffload.o ./a.xnvptx-none.mkoffload.s
{+Verifying sm_30 code with sm_35 code generation.+}
{+ ptxas -c -o /dev/null ./a.xnvptx-none.mkoffload.o --gpu-name sm_35 -O0+}
[...]
 [...]/build-gcc-offload-nvptx-none/gcc/collect2 {+-v -v+} -o ./a.xnvptx-none.mkoffload [...] @./a.xnvptx-none.mkoffload.args.1 -lgomp -lgcc -lc -lgcc
{+collect2 version 12.0.1 20220428 (experimental)+}
{+[...]/build-gcc-offload-nvptx-none/gcc/collect-ld -v -v -o ./a.xnvptx-none.mkoffload [...] ./a.xnvptx-none.mkoffload.o -lgomp -lgcc -lc -lgcc+}
{+Linking ./a.xnvptx-none.mkoffload.o as 0+}
{+trying lib libc.a+}
{+trying lib libgcc.a+}
{+trying lib libgomp.a+}
{+Resolving abort+}
{+Resolving acc_on_device+}
{+Linking libgomp.a::oacc-init.o/ as 1+}
{+Linking libc.a::lib_a-abort.o/   as 2+}
[...]

(This depends on 
"Put '-v' verbose output onto stderr instead of stdout".)

	gcc/
	* config/nvptx/nvptx.h (ASM_SPEC, LINK_SPEC): Define.
---
 gcc/config/nvptx/nvptx.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index ed72c253191..b184f1d0150 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -27,6 +27,13 @@
 
 /* Run-time Target.  */
 
+/* Assembler supports '-v' option; handle similar to
+   '../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
+#define ASM_SPEC "%{v}"
+
+/* Linker supports '-v' option.  */
+#define LINK_SPEC "%{v}"
+
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
-- 
2.25.1



[PING^3] nvptx: Allow '--with-arch' to override the default '-misa' (was: nvptx multilib setup)

2022-07-20 Thread Thomas Schwinge
Hi Tom!

Ping.


Grüße
 Thomas


On 2022-07-13T10:42:44+0200, I wrote:
> Hi Tom!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2022-07-05T16:59:23+0200, I wrote:
>> Hi Tom!
>>
>> Ping.
>>
>>
>> Grüße
>>  Thomas
>>
>>
>> On 2022-06-15T23:18:10+0200, I wrote:
>>> Hi Tom!
>>>
>>> On 2022-05-13T16:20:14+0200, I wrote:
 On 2022-02-04T13:09:29+0100, Tom de Vries via Gcc  wrote:
> On 2/4/22 08:21, Thomas Schwinge wrote:
>> On 2022-02-03T13:35:55+, "vries at gcc dot gnu.org via Gcc-bugs" 
>>  wrote:
>>> I've tested this using (recommended) driver 470.94 on boards:

>>> while iterating over dimensions { -mptx=3.1 , -mptx=6.3 } x { 
>>> GOMP_NVPTX_JIT=-O0,  }.
>>
>> Do you use separate (nvptx-none offload target only?) builds for
>> different '-mptx' variants (likewise: '-misa'), or have you hacked up the
>> multilib configuration?
>
> Neither, I'm using --target_board=unix/foffload= for that.

 ACK, I see.  So these flags then only affect GCC/nvptx code generation
 for the actual user code (here: GCC libgomp test cases), but for the
 GCC/nvptx target libraries (such as: libc, libm, libgfortran, libgomp --
 the latter especially relevant for OpenMP), it uses PTX code from one of
 the two "pre-compiled" GCC/nvptx multilibs: default or '-mptx=3.1'.

 Meaning, one can't just use such a flag for "completely building code"
 for a specific configuration.  Random example,
 '-foffload-options=nvptx-none=-march=sm_75': as GCC/nvptx target
 libraries aren't being built for '-march=sm_75' multilib,
 '-foffload-options=nvptx-none=-march=sm_75' uses the default multilib,
 which isn't '-march=sm_75'.


>   ('gcc/config/nvptx/t-nvptx:MULTILIB_OPTIONS'
>> etc., I suppose?)  Should we add a few representative configurations to
>> be built by default?  And/or, should we have a way to 'configure' per
>> user needs (I suppose: '--with-multilib-list=[...]', as supported for a
>> few other targets?)?  (I see there's also a new
>> '--with-multilib-generator=[...]', haven't looked in detail.)  No matter
>> which way: again, combinatorial explosion is a problem, of course...
>
> As far as I know, the gcc build doesn't finish when switching default to
> higher than sm_35, so there's little point to go to a multilib setup at
> this point.  But once we fix that, we could reconsider, otherwise,
> things are likely to regress again.

 As far as I remember, several issues have been fixed.  Still waiting for
 Roger's "middle-end: Support ABIs that pass FP values as wider integers"
 or something similar, but that PR104489 issue is being worked around by
 "Limit HFmode support to mexperimental", if I got that right.

 Now I'm not suggesting we should now enable all or any random GCC/nvptx
 multilibs, to get all these variants of GCC/nvptx target libraries built;
 especially also given that GCC/nvptx code generation currently doesn't
 make too much use of the new capabilities.

 However, we do have a specific request that a customer would like to be
 able to change at GCC 'configure' time the GCC/nvptx default multilib
 (including that being used for building corresponding GCC/nvptx target
 libraries).

 Per 'gcc/doc/install.texi', I do see that some GCC targets allow for
 GCC 'configure'-time '--with-multilib-list=[...]', or
 '--with-multilib-generator=[...]', and I suppose we could be doing
 something similar?  But before starting implementing, I'd like your
 input, as you'll be the one to approve in the end.  And/or, maybe you've
 already made up your own ideas about that?
>>>
>>> So, instead of "random GCC/nvptx multilib configuration" (last
>>> paragraph), I've come up with a way to implement our customer's request
>>> (second last paragraph): 'configure' GCC/nvptx '--with-arch=sm_70'.
>>>
>>> I think I've implemented this in a way so that "random GCC/nvptx multilib
>>> configuration" may eventually be implemented on top of that.  For easy
>>> review/testing I've split my changes into three commits, see attached
>>> "nvptx: Make default '-misa=sm_30' explicit",
>>> "nvptx: Introduce dummy multilib option for default '-misa=sm_30'",
>>> "nvptx: Allow '--with-arch' to override the default '-misa'".
>>>
>>> To the best of my knowledge, the first two patches do not change any
>>> user-visible behavior (I generally 'diff'ed target libraries, and
>>> compared a good number of 'gcc -print-multi-directory [flags]'), and
>>> likewise with the third patch, given implicit (default) or explicit
>>> '--with-arch=sm_30', and that with '--with-arch=sm_70', for example, the
>>> '-misa=sm_70' multilib variants are used for implicit (default) or
>>> explicit '-misa=sm_70' or higher, and the '-misa=sm_30' multilib variants
>>> are used for explicit lower '-misa'.
>>>
>>> What do you think, OK to push to master branch?

Re: [PATCH v2] ipa-visibility: Optimize TLS access [PR99619]

2022-07-20 Thread Alexander Monakov via Gcc-patches


Ping.

On Thu, 7 Jul 2022, Alexander Monakov via Gcc-patches wrote:

> From: Artem Klimov 
> 
> Fix PR99619, which asks to optimize TLS model based on visibility.
> The fix is implemented as an IPA optimization: this allows to take
> optimized visibility status into account (as well as avoid modifying
> all language frontends).
> 
> 2022-04-17  Artem Klimov  
> 
> gcc/ChangeLog:
> 
>   * ipa-visibility.cc (function_and_variable_visibility): Promote
>   TLS access model afer visibility optimizations.
>   * varasm.cc (have_optimized_refs): New helper.
>   (optimize_dyn_tls_for_decl_p): New helper. Use it ...
>   (decl_default_tls_model): ... here in place of 'optimize' check.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tls/vis-attr-gd.c: New test.
>   * gcc.dg/tls/vis-attr-hidden-gd.c: New test.
>   * gcc.dg/tls/vis-attr-hidden.c: New test.
>   * gcc.dg/tls/vis-flag-hidden-gd.c: New test.
>   * gcc.dg/tls/vis-flag-hidden.c: New test.
>   * gcc.dg/tls/vis-pragma-hidden-gd.c: New test.
>   * gcc.dg/tls/vis-pragma-hidden.c: New test.
> 
> Co-Authored-By:  Alexander Monakov  
> Signed-off-by: Artem Klimov 
> ---
> 
> v2: run the new loop in ipa-visibility only in the whole-program IPA pass;
> in decl_default_tls_model, check if any referring function is optimized
> when 'optimize == 0' (when running in LTO mode)
> 
> 
> Note for reviewers: I noticed there's a place which tries to avoid TLS
> promotion, but the comment seems wrong and I could not find a testcase.
> I'd suggest we remove it. The compiler can only promote general-dynamic
> to local-dynamic and initial-exec to local-exec. The comment refers to
> promoting x-dynamic to y-exec, but that cannot happen AFAICT:
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=8e1ba78f1b8eedd6c65c6f0e6d6d09a801de5d3d
> 
> 
>  gcc/ipa-visibility.cc | 19 +++
>  gcc/testsuite/gcc.dg/tls/vis-attr-gd.c| 12 +++
>  gcc/testsuite/gcc.dg/tls/vis-attr-hidden-gd.c | 13 
>  gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c| 12 +++
>  gcc/testsuite/gcc.dg/tls/vis-flag-hidden-gd.c | 13 
>  gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c| 12 +++
>  .../gcc.dg/tls/vis-pragma-hidden-gd.c | 17 ++
>  gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c  | 16 ++
>  gcc/varasm.cc | 32 ++-
>  9 files changed, 145 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tls/vis-attr-gd.c
>  create mode 100644 gcc/testsuite/gcc.dg/tls/vis-attr-hidden-gd.c
>  create mode 100644 gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c
>  create mode 100644 gcc/testsuite/gcc.dg/tls/vis-flag-hidden-gd.c
>  create mode 100644 gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c
>  create mode 100644 gcc/testsuite/gcc.dg/tls/vis-pragma-hidden-gd.c
>  create mode 100644 gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c
> 
> diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc
> index 8a27e7bcd..3ed2b7cf6 100644
> --- a/gcc/ipa-visibility.cc
> +++ b/gcc/ipa-visibility.cc
> @@ -873,6 +873,25 @@ function_and_variable_visibility (bool whole_program)
>   }
>  }
>  
> +  if (symtab->state >= IPA_SSA)
> +{
> +  FOR_EACH_VARIABLE (vnode)
> + {
> +   tree decl = vnode->decl;
> +
> +   /* Upgrade TLS access model based on optimized visibility status,
> +  unless it was specified explicitly or no references remain.  */
> +   if (DECL_THREAD_LOCAL_P (decl)
> +   && !lookup_attribute ("tls_model", DECL_ATTRIBUTES (decl))
> +   && vnode->ref_list.referring.length ())
> + {
> +   enum tls_model new_model = decl_default_tls_model (decl);
> +   gcc_checking_assert (new_model >= decl_tls_model (decl));
> +   set_decl_tls_model (decl, new_model);
> + }
> + }
> +}
> +
>if (dump_file)
>  {
>fprintf (dump_file, "\nMarking local functions:");
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 4db8506b1..de149e82c 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -6679,6 +6679,36 @@ init_varasm_once (void)
>  #endif
>  }
>  
> +/* Determine whether SYMBOL is used in any optimized function.  */
> +
> +static bool
> +have_optimized_refs (struct symtab_node *symbol)
> +{
> +  struct ipa_ref *ref;
> +
> +  for (int i = 0; symbol->iterate_referring (i, ref); i++)
> +{
> +  cgraph_node *cnode = dyn_cast  (ref->referring);
> +
> +  if (cnode && opt_for_fn (cnode->decl, optimize))
> + return true;
> +}
> +
> +  return false;
> +}
> +
> +/* Check if promoting general-dynamic TLS access model to local-dynamic is
> +   desirable for DECL.  */
> +
> +static bool
> +optimize_dyn_tls_for_decl_p (const_tree decl)
> +{
> +  if (optimize)
> +return true;
> +  return symtab->state >= IPA && have_optimized_refs (symtab_node::get 
> (decl));
> +}
> +
> +
>  enum tls_model
>  decl_default_tls_model (const_tree decl)
>

Re: [PATCH 2/2] Avoid registering __builtin_setjmp_receiver label twice [PR101347]

2022-07-20 Thread Alexander Monakov via Gcc-patches


On Wed, 20 Jul 2022, Eric Botcazou wrote:

> > Eric is probably most familiar with this, but can you make sure to bootstrap
> > and test this on a SJLJ EH target?  I'm not sure --enable-sjlj-exceptions
> > is well tested anywhere but on targets not supporting DWARF EH and the
> > configury is a bit odd suggesting the option is mostly ignored ...
> 
> This is a specific circuitry for __builtln_setjmp so it is *not* exercised by 
> the SJLJ exception scheme.  It used to be exercised by the GNAT bootstrap, 
> but 
> that's no longer the case either.
> 
> I think that the fix is sensible, assuming that it passes the C testsuite.

Yes, it passes the usual regtest.  Thanks, applying to trunk.

Alexander


[PATCH V2] arm: add -static-pie support

2022-07-20 Thread Lance Fredrickson via Gcc-patches
This patch adds -static-pie support for the arm architecture. aarch64 
had the appropriate code for handling -static-pie, so this just mirrors 
the code found there.  Tested with uclibc-ng and musl c-standard 
libraries to produce static-pie binaries.  Re-submitted with minor spell 
check fix.From 4e122adfea2a6247f2da0c094f3203cbdf2a578c Mon Sep 17 00:00:00 2001
From: lancethepants 
Date: Tue, 19 Jul 2022 14:21:05 -0600
Subject: [PATCH V2] arm: add -static-pie support

The commit mirrors code from aarch64 to handle -static-pie.
Tested with uclibc-ng and musl c-standard libraries.

Signed-off-by: Lance Fredrickson 
---
 gcc/config/arm/linux-elf.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/linux-elf.h b/gcc/config/arm/linux-elf.h
index df3da67c4f0..70f71b051a3 100644
--- a/gcc/config/arm/linux-elf.h
+++ b/gcc/config/arm/linux-elf.h
@@ -66,9 +66,10 @@
%{static:-Bstatic} \
%{shared:-shared} \
%{symbolic:-Bsymbolic} \
-   %{!static: \
+   %{!static:%{!static-pie: \
  %{rdynamic:-export-dynamic} \
- %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}} \
+ %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}} \
+   %{static-pie:-Bstatic -pie --no-dynamic-linker -z text} \
-X \
%{mbig-endian:-EB} %{mlittle-endian:-EL}" \
SUBTARGET_EXTRA_LINK_SPEC
-- 
2.20.1



Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-20 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 18 Jul 2022 at 11:57, Richard Biener  wrote:
>
> On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> >  wrote:
> > >
> > > Richard Biener  writes:
> > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > >  wrote:
> > > >>
> > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > >>  wrote:
> > > >> >
> > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > >> >  wrote:
> > > >> > >
> > > >> > > Hi Richard,
> > > >> > > For the following test:
> > > >> > >
> > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > >> > > {
> > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > >> > > }
> > > >> > >
> > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > >> > > foo.c: In function ‘f2’:
> > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > >> > >   |   ^~
> > > >> > > svint32_t
> > > >> > > __Int32x4_t
> > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > >> > > during GIMPLE pass: forwprop
> > > >> > > dump file: foo.c.109t.forwprop2
> > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > >> > > 0xe9371f execute_function_todo
> > > >> > > ../../gcc/gcc/passes.cc:2091
> > > >> > > 0xe93ccb execute_todo
> > > >> > > ../../gcc/gcc/passes.cc:2145
> > > >> > >
> > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, 
> > > >> > > we have:
> > > >> > >   int32x4_t v;
> > > >> > >   __Int32x4_t _1;
> > > >> > >   svint32_t _9;
> > > >> > >   vector(4) int _11;
> > > >> > >
> > > >> > >:
> > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > >> > >   v_12 = _1;
> > > >> > >   _11 = v_12;
> > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > >> > >   return _9;
> > > >> > >
> > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > >> > > view_convert_expr,
> > > >> > > and the end result becomes:
> > > >> > >   svint32_t _7;
> > > >> > >   __Int32x4_t _8;
> > > >> > >
> > > >> > > ;;   basic block 2, loop depth 0
> > > >> > > ;;pred:   ENTRY
> > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > >> > >   return _7;
> > > >> > > ;;succ:   EXIT
> > > >> > >
> > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > >> > >
> > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > > >> >
> > > >> > It for sure papers over the issue.  I think the error happens 
> > > >> > earlier,
> > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > >> > which is the type of the LHS.  But then you probably run into the
> > > >> > different sizes ICE (VLA vs constant size).  I think for this case 
> > > >> > you
> > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > >> > selecting the "low" part of the VLA vector.
> > > >> Hi Richard,
> > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > >> represent dup operation
> > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > >> BIT_FIELD_REF will work.
> > > >> Could you please elaborate ?
> > > >>
> > > >> Also, the issue doesn't seem restricted to this case.
> > > >> The following test case also ICE's during forwprop:
> > > >> svint32_t foo()
> > > >> {
> > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > >>   return v2;
> > > >> }
> > > >>
> > > >> foo2.c: In function ‘foo’:
> > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > >> 9 | }
> > > >>   | ^
> > > >> svint32_t
> > > >> int32x4_t
> > > >> v2_4 = { 1, 2, 3, 4 };
> > > >>
> > > >> because simplify_permutation folds
> > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > >> into:
> > > >> vector_cst {1, 2, 3, 4}
> > > >>
> > > >> and it complains during verify_gimple_assign_single because we don't
> > > >> support assignment of vector_cst to VLA vector.
> > > >>
> > > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > > >> supports lhs and rhs
> > > >> to have vector types with differing lengths, and simplifying it to
> > > >> other tree codes, like above,
> > > >> will result in type errors ?
> > > >
> > > > That might be the case - Richard should know.
> > >
> > > I don't see anything particularly special about VEC_PERM_EXPR here,
> > > or 

[r13-1762 Regression] FAIL: gcc.dg/pr56837.c scan-tree-dump-times optimized "memset ..c, 68, 16384.; " 1 on Linux/x86_64

2022-07-20 Thread skpandey--- via Gcc-patches
On Linux/x86_64,

f9d4c3b45c5ed5f45c8089c990dbd4e181929c3d is the first bad commit
commit f9d4c3b45c5ed5f45c8089c990dbd4e181929c3d
Author: liuhongt 
Date:   Tue Jul 19 17:24:52 2022 +0800

Lower complex type move to enable vectorization for complex type load&store.

caused

FAIL: gcc.dg/pr23911.c scan-tree-dump-times dce3 "__complex__ \\(1.0e\\+0, 
0.0\\)" 2
FAIL: gcc.dg/pr56837.c scan-tree-dump-times optimized "memset ..c, 68, 16384.;" 
1

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r13-1762/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr23911.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr23911.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr23911.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr23911.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr56837.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr56837.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr56837.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr56837.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] Adding three new function attributes for static analysis of file descriptors

2022-07-20 Thread Immad Mir via Gcc-patches
This patch adds three new function attributes to GCC that
are used for static analysis of usage of file descriptors:

1) __attribute__ ((fd_arg(N))): The attributes may be applied to a function that
takes on open file descriptor at refrenced argument N.

It indicates that the passed filedescriptor must not have been closed.
Therefore, when the analyzer is enabled with -fanalyzer, the
analyzer may emit a -Wanalyzer-fd-use-after-close diagnostic
if it detects a code path in which a function with this attribute is
called with a closed file descriptor.

The attribute also indicates that the file descriptor must have been checked for
validity before usage. Therefore, analyzer may emit
-Wanalyzer-fd-use-without-check diagnostic if it detects a code path in
which a function with this attribute is called with a file descriptor that has
not been checked for validity.

2) __attribute__((fd_arg_read(N))): The attribute is identical to
fd_arg, but with the additional requirement that it might read from
the file descriptor, and thus, the file descriptor must not have been opened
as write-only.

The analyzer may emit a -Wanalyzer-access-mode-mismatch
diagnostic if it detects a code path in which a function with this
attribute is called on a file descriptor opened with O_WRONLY.

3) __attribute__((fd_arg_write(N))): The attribute is identical to fd_arg_read
except that the analyzer may emit a -Wanalyzer-access-mode-mismatch diagnostic 
if
it detects a code path in which a function with this attribute is called on a
file descriptor opened with O_RDONLY.

gcc/analyzer/ChangeLog:
* sm-fd.cc (fd_param_diagnostic): New diagnostic class.
(fd_access_mode_mismatch): Change inheritance from fd_diagnostic
to fd_param_diagnostic. Add new overloaded constructor.
(fd_use_after_close): Likewise.
(unchecked_use_of_fd): Likewise and also change name to 
fd_use_without_check.
(double_close): Change name to fd_double_close.
(enum access_directions): New.
(fd_state_machine::on_stmt): Handle calls to function with the
new three function attributes.
(fd_state_machine::check_for_fd_attrs): New.
(fd_state_machine::on_open): Use the new overloaded constructors
of diagnostic classes.

gcc/c-family/ChangeLog:
* c-attribs.cc: (c_common_attribute_table): add three new attributes
namely: fd_arg, fd_arg_read and fd_arg_write.
(handle_fd_arg_attribute): New.

gcc/ChangeLog:
* doc/extend.texi: Add fd_arg, fd_arg_read and fd_arg_write under
"Common Function Attributes" section.
* doc/invoke.texi: Add docs to -Wanalyzer-fd-access-mode-mismatch,
-Wanalyzer-use-after-close, -Wanalyzer-fd-use-without-check that these
warnings may be emitted through usage of three function attributes used
for static analysis of file descriptors namely fd_arg, fd_arg_read and
fd_arg_write.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/fd-5.c: New test.
* c-c++-common/attr-fd.c: New test.

Signed-off-by: Immad Mir 
---
 gcc/analyzer/sm-fd.cc| 335 +--
 gcc/c-family/c-attribs.cc|  32 +++
 gcc/doc/extend.texi  |  37 +++
 gcc/doc/invoke.texi  |  17 +-
 gcc/testsuite/c-c++-common/attr-fd.c |  18 ++
 gcc/testsuite/gcc.dg/analyzer/fd-5.c |  53 +
 6 files changed, 422 insertions(+), 70 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/attr-fd.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-5.c

diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
index 8e4300b06e2..bb89c471f7e 100644
--- a/gcc/analyzer/sm-fd.cc
+++ b/gcc/analyzer/sm-fd.cc
@@ -39,10 +39,13 @@ along with GCC; see the file COPYING3.  If not see
 #include "analyzer/analyzer-selftests.h"
 #include "tristate.h"
 #include "selftest.h"
+#include "stringpool.h"
+#include "attribs.h"
 #include "analyzer/call-string.h"
 #include "analyzer/program-point.h"
 #include "analyzer/store.h"
 #include "analyzer/region-model.h"
+#include "bitmap.h"
 
 #if ENABLE_ANALYZER
 
@@ -59,6 +62,13 @@ enum access_mode
   WRITE_ONLY
 };
 
+enum access_directions
+{
+  DIRS_READ_WRITE,
+  DIRS_READ,
+  DIRS_WRITE
+};
+
 class fd_state_machine : public state_machine
 {
 public:
@@ -146,7 +156,7 @@ private:
   void check_for_open_fd (sm_context *sm_ctxt, const supernode *node,
   const gimple *stmt, const gcall *call,
   const tree callee_fndecl,
-  enum access_direction access_fn) const;
+  enum access_directions access_fn) const;
 
   void make_valid_transitions_on_condition (sm_context *sm_ctxt,
 const supernode *node,
@@ -156,6 +166,10 @@ private:
   const supernode *node,
   const gimple *stmt,
 

Re: [PATCH] Adding three new function attributes for static analysis of file descriptors

2022-07-20 Thread David Malcolm via Gcc-patches
On Wed, 2022-07-20 at 23:29 +0530, Immad Mir wrote:


> This patch adds three new function attributes to GCC that
> are used for static analysis of usage of file descriptors:

Thanks for the updated patch.

Some very minor spelling/grammar/whitespace nits...

> 
> 1) __attribute__ ((fd_arg(N))): The attributes may be applied to a function 
> that
> takes on open file descriptor at refrenced argument N.

"on open" -> "an open"

"refrenced" -> "referenced"

[...snip...]

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index d5ff1018372..9234275ca6d 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -9843,7 +9843,12 @@ This warning requires @option{-fanalyzer}, which 
> enables it; use
>  to disable it.
>  
>  This diagnostic warns for paths through code in which a 
> -@code{read} on a write-only file descriptor is attempted, or vice versa
> +@code{read} on a write-only file descriptor is attempted, or vice versa.
> +
> +This diagnostic also warns for code paths in a which a function with 
> attribute
> +@code{fd_arg_read (N)} is called with a file descriptor opened with 
> @code{O_WRONLY} 
> +at refrenced argument @code{N} or a function with attribute 
> @code{fd_arg_write (N)}

"refrenced" -> "referenced"

> +is called with a file descriptor opened with @code{O_RDONLY} at refrenced 
> argument @var{N},

"refrenced" -> "referenced".

Please wrap these lines to avoid going over 80 columns.

>  
>  @item -Wno-analyzer-fd-double-close
>  @opindex Wanalyzer-fd-double-close
> @@ -9875,6 +9880,11 @@ to disable it.
>  This diagnostic warns for paths through code in which a 
>  read or write is called on a closed file descriptor.
>  
> +This diagnostic also warns for paths through code in which
> +a function with attribute @code{fd_arg (N)} or @code{fd_arg_read (N)}
> +or @code{fd_arg_write (N)} is called with a closed file descriptor at
> +refrenced argument @code{N}.

"refrenced" -> "referenced".


> +
>  @item -Wno-analyzer-fd-use-without-check
>  @opindex Wanalyzer-fd-use-without-check
>  @opindex Wno-analyzer-fd-use-without-check
> @@ -9885,6 +9895,11 @@ to disable it.
>  This diagnostic warns for paths through code in which a 
>  file descriptor is used without being checked for validity.
>  
> +This diagnostic also warns for paths through code in which
> +a function with attribute @code{fd_arg (N)} or @code{fd_arg_read (N)}
> +or @code{fd_arg_write (N)} is called with a file descriptor, at refrenced
> +argument @code{N}, without being checked for validity.

"refrenced" -> "referenced".

> +
>  @item -Wno-analyzer-file-leak
>  @opindex Wanalyzer-file-leak
>  @opindex Wno-analyzer-file-leak

[...snip...]

...but with those fixed, this patch is OK for trunk, assuming it
bootstraps and passes regression tests.

Thanks!
Dave



Re: [PATCH] Adding three new function attributes for static analysis of file descriptors

2022-07-20 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 20 Jul 2022 at 23:31, Immad Mir via Gcc-patches
 wrote:
>
> This patch adds three new function attributes to GCC that
> are used for static analysis of usage of file descriptors:
>
> 1) __attribute__ ((fd_arg(N))): The attributes may be applied to a function 
> that
> takes on open file descriptor at refrenced argument N.
>
> It indicates that the passed filedescriptor must not have been closed.
> Therefore, when the analyzer is enabled with -fanalyzer, the
> analyzer may emit a -Wanalyzer-fd-use-after-close diagnostic
> if it detects a code path in which a function with this attribute is
> called with a closed file descriptor.
>
> The attribute also indicates that the file descriptor must have been checked 
> for
> validity before usage. Therefore, analyzer may emit
> -Wanalyzer-fd-use-without-check diagnostic if it detects a code path in
> which a function with this attribute is called with a file descriptor that has
> not been checked for validity.
>
> 2) __attribute__((fd_arg_read(N))): The attribute is identical to
> fd_arg, but with the additional requirement that it might read from
> the file descriptor, and thus, the file descriptor must not have been opened
> as write-only.
>
> The analyzer may emit a -Wanalyzer-access-mode-mismatch
> diagnostic if it detects a code path in which a function with this
> attribute is called on a file descriptor opened with O_WRONLY.
>
> 3) __attribute__((fd_arg_write(N))): The attribute is identical to fd_arg_read
> except that the analyzer may emit a -Wanalyzer-access-mode-mismatch 
> diagnostic if
> it detects a code path in which a function with this attribute is called on a
> file descriptor opened with O_RDONLY.
>
> gcc/analyzer/ChangeLog:
> * sm-fd.cc (fd_param_diagnostic): New diagnostic class.
> (fd_access_mode_mismatch): Change inheritance from fd_diagnostic
> to fd_param_diagnostic. Add new overloaded constructor.
> (fd_use_after_close): Likewise.
> (unchecked_use_of_fd): Likewise and also change name to 
> fd_use_without_check.
> (double_close): Change name to fd_double_close.
> (enum access_directions): New.
> (fd_state_machine::on_stmt): Handle calls to function with the
> new three function attributes.
> (fd_state_machine::check_for_fd_attrs): New.
> (fd_state_machine::on_open): Use the new overloaded constructors
> of diagnostic classes.
>
> gcc/c-family/ChangeLog:
> * c-attribs.cc: (c_common_attribute_table): add three new attributes
> namely: fd_arg, fd_arg_read and fd_arg_write.
> (handle_fd_arg_attribute): New.
>
> gcc/ChangeLog:
> * doc/extend.texi: Add fd_arg, fd_arg_read and fd_arg_write under
> "Common Function Attributes" section.
> * doc/invoke.texi: Add docs to -Wanalyzer-fd-access-mode-mismatch,
> -Wanalyzer-use-after-close, -Wanalyzer-fd-use-without-check that these
> warnings may be emitted through usage of three function attributes 
> used
> for static analysis of file descriptors namely fd_arg, fd_arg_read and
> fd_arg_write.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/analyzer/fd-5.c: New test.
> * c-c++-common/attr-fd.c: New test.
>
> Signed-off-by: Immad Mir 
> ---
>  gcc/analyzer/sm-fd.cc| 335 +--
>  gcc/c-family/c-attribs.cc|  32 +++
>  gcc/doc/extend.texi  |  37 +++
>  gcc/doc/invoke.texi  |  17 +-
>  gcc/testsuite/c-c++-common/attr-fd.c |  18 ++
>  gcc/testsuite/gcc.dg/analyzer/fd-5.c |  53 +
>  6 files changed, 422 insertions(+), 70 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/attr-fd.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-5.c
>
> diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> index 8e4300b06e2..bb89c471f7e 100644
> --- a/gcc/analyzer/sm-fd.cc
> +++ b/gcc/analyzer/sm-fd.cc
> @@ -39,10 +39,13 @@ along with GCC; see the file COPYING3.  If not see
>  #include "analyzer/analyzer-selftests.h"
>  #include "tristate.h"
>  #include "selftest.h"
> +#include "stringpool.h"
> +#include "attribs.h"
>  #include "analyzer/call-string.h"
>  #include "analyzer/program-point.h"
>  #include "analyzer/store.h"
>  #include "analyzer/region-model.h"
> +#include "bitmap.h"
>
>  #if ENABLE_ANALYZER
>
> @@ -59,6 +62,13 @@ enum access_mode
>WRITE_ONLY
>  };
>
> +enum access_directions
> +{
> +  DIRS_READ_WRITE,
> +  DIRS_READ,
> +  DIRS_WRITE
> +};
> +
>  class fd_state_machine : public state_machine
>  {
>  public:
> @@ -146,7 +156,7 @@ private:
>void check_for_open_fd (sm_context *sm_ctxt, const supernode *node,
>const gimple *stmt, const gcall *call,
>const tree callee_fndecl,
> -  enum access_direction access_fn) const;
> +  enum access_directions access_fn) const;
>
>void make_valid_transitions_on_condition

Re: [PATCH] Adding three new function attributes for static analysis of file descriptors

2022-07-20 Thread Mir Immad via Gcc-patches
 > Sorry to nitpick -- I assume stmt here refers to a call stmt ?
> In that case, I suppose it'd be better to use const gcall *stmt ?

Thanks for the catch, Prathamesh.

Immad.


On Wed, Jul 20, 2022 at 11:59 PM Prathamesh Kulkarni <
prathamesh.kulka...@linaro.org> wrote:

> On Wed, 20 Jul 2022 at 23:31, Immad Mir via Gcc-patches
>  wrote:
> >
> > This patch adds three new function attributes to GCC that
> > are used for static analysis of usage of file descriptors:
> >
> > 1) __attribute__ ((fd_arg(N))): The attributes may be applied to a
> function that
> > takes on open file descriptor at refrenced argument N.
> >
> > It indicates that the passed filedescriptor must not have been closed.
> > Therefore, when the analyzer is enabled with -fanalyzer, the
> > analyzer may emit a -Wanalyzer-fd-use-after-close diagnostic
> > if it detects a code path in which a function with this attribute is
> > called with a closed file descriptor.
> >
> > The attribute also indicates that the file descriptor must have been
> checked for
> > validity before usage. Therefore, analyzer may emit
> > -Wanalyzer-fd-use-without-check diagnostic if it detects a code path in
> > which a function with this attribute is called with a file descriptor
> that has
> > not been checked for validity.
> >
> > 2) __attribute__((fd_arg_read(N))): The attribute is identical to
> > fd_arg, but with the additional requirement that it might read from
> > the file descriptor, and thus, the file descriptor must not have been
> opened
> > as write-only.
> >
> > The analyzer may emit a -Wanalyzer-access-mode-mismatch
> > diagnostic if it detects a code path in which a function with this
> > attribute is called on a file descriptor opened with O_WRONLY.
> >
> > 3) __attribute__((fd_arg_write(N))): The attribute is identical to
> fd_arg_read
> > except that the analyzer may emit a -Wanalyzer-access-mode-mismatch
> diagnostic if
> > it detects a code path in which a function with this attribute is called
> on a
> > file descriptor opened with O_RDONLY.
> >
> > gcc/analyzer/ChangeLog:
> > * sm-fd.cc (fd_param_diagnostic): New diagnostic class.
> > (fd_access_mode_mismatch): Change inheritance from fd_diagnostic
> > to fd_param_diagnostic. Add new overloaded constructor.
> > (fd_use_after_close): Likewise.
> > (unchecked_use_of_fd): Likewise and also change name to
> fd_use_without_check.
> > (double_close): Change name to fd_double_close.
> > (enum access_directions): New.
> > (fd_state_machine::on_stmt): Handle calls to function with the
> > new three function attributes.
> > (fd_state_machine::check_for_fd_attrs): New.
> > (fd_state_machine::on_open): Use the new overloaded constructors
> > of diagnostic classes.
> >
> > gcc/c-family/ChangeLog:
> > * c-attribs.cc: (c_common_attribute_table): add three new
> attributes
> > namely: fd_arg, fd_arg_read and fd_arg_write.
> > (handle_fd_arg_attribute): New.
> >
> > gcc/ChangeLog:
> > * doc/extend.texi: Add fd_arg, fd_arg_read and fd_arg_write under
> > "Common Function Attributes" section.
> > * doc/invoke.texi: Add docs to
> -Wanalyzer-fd-access-mode-mismatch,
> > -Wanalyzer-use-after-close, -Wanalyzer-fd-use-without-check that
> these
> > warnings may be emitted through usage of three function
> attributes used
> > for static analysis of file descriptors namely fd_arg,
> fd_arg_read and
> > fd_arg_write.
> >
> > gcc/testsuite/ChangeLog:
> > * gcc.dg/analyzer/fd-5.c: New test.
> > * c-c++-common/attr-fd.c: New test.
> >
> > Signed-off-by: Immad Mir 
> > ---
> >  gcc/analyzer/sm-fd.cc| 335 +--
> >  gcc/c-family/c-attribs.cc|  32 +++
> >  gcc/doc/extend.texi  |  37 +++
> >  gcc/doc/invoke.texi  |  17 +-
> >  gcc/testsuite/c-c++-common/attr-fd.c |  18 ++
> >  gcc/testsuite/gcc.dg/analyzer/fd-5.c |  53 +
> >  6 files changed, 422 insertions(+), 70 deletions(-)
> >  create mode 100644 gcc/testsuite/c-c++-common/attr-fd.c
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-5.c
> >
> > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> > index 8e4300b06e2..bb89c471f7e 100644
> > --- a/gcc/analyzer/sm-fd.cc
> > +++ b/gcc/analyzer/sm-fd.cc
> > @@ -39,10 +39,13 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "analyzer/analyzer-selftests.h"
> >  #include "tristate.h"
> >  #include "selftest.h"
> > +#include "stringpool.h"
> > +#include "attribs.h"
> >  #include "analyzer/call-string.h"
> >  #include "analyzer/program-point.h"
> >  #include "analyzer/store.h"
> >  #include "analyzer/region-model.h"
> > +#include "bitmap.h"
> >
> >  #if ENABLE_ANALYZER
> >
> > @@ -59,6 +62,13 @@ enum access_mode
> >WRITE_ONLY
> >  };
> >
> > +enum access_directions
> > +{
> > +  DIRS_READ_WRITE,
> > +  DIRS_READ,
> > +  DI

[PATCH, committed] Fortran: fix parsing of omp task affinity iterator clause [PR101330]

2022-07-20 Thread Harald Anlauf via Gcc-patches
Dear all,

there was some left-over code - likely from development - that could
lead to a compiler segfault when given invalid input.  Steve found
the offending line.  Removing it solves the issue.

The fix was acknowledged by Tobias in the PR.

Regtested on x86_64-pc-linux-gnu.

Pushed as: r13-1767-g26bbe78f77f73bb66af1ac13d0deec888a3c6510

Will backport to 12-branch, as the offending code was introduced there.

Thanks,
Harald

From 26bbe78f77f73bb66af1ac13d0deec888a3c6510 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 20 Jul 2022 20:40:23 +0200
Subject: [PATCH] Fortran: fix parsing of omp task affinity iterator clause
 [PR101330]

gcc/fortran/ChangeLog:

	PR fortran/101330
	* openmp.cc (gfc_match_iterator): Remove left-over code from
	development that could lead to a crash on invalid input.

gcc/testsuite/ChangeLog:

	PR fortran/101330
	* gfortran.dg/gomp/affinity-clause-7.f90: New test.
---
 gcc/fortran/openmp.cc |  1 -
 .../gfortran.dg/gomp/affinity-clause-7.f90| 19 +++
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/affinity-clause-7.f90

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index bd4ff259fe0..df9cdf43eb7 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -1181,7 +1181,6 @@ gfc_match_iterator (gfc_namespace **ns, bool permit_var)
 	}
   if (':' == gfc_peek_ascii_char ())
 	{
-	  step = gfc_get_expr ();
 	  if (gfc_match (": %e ", &step) != MATCH_YES)
 	{
 	  gfc_free_expr (begin);
diff --git a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-7.f90 b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-7.f90
new file mode 100644
index 000..5b1ca85aba3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-7.f90
@@ -0,0 +1,19 @@
+! { dg-do compile }
+! PR fortran/101330 - ICE in free_expr0(): Bad expr type
+! Contributed by G.Steinmetz
+
+  implicit none
+  integer :: j, b(10)
+!$omp task affinity (iterator(j=1:2:1) : b(j))
+!$omp end task
+!$omp task affinity (iterator(j=1:2:) : b(j)) ! { dg-error "Invalid character" }
+!!$omp end task
+!$omp task affinity (iterator(j=1:2:  ! { dg-error "Invalid character" }
+!!$omp end task
+!$omp task affinity (iterator(j=1:2:) ! { dg-error "Invalid character" }
+!!$omp end task
+!$omp task affinity (iterator(j=1:2::)! { dg-error "Invalid character" }
+!!$omp end task
+!$omp task affinity (iterator(j=1:2:))! { dg-error "Invalid character" }
+!!$omp end task
+end
--
2.35.3



Re: Supporting RISC-V Vendor Extensions in the GNU Toolchain

2022-07-20 Thread Palmer Dabbelt

On Tue, 10 May 2022 17:01:26 PDT (-0700), Palmer Dabbelt wrote:

[Sorry for cross-posting to a bunch of lists, I figured it'd be best to
have all the discussions in one thread.]

We currently only support what is defined by official RISC-V
specifications in the various GNU toolchain projects.  There's certainly
some grey areas there, but in general that means not taking code that
relies on drafts or vendor defined extensions, even if that would result
in higher performance or more featured systems for users.

The original goal of these policies were to steer RISC-V implementers
towards a common set of specifications, but over the last year or so
it's become abundantly clear that this is causing more harm that good.
All extant RISC-V systems rely on behaviors defined outside the official
specifications, and while that's technically always been the case we've
gotten to the point where trying to ignore that fact is impacting real
users on real systems.  There's been consistent feedback from users that
we're not meeting their needs, which can clearly be seen in the many out
of tree patch sets in common use.

There's been a handful of discussions about this, but we've yet to have
a proper discussion on the mailing lists.  From the various discussions
I've had it seems that folks are broadly in favor of supporting vendor
extensions, but the devil's always in the details with this sort of
thing so I thought it'd be best to write something up so we can have a
concrete discussion.

The idea is to start taking code that depends on vendor-defined behavior
into the core GNU toolchain ports, as long as it meets the following
criteria:

* An ISA manual is available that can be redistributed/archived, defines
  the behaviors in question as one or more vendor-specific extensions,
  and is clearly versioned.  The RISC-V foundation is setting various
  guidelines around how vendor-defined extensions and instructions
  should be named, we strongly suggest that vendors follow those
  conventions whenever possible (this is all new, though, so exactly
  what's necessary from vendor specifications will likely evolve as we
  learn).
* There is a substantial user base that depends on the behavior in
  question, which probably means there is hardware in the wild that
  implements the extensions and users that require those extensions in
  order for that hardware to be useful for common applications.  This is
  always going to be a grey area, but it's essentially the same spot
  everyone else is in.
* There is a mechanism for testing the code in question without direct
  access to hardware, which in practice means a QEMU port (or whatever
  simulator is relevant in the space and that folks use for testing) or
  some community commitment to long-term availability of the hardware
  for testing (something like the GCC compile farm, for example).
* It is possible to produce binaries that are compatible with all
  upstream vendors' implementations.  That means we'll need mechanisms
  to allow extensions from multiple vendors to be linked together and
  then probed at runtime.  That's not to say that all binaries will be
  compatible, as users are always free to skip the compatibility code
  and there will be conflicting definitions of instruction encodings,
  but we can at least provide users with the option of compatibility.

These are pretty loosely written on purpose, both because this is all
new and because each project has its own set of contribution
requirements so it's going to be all but impossible to have a single
concrete set of rules that applies everywhere -- that's nothing specific
to the vendor extensions (or even RISC-V), it's just life.  Specifically
a major goal here is to balance the needs of users, both in the short
term (ie, getting new hardware to work) and the long term (ie, the long
term stability of their software).  We're not talking about taking code
that can't be tested, hasn't been reviewed, isn't going to be supported
long-term, or doesn't have a stable ABI; just dropping the specific
requirement that a specification must be furnished by the RISC-V
foundation in order to accept code.

Nothing is decided yet, so happy to hear any thought folks have.  This
is certainly a very different development methodology than what we've
done in the past and isn't something that should be entreated into
lightly, so any comments are welcome.


I'm going back to the start of the thread as this led to some heated 
discussion, both here and in private.  Clearly there's lots of opinions 
here and everyone wants something different, but the nature of 
compromise is that nobody gets exactly what they want and it looks like 
this is as good as we're going to get any time soon.  So I'm going to 
propose that we go with this.


This was all purposefully a bit vague so we'll have to go sort out exactly 
how to move forward as patches go by.  Hopefully we'll be able to have 
more constructive discussions on the specific patch sets, as

[committed] analyzer: update "tainted" state of RHS in comparisons [PR106373]

2022-07-20 Thread David Malcolm via Gcc-patches
Doing so fixes various false positives from
-Wanalyzer-tainted-array-index at -O1 and above (e.g. seen on the
Linux kernel)

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-1768-g5e830693dd3356.

gcc/analyzer/ChangeLog:
PR analyzer/106373
* sm-taint.cc (taint_state_machine::on_condition): Potentially
update the state of the RHS as well as the LHS.

gcc/testsuite/ChangeLog:
PR analyzer/106373
* gcc.dg/analyzer/torture/taint-read-index-3.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-taint.cc  | 18 +--
 .../analyzer/torture/taint-read-index-3.c | 52 +++
 2 files changed, 67 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-3.c

diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index 9cb78886c9f..0486c01aaca 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -830,13 +830,11 @@ taint_state_machine::on_condition (sm_context *sm_ctxt,
   const gimple *stmt,
   const svalue *lhs,
   enum tree_code op,
-  const svalue *rhs ATTRIBUTE_UNUSED) const
+  const svalue *rhs) const
 {
   if (stmt == NULL)
 return;
 
-  // TODO: this doesn't use the RHS; should we make it symmetric?
-
   // TODO
   switch (op)
 {
@@ -845,10 +843,17 @@ taint_state_machine::on_condition (sm_context *sm_ctxt,
 case GE_EXPR:
 case GT_EXPR:
   {
+   /* (LHS >= RHS) or (LHS > RHS)
+  LHS gains a lower bound
+  RHS gains an upper bound.  */
sm_ctxt->on_transition (node, stmt, lhs, m_tainted,
m_has_lb);
sm_ctxt->on_transition (node, stmt, lhs, m_has_ub,
m_stop);
+   sm_ctxt->on_transition (node, stmt, rhs, m_tainted,
+   m_has_ub);
+   sm_ctxt->on_transition (node, stmt, rhs, m_has_lb,
+   m_stop);
   }
   break;
 case LE_EXPR:
@@ -896,10 +901,17 @@ taint_state_machine::on_condition (sm_context *sm_ctxt,
  }
  }
 
+   /* (LHS <= RHS) or (LHS < RHS)
+  LHS gains an upper bound
+  RHS gains a lower bound.  */
sm_ctxt->on_transition (node, stmt, lhs, m_tainted,
m_has_ub);
sm_ctxt->on_transition (node, stmt, lhs, m_has_lb,
m_stop);
+   sm_ctxt->on_transition (node, stmt, rhs, m_tainted,
+   m_has_lb);
+   sm_ctxt->on_transition (node, stmt, rhs, m_has_ub,
+   m_stop);
   }
   break;
 default:
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-3.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-3.c
new file mode 100644
index 000..8eb6061a08b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-3.c
@@ -0,0 +1,52 @@
+// TODO: remove need for the taint option:
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } { "" } } */
+
+struct raw_ep {
+  /* ...snip... */
+  int state;
+  /* ...snip... */
+};
+
+struct raw_dev {
+  /* ...snip... */
+  struct raw_ep eps[30];
+  int eps_num;
+  /* ...snip... */
+};
+
+int   __attribute__((tainted_args))
+simplified_raw_ioctl_ep_disable(struct raw_dev *dev, unsigned long value)
+{
+  int ret = 0, i = value;
+
+  if (i < 0 || i >= dev->eps_num) {
+ret = -16;
+goto out_unlock;
+  }
+  if (dev->eps[i].state == 0) { /* { dg-bogus "attacker-controlled" } */
+ret = -22;
+goto out_unlock;
+  }
+
+out_unlock:
+  return ret;
+}
+
+int   __attribute__((tainted_args))
+test_2(struct raw_dev *dev, int i)
+{
+  int ret = 0;
+
+  if (i < 0 || i >= dev->eps_num) {
+ret = -16;
+goto out_unlock;
+  }
+  if (dev->eps[i].state == 0) { /* { dg-bogus "attacker-controlled" } */
+ret = -22;
+goto out_unlock;
+  }
+
+out_unlock:
+  return ret;
+}
-- 
2.26.3



Re: [PATCH, rs6000, v2] Cleanup some vstrir define_expand naming inconsistencies

2022-07-20 Thread Segher Boessenkool
On Tue, Jul 19, 2022 at 03:14:52PM -0500, will schmidt wrote:
>   This cleans up some of the naming around the vstrir and vstril
> instruction definitions, with some cosmetic changes for consistency.

Okay for trunk.  Thanks!


Segher


[committed 1/3] libstdc++: Fix minor bugs in std::common_iterator

2022-07-20 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

The noexcept-specifier for some std::common_iterator constructors was
incorrectly using an rvalue as the first argument of
std::is_nothrow_assignable_v. This gave the wrong answer for some types,
e.g. std::common_iterator, because an rvalue of scalar type
cannot be assigned to.

Also fix the friend declaration to use the same constraints as on the
definition of the class template. G++ fails to diagnose this error, due
to PR c++/96830.

Finally, the copy constructor was using std::move for its argument
in some cases, which should be removed.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (common_iterator): Fix incorrect
uses of is_nothrow_assignable_v. Fix inconsistent constraints on
friend declaration. Do not move argument in copy constructor.
* testsuite/24_iterators/common_iterator/1.cc: Check for
noexcept constructibnle/assignable.
---
 libstdc++-v3/include/bits/stl_iterator.h  | 11 +
 .../24_iterators/common_iterator/1.cc | 23 ++-
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 049cb02a4c4..fd0ae3aa771 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1838,7 +1838,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _S_noexcept1()
   {
if constexpr (is_trivially_default_constructible_v<_Tp>)
- return is_nothrow_assignable_v<_Tp, _Up>;
+ return is_nothrow_assignable_v<_Tp&, _Up>;
else
  return is_nothrow_constructible_v<_Tp, _Up>;
   }
@@ -1932,14 +1932,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   if (_M_index == 0)
{
  if constexpr (is_trivially_default_constructible_v<_It>)
-   _M_it = std::move(__x._M_it);
+   _M_it = __x._M_it;
  else
std::construct_at(std::__addressof(_M_it), __x._M_it);
}
   else if (_M_index == 1)
{
  if constexpr (is_trivially_default_constructible_v<_Sent>)
-   _M_sent = std::move(__x._M_sent);
+   _M_sent = __x._M_sent;
  else
std::construct_at(std::__addressof(_M_sent), __x._M_sent);
}
@@ -1964,8 +1964,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   operator=(const common_iterator<_It2, _Sent2>& __x)
   noexcept(is_nothrow_constructible_v<_It, const _It2&>
   && is_nothrow_constructible_v<_Sent, const _Sent2&>
-  && is_nothrow_assignable_v<_It, const _It2&>
-  && is_nothrow_assignable_v<_Sent, const _Sent2&>)
+  && is_nothrow_assignable_v<_It&, const _It2&>
+  && is_nothrow_assignable_v<_Sent&, const _Sent2&>)
   {
switch(_M_index << 2 | __x._M_index)
  {
@@ -2164,6 +2164,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   private:
 template _Sent2>
+  requires (!same_as<_It2, _Sent2>) && copyable<_It2>
   friend class common_iterator;
 
 constexpr bool _M_has_value() const noexcept { return _M_index < 2; }
diff --git a/libstdc++-v3/testsuite/24_iterators/common_iterator/1.cc 
b/libstdc++-v3/testsuite/24_iterators/common_iterator/1.cc
index 365ee89c02e..ec4a86c862a 100644
--- a/libstdc++-v3/testsuite/24_iterators/common_iterator/1.cc
+++ b/libstdc++-v3/testsuite/24_iterators/common_iterator/1.cc
@@ -27,15 +27,30 @@ test01()
   using I = std::common_iterator;
   static_assert( std::is_default_constructible_v );
   static_assert( std::is_copy_constructible_v );
+  static_assert( std::is_move_constructible_v );
   static_assert( std::is_copy_assignable_v );
+  static_assert( std::is_move_assignable_v );
   static_assert( std::is_constructible_v );
   static_assert( std::is_constructible_v );
 
-  struct sentinel { operator int*() const { return nullptr; } };
+  static_assert( std::is_nothrow_copy_constructible_v ); // GCC extension
+  static_assert( std::is_nothrow_move_constructible_v ); // GCC extension
+  static_assert( std::is_nothrow_copy_assignable_v ); // GCC extension
+  static_assert( std::is_nothrow_move_assignable_v ); // GCC extension
+
+  struct sentinel { operator int*() const noexcept { return nullptr; } };
   using K = std::common_iterator;
   static_assert( std::is_constructible_v );
   static_assert( std::is_assignable_v );
 
+  static_assert( std::is_nothrow_assignable_v ); // GCC extension
+
+  struct sentinel_throwing { operator int*() const { return nullptr; } };
+  using K_throwing = std::common_iterator;
+  // Conversion is noexcept(false)
+  static_assert( ! std::is_nothrow_assignable_v );
+
+
   struct sentinel2
   {
 const int* p;
@@ -46,6 +61,12 @@ test01()
   using J = std::common_iterator;
   static_assert( std::is_constructible_v );
   static_assert( std::is_convertible_v );
+
+  static_assert( std::is_constructible_v );
+  static_assert( std::is_convertible_v );
+
+  // Constructor is noexcept(false

[committed 3/3] libstdc++: Fix std::common_iterator triviality [PR100823]

2022-07-20 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

This fixes the remaining problem reported in the PR, that the special
members should be trivial.  This can be done by constraining the
non-trivial versions and adding defaulted overloads that will be used
when the union members are trivial.

Making these members trivial alters the argument passing ABI and so
isn't suitable for backporting to release branches.

libstdc++-v3/ChangeLog:

PR libstdc++/100823
* include/bits/stl_iterator.h (common_iterator): Define
destructor, copy constructor and move constructor as trivial
when the underlying types allow.
* testsuite/24_iterators/common_iterator/100823.cc: Check
triviality of special members.
---
 libstdc++-v3/include/bits/stl_iterator.h  | 15 +++
 .../24_iterators/common_iterator/100823.cc| 15 +++
 2 files changed, 30 insertions(+)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index a913c04deaa..9cd262cd1d9 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1925,9 +1925,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }
   }
 
+common_iterator(const common_iterator&) = default;
+
 constexpr
 common_iterator(const common_iterator& __x)
 noexcept(_S_noexcept())
+requires (!is_trivially_copyable_v<_It> || !is_trivially_copyable_v<_Sent>)
 : _M_valueless(), _M_index(__x._M_index)
 {
   if (_M_index == 0)
@@ -1946,9 +1949,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 }
 
+common_iterator(common_iterator&&) = default;
+
 constexpr
 common_iterator(common_iterator&& __x)
 noexcept(_S_noexcept<_It, _Sent>())
+requires (!is_trivially_copyable_v<_It> || !is_trivially_copyable_v<_Sent>)
 : _M_valueless(), _M_index(__x._M_index)
 {
   if (_M_index == 0)
@@ -2017,8 +2023,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *this;
   }
 
+#if __cpp_concepts >= 202002L // Constrained special member functions
+~common_iterator() = default;
+
 constexpr
 ~common_iterator()
+  requires (!is_trivially_destructible_v<_It>
+ || !is_trivially_destructible_v<_Sent>)
+#else
+constexpr
+~common_iterator()
+#endif
 {
   if (_M_index == 0)
_M_it.~_It();
diff --git a/libstdc++-v3/testsuite/24_iterators/common_iterator/100823.cc 
b/libstdc++-v3/testsuite/24_iterators/common_iterator/100823.cc
index 4f2b23de8cc..b42dd087ab2 100644
--- a/libstdc++-v3/testsuite/24_iterators/common_iterator/100823.cc
+++ b/libstdc++-v3/testsuite/24_iterators/common_iterator/100823.cc
@@ -4,6 +4,21 @@
 #include 
 #include 
 
+void
+test_triviality()
+{
+  using I = std::common_iterator;
+
+  // Cannot be trivial, because it has to initialize members.
+  static_assert( ! std::is_trivially_default_constructible_v );
+
+  static_assert( std::is_trivially_destructible_v );
+  static_assert( std::is_trivially_copy_constructible_v );
+  static_assert( std::is_trivially_copy_assignable_v );
+  static_assert( std::is_trivially_move_constructible_v );
+  static_assert( std::is_trivially_move_assignable_v );
+}
+
 void
 test_valueless_assignment()
 {
-- 
2.34.3



[committed 2/3] libstdc++: Fix std::common_iterator assignment [PR100823]

2022-07-20 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

This fixes the following conformance problems reported in the PR:

- Move constructor and move assignment should be defined.
- Copy assignment from a valueless object should be allowed.

Assignment is completely rewritten by this patch, as the previous
version had a number of problems. The converting assignment failed to
handle the case of assigning a new value to a valueless object, which
should work. It only accepted lvalue arguments, so wasn't usable to
implement the move assignment operator. Finally, it enforced the
precondition that the argument is not valueless, which is correct for
the converting assignment but not for the copy assignment.

A new _M_assign member is added to handle all cases of assignment
(copying from an lvalue, moving from an rvalue, and converting from a
different type). The not valueless precondition is checked in the
converting assignment before calling _M_assign, so isn't enforced for
copy and move assignment. The new function no longer uses a switch, so
handles valueless objects as the LHS or RHS of the assignment.

libstdc++-v3/ChangeLog:

PR libstdc++/100823
* include/bits/stl_iterator.h (common_iterator): Define move
constructor and move assignment operator.
(common_iterator::_M_assign): New function implementing
assignment.
(common_iterator::operator=): Use _M_assign.
(common_iterator::_S_valueless): New constant.
* testsuite/24_iterators/common_iterator/100823.cc: New test.
---
 libstdc++-v3/include/bits/stl_iterator.h  | 126 --
 .../24_iterators/common_iterator/100823.cc|  43 ++
 2 files changed, 129 insertions(+), 40 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/common_iterator/100823.cc

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index fd0ae3aa771..a913c04deaa 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1908,6 +1908,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   noexcept(_S_noexcept())
   : _M_valueless(), _M_index(__x._M_index)
   {
+   __glibcxx_assert(__x._M_has_value());
if (_M_index == 0)
  {
if constexpr (is_trivially_default_constructible_v<_It>)
@@ -1945,14 +1946,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 }
 
+constexpr
+common_iterator(common_iterator&& __x)
+noexcept(_S_noexcept<_It, _Sent>())
+: _M_valueless(), _M_index(__x._M_index)
+{
+  if (_M_index == 0)
+   {
+ if constexpr (is_trivially_default_constructible_v<_It>)
+   _M_it = std::move(__x._M_it);
+ else
+   std::construct_at(std::__addressof(_M_it), std::move(__x._M_it));
+   }
+  else if (_M_index == 1)
+   {
+ if constexpr (is_trivially_default_constructible_v<_Sent>)
+   _M_sent = std::move(__x._M_sent);
+ else
+   std::construct_at(std::__addressof(_M_sent),
+ std::move(__x._M_sent));
+   }
+}
+
+constexpr common_iterator&
+operator=(const common_iterator&) = default;
+
 constexpr common_iterator&
 operator=(const common_iterator& __x)
 noexcept(is_nothrow_copy_assignable_v<_It>
 && is_nothrow_copy_assignable_v<_Sent>
 && is_nothrow_copy_constructible_v<_It>
 && is_nothrow_copy_constructible_v<_Sent>)
+requires (!is_trivially_copy_assignable_v<_It>
+   || !is_trivially_copy_assignable_v<_Sent>)
 {
-  return this->operator=<_It, _Sent>(__x);
+  _M_assign(__x);
+  return *this;
+}
+
+constexpr common_iterator&
+operator=(common_iterator&&) = default;
+
+constexpr common_iterator&
+operator=(common_iterator&& __x)
+noexcept(is_nothrow_move_assignable_v<_It>
+&& is_nothrow_move_assignable_v<_Sent>
+&& is_nothrow_move_constructible_v<_It>
+&& is_nothrow_move_constructible_v<_Sent>)
+requires (!is_trivially_move_assignable_v<_It>
+   || !is_trivially_move_assignable_v<_Sent>)
+{
+  _M_assign(std::move(__x));
+  return *this;
 }
 
 template
@@ -1967,49 +2012,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   && is_nothrow_assignable_v<_It&, const _It2&>
   && is_nothrow_assignable_v<_Sent&, const _Sent2&>)
   {
-   switch(_M_index << 2 | __x._M_index)
- {
- case 0b:
-   _M_it = __x._M_it;
-   break;
- case 0b0101:
-   _M_sent = __x._M_sent;
-   break;
- case 0b0001:
-   _M_it.~_It();
-   _M_index = -1;
-   [[fallthrough]];
- case 0b1001:
-   std::construct_at(std::__addressof(_M_sent), _Sent(__x._M_sent));
-   _M_index = 1;
-   break;
- case 0b0100:
-   _M_sent.~_Sent();
-   _M_index = -

Ping: 2 libcpp patches

2022-07-20 Thread Lewis Hyatt via Gcc-patches
Hello-

May I please ping these two preprocessor patches?

For PR103902:
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html

For PR55971:
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596820.html

Thanks!

-Lewis


Mail delivery failed: returning message to sender

2022-07-20 Thread Mail Delivery System via Gcc-patches
This message was created automatically by mail delivery software.

A message that you sent could not be delivered to one or more of its
recipients. This is a permanent error. The following address(es) failed:

  gcc-patches@gcc.gnu.org
host gcc.gnu.org [8.43.85.97]
SMTP error from remote mail server after end of data:
550 5.7.1 Blocked by SpamAssassin
Reporting-MTA: dns; casnvesweb.ga

Action: failed
Final-Recipient: rfc822;gcc-patches@gcc.gnu.org
Status: 5.0.0
Remote-MTA: dns; gcc.gnu.org
Diagnostic-Code: smtp; 550 5.7.1 Blocked by SpamAssassin


[committed] analyzer: fix ICE on untracked decl_regions [PR106374]

2022-07-20 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-1773-ga6c192e80a87ef.

gcc/analyzer/ChangeLog:
PR analyzer/106374
* region.cc (decl_region::get_svalue_for_initializer): Bail out on
untracked regions.

gcc/testsuite/ChangeLog:
PR analyzer/106374
* gcc.dg/analyzer/untracked-2.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region.cc  | 5 +
 gcc/testsuite/gcc.dg/analyzer/untracked-2.c | 7 +++
 2 files changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/untracked-2.c

diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index a8d1ae92deb..b78bf4ec1b7 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -1152,6 +1152,11 @@ decl_region::get_svalue_for_initializer 
(region_model_manager *mgr) const
   if (binding->symbolic_p ())
return NULL;
 
+  /* If we don't care about tracking the content of this region, then
+it's unused, and the value doesn't matter.  */
+  if (!tracked_p ())
+   return NULL;
+
   binding_cluster c (this);
   c.zero_fill_region (mgr->get_store_manager (), this);
   return mgr->get_or_create_compound_svalue (TREE_TYPE (m_decl),
diff --git a/gcc/testsuite/gcc.dg/analyzer/untracked-2.c 
b/gcc/testsuite/gcc.dg/analyzer/untracked-2.c
new file mode 100644
index 000..565a9ccd58e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/untracked-2.c
@@ -0,0 +1,7 @@
+typedef unsigned char u8;
+extern int foo(const u8 *key, unsigned int keylen);
+int test (void)
+{
+  static const u8 default_salt[64];
+  return foo(default_salt, 64);
+}
-- 
2.26.3



[committed] analyzer: bulletproof taint warnings against NULL m_arg

2022-07-20 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-1774-g742377ed0f0931.

gcc/analyzer/ChangeLog:
* sm-taint.cc (tainted_array_index::emit): Bulletproof against
NULL m_arg.
(tainted_array_index::describe_final_event): Likewise.
(tainted_size::emit): Likewise.
(tainted_size::describe_final_event): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-taint.cc | 247 ++-
 1 file changed, 164 insertions(+), 83 deletions(-)

diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index 0486c01aaca..51bfe06835d 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -212,53 +212,96 @@ public:
 diagnostic_metadata m;
 /* CWE-129: "Improper Validation of Array Index".  */
 m.add_cwe (129);
-switch (m_has_bounds)
-  {
-  default:
-   gcc_unreachable ();
-  case BOUNDS_NONE:
-   return warning_meta (rich_loc, m, get_controlling_option (),
-"use of attacker-controlled value %qE"
-" in array lookup without bounds checking",
-m_arg);
-   break;
-  case BOUNDS_UPPER:
-   return warning_meta (rich_loc, m, get_controlling_option (),
-"use of attacker-controlled value %qE"
-" in array lookup without checking for negative",
-m_arg);
-   break;
-  case BOUNDS_LOWER:
-   return warning_meta (rich_loc, m, get_controlling_option (),
-"use of attacker-controlled value %qE"
-" in array lookup without upper-bounds checking",
-m_arg);
-   break;
-  }
+if (m_arg)
+  switch (m_has_bounds)
+   {
+   default:
+ gcc_unreachable ();
+   case BOUNDS_NONE:
+ return warning_meta (rich_loc, m, get_controlling_option (),
+  "use of attacker-controlled value %qE"
+  " in array lookup without bounds checking",
+  m_arg);
+ break;
+   case BOUNDS_UPPER:
+ return warning_meta (rich_loc, m, get_controlling_option (),
+  "use of attacker-controlled value %qE"
+  " in array lookup without checking for negative",
+  m_arg);
+ break;
+   case BOUNDS_LOWER:
+ return warning_meta (rich_loc, m, get_controlling_option (),
+  "use of attacker-controlled value %qE"
+  " in array lookup without upper-bounds checking",
+  m_arg);
+ break;
+   }
+else
+  switch (m_has_bounds)
+   {
+   default:
+ gcc_unreachable ();
+   case BOUNDS_NONE:
+ return warning_meta (rich_loc, m, get_controlling_option (),
+  "use of attacker-controlled value"
+  " in array lookup without bounds checking");
+ break;
+   case BOUNDS_UPPER:
+ return warning_meta (rich_loc, m, get_controlling_option (),
+  "use of attacker-controlled value"
+  " in array lookup without checking for"
+  " negative");
+ break;
+   case BOUNDS_LOWER:
+ return warning_meta (rich_loc, m, get_controlling_option (),
+  "use of attacker-controlled value"
+  " in array lookup without upper-bounds"
+  " checking");
+ break;
+   }
   }
 
   label_text describe_final_event (const evdesc::final_event &ev) final 
override
   {
-switch (m_has_bounds)
-  {
-  default:
-   gcc_unreachable ();
-  case BOUNDS_NONE:
-   return ev.formatted_print
- ("use of attacker-controlled value %qE in array lookup"
-  " without bounds checking",
-  m_arg);
-  case BOUNDS_UPPER:
-   return ev.formatted_print
- ("use of attacker-controlled value %qE"
-  " in array lookup without checking for negative",
-  m_arg);
-  case BOUNDS_LOWER:
-   return ev.formatted_print
- ("use of attacker-controlled value %qE"
-  " in array lookup without upper-bounds checking",
-  m_arg);
-  }
+if (m_arg)
+  switch (m_has_bounds)
+   {
+   default:
+ gcc_unreachable ();
+   case BOUNDS_NONE:
+ return ev.formatted_print
+   ("use of attacker-controlled value %qE in array lookup"
+" without bounds checking",
+m_arg);
+   case BOUNDS_UPPER:
+ return ev.formatted_print
+   ("use of attacker-controlled value %qE"
+" in array lookup without checking for nega

Re: [RFC] RISC-V: Add support for RV64E/lp64e

2022-07-20 Thread jiawei






> gcc/ChangeLog

> 
> * config.gcc (riscv): Accept rv64e and lp64e.
> * config/riscv/arch-canonicalize: Likewise.
> * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Likewise.
> * config/riscv/riscv-opts.h (riscv_abi_type): Likewise.
> * config/riscv/riscv.cc (riscv_option_override): Likewise
> * config/riscv/riscv.h (UNITS_PER_FP_ARG): Likewise.
> (STACK_BOUNDARY): Likewise.
> (ABI_STACK_BOUNDARY): Likewise.
> (MAX_ARGS_IN_REGISTERS): Likewise.
> (ABI_SPEC): Likewise.
> * config/riscv/riscv.opt (abi_type): Likewise.
> * doc/invoke.texi (RISC-V) <-mabi>: Likewise.
> ---
> This is all still in flight, but evidently RV64E exists.  I haven't
> tested this at all, but given that we don't even have the ABI docs lined
> up yet it's likely a bit away from being mergable.
> ---
>  gcc/config.gcc |  8 +---
>  gcc/config/riscv/arch-canonicalize |  2 +-
>  gcc/config/riscv/riscv-c.cc|  1 +
>  gcc/config/riscv/riscv-opts.h  |  1 +
>  gcc/config/riscv/riscv.cc  |  6 --
>  gcc/config/riscv/riscv.h   | 11 +++
>  gcc/config/riscv/riscv.opt |  3 +++
>  gcc/doc/invoke.texi|  5 +++--
>  8 files changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 4e3b15bb5e9..4617ecb8d9b 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -4637,7 +4637,7 @@ case "${target}" in
>  
>  # Infer arch from --with-arch, --target, and --with-abi.
>  case "${with_arch}" in
> -rv32e* | rv32i* | rv32g* | rv64i* | rv64g*)
> +rv32e* | rv32i* | rv32g* | rv64e* | rv64i* | rv64g*)
>  # OK.
>  ;;
>  "")
> @@ -4645,12 +4645,13 @@ case "${target}" in
>  case "${with_abi}" in
>  ilp32e) with_arch="rv32e" ;;
>  ilp32 | ilp32f | ilp32d) with_arch="rv32gc" ;;
> +lp64e) with_arch="rv64e" ;;
>  lp64 | lp64f | lp64d) with_arch="rv64gc" ;;
>  *) with_arch="rv${xlen}gc" ;;
>  esac
>  ;;
>  *)
> -echo "--with-arch=${with_arch} is not supported.  The argument must begin 
with rv32e, rv32i, rv32g, rv64i, or rv64g." 1>&2
> +echo "--with-arch=${with_arch} is not supported.  The argument must begin 
with rv32e, rv32i, rv32g, rv64e, rv64i, or rv64g." 1>&2
>  exit 1
>  ;;
>  esac
> @@ -4672,6 +4673,7 @@ case "${target}" in
>  rv32e*) with_abi=ilp32e ;;
>  rv32*) with_abi=ilp32 ;;
>  rv64*d* | rv64g*) with_abi=lp64d ;;
> +rv64e*) with_abi=lp64e ;;
>  rv64*) with_abi=lp64 ;;
>  esac
>  ;;
> @@ -4687,7 +4689,7 @@ case "${target}" in
>  ilp32,rv32* | ilp32e,rv32e* \
>  | ilp32f,rv32*f* | ilp32f,rv32g* \
>  | ilp32d,rv32*d* | ilp32d,rv32g* \
> -| lp64,rv64* \
> +| lp64,rv64* | lp64e,rv64e* \
>  | lp64f,rv64*f* | lp64f,rv64g* \
>  | lp64d,rv64*d* | lp64d,rv64g*)
>  ;;
> diff --git a/gcc/config/riscv/arch-canonicalize 
b/gcc/config/riscv/arch-canonicalize
> index fd7651ac491..8db3e88ddd7 100755
> --- a/gcc/config/riscv/arch-canonicalize
> +++ b/gcc/config/riscv/arch-canonicalize
> @@ -71,7 +71,7 @@ def arch_canonicalize(arch, isa_spec):
>new_arch = ""
>extra_long_ext = []
>std_exts = []
> -  if arch[:5] in ['rv32e', 'rv32i', 'rv32g', 'rv64i', 'rv64g']:
> +  if arch[:5] in ['rv32e', 'rv32i', 'rv32g', 'rv64e', 'rv64i', 'rv64g']:
>  new_arch = arch[:5].replace("g", "i")
>  if arch[:5] in ['rv32g', 'rv64g']:
>std_exts = ['m', 'a', 'f', 'd']
> diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
> index eb7ef09297e..4614dc6b6d9 100644
> --- a/gcc/config/riscv/riscv-c.cc
> +++ b/gcc/config/riscv/riscv-c.cc
> @@ -67,6 +67,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
>switch (riscv_abi)
>  {
>  case ABI_ILP32E:
> +case ABI_LP64E:
>builtin_define ("__riscv_abi_rve");
>gcc_fallthrough ();
>  
> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> index 1e153b3a6e7..70fe708cbae 100644
> --- a/gcc/config/riscv/riscv-opts.h
> +++ b/gcc/config/riscv/riscv-opts.h
> @@ -27,6 +27,7 @@ enum riscv_abi_type {
>ABI_ILP32F,
>ABI_ILP32D,
>ABI_LP64,
> +  ABI_LP64E,
>ABI_LP64F,
>ABI_LP64D
>  };
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 2e83ca07394..51b7195c17b 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5047,8 +5047,10 @@ riscv_option_override (void)
>  error ("requested ABI requires %<-march%> to subsume the %qc 
extension",
> UNITS_PER_FP_ARG > 8 ? 'Q' : (UNITS_PER_FP_ARG > 4 ? 'D' : 'F'));
>  
> -  if (TARGET_RVE && riscv_abi != ABI_ILP32E)
> +  if (riscv_xlen == 32 && TARGET_RVE && riscv_abi != ABI_ILP32E)
>  error ("rv32e requires ilp32e ABI");
> +  if (riscv_xlen == 64 && TARGET_RVE && riscv_abi != ABI_LP64E)
> +error ("rv64e requires lp64e ABI");
>  


Hi Palmer, I just run this patch and report unresolve the symbol "riscv_xlen" 
here,


maybe we can use "!TARGET_64BIT" and "TARGET_64BIT" to instead of them, thanks.



>/* We do not yet support ILP32 on RV64.  */
>if (BITS_PER_WORD != POINTER_SIZE)
> @@ -5140,7 +5142,7 @@ riscv_conditio

[PATCH V3] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.

2022-07-20 Thread liuhongt via Gcc-patches
And split it after reload.

gcc/ChangeLog:

PR target/106038
* config/i386/mmx.md (3): New define_expand, it's
original "3".
(*3): New define_insn, it's original
"3" be extended to handle memory and immediate
operand with ix86_binary_operator_ok. Also adjust define_split
after it.
(mmxinsnmode): New mode attribute.
(*mov_imm): Refactor with mmxinsnmode.
* config/i386/predicates.md
(register_or_x86_64_const_vector_operand): New predicate.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr106038-1.c: New test.
---
 gcc/config/i386/mmx.md | 70 --
 gcc/config/i386/predicates.md  |  4 ++
 gcc/testsuite/gcc.target/i386/pr106038-1.c | 27 +
 3 files changed, 70 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr106038-1.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 3294c1e6274..dda4b43f5c1 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -86,6 +86,14 @@ (define_mode_attr mmxvecsize
   [(V8QI "b") (V4QI "b") (V2QI "b")
(V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")])
 
+;; Mapping to same size integral mode.
+(define_mode_attr mmxinsnmode
+  [(V8QI "DI") (V4QI "SI") (V2QI "HI")
+   (V4HI "DI") (V2HI "SI")
+   (V2SI "DI")
+   (V4HF "DI") (V2HF "SI")
+   (V2SF "DI")])
+
 (define_mode_attr mmxdoublemode
   [(V8QI "V8HI") (V4HI "V4SI")])
 
@@ -350,22 +358,7 @@ (define_insn_and_split "*mov_imm"
   HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1],
mode);
   operands[1] = GEN_INT (val);
-  machine_mode mode;
-  switch (GET_MODE_SIZE (mode))
-{
-case 2:
-  mode = HImode;
-  break;
-case 4:
-  mode = SImode;
-  break;
-case 8:
-  mode = DImode;
-  break;
-default:
-  gcc_unreachable ();
-}
-  operands[0] = lowpart_subreg (mode, operands[0], mode);
+  operands[0] = lowpart_subreg (mode, operands[0], mode);
 })
 
 ;; For TARGET_64BIT we always round up to 8 bytes.
@@ -2974,33 +2967,48 @@ (define_insn "*mmx_3"
(set_attr "type" "mmxadd,sselog,sselog,sselog")
(set_attr "mode" "DI,TI,TI,TI")])
 
-(define_insn "3"
-  [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v")
+(define_expand "3"
+  [(set (match_operand:VI_16_32 0 "nonimmediate_operand")
 (any_logic:VI_16_32
- (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v")
- (match_operand:VI_16_32 2 "register_operand" "r,x,x,v")))
-   (clobber (reg:CC FLAGS_REG))]
+ (match_operand:VI_16_32 1 "nonimmediate_operand")
+ (match_operand:VI_16_32 2 
"nonimmediate_or_x86_64_const_vector_operand")))]
   ""
+  "ix86_expand_binary_operator (, mode, operands); DONE;")
+
+(define_insn "*3"
+  [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=?r,m,x,x,v")
+(any_logic:VI_16_32
+ (match_operand:VI_16_32 1 "nonimmediate_operand" "%0,0,0,x,v")
+ (match_operand:VI_16_32 2 
"nonimmediate_or_x86_64_const_vector_operand" "r,i,x,x,v")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (, mode, operands)"
   "#"
-  [(set_attr "isa" "*,sse2_noavx,avx,avx512vl")
-   (set_attr "type" "alu,sselog,sselog,sselog")
-   (set_attr "mode" "SI,TI,TI,TI")])
+  [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl")
+   (set_attr "type" "alu,alu,sselog,sselog,sselog")
+   (set_attr "mode" "SI,SI,TI,TI,TI")])
 
 (define_split
-  [(set (match_operand:VI_16_32 0 "general_reg_operand")
+  [(set (match_operand:VI_16_32 0 "nonimmediate_gr_operand")
 (any_logic:VI_16_32
- (match_operand:VI_16_32 1 "general_reg_operand")
- (match_operand:VI_16_32 2 "general_reg_operand")))
+ (match_operand:VI_16_32 1 "nonimmediate_gr_operand")
+ (match_operand:VI_16_32 2 "reg_or_const_vector_operand")))
(clobber (reg:CC FLAGS_REG))]
   "reload_completed"
   [(parallel
  [(set (match_dup 0)
-  (any_logic:SI (match_dup 1) (match_dup 2)))
+  (any_logic: (match_dup 1) (match_dup 2)))
   (clobber (reg:CC FLAGS_REG))])]
 {
-  operands[2] = lowpart_subreg (SImode, operands[2], mode);
-  operands[1] = lowpart_subreg (SImode, operands[1], mode);
-  operands[0] = lowpart_subreg (SImode, operands[0], mode);
+  if (!register_operand (operands[2], mode))
+{
+  HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[2],
+   mode);
+  operands[2] = GEN_INT (val);
+}
+  else
+operands[2] = lowpart_subreg (mode, operands[2], mode);
+  operands[1] = lowpart_subreg (mode, operands[1], mode);
+  operands[0] = lowpart_subreg (mode, operands[0], mode);
 })
 
 (define_split
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index c71c453cceb..73dfd46bf90 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md

[PATCH] RTEMS: Add -ftls-model=local-exec to multilibs

2022-07-20 Thread Sebastian Huber
Use the local-exec TLS model for all multilibs of all RTEMS targets with proper
TLS support.

gcc/ChangeLog:

* config/arm/t-rtems (MULTILIB_EXTRA_OPTS): Define to use
-ftls-model=local-exec.
* config/i386/t-rtems (MULTILIB_EXTRA_OPTS): Likewise.
* config/m68k/t-rtems (MULTILIB_EXTRA_OPTS): Likewise.
* config/microblaze/t-rtems (MULTILIB_EXTRA_OPTS): Likewise.
* config/nios2/t-rtems (MULTILIB_EXTRA_OPTS): Likewise.
* config/riscv/t-rtems (MULTILIB_EXTRA_OPTS): Likewise.
* config/rs6000/t-rtems (MULTILIB_EXTRA_OPTS): Likewise.
* config/sparc/t-rtems (MULTILIB_EXTRA_OPTS): Likewise.
---
 gcc/config/arm/t-rtems| 1 +
 gcc/config/i386/t-rtems   | 1 +
 gcc/config/m68k/t-rtems   | 1 +
 gcc/config/microblaze/t-rtems | 1 +
 gcc/config/nios2/t-rtems  | 1 +
 gcc/config/riscv/t-rtems  | 2 ++
 gcc/config/rs6000/t-rtems | 1 +
 gcc/config/sparc/t-rtems  | 2 ++
 8 files changed, 10 insertions(+)

diff --git a/gcc/config/arm/t-rtems b/gcc/config/arm/t-rtems
index b2fcf572bca..aaf11355b11 100644
--- a/gcc/config/arm/t-rtems
+++ b/gcc/config/arm/t-rtems
@@ -8,6 +8,7 @@ MULTILIB_EXCEPTIONS =
 MULTILIB_REUSE =
 MULTILIB_MATCHES   =
 MULTILIB_REQUIRED  =
+MULTILIB_EXTRA_OPTS= ftls-model=local-exec
 
 # Enumeration of multilibs
 
diff --git a/gcc/config/i386/t-rtems b/gcc/config/i386/t-rtems
index 692c99484b3..83b95a6e53d 100644
--- a/gcc/config/i386/t-rtems
+++ b/gcc/config/i386/t-rtems
@@ -24,3 +24,4 @@ MULTILIB_MATCHES += march?pentium=march?k6 
march?pentiumpro=march?athlon
 MULTILIB_EXCEPTIONS = \
 march=pentium/*msoft-float* \
 march=pentiumpro/*msoft-float*
+MULTILIB_EXTRA_OPTS = ftls-model=local-exec
diff --git a/gcc/config/m68k/t-rtems b/gcc/config/m68k/t-rtems
index 0997afebc94..53a585e3018 100644
--- a/gcc/config/m68k/t-rtems
+++ b/gcc/config/m68k/t-rtems
@@ -7,3 +7,4 @@ M68K_MLIB_CPU += && (match(MLIB, "^68") \
 || MLIB == "5329" \
 || MLIB == "5407" \
 || MLIB == "5475")
+MULTILIB_EXTRA_OPTS = ftls-model=local-exec
diff --git a/gcc/config/microblaze/t-rtems b/gcc/config/microblaze/t-rtems
index d0c38261aaa..c9c9716ab62 100644
--- a/gcc/config/microblaze/t-rtems
+++ b/gcc/config/microblaze/t-rtems
@@ -1 +1,2 @@
 # Custom multilibs for RTEMS
+MULTILIB_EXTRA_OPTS = ftls-model=local-exec
diff --git a/gcc/config/nios2/t-rtems b/gcc/config/nios2/t-rtems
index beda8328bd2..3c9fbc69c83 100644
--- a/gcc/config/nios2/t-rtems
+++ b/gcc/config/nios2/t-rtems
@@ -8,6 +8,7 @@ MULTILIB_EXCEPTIONS =
 MULTILIB_REUSE =
 MULTILIB_MATCHES   =
 MULTILIB_REQUIRED  =
+MULTILIB_EXTRA_OPTS= ftls-model=local-exec
 
 # Enumeration of multilibs
 
diff --git a/gcc/config/riscv/t-rtems b/gcc/config/riscv/t-rtems
index 41f5927fc87..bb49e559ec5 100644
--- a/gcc/config/riscv/t-rtems
+++ b/gcc/config/riscv/t-rtems
@@ -1,3 +1,5 @@
+MULTILIB_EXTRA_OPTS= ftls-model=local-exec
+
 MULTILIB_OPTIONS   =
 MULTILIB_DIRNAMES  =
 
diff --git a/gcc/config/rs6000/t-rtems b/gcc/config/rs6000/t-rtems
index 4f8c147be3e..ba7177bf0f5 100644
--- a/gcc/config/rs6000/t-rtems
+++ b/gcc/config/rs6000/t-rtems
@@ -23,6 +23,7 @@ MULTILIB_DIRNAMES =
 MULTILIB_MATCHES =
 MULTILIB_EXCEPTIONS =
 MULTILIB_REQUIRED =
+MULTILIB_EXTRA_OPTS = ftls-model=local-exec
 
 MULTILIB_OPTIONS += 
mcpu=403/mcpu=505/mcpu=603e/mcpu=604/mcpu=860/mcpu=7400/mcpu=8540/mcpu=e6500
 MULTILIB_DIRNAMES += m403 m505 m603e m604 m860 m7400 m8540 me6500
diff --git a/gcc/config/sparc/t-rtems b/gcc/config/sparc/t-rtems
index c58836c1e96..1917eda322e 100644
--- a/gcc/config/sparc/t-rtems
+++ b/gcc/config/sparc/t-rtems
@@ -17,6 +17,8 @@
 # .
 #
 
+MULTILIB_EXTRA_OPTS = ftls-model=local-exec
+
 MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3/mcpu=leon3v7/mcpu=leon \
   mfix-ut699/mfix-at697f/mfix-gr712rc
 MULTILIB_DIRNAMES = soft v8 leon3 leon3v7 leon ut699 at697f gr712rc
-- 
2.35.3



Re: [PATCH V3] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.

2022-07-20 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 21, 2022 at 7:19 AM liuhongt  wrote:
>
> And split it after reload.
>
> gcc/ChangeLog:
>
> PR target/106038
> * config/i386/mmx.md (3): New define_expand, it's
> original "3".
> (*3): New define_insn, it's original
> "3" be extended to handle memory and immediate
> operand with ix86_binary_operator_ok. Also adjust define_split
> after it.
> (mmxinsnmode): New mode attribute.
> (*mov_imm): Refactor with mmxinsnmode.
> * config/i386/predicates.md
> (register_or_x86_64_const_vector_operand): New predicate.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr106038-1.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/mmx.md | 70 --
>  gcc/config/i386/predicates.md  |  4 ++
>  gcc/testsuite/gcc.target/i386/pr106038-1.c | 27 +
>  3 files changed, 70 insertions(+), 31 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106038-1.c
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 3294c1e6274..dda4b43f5c1 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -86,6 +86,14 @@ (define_mode_attr mmxvecsize
>[(V8QI "b") (V4QI "b") (V2QI "b")
> (V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")])
>
> +;; Mapping to same size integral mode.
> +(define_mode_attr mmxinsnmode
> +  [(V8QI "DI") (V4QI "SI") (V2QI "HI")
> +   (V4HI "DI") (V2HI "SI")
> +   (V2SI "DI")
> +   (V4HF "DI") (V2HF "SI")
> +   (V2SF "DI")])
> +
>  (define_mode_attr mmxdoublemode
>[(V8QI "V8HI") (V4HI "V4SI")])
>
> @@ -350,22 +358,7 @@ (define_insn_and_split "*mov_imm"
>HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1],
> mode);
>operands[1] = GEN_INT (val);
> -  machine_mode mode;
> -  switch (GET_MODE_SIZE (mode))
> -{
> -case 2:
> -  mode = HImode;
> -  break;
> -case 4:
> -  mode = SImode;
> -  break;
> -case 8:
> -  mode = DImode;
> -  break;
> -default:
> -  gcc_unreachable ();
> -}
> -  operands[0] = lowpart_subreg (mode, operands[0], mode);
> +  operands[0] = lowpart_subreg (mode, operands[0], mode);
>  })
>
>  ;; For TARGET_64BIT we always round up to 8 bytes.
> @@ -2974,33 +2967,48 @@ (define_insn "*mmx_3"
> (set_attr "type" "mmxadd,sselog,sselog,sselog")
> (set_attr "mode" "DI,TI,TI,TI")])
>
> -(define_insn "3"
> -  [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v")
> +(define_expand "3"
> +  [(set (match_operand:VI_16_32 0 "nonimmediate_operand")
>  (any_logic:VI_16_32
> - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v")
> - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v")))
> -   (clobber (reg:CC FLAGS_REG))]
> + (match_operand:VI_16_32 1 "nonimmediate_operand")
> + (match_operand:VI_16_32 2 
> "nonimmediate_or_x86_64_const_vector_operand")))]
>""
> +  "ix86_expand_binary_operator (, mode, operands); DONE;")
> +
> +(define_insn "*3"
> +  [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=?r,m,x,x,v")
> +(any_logic:VI_16_32
> + (match_operand:VI_16_32 1 "nonimmediate_operand" "%0,0,0,x,v")
> + (match_operand:VI_16_32 2 
> "nonimmediate_or_x86_64_const_vector_operand" "r,i,x,x,v")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_binary_operator_ok (, mode, operands)"
>"#"
> -  [(set_attr "isa" "*,sse2_noavx,avx,avx512vl")
> -   (set_attr "type" "alu,sselog,sselog,sselog")
> -   (set_attr "mode" "SI,TI,TI,TI")])
> +  [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl")
> +   (set_attr "type" "alu,alu,sselog,sselog,sselog")
> +   (set_attr "mode" "SI,SI,TI,TI,TI")])
>
>  (define_split
> -  [(set (match_operand:VI_16_32 0 "general_reg_operand")
> +  [(set (match_operand:VI_16_32 0 "nonimmediate_gr_operand")
>  (any_logic:VI_16_32
> - (match_operand:VI_16_32 1 "general_reg_operand")
> - (match_operand:VI_16_32 2 "general_reg_operand")))
> + (match_operand:VI_16_32 1 "nonimmediate_gr_operand")
> + (match_operand:VI_16_32 2 "reg_or_const_vector_operand")))
> (clobber (reg:CC FLAGS_REG))]
>"reload_completed"
>[(parallel
>   [(set (match_dup 0)
> -  (any_logic:SI (match_dup 1) (match_dup 2)))
> +  (any_logic: (match_dup 1) (match_dup 2)))
>(clobber (reg:CC FLAGS_REG))])]
>  {
> -  operands[2] = lowpart_subreg (SImode, operands[2], mode);
> -  operands[1] = lowpart_subreg (SImode, operands[1], mode);
> -  operands[0] = lowpart_subreg (SImode, operands[0], mode);
> +  if (!register_operand (operands[2], mode))
> +{
> +  HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[2],
> +   mode);
> +  operands[2] = GEN_INT (val);
> +}
> +  else
> +operands[2] = lowpart_subreg (mode, operands[2], 
> mode);
> + 

[PATCH] Remove setting -mblock-ops-vector-pair on power10.

2022-07-20 Thread Michael Meissner via Gcc-patches
Remove setting -mblock-ops-vector-pair on power10.

Testing has shown that using the load vector pair and store vector pair
instructions for block moves has some performance issues on power10.  This
patch does not set this option by default.  If it is a win in other
machines in the future, this flag can be set in the ISA options.

This patch modifies a previous patch that attempted to do the same thing
but the previous patch was flawed in that it would generate the load
vector pair and store vector pair instructions if we are tuning for a
different machine.

I have tested this patch on power10 systems (with long double set to IEEE
128-bit and also with long double set to IBM 128-bit).  Can I check this patch
into the trunk and back port to older GCC releases?

2022-07-21   Michael Meissner  

gcc/

* config/rs6000/rs6000.cc (rs6000_option_override_internal):
Do not enable -mblock-ops-vector-pair by default on power10.
---
 gcc/config/rs6000/rs6000.cc | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3ff16b8ae04..667f83b1dfd 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -4139,17 +4139,6 @@ rs6000_option_override_internal (bool global_init_p)
rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_UNALIGNED_VSX;
 }
 
-  if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
-{
-  /* Do not generate lxvp and stxvp on power10 since there are some
-performance issues.  */
-  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
- && rs6000_tune != PROCESSOR_POWER10)
-   rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
-  else
-   rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
-}
-
   /* Use long double size to select the appropriate long double.  We use
  TYPE_PRECISION to differentiate the 3 different long double types.  We map
  128 into the precision used for TFmode.  */
-- 
2.35.3


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH v2 0/3] LoongArch: Modify the method of obtaining symbolic addresses.

2022-07-20 Thread Lulu Cheng
1. The original LA macro instruction is split into two instructions to
   obtain the address of the symbol if enable '-mexplicit-relocs'.
2. Currently, '-mcmodel=' only supports 'normal' mode, because other mode
   behaviors are not yet determined. This function is gradually improved
   by the subsequent handling.
3. The method that calls global functions from 'la.global + jirl' to 'bl'
   when build with '-fplt'.
4. Some R_LARCH_64 in section .eh_frame will to generate  R_LARCH_NONE, we
   change ASM_PREFERRED_EH_DATA_FORMAT from 'WD_EH_PE_absptr' to
   'WD_EH_PE_pcrel | DW_EH_PE_sdata4' then relocation to R_LARCH_32_PCREL
   from R_LARCH_64 in setction .eh_frame and not generate dynamic relocation
   for R_LARCH_32_PCREL.

This new symbol loading method not support by upstream binutils yet,
this GCC port requires the following patches applied to binutils to build.

  [1]https://sourceware.org/pipermail/binutils/2022-July/121933.html
  [2]https://sourceware.org/pipermail/binutils/2022-July/121937.html
  [3]https://sourceware.org/pipermail/binutils/2022-July/121934.html
  [4]https://sourceware.org/pipermail/binutils/2022-July/121935.html
  [5]https://sourceware.org/pipermail/binutils/2022-July/121936.html
  [6]https://sourceware.org/pipermail/binutils/2022-July/121938.html
  [7]https://sourceware.org/pipermail/binutils/2022-July/121939.html


Lulu Cheng (3):
  LoongArch: Subdivision symbol type, add SYMBOL_PCREL support.
  LoongArch: Support split symbol.
  LoongArch: Modify the definition of the ASM_PREFERRED_EH_DATA_FORMAT
macro.

 .../config/loongarch/loongarch-common.cc  |   1 +
 gcc/config/loongarch/constraints.md   |  24 +-
 gcc/config/loongarch/genopts/loongarch.opt.in |   4 +
 gcc/config/loongarch/loongarch-protos.h   |  10 +-
 gcc/config/loongarch/loongarch.cc | 652 +-
 gcc/config/loongarch/loongarch.h  |   4 +-
 gcc/config/loongarch/loongarch.md | 401 +++
 gcc/config/loongarch/loongarch.opt|   4 +
 gcc/config/loongarch/predicates.md|  56 +-
 .../gcc.target/loongarch/func-call-1.c|  32 +
 .../gcc.target/loongarch/func-call-2.c|  32 +
 .../gcc.target/loongarch/func-call-3.c|  32 +
 .../gcc.target/loongarch/func-call-4.c|  32 +
 .../gcc.target/loongarch/func-call-5.c|  33 +
 .../gcc.target/loongarch/func-call-6.c|  33 +
 .../gcc.target/loongarch/func-call-7.c|  34 +
 .../gcc.target/loongarch/func-call-8.c|  33 +
 .../loongarch/relocs-symbol-noaddend.c|  23 +
 18 files changed, 905 insertions(+), 535 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-3.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-4.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-5.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-6.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-7.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-8.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/relocs-symbol-noaddend.c

-- 
2.31.1



[PATCH v2 1/3] LoongArch: Subdivision symbol type, add SYMBOL_PCREL support.

2022-07-20 Thread Lulu Cheng
1. Remove cModel type support other than normal.
2. The method that calls global functions from 'la.global + jirl' to 'bl'
   when build with '-fplt'.

gcc/ChangeLog:

* config/loongarch/constraints.md (a): Delete the constraint.
(b): A constant call not local address.
(h): Delete the constraint.
(t): Delete the constraint.
* config/loongarch/loongarch-protos.h (enum loongarch_symbol_type):
Add new symbol type 'SYMBOL_PCREL', 'SYMBOL_TLS_IE' and 'SYMBOL_TLS_LE'.
(loongarch_split_symbol): Delete useless function declarations.
(loongarch_split_symbol_type): Delete useless function declarations.
* config/loongarch/loongarch.cc (enum loongarch_address_type):
Delete unnecessary comment information.
(loongarch_symbol_binds_local_p): Modified the judgment order of label
and symbol.
(loongarch_classify_symbol): Return symbol type. If symbol is a label,
or symbol is a local symbol return SYMBOL_PCREL. If is a tls symbol,
return SYMBOL_TLS. If is a not local symbol return SYMBOL_GOT_DISP.
(loongarch_symbolic_constant_p): Add handling of 'SYMBOL_TLS_IE'
'SYMBOL_TLS_LE' and 'SYMBOL_PCREL'.
(loongarch_symbol_insns): Add handling of 'SYMBOL_TLS_IE' 
'SYMBOL_TLS_LE'
and 'SYMBOL_PCREL'.
(loongarch_address_insns): Sort code.
(loongarch_12bit_offset_address_p): Sort code.
(loongarch_14bit_shifted_offset_address_p): Sort code.
(loongarch_call_tls_get_addr): Sort code.
(loongarch_legitimize_tls_address): Sort code.
(loongarch_output_move): Remove schema support for cmodel other than 
normal.
(loongarch_memmodel_needs_release_fence): Sort code.
(loongarch_print_operand): Sort code.
* config/loongarch/loongarch.h (LARCH_U12BIT_OFFSET_P):
Rename to LARCH_12BIT_OFFSET_P.
(LARCH_12BIT_OFFSET_P): New macro.
* config/loongarch/loongarch.md: Reimplement the function call. Remove 
schema
support for cmodel other than normal.
* config/loongarch/predicates.md (is_const_call_weak_symbol): Delete 
this predicate.
(is_const_call_plt_symbol): Delete this predicate.
(is_const_call_global_noplt_symbol): Delete this predicate.
(is_const_call_no_local_symbol): New predicate, determines whether it 
is a local
symbol or label.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/func-call-1.c: New test.
* gcc.target/loongarch/func-call-2.c: New test.
* gcc.target/loongarch/func-call-3.c: New test.
* gcc.target/loongarch/func-call-4.c: New test.
---
 gcc/config/loongarch/constraints.md   |  24 +-
 gcc/config/loongarch/loongarch-protos.h   |   9 +-
 gcc/config/loongarch/loongarch.cc | 256 +++-
 gcc/config/loongarch/loongarch.h  |   2 +-
 gcc/config/loongarch/loongarch.md | 279 +++---
 gcc/config/loongarch/predicates.md|  40 ++-
 .../gcc.target/loongarch/func-call-1.c|  32 ++
 .../gcc.target/loongarch/func-call-2.c|  32 ++
 .../gcc.target/loongarch/func-call-3.c|  32 ++
 .../gcc.target/loongarch/func-call-4.c|  32 ++
 10 files changed, 305 insertions(+), 433 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-3.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-4.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index d0bfddbd5a9..43cb7b5f0f5 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -20,14 +20,14 @@
 
 ;; Register constraints
 
-;; "a" "A constant call global and noplt address."
-;; "b" <-unused
+;; "a" <-unused
+;; "b" "A constant call not local address."
 ;; "c" "A constant call local address."
 ;; "d" <-unused
 ;; "e" JIRL_REGS
 ;; "f" FP_REGS
 ;; "g" <-unused
-;; "h" "A constant call plt address."
+;; "h" <-unused
 ;; "i" "Matches a general integer constant." (Global non-architectural)
 ;; "j" SIBCALL_REGS
 ;; "k" "A memory operand whose address is formed by a base register and
@@ -42,7 +42,7 @@
 ;; "q" CSR_REGS
 ;; "r" GENERAL_REGS (Global non-architectural)
 ;; "s" "Matches a symbolic integer constant." (Global non-architectural)
-;; "t" "A constant call weak address"
+;; "t" <-unused
 ;; "u" "A signed 52bit constant and low 32-bit is zero (for logic 
instructions)"
 ;; "v" "A signed 64-bit constant and low 44-bit is zero (for logic 
instructions)."
 ;; "w" "Matches any valid memory."
@@ -89,10 +89,10 @@
 ;; "<" "Matches a pre-dec or post-dec operand." (Global non-architectural)
 ;; ">" "Matches a pre-inc or post-inc operand." (Global non-architectural)
 
-(define_constraint "a"
+(define_constraint "b"
   "@internal
- 

[PATCH v2 3/3] LoongArch: Modify the definition of the ASM_PREFERRED_EH_DATA_FORMAT macro.

2022-07-20 Thread Lulu Cheng
Some R_LARCH_64 in section .eh_frame will to generate  R_LARCH_NONE, we
change relocation to R_LARCH_32_PCREL from R_LARCH_64 in setction .eh_frame
and not generate dynamic relocation for R_LARCH_32_PCREL.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ASM_PREFERRED_EH_DATA_FORMAT):
Modify the definition of the ASM_PREFERRED_EH_DATA_FORMAT macro.
---
 gcc/config/loongarch/loongarch.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 89a5bd728fe..222b58b838d 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -1128,7 +1128,7 @@ struct GTY (()) machine_function
 #endif
 
 #define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \
-  (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_absptr)
+  (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_pcrel | DW_EH_PE_sdata4)
 
 /* Do emit .note.GNU-stack by default.  */
 #ifndef NEED_INDICATE_EXEC_STACK
-- 
2.31.1



[PATCH v2 2/3] LoongArch: Support split symbol.

2022-07-20 Thread Lulu Cheng
Add compilation option '-mexplicit-relocs', and if enable '-mexplicit-relocs'
the symbolic address load instruction 'la.*' will be split into two 
instructions.
This ckompilation option enabled by default.

gcc/ChangeLog:

* common/config/loongarch/loongarch-common.cc:
Enable '-fsection-anchors' when O1 and more advanced optimization.
* config/loongarch/genopts/loongarch.opt.in: Add new option
'-mexplicit-relocs', and enable by default.
* config/loongarch/loongarch-protos.h (loongarch_split_move_insn_p):
Delete function declaration.
(loongarch_split_move_insn): Delete function declaration.
(loongarch_split_symbol_type): Add function declaration.
* config/loongarch/loongarch.cc (enum loongarch_address_type):
Add new address type 'ADDRESS_LO_SUM'.
(loongarch_classify_symbolic_expression): New function definitions.
Classify the base of symbolic expression X, given that X appears in
context CONTEXT.
(loongarch_symbol_insns): Add a judgment condition 
TARGET_EXPLICIT_RELOCS.
(loongarch_split_symbol_type): New function definitions.
Determines whether the symbol load should be split into two 
instructions.
(loongarch_valid_lo_sum_p): New function definitions.
Return true if a LO_SUM can address a value of mode MODE when the LO_SUM
symbol has type SYMBOL_TYPE.
(loongarch_classify_address): Add handling of 'LO_SUM'.
(loongarch_address_insns): Add handling of 'ADDRESS_LO_SUM'.
(loongarch_signed_immediate_p): Sort code.
(loongarch_12bit_offset_address_p): Return true if address type is 
ADDRESS_LO_SUM.
(loongarch_const_insns): Add handling of 'HIGH'.
(loongarch_split_move_insn_p): Add the static attribute to the function.
(loongarch_emit_set): New function definitions.
(loongarch_call_tls_get_addr): Add symbol handling when defining 
TARGET_EXPLICIT_RELOCS.
(loongarch_legitimize_tls_address): Add symbol handling when defining 
the
TARGET_EXPLICIT_RELOCS macro.
(loongarch_split_symbol): New function definitions. Split symbol.
(loongarch_legitimize_address): Add codes see if the address can split 
into a high part
and a LO_SUM.
(loongarch_legitimize_const_move): Add codes split moves of symbolic 
constants into
high and low.
(loongarch_split_move_insn): Delete function definitions.
(loongarch_output_move): Add support for HIGH and LO_SUM.
(loongarch_print_operand_reloc): New function definitions.
Print symbolic operand OP, which is part of a HIGH or LO_SUM in context 
CONTEXT.
(loongarch_memmodel_needs_release_fence): Sort code.
(loongarch_print_operand): Rearrange alphabetical order and add H and L 
to support HIGH
and LOW output.
(loongarch_print_operand_address): Add handling of 'ADDRESS_LO_SUM'.
(TARGET_MIN_ANCHOR_OFFSET): Define macro to -IMM_REACH/2.
(TARGET_MAX_ANCHOR_OFFSET): Define macro to IMM_REACH/2-1.
* config/loongarch/loongarch.md (movti): Delete the template.
(*movti): Delete the template.
(movtf): Delete the template.
(*movtf): Delete the template.
(*low): New template of normal symbol low address.
(@tls_low): New template of tls symbol low address.
(@ld_from_got): New template load address from got table.
(@ori_l_lo12): New template.
* config/loongarch/loongarch.opt: Update from loongarch.opt.in.
* config/loongarch/predicates.md: Add support for symbol_type HIGH.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/func-call-1.c: Add build option 
'-mno-explicit-relocs'.
* gcc.target/loongarch/func-call-2.c: Add build option 
'-mno-explicit-relocs'.
* gcc.target/loongarch/func-call-3.c: Add build option 
'-mno-explicit-relocs'.
* gcc.target/loongarch/func-call-4.c: Add build option 
'-mno-explicit-relocs'.
* gcc.target/loongarch/func-call-5.c: New test.
* gcc.target/loongarch/func-call-6.c: New test.
* gcc.target/loongarch/func-call-7.c: New test.
* gcc.target/loongarch/func-call-8.c: New test.
* gcc.target/loongarch/relocs-symbol-noaddend.c: New test.
---
 .../config/loongarch/loongarch-common.cc  |   1 +
 gcc/config/loongarch/genopts/loongarch.opt.in |   4 +
 gcc/config/loongarch/loongarch-protos.h   |   3 +-
 gcc/config/loongarch/loongarch.cc | 412 --
 gcc/config/loongarch/loongarch.md | 122 +++---
 gcc/config/loongarch/loongarch.opt|   4 +
 gcc/config/loongarch/predicates.md|  20 +-
 .../gcc.target/loongarch/func-call-1.c|   2 +-
 .../gcc.target/loongarch/func-call-2.c|   2 +-
 .../gcc.target/loongarch/func-call-3.c|   2 +-
 .../gcc.target/loongarch/func-call-4.c|   2 +-
 .../gcc.target/loongarch/func-ca

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-20 Thread Richard Biener via Gcc-patches
On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
 wrote:
>
> On Mon, 18 Jul 2022 at 11:57, Richard Biener  
> wrote:
> >
> > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > >  wrote:
> > > >
> > > > Richard Biener  writes:
> > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > >  wrote:
> > > > >>
> > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > > >>  wrote:
> > > > >> >
> > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > > >> >  wrote:
> > > > >> > >
> > > > >> > > Hi Richard,
> > > > >> > > For the following test:
> > > > >> > >
> > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > >> > > {
> > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > >> > > }
> > > > >> > >
> > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > > >> > > foo.c: In function ‘f2’:
> > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > > >> > >   |   ^~
> > > > >> > > svint32_t
> > > > >> > > __Int32x4_t
> > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > >> > > during GIMPLE pass: forwprop
> > > > >> > > dump file: foo.c.109t.forwprop2
> > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > > >> > > 0xe9371f execute_function_todo
> > > > >> > > ../../gcc/gcc/passes.cc:2091
> > > > >> > > 0xe93ccb execute_todo
> > > > >> > > ../../gcc/gcc/passes.cc:2145
> > > > >> > >
> > > > >> > > This happens because, after folding svld1rq_s32 to 
> > > > >> > > vec_perm_expr, we have:
> > > > >> > >   int32x4_t v;
> > > > >> > >   __Int32x4_t _1;
> > > > >> > >   svint32_t _9;
> > > > >> > >   vector(4) int _11;
> > > > >> > >
> > > > >> > >:
> > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > >> > >   v_12 = _1;
> > > > >> > >   _11 = v_12;
> > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > >> > >   return _9;
> > > > >> > >
> > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > > >> > > view_convert_expr,
> > > > >> > > and the end result becomes:
> > > > >> > >   svint32_t _7;
> > > > >> > >   __Int32x4_t _8;
> > > > >> > >
> > > > >> > > ;;   basic block 2, loop depth 0
> > > > >> > > ;;pred:   ENTRY
> > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > >> > >   return _7;
> > > > >> > > ;;succ:   EXIT
> > > > >> > >
> > > > >> > > which causes the error duing verify_gimple since 
> > > > >> > > VIEW_CONVERT_EXPR
> > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > >> > >
> > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > >> > > in simplify_permutation, if lhs and rhs have non compatible 
> > > > >> > > types,
> > > > >> > > which resolves ICE, but am not sure if it's the correct approach 
> > > > >> > > ?
> > > > >> >
> > > > >> > It for sure papers over the issue.  I think the error happens 
> > > > >> > earlier,
> > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > > >> > which is the type of the LHS.  But then you probably run into the
> > > > >> > different sizes ICE (VLA vs constant size).  I think for this case 
> > > > >> > you
> > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > >> > selecting the "low" part of the VLA vector.
> > > > >> Hi Richard,
> > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > > >> represent dup operation
> > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > >> BIT_FIELD_REF will work.
> > > > >> Could you please elaborate ?
> > > > >>
> > > > >> Also, the issue doesn't seem restricted to this case.
> > > > >> The following test case also ICE's during forwprop:
> > > > >> svint32_t foo()
> > > > >> {
> > > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > >>   return v2;
> > > > >> }
> > > > >>
> > > > >> foo2.c: In function ‘foo’:
> > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > > >> 9 | }
> > > > >>   | ^
> > > > >> svint32_t
> > > > >> int32x4_t
> > > > >> v2_4 = { 1, 2, 3, 4 };
> > > > >>
> > > > >> because simplify_permutation folds
> > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > > >> into:
> > > > >> vector_cst {1, 2, 3, 4}
> > > > >>
> > > > >> and it complains during verify_gimple_assign_single because we don't
> > > > >> support assignment of vector_cst to VLA vector.
> > > > >>
> > > > >> I guess the issue really is that currently,