Re: [PATCH 02/11] AArch64: Add test cases for SVE types in OpenMP shared clause.

2024-05-31 Thread Tejas Belagod

On 5/30/24 6:08 PM, Richard Sandiford wrote:

Tejas Belagod  writes:

This patch tests various shared clauses with SVE types.  It also adds a test
scaffold to run OpenMP tests in under the gcc.target testsuite.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp: New scaffold.


Hopefully Jakub can comment on whether we should test this in the
GCC testsuite or libgomp testsuite.

On the test:


[...]
+int
+main ()
+{
+  svint32_t x = svindex_s32 (0 ,1);
+  svint32_t y = svindex_s32 (8, 1);
+  svint32_t a, b;
+  svbool_t p;
+
+  /* Implicit shared.  */
+  a = foo (x, y, p);
+  b = implicit_shared_default (x, y, p);


It looks like p is used uninitialised here.  Can you check locally
that using svptrue_b8 () (or whatever) as an initialiser allows the
test to pass while svpfalse_b () causes it to fail?



Oops, thanks for spotting that. Now verified - will wait for Jakub's 
comment on tests' home before I respin.


Thanks,
Tejas.


Thanks,
Richard


+  compare_vec (a, b);
+
+  /* Explicit shared.  */
+  a = foo (x ,y, p);
+  b = explicit_shared (x, y, p);
+  compare_vec (a, b);
+
+  /* Implicit shared with no default clause.  */
+  a = foo (x ,y, p);
+  b = implicit_shared_no_default (x, y, p);
+  compare_vec (a, b);
+
+  /* Mix shared.  */
+  a = foo (x ,y, p);
+  b = mix_shared (y, p);
+  compare_vec (a, b);
+
+  /* Predetermined shared.  */
+  predetermined_shared_static (true);
+  predetermined_shared_static (false);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "value-expr: \*.omp_data_i->a" 10 
"ompexp" } } */




Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs

2024-05-31 Thread Kewen.Lin
on 2024/5/29 14:52, Alexandre Oliva wrote:
> On May 27, 2024, "Kewen.Lin"  wrote:
> 
>> I wonder if it's possible to have a test case for this?
> 
> gcc.dg/guality/pr54519-[34].c at -O[1g] are fixed by this patch on

Nice!

> ppc64le-linux-gnu.  Are these the sort of test case you're interested

Yes, I was curious if we can have some testing coverage on this.  As
Segher pointed out, it would be good to have this information in commit
log.

BR,
Kewen

> in, or are you looking for something that tests the offsets in debug
> info, rather than the end-to-end debugging feature?
> 





Re: [PATCH v3 4/6] btf: add -fprune-btf option

2024-05-31 Thread Richard Biener
On Thu, May 30, 2024 at 11:34 PM David Faust  wrote:
>
> This patch adds a new option, -fprune-btf, to control BTF debug info
> generation.

Can you name it -gprune-btf instead?

> As the name implies, this option enables a kind of "pruning" of the BTF
> information before it is emitted.  When enabled, rather than emitting
> all type information translated from DWARF, only information for types
> directly used in the source program is emitted.
>
> The primary purpose of this pruning is to reduce the amount of
> unnecessary BTF information emitted, especially for BPF programs.  It is
> very common for BPF programs to incldue Linux kernel internal headers in
> order to have access to kernel data structures.  However, doing so often
> has the side effect of also adding type definitions for a large number
> of types which are not actually used by nor relevant to the program.
> In these cases, -fprune-btf commonly reduces the size of the resulting
> BTF information by 10x or more, as seen on average when compiling Linux
> kernel BPF selftests.  This both slims down the size of the resulting
> object and reduces the time required by the BPF loader to verify the
> program and its BTF information.
>
> Note that the pruning implemented in this patch follows the same rules
> as the BTF pruning performed unconditionally by LLVM's BPF backend when
> generating BTF.  In particular, the main sources of pruning are:
>
>   1) Only generate BTF for types used by variables and functions at
>  the file scope.  Note that with or without pruning, BTF_KIND_VAR
>  entries are only generated for variables present in the final
>  object - unused static variables or variables completely optimized
>  away must not have VAR entries in BTF.
>
>   2) Avoid emitting full BTF for struct and union types which are only
>  pointed-to by members of other struct/union types.  In these cases,
>  the full BTF_KIND_STRUCT or BTF_KIND_UNION which would normally
>  be emitted is replaced with a BTF_KIND_FWD, as though the
>  underlying type was a forward-declared struct or union type.
>
> gcc/
> * btfout.cc (btf_used_types): New hash set.
> (struct btf_fixup): New.
> (fixups, forwards): New vecs.
> (btf_output): Calculate num_types depending on flag_prune_btf.
> (btf_early_finsih): New initialization for flag_prune_btf.
> (btf_add_used_type): New function.
> (btf_used_type_list_cb): Likewise.
> (btf_late_collect_pruned_types): Likewise.
> (btf_late_add_vars): Handle special case for variables in ".maps"
> section when generating BTF for BPF CO-RE target.
> (btf_late_finish): Use btf_late_collect_pruned_types when
> flag_prune_btf in effect.  Move some initialization to 
> btf_early_finish.
> (btf_finalize): Additional deallocation for flag_prune_btf.
> * common.opt (fprune-btf): New flag.
> * ctfc.cc (init_ctf_strtable): Make non-static.
> * ctfc.h (struct ctf_dtdef): Add visited_children_p boolean flag.
> (init_ctf_strtable, ctfc_delete_strtab): Make extern.
> * doc/invoke.texi (Debugging Options): Document -fprune-btf.
>
> gcc/testsuite/
> * gcc.dg/debug/btf/btf-prune-1.c: New test.
> * gcc.dg/debug/btf/btf-prune-2.c: Likewise.
> * gcc.dg/debug/btf/btf-prune-3.c: Likewise.
> * gcc.dg/debug/btf/btf-prune-maps.c: Likewise.
> ---
>  gcc/btfout.cc | 359 +-
>  gcc/common.opt|   4 +
>  gcc/ctfc.cc   |   2 +-
>  gcc/ctfc.h|   3 +
>  gcc/doc/invoke.texi   |  20 +
>  gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c  |  25 ++
>  gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c  |  33 ++
>  gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c  |  35 ++
>  .../gcc.dg/debug/btf/btf-prune-maps.c |  20 +
>  9 files changed, 494 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-maps.c
>
> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
> index 32fda14f704b..a7da164f6b31 100644
> --- a/gcc/btfout.cc
> +++ b/gcc/btfout.cc
> @@ -828,7 +828,10 @@ output_btf_types (ctf_container_ref ctfc)
>  {
>size_t i;
>size_t num_types;
> -  num_types = ctfc->ctfc_types->elements ();
> +  if (flag_prune_btf)
> +num_types = max_translated_id;
> +  else
> +num_types = ctfc->ctfc_types->elements ();
>
>if (num_types)
>  {
> @@ -957,6 +960,212 @@ btf_early_add_func_records (ctf_container_ref ctfc)
>  }
>  }
>
> +/* The set of types used directly in the source program, and any types 
> manually
> +   marked as used.  This is the set of types which will be em

Re: [PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-31 Thread Richard Biener
On Thu, May 30, 2024 at 3:28 PM Feng Xue OS  wrote:
>
> >> Hi,
> >>
> >> The patch was updated with the newest trunk, and also contained some minor 
> >> changes.
> >>
> >> I am working on another new feature which is meant to support pattern 
> >> recognition
> >> of lane-reducing operations in affine closure originated from loop 
> >> reduction variable,
> >> like:
> >>
> >>   sum += cst1 * dot_prod_1 + cst2 * sad_2 + ... + cstN * lane_reducing_op_N
> >>
> >> The feature WIP depends on the patch. It has been a little bit long time 
> >> since its post,
> >> would you please take a time to review this one? Thanks.
>
> > This seems to do multiple things so I wonder if you can split up the
> > patch a bit?
>
> OK. Will send out split patches in new mails.

Thanks.

> > For example adding lane_reducing_op_p can be split out, it also seems like
> > the vect_transform_reduction change to better distribute work can be done
> > separately?  Likewise refactoring like splitting out
> > vect_reduction_use_partial_vector.
> >
> > When we have
> >
> >sum += d0[i] * d1[i];  // dot-prod 
> >sum += w[i];   // widen-sum 
> >sum += abs(s0[i] - s1[i]); // sad 
> >sum += n[i];   // normal 
> >
> > the vector DOT_PROD and friend ops can end up mixing different lanes
> > since it is not specified which lanes are reduced into which output lane.
> > So, DOT_PROD might combine 0-3, 4-7, ... but SAD might combine
> > 0,4,8,12; 1,5,9,13; ... I think this isn't worse than what one op itself
> > is doing, but it's worth pointing out (it's probably unlikely a target
> > mixes different reduction strategies anyway).
>
> Yes. But even on a peculiar target, DOT_PROD and SAD have different reduction
> strategies, it does not impact result correctness, at least for integer 
> operation.
> Is there anything special that we need to consider?

I couldn't think of any case, it's just these operations are only useful for
reductions.

> >
> > Can you make sure to add at least one SLP reduction example to show
> > this works for SLP as well?
> OK. The patches contains the cases for SLP reduction chain. Will add one for
> SLP reduction, this should be a negative case.

Yes.

> Thanks,
> Feng


Re: [PATCH 1/6] vect: Add a function to check lane-reducing code [PR114440]

2024-05-31 Thread Richard Biener
On Thu, May 30, 2024 at 4:45 PM Feng Xue OS  wrote:
>
> This is a patch that is split out from 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html.
>
> Check if an operation is lane-reducing requires comparison of code against
> three kinds (DOT_PROD_EXPR/WIDEN_SUM_EXPR/SAD_EXPR).  Add an utility
> function to make source coding for the check handy and concise.

OK.

Thanks,
Richard.

> Feng
> --
> gcc/
> * tree-vectorizer.h (lane_reducing_op_p): New function.
> * tree-vect-slp.cc (vect_analyze_slp): Use new function
> lane_reducing_op_p to check statement code.
> * tree-vect-loop.cc (vect_transform_reduction): Likewise.
> (vectorizable_reduction): Likewise, and change name of a local
> variable that holds the result flag.
> ---
>  gcc/tree-vect-loop.cc | 29 -
>  gcc/tree-vect-slp.cc  |  4 +---
>  gcc/tree-vectorizer.h |  6 ++
>  3 files changed, 19 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 04a9ac64df7..a42d79c7cbf 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7650,9 +7650,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>gimple_match_op op;
>if (!gimple_extract_op (stmt_info->stmt, &op))
>  gcc_unreachable ();
> -  bool lane_reduc_code_p = (op.code == DOT_PROD_EXPR
> -   || op.code == WIDEN_SUM_EXPR
> -   || op.code == SAD_EXPR);
> +  bool lane_reducing = lane_reducing_op_p (op.code);
>
>if (!POINTER_TYPE_P (op.type) && !INTEGRAL_TYPE_P (op.type)
>&& !SCALAR_FLOAT_TYPE_P (op.type))
> @@ -7664,7 +7662,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>
>/* For lane-reducing ops we're reducing the number of reduction PHIs
>   which means the only use of that may be in the lane-reducing operation. 
>  */
> -  if (lane_reduc_code_p
> +  if (lane_reducing
>&& reduc_chain_length != 1
>&& !only_slp_reduc_chain)
>  {
> @@ -7678,7 +7676,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>   since we'll mix lanes belonging to different reductions.  But it's
>   OK to use them in a reduction chain or when the reduction group
>   has just one element.  */
> -  if (lane_reduc_code_p
> +  if (lane_reducing
>&& slp_node
>&& !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
>&& SLP_TREE_LANES (slp_node) > 1)
> @@ -7738,7 +7736,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>/* To properly compute ncopies we are interested in the widest
>  non-reduction input type in case we're looking at a widening
>  accumulation that we later handle in vect_transform_reduction.  */
> -  if (lane_reduc_code_p
> +  if (lane_reducing
>   && vectype_op[i]
>   && (!vectype_in
>   || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
> @@ -8211,7 +8209,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>&& loop_vinfo->suggested_unroll_factor == 1)
>  single_defuse_cycle = true;
>
> -  if (single_defuse_cycle || lane_reduc_code_p)
> +  if (single_defuse_cycle || lane_reducing)
>  {
>gcc_assert (op.code != COND_EXPR);
>
> @@ -8227,7 +8225,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>  mixed-sign dot-products can be implemented using signed
>  dot-products.  */
>machine_mode vec_mode = TYPE_MODE (vectype_in);
> -  if (!lane_reduc_code_p
> +  if (!lane_reducing
>   && !directly_supported_p (op.code, vectype_in, optab_vector))
>  {
>if (dump_enabled_p ())
> @@ -8252,7 +8250,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>   For the other cases try without the single cycle optimization.  */
>if (!ok)
> {
> - if (lane_reduc_code_p)
> + if (lane_reducing)
> return false;
>   else
> single_defuse_cycle = false;
> @@ -8263,7 +8261,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>/* If the reduction stmt is one of the patterns that have lane
>   reduction embedded we cannot handle the case of ! single_defuse_cycle.  
> */
>if ((ncopies > 1 && ! single_defuse_cycle)
> -  && lane_reduc_code_p)
> +  && lane_reducing)
>  {
>if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -8274,7 +8272,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>
>if (slp_node
>&& !(!single_defuse_cycle
> -  && !lane_reduc_code_p
> +  && !lane_reducing
>&& reduction_type != FOLD_LEFT_REDUCTION))
>  for (i = 0; i < (int) op.num_ops; i++)
>if (!vect_maybe_update_slp_op_vectype (slp_op[i], vectype_op[i]))
> @@ -8295,7 +8293,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>/* Cost the reduction op inside the loop if transformed via
>   vect_transform_reduction.  Other

Re: [PATCH 2/6] vect: Split out partial vect checking for reduction into a function

2024-05-31 Thread Richard Biener
On Thu, May 30, 2024 at 4:48 PM Feng Xue OS  wrote:
>
> This is a patch that is split out from 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html.
>
> Partial vectorization checking for vectorizable_reduction is a piece of
> relatively isolated code, which may be reused by other places. Move the
> code into a new function for sharing.
>
> Thanks,
> Feng
> ---
> gcc/
> * tree-vect-loop.cc (vect_reduction_use_partial_vector): New function.

Can you rename the function to vect_reduction_update_partial_vector_usage
please?  And keep ...

> (vectorizable_reduction): Move partial vectorization checking code to
> vect_reduction_use_partial_vector.
> ---
>  gcc/tree-vect-loop.cc | 138 --
>  1 file changed, 78 insertions(+), 60 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index a42d79c7cbf..aa5f21ccd1a 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7391,6 +7391,81 @@ build_vect_cond_expr (code_helper code, tree vop[3], 
> tree mask,
>  }
>  }
>
> +/* Given an operation with CODE in loop reduction path whose reduction PHI is
> +   specified by REDUC_INFO, the operation has TYPE of scalar result, and its
> +   input vectype is represented by VECTYPE_IN. The vectype of vectorized 
> result
> +   may be different from VECTYPE_IN, either in base type or vectype lanes,
> +   lane-reducing operation is the case.  This function check if it is 
> possible,
> +   and how to perform partial vectorization on the operation in the context
> +   of LOOP_VINFO.  */
> +
> +static void
> +vect_reduction_use_partial_vector (loop_vec_info loop_vinfo,
> +  stmt_vec_info reduc_info,
> +  slp_tree slp_node, code_helper code,
> +  tree type, tree vectype_in)
> +{
> +  if (!LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +return;
> +
> +  enum vect_reduction_type reduc_type = STMT_VINFO_REDUC_TYPE (reduc_info);
> +  internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
> +  internal_fn cond_fn = get_conditional_internal_fn (code, type);
> +
> +  if (reduc_type != FOLD_LEFT_REDUCTION
> +  && !use_mask_by_cond_expr_p (code, cond_fn, vectype_in)
> +  && (cond_fn == IFN_LAST
> + || !direct_internal_fn_supported_p (cond_fn, vectype_in,
> + OPTIMIZE_FOR_SPEED)))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"can't operate on partial vectors because"
> +" no conditional operation is available.\n");
> +  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +}
> +  else if (reduc_type == FOLD_LEFT_REDUCTION
> +  && reduc_fn == IFN_LAST
> +  && !expand_vec_cond_expr_p (vectype_in, truth_type_for 
> (vectype_in),
> +  SSA_NAME))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +   "can't operate on partial vectors because"
> +   " no conditional operation is available.\n");
> +  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +}
> +  else if (reduc_type == FOLD_LEFT_REDUCTION
> +  && internal_fn_mask_index (reduc_fn) == -1
> +  && FLOAT_TYPE_P (vectype_in)
> +  && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"can't operate on partial vectors because"
> +" signed zeros cannot be preserved.\n");
> +  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +}
> +  else
> +{
> +  internal_fn mask_reduc_fn
> +   = get_masked_reduction_fn (reduc_fn, vectype_in);
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> +  unsigned nvectors;
> +
> +  if (slp_node)
> +   nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
> +  else
> +   nvectors = vect_get_num_copies (loop_vinfo, vectype_in);
> +
> +  if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
> +   vect_record_loop_len (loop_vinfo, lens, nvectors, vectype_in, 1);
> +  else
> +   vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype_in, NULL);
> +}
> +}
> +
>  /* Function vectorizable_reduction.
>
> Check if STMT_INFO performs a reduction operation that can be vectorized.
> @@ -7456,7 +7531,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>bool single_defuse_cycle = false;
>bool nested_cycle = false;
>bool double_reduc = false;
> -  int vec_num;
>tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
>tree cond_reduc_v

Re: [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-05-31 Thread Kewen.Lin
Hi Haochen,

on 2024/5/30 11:14, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> 在 2024/5/29 13:26, Kewen.Lin 写道:
>> I can understand re-using "unordered" and "eq" will save some efforts than
>> doing with unspecs, but they are actually RTL codes instead of bits on the
>> specific hardware CR, a downside is that people who isn't aware of this
>> design point can have some misunderstanding when reading/checking the code
>> or dumping, from this perspective unspecs (with reasonable name) can be
>> more meaningful.  Normally adopting RTL code is better since they have the
>> chance to be considered (optimized) in generic pass/code, but it isn't the
>> case here as we just use the code itself but not be with the same semantic
>> (meaning).  Looking forward to others' opinions on this, if we want to adopt
>> "unordered" and "eq" like what this patch does, I think we should at least
>> emphasize such points in rs6000-modes.def.
> 
> Thanks so much for your comments. IMHO, the core is if we can re-define
> "unordered" or "eq" for certain CC mode on a specific target. If we can't or
> it's unsafe, we have to use the unspecs. In this case, I just want to define
> the code "unordered" on CCBCD as testing if the bit 3 is set on this CR field.

But my point is that "unordered" has its semantic, it looks a bit tricky to
adopt it on the result from BCD comparison when which only has "invalid" and
"overflow" other than normal ones, though I can understand that this patch
wants to use it to test bit 3 since for float comparison bit 3 is for
"unordered".  However, IMHO it would be more clear to have one unspec to test
bit 3 when bit 3 doesn't actually mean unordered result.

> Actually rs6000 already use "lt" code to test if bit 0 is set for vector
> compare instructions. The following expand is an example.

Yeah, but it doesn't mean it's the most sensible way to do this, IMHO it
suffers from the similar issue and can be improved as well.

> 
> (define_expand "vector_ae__p"
>   [(parallel
> [(set (reg:CC CR6_REGNO)
>   (unspec:CC [(ne:CC (match_operand:VI 1 "vlogical_operand")
>  (match_operand:VI 2 "vlogical_operand"))]
>UNSPEC_PREDICATE))
>  (set (match_dup 3)
>   (ne:VI (match_dup 1)
>  (match_dup 2)))])
>(set (match_operand:SI 0 "register_operand" "=r")
> (lt:SI (reg:CC CR6_REGNO)
>(const_int 0)))
>(set (match_dup 0)
> (xor:SI (match_dup 0)
> (const_int 1)))]
> 
> I think the "lt" on CC just doesn't mean it compares if CC value is less than 
> an
> integer. It just tests the "lt" bit (bit 0) is set or not on this CC.

But bit 0 doesn't mean for "lt" comparison result in this context any more.

BR,
Kewen

> 
>   Looking forward to your and Segher's further invaluable comments.
> 
> Thanks
> Gui Haochen



[COMMITTED] testsuite: Adjust several dg-additional-files-options calls [PR115294]

2024-05-31 Thread Rainer Orth
A recent patch

commit bdc264a16e327c63d133131a695a202fbbc0a6a0
Author: Alexandre Oliva 
Date:   Thu May 30 02:06:48 2024 -0300

[testsuite] conditionalize dg-additional-sources on target and type

added two additional args to dg-additional-files-options.
Unfortunately, this completely broke several testsuites like

ERROR: tcl error sourcing 
/vol/gcc/src/hg/master/local/libatomic/testsuite/../../gcc/testsuite/lib/gcc-dg.exp.
wrong # args: should be "dg-additional-files-options options source dest type"

since the patch forgot to adjust some of the callers.

This patch fixes that.

Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.

Committed to trunk.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-05-31  Rainer Orth  

libatomic:
PR testsuite/115294
* testsuite/lib/libatomic.exp (libatomic_target_compile): Pass new
dg-additional-files-options args.

libgomp:
PR testsuite/115294
* testsuite/lib/libgomp.exp (libgomp_target_compile): Pass new
dg-additional-files-options args.

libitm:
PR testsuite/115294
* testsuite/lib/libitm.exp (libitm_target_compile): Pass new
dg-additional-files-options args.

libphobos:
PR testsuite/115294
* testsuite/lib/libphobos.exp (libphobos_target_compile): Pass new
dg-additional-files-options args.

libvtv:
PR testsuite/115294
* testsuite/lib/libvtv.exp (libvtv_target_compile): Pass new
dg-additional-files-options args.

# HG changeset patch
# Parent  83e1dad81f5eb46a0216b67de025afa9396cbbe3
testsuite: Adjust several dg-additional-files-options calls [PR115294]

diff --git a/libatomic/testsuite/lib/libatomic.exp b/libatomic/testsuite/lib/libatomic.exp
--- a/libatomic/testsuite/lib/libatomic.exp
+++ b/libatomic/testsuite/lib/libatomic.exp
@@ -214,7 +214,7 @@ proc libatomic_target_compile { source d
 	set options [concat "$ALWAYS_CFLAGS" $options]
 }
 
-set options [dg-additional-files-options $options $source]
+set options [dg-additional-files-options $options $source $dest $type]
 
 set result [target_compile $source $dest $type $options]
 
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -296,7 +296,7 @@ proc libgomp_target_compile { source des
 	set options [concat "$ALWAYS_CFLAGS" $options]
 }
 
-set options [dg-additional-files-options $options $source]
+set options [dg-additional-files-options $options $source $dest $type]
 
 set result [target_compile $source $dest $type $options]
 
diff --git a/libitm/testsuite/lib/libitm.exp b/libitm/testsuite/lib/libitm.exp
--- a/libitm/testsuite/lib/libitm.exp
+++ b/libitm/testsuite/lib/libitm.exp
@@ -217,7 +217,7 @@ proc libitm_target_compile { source dest
 	set options [concat "$ALWAYS_CFLAGS" $options]
 }
 
-set options [dg-additional-files-options $options $source]
+set options [dg-additional-files-options $options $source $dest $type]
 
 set result [target_compile $source $dest $type $options]
 
diff --git a/libphobos/testsuite/lib/libphobos.exp b/libphobos/testsuite/lib/libphobos.exp
--- a/libphobos/testsuite/lib/libphobos.exp
+++ b/libphobos/testsuite/lib/libphobos.exp
@@ -281,7 +281,7 @@ proc libphobos_target_compile { source d
 lappend options "compiler=$gdc_final"
 lappend options "timeout=[timeout_value]"
 
-set options [dg-additional-files-options $options $source]
+set options [dg-additional-files-options $options $source $dest $type]
 set comp_output [target_compile $source $dest $type $options]
 
 return $comp_output
diff --git a/libvtv/testsuite/lib/libvtv.exp b/libvtv/testsuite/lib/libvtv.exp
--- a/libvtv/testsuite/lib/libvtv.exp
+++ b/libvtv/testsuite/lib/libvtv.exp
@@ -212,7 +212,7 @@ proc libvtv_target_compile { source dest
 	set options [concat "$ALWAYS_CFLAGS" $options]
 }
 
-set options [dg-additional-files-options $options $source]
+set options [dg-additional-files-options $options $source $dest $type]
 
 set result [target_compile $source $dest $type $options]
 


Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-31 Thread Richard Sandiford
Tejas Belagod  writes:
> On 5/30/24 6:28 PM, Richard Sandiford wrote:
>> Tejas Belagod  writes:
>>> Currently poly-int type structures are passed by value to OpenMP runtime
>>> functions for shared clauses etc.  This patch improves on this by passing
>>> around poly-int structures by address to avoid copy-overhead.
>>>
>>> gcc/ChangeLog
>>> * omp-low.c (use_pointer_for_field): Use pointer if the OMP data
>>> structure's field type is a poly-int.
>>> ---
>>>   gcc/omp-low.cc | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
>>> index 1a65229cc37..b15607f4ef5 100644
>>> --- a/gcc/omp-low.cc
>>> +++ b/gcc/omp-low.cc
>>> @@ -466,7 +466,8 @@ static bool
>>>   use_pointer_for_field (tree decl, omp_context *shared_ctx)
>>>   {
>>> if (AGGREGATE_TYPE_P (TREE_TYPE (decl))
>>> -  || TYPE_ATOMIC (TREE_TYPE (decl)))
>>> +  || TYPE_ATOMIC (TREE_TYPE (decl))
>>> +  || POLY_INT_CST_P (DECL_SIZE (decl)))
>>>   return true;
>>>   
>>> /* We can only use copy-in/copy-out semantics for shared variables
>> 
>
> Thanks for the reviews.
>
>> Realise this is also true of my original patch, but:
>> 
>> I suppose a question here is whether this function is only ever used for
>> local interfaces between code generated by the same source code function,
>> or whether it's ABI in a more general sense.  
>
> I'm not a 100% sure, but AFAICS, 'use_pointer_for_field' seems to be 
> used only for local interface between source and generated functions. I 
> don't see any backend hooks into this or backend hooking into this 
> function for general ABI. Ofcourse, I'm not the expert on OMP lowering, 
> so it would be great to get an expert opinion on this.
>
>> If the latter, I suppose
>> we should make sure to handle ACLE types the same way regardless of
>> whether the SVE vector size is known.
>> 
>
> When you say same way, do you mean the way SVE ABI defines the rules for 
> SVE types?

No, sorry, I meant that if the choice isn't purely local to a source
code function, the condition should be something like sizeless_type_p
(suitably abstracted) rather than POLY_INT_CST_P.  That way, the "ABI"
stays the same regardless of -msve-vector-bits.

Thanks,
Richard


Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-05-31 Thread Richard Biener
On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill  wrote:
>
> From: Greg McGary 

Still a NACK.  If remain ends up zero then

/* Try to use a single smaller load when we are about
   to load excess elements compared to the unrolled
   scalar loop.  */
if (known_gt ((vec_num * j + i + 1) * nunits,
   (group_size * vf - gap)))
  {
poly_uint64 remain = ((group_size * vf - gap)
  - (vec_num * j + i) * nunits);
if (known_ge ((vec_num * j + i + 1) * nunits
  - (group_size * vf - gap), nunits))
  /* DR will be unused.  */
  ltype = NULL_TREE;

needs to be re-formulated so that the combined conditions make sure
this doesn't happen.  The outer known_gt should already ensure that
remain > 0.  For correctness that should possibly be maybe_gt though.

> gcc/ChangeLog:
> * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
> divide-by-zero.
> * testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove dg-ice.
> ---
> No changes in v3. Depends on the risc-v backend option added in patch 1 to
> trigger the ICE.
> ---
>  gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 -
>  gcc/tree-vect-stmts.cc  | 3 ++-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> index dfbe09f01a1..79d03612a22 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> @@ -1,6 +1,5 @@
>  /* { dg-do compile } */
>  /* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=scalable -O3 
> -mno-autovec-segment" } */
> -/* { dg-ice "Floating point exception" } */
>
>  enum e { c, d };
>  enum g { f };
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 4219ad832db..34f5736ba00 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo,
>  - (vec_num * j + i) * nunits);
> /* remain should now be > 0 and < nunits.  */
> unsigned num;
> -   if (constant_multiple_p (nunits, remain, &num))
> +   if (known_gt (remain, 0)
> +   && constant_multiple_p (nunits, remain, &num))
>   {
> tree ptype;
> new_vtype
> --
> 2.43.2
>


Re: [PATCH] Fix some opindex for some options [PR115022]

2024-05-31 Thread Richard Biener
On Thu, May 30, 2024 at 5:48 AM Andrew Pinski  wrote:
>
> While looking at the index I noticed that some options had
> `-` in the front for the index which is wrong. And then
> I noticed there was no index for `mcmodel=` for targets or had
> used `-mcmodel` incorrectly.
>
> This fixes both of those and regnerates the urls files see that
> `-mcmodel=` option now has an url associated with it.
>
> OK?

OK

> gcc/ChangeLog:
>
> PR target/115022
> * doc/invoke.texi (fstrub=disable): Fix opindex.
> (minline-memops-threshold): Fix opindex.
> (mcmodel=): Add opindex and fix them.
> * common.opt.urls: Regenerate.
> * config/aarch64/aarch64.opt.urls: Regenerate.
> * config/bpf/bpf.opt.urls: Regenerate.
> * config/i386/i386.opt.urls: Regenerate.
> * config/loongarch/loongarch.opt.urls: Regenerate.
> * config/nds32/nds32-elf.opt.urls: Regenerate.
> * config/nds32/nds32-linux.opt.urls: Regenerate.
> * config/or1k/or1k.opt.urls: Regenerate.
> * config/riscv/riscv.opt.urls: Regenerate.
> * config/rs6000/aix64.opt.urls: Regenerate.
> * config/rs6000/linux64.opt.urls: Regenerate.
> * config/sparc/sparc.opt.urls: Regenerate.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/common.opt.urls |  3 +++
>  gcc/config/aarch64/aarch64.opt.urls |  3 ++-
>  gcc/config/bpf/bpf.opt.urls |  3 +++
>  gcc/config/i386/i386.opt.urls   |  3 ++-
>  gcc/config/loongarch/loongarch.opt.urls |  2 +-
>  gcc/config/nds32/nds32-elf.opt.urls |  2 +-
>  gcc/config/nds32/nds32-linux.opt.urls   |  2 +-
>  gcc/config/or1k/or1k.opt.urls   |  3 ++-
>  gcc/config/riscv/riscv.opt.urls |  3 ++-
>  gcc/config/rs6000/aix64.opt.urls|  3 ++-
>  gcc/config/rs6000/linux64.opt.urls  |  3 ++-
>  gcc/config/sparc/sparc.opt.urls |  2 +-
>  gcc/doc/invoke.texi | 17 +++--
>  13 files changed, 33 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
> index 10462e40874..1f2eb67c8e0 100644
> --- a/gcc/common.opt.urls
> +++ b/gcc/common.opt.urls
> @@ -1339,6 +1339,9 @@ 
> UrlSuffix(gcc/Optimize-Options.html#index-fstrict-aliasing)
>  fstrict-overflow
>  UrlSuffix(gcc/Code-Gen-Options.html#index-fstrict-overflow)
>
> +fstrub=disable
> +UrlSuffix(gcc/Instrumentation-Options.html#index-fstrub_003ddisable)
> +
>  fstrub=strict
>  UrlSuffix(gcc/Instrumentation-Options.html#index-fstrub_003dstrict)
>
> diff --git a/gcc/config/aarch64/aarch64.opt.urls 
> b/gcc/config/aarch64/aarch64.opt.urls
> index 993634c52f8..4fa90384378 100644
> --- a/gcc/config/aarch64/aarch64.opt.urls
> +++ b/gcc/config/aarch64/aarch64.opt.urls
> @@ -18,7 +18,8 @@ 
> UrlSuffix(gcc/AArch64-Options.html#index-mfix-cortex-a53-843419)
>  mlittle-endian
>  UrlSuffix(gcc/AArch64-Options.html#index-mlittle-endian)
>
> -; skipping UrlSuffix for 'mcmodel=' due to finding no URLs
> +mcmodel=
> +UrlSuffix(gcc/AArch64-Options.html#index-mcmodel_003d)
>
>  mtp=
>  UrlSuffix(gcc/AArch64-Options.html#index-mtp)
> diff --git a/gcc/config/bpf/bpf.opt.urls b/gcc/config/bpf/bpf.opt.urls
> index 8c1e5f86d5c..1e8873a899f 100644
> --- a/gcc/config/bpf/bpf.opt.urls
> +++ b/gcc/config/bpf/bpf.opt.urls
> @@ -33,3 +33,6 @@ UrlSuffix(gcc/eBPF-Options.html#index-msmov)
>  mcpu=
>  UrlSuffix(gcc/eBPF-Options.html#index-mcpu-5)
>
> +minline-memops-threshold=
> +UrlSuffix(gcc/eBPF-Options.html#index-minline-memops-threshold)
> +
> diff --git a/gcc/config/i386/i386.opt.urls b/gcc/config/i386/i386.opt.urls
> index 40e8a844936..9384b0b3187 100644
> --- a/gcc/config/i386/i386.opt.urls
> +++ b/gcc/config/i386/i386.opt.urls
> @@ -40,7 +40,8 @@ UrlSuffix(gcc/x86-Options.html#index-march-16)
>  mlarge-data-threshold=
>  UrlSuffix(gcc/x86-Options.html#index-mlarge-data-threshold)
>
> -; skipping UrlSuffix for 'mcmodel=' due to finding no URLs
> +mcmodel=
> +UrlSuffix(gcc/x86-Options.html#index-mcmodel_003d-7)
>
>  mcpu=
>  UrlSuffix(gcc/x86-Options.html#index-mcpu-14)
> diff --git a/gcc/config/loongarch/loongarch.opt.urls 
> b/gcc/config/loongarch/loongarch.opt.urls
> index 9ed5d7b5596..f7545f65103 100644
> --- a/gcc/config/loongarch/loongarch.opt.urls
> +++ b/gcc/config/loongarch/loongarch.opt.urls
> @@ -58,7 +58,7 @@ mrecip
>  UrlSuffix(gcc/LoongArch-Options.html#index-mrecip)
>
>  mcmodel=
> -UrlSuffix(gcc/LoongArch-Options.html#index-mcmodel)
> +UrlSuffix(gcc/LoongArch-Options.html#index-mcmodel_003d-1)
>
>  mdirect-extern-access
>  UrlSuffix(gcc/LoongArch-Options.html#index-mdirect-extern-access)
> diff --git a/gcc/config/nds32/nds32-elf.opt.urls 
> b/gcc/config/nds32/nds32-elf.opt.urls
> index 3ae1efe7312..e5432b62863 100644
> --- a/gcc/config/nds32/nds32-elf.opt.urls
> +++ b/gcc/config/nds32/nds32-elf.opt.urls
> @@ -1,5 +1,5 @@
>  ; Autogenerated by regenerate-opt-urls.py from 
> gcc/config/nds32/nds32-elf.opt and generated HTML
>
>  mcmodel=
> -UrlSuffix(gcc/NDS32-O

Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-31 Thread Jakub Jelinek
On Fri, May 31, 2024 at 08:45:54AM +0100, Richard Sandiford wrote:
> > When you say same way, do you mean the way SVE ABI defines the rules for 
> > SVE types?
> 
> No, sorry, I meant that if the choice isn't purely local to a source
> code function, the condition should be something like sizeless_type_p
> (suitably abstracted) rather than POLY_INT_CST_P.  That way, the "ABI"
> stays the same regardless of -msve-vector-bits.

There is no ABI, it is how the caller and indirect callee communicate,
but both parts are compiled with the same compiler, so it can choose
differently based on different compiler version etc.
It is effectively simplified:
struct whatever { ... };
void callee (void *x) { struct whatever *w = *x; use *w; }
void caller (void) { struct whatever w; fill in w; ABI_call (callee, &w); }
(plus in some cases the callee can also update values and propagate that
back to caller).
In any case, it is a similar "ABI" to e.g. tree-nested.cc communication
between caller and nested callee, how exactly are the variables laid out
in a struct depends on compiler version and whatever it decides, same
compiler then emits both sides.

Jakub



Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Richard Sandiford
Segher Boessenkool  writes:
> Hi!
>
> On Fri, May 31, 2024 at 01:21:44AM +0530, Ajit Agarwal wrote:
>> Code is implemented with pure virtual functions to interface with target
>> code.
>
> It's not a pure function.  A pure function -- by definition -- has no
> side effects.  These things have side effects.
>
> What you mean is this is *an implementation* for C++ functions without
> a generic implementation.  An obfuscation some people (like me) would
> say.  But please call things what they are!  So not "pure function".
> That has a meaning, and this isn't it.

"pure virtual function" is an established term.  The "pure" modifies
"virtual", not "function".

The description is correct because the patch adds pure virtual functions
to the base class and expects the derived class to override and implement
them.

>>  * config/aarch64/aarch64-ldp-fusion.cc: Add target specific
>>  implementation of additional virtual functions added in pair_fusion
>>  struct.
>
> This does not belong in this patch.  Do not send "rs6000" patches that
> touch anything outside of config/rs6000/ and similar, certainly not in
> config/something-else/!
>
> This would be WAY easier to review (read: AT ALL POSSIBLE) if you
> included some detailed rationale and design document.

Please don't shout.

I don't think this kind of aggressive review is helpful to the project.

Richard


Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-31 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Fri, May 31, 2024 at 08:45:54AM +0100, Richard Sandiford wrote:
>> > When you say same way, do you mean the way SVE ABI defines the rules for 
>> > SVE types?
>> 
>> No, sorry, I meant that if the choice isn't purely local to a source
>> code function, the condition should be something like sizeless_type_p
>> (suitably abstracted) rather than POLY_INT_CST_P.  That way, the "ABI"
>> stays the same regardless of -msve-vector-bits.
>
> There is no ABI, it is how the caller and indirect callee communicate,
> but both parts are compiled with the same compiler, so it can choose
> differently based on different compiler version etc.
> It is effectively simplified:
> struct whatever { ... };
> void callee (void *x) { struct whatever *w = *x; use *w; }
> void caller (void) { struct whatever w; fill in w; ABI_call (callee, &w); }
> (plus in some cases the callee can also update values and propagate that
> back to caller).
> In any case, it is a similar "ABI" to e.g. tree-nested.cc communication
> between caller and nested callee, how exactly are the variables laid out
> in a struct depends on compiler version and whatever it decides, same
> compiler then emits both sides.

Ah, ok, thanks.  In that case I guess POLY_INT_CST_P should be
safe/correct after all.

Richard


RE: [PATCH 1/3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-05-31 Thread Hu, Lin1
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, May 29, 2024 5:41 PM
> To: Hu, Lin1 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH 1/3] vect: generate suitable convert insn for int -> int, 
> float
> -> float and int <-> float.
> 
> On Thu, 23 May 2024, Hu, Lin1 wrote:
> 
> > gcc/ChangeLog:
> >
> > PR target/107432
> > * tree-vect-generic.cc
> > (supportable_indirect_narrowing_operation): New function for
> > support indirect narrowing convert.
> > (supportable_indirect_widening_operation): New function for
> > support indirect widening convert.
> > (expand_vector_conversion): Support convert for int -> int,
> > float -> float and int <-> float.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/107432
> > * gcc.target/i386/pr107432-1.c: New test.
> > * gcc.target/i386/pr107432-2.c: Ditto.
> > * gcc.target/i386/pr107432-3.c: Ditto.
> > * gcc.target/i386/pr107432-4.c: Ditto.
> > * gcc.target/i386/pr107432-5.c: Ditto.
> > * gcc.target/i386/pr107432-6.c: Ditto.
> > * gcc.target/i386/pr107432-7.c: Ditto.
> > ---
> > diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index
> > ab640096ca2..0bedb53d9f9 100644
> > --- a/gcc/tree-vect-generic.cc
> > +++ b/gcc/tree-vect-generic.cc
> > @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
> > #include "gimple-match.h"
> >  #include "recog.h" /* FIXME: for insn_data */
> >  #include "optabs-libfuncs.h"
> > +#include "cfgloop.h"
> > +#include "tree-vectorizer.h"
> >
> >
> >  /* Build a ternary operation and gimplify it.  Emit code before GSI.
> > @@ -1834,6 +1836,142 @@ do_vec_narrow_conversion
> (gimple_stmt_iterator *gsi, tree inner_type, tree a,
> >return gimplify_build2 (gsi, code, outer_type, b, c);  }
> >
> > +/* A subroutine of expand_vector_conversion, support indirect conversion
> for
> > +   float <-> int, like double -> char.  */ bool
> > +supportable_indirect_narrowing_operation (gimple_stmt_iterator *gsi,
> > +enum tree_code code,
> > +tree lhs,
> > +tree arg)
> > +{
> > +  gimple *g;
> > +  tree ret_type = TREE_TYPE (lhs);
> > +  tree arg_type = TREE_TYPE (arg);
> > +  tree new_rhs;
> > +
> > +  unsigned int ret_elt_bits = vector_element_bits (ret_type);
> > + unsigned int arg_elt_bits = vector_element_bits (arg_type);  if
> > + (code != FIX_TRUNC_EXPR || flag_trapping_math || ret_elt_bits >=
> arg_elt_bits)
> > +return false;
> > +
> > +  unsigned short target_size;
> > +  scalar_mode tmp_cvt_mode;
> > +  scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE (ret_type));
> > + scalar_mode rhs_mode = GET_MODE_INNER (TYPE_MODE (arg_type));  tree
> > + cvt_type = NULL_TREE;  tmp_cvt_mode = lhs_mode;  target_size =
> > + GET_MODE_SIZE (rhs_mode);
> > +
> > +  opt_scalar_mode mode_iter;
> > +  enum tree_code tc1, tc2;
> > +  unsigned HOST_WIDE_INT nelts
> > += constant_lower_bound (TYPE_VECTOR_SUBPARTS (arg_type));
> > +
> > +  FOR_EACH_2XWIDER_MODE (mode_iter, tmp_cvt_mode)
> > +{
> > +  tmp_cvt_mode = mode_iter.require ();
> > +
> > +  if (GET_MODE_SIZE (tmp_cvt_mode) > target_size)
> > +   break;
> > +
> > +  scalar_mode cvt_mode;
> > +  int tmp_cvt_size = GET_MODE_BITSIZE (tmp_cvt_mode);
> > +  if (!int_mode_for_size (tmp_cvt_size, 0).exists (&cvt_mode))
> > +   break;
> > +
> > +  int cvt_size = GET_MODE_BITSIZE (cvt_mode);
> > +  bool isUnsigned = TYPE_UNSIGNED (ret_type) || TYPE_UNSIGNED
> (arg_type);
> > +  cvt_type = build_nonstandard_integer_type (cvt_size,
> > + isUnsigned);
> > +
> > +  cvt_type = build_vector_type (cvt_type, nelts);
> > +  if (cvt_type == NULL_TREE
> > + || !supportable_convert_operation ((tree_code) NOP_EXPR,
> > +ret_type,
> > +cvt_type, &tc1)
> > + || !supportable_convert_operation ((tree_code) code,
> > +cvt_type,
> > +arg_type, &tc2))
> > +   continue;
> > +
> > +  new_rhs = make_ssa_name (cvt_type);
> > +  g = vect_gimple_build (new_rhs, tc2, arg);
> > +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > +  g = gimple_build_assign (lhs, tc1, new_rhs);
> > +  gsi_replace (gsi, g, false);
> > +  return true;
> > +}
> > +  return false;
> > +}
> > +
> > +/* A subroutine of expand_vector_conversion, support indirect conversion
> for
> > +   float <-> int, like char -> double.  */ bool
> > +supportable_indirect_widening_operation (gimple_stmt_iterator *gsi,
> > +enum tree_code code,
> > +tree lhs,
> > +tree arg)
> > +{
> > +  gimple *g;
> > +  tree ret_type = TREE_TYPE (lhs);
> > +  tree arg_type = TREE_TYPE (arg

Re: [COMMITTED] ggc: Reduce GGC_QUIRE_SIZE on Solaris/SPARC [PR115031]

2024-05-31 Thread Rainer Orth
Hi Eric,

>> It turns out that this exhaustion of the 32-bit address space happens
>> due to a combination of three issues:
>> 
>> * the SPARC pagesize of 8 kB,
>> 
>> * ggc-page.cc's chunk size of 512 * pagesize, i.e. 4 MB, and
>> 
>> * mmap adding two 8 kB unmapped red-zone pages to each mapping
>> 
>> which result in the 4 MB mappings to actually consume 4.5 MB of address
>> space.
>> 
>> To avoid this, this patch reduces the chunk size so it remains at 4 MB
>> even when combined with the red-zone pages, as recommended by mmap(2).
>
> Nice investigation!  This size is a host parameter rather than a target one 
> though, so config/sparc/sol2.h is probably not the most appropriate place to 
> override it, but I personally do not mind.

ah, I tend to forget, not having built a cross compiler in ages.  I'll
leave it as is for the moment, though, since there's no Solaris host
header ATM.

Thanks for the hint.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 2/4] resource.cc: Replace calls to find_basic_block with cfgrtl BLOCK_FOR_INSN

2024-05-31 Thread Richard Sandiford
Hans-Peter Nilsson  writes:
> [...]
> (Not-so-)fun fact: add_insn_after takes a bb parameter which
> reorg.cc always passes as NULL.  But - the argument is
> *always ignored* and the bb in the "after" insn is used.
> I traced that ignored parameter as far as
> r0-81421-g6fb5fa3cbc0d78 "Merge dataflow branch into
> mainline" when is was added.  I *guess* it's an artifact
> left over from some idea explored on that branch.  Ripe for
> obvious cleanup by removal everywhere.

Heh.  I wondered whether there'd be some direct callers of
add_insn_after_nobb that relied on the block *not* being updated
for some reason, but thankfully not.  The only two callers seem
to be add_insn_after and emit_note_after.  But then emit_note_after
handles notes slightly differently from add_insn_after, even though
logically, emitting an existing note should work in the same way
as emitting a new note.

So yeah, like you say, ripe for cleanup :)

Richard


[COMMITTED] build: Include minor version in config.gcc unsupported message

2024-05-31 Thread Rainer Orth
It has been pointed out to me that when moving Solaris 11.3 from
config.gcc's obsolete to unsupported list, I'd forgotten to also move
the minor version info, leading to confusing

*** Configuration i386-pc-solaris2.11 not supported

instead of the correct

*** Configuration i386-pc-solaris2.11.3 not supported

This patch fixes this oversight.

Tested on i386-pc-solaris2.11 (11.3 and 11.4).  Committed to trunk.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-05-30  Rainer Orth  

gcc:
* config.gcc: Move ${target_min} from obsolete to unsupported
message.

# HG changeset patch
# Parent  a457e38f3743f8fe1336d94b6d1a5f336057b128
build: Include minor version in config.gcc unsupported message

diff --git a/gcc/config.gcc b/gcc/config.gcc
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -276,7 +276,7 @@ case ${target} in
| nios2*-*-*\
  )
 if test "x$enable_obsolete" != xyes; then
-  echo "*** Configuration ${target}${target_min} is obsolete." >&2
+  echo "*** Configuration ${target} is obsolete." >&2
   echo "*** Specify --enable-obsolete to build it anyway." >&2
   echo "*** Support will be REMOVED in the next major release of GCC," >&2
   echo "*** unless a maintainer comes forward." >&2
@@ -328,7 +328,7 @@ case ${target}${target_min} in
  | *-*-sysv*\
  | vax-*-vms*\
  )
-	echo "*** Configuration ${target} not supported" 1>&2
+	echo "*** Configuration ${target}${target_min} not supported" 1>&2
 	exit 1
 	;;
 esac


Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Richard Sandiford
Ajit Agarwal  writes:
> Hello All:
>
> Common infrastructure using generic code for pair mem fusion of different
> targets.
>
> rs6000 target specific specific code implements virtual functions defined
> by generic code.
>
> Code is implemented with pure virtual functions to interface with target
> code.
>
> Target specific code are added in rs6000-mem-fusion.cc and additional virtual
> function implementation required for rs6000 are added in 
> aarch64-ldp-fusion.cc.
>
> Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
>
> aarch64, rs6000, middle-end: Add implementation for different targets for 
> pair mem fusion
>
> Common infrastructure using generic code for pair mem fusion of different
> targets.
>
> rs6000 target specific specific code implements virtual functions defined
> by generic code.
>
> Code is implemented with pure virtual functions to interface with target
> code.
>
> Target specific code are added in rs6000-mem-fusion.cc and additional virtual
> function implementation required for rs6000 are added in 
> aarch64-ldp-fusion.cc.
>
> 2024-05-31  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-ldp-fusion.cc: Add target specific
>   implementation of additional virtual functions added in pair_fusion
>   struct.
>   * config/rs6000/rs6000-passes.def: New mem fusion pass
>   before pass_early_remat.
>   * config/rs6000/rs6000-mem-fusion.cc: Add new pass.
>   Add target specific implementation using pure virtual
>   functions.
>   * config.gcc: Add new object file.
>   * config/rs6000/rs6000-protos.h: Add new prototype for mem
>   fusion pass.
>   * config/rs6000/t-rs6000: Add new rule.
>   * rtl-ssa/accesses.h: Moved set_is_live_out_use as public
>   from private.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/powerpc/me-fusion.C: New test.
>   * g++.target/powerpc/mem-fusion-1.C: New test.
>   * gcc.target/powerpc/mma-builtin-1.c: Modify test.
> ---

This isn't a complete review, just some initial questions & comments
about selected parts.

> [...]
> +/* Check whether load can be fusable or not.
> +   Return true if dependent use is UNSPEC otherwise false.  */
> +bool
> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
> +{
> +  rtx_insn *insn = info->rtl ();
> +
> +  for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1))
> +if (REG_NOTE_KIND (note) == REG_EQUAL
> + || REG_NOTE_KIND (note) == REG_EQUIV)
> +  return false;

It's unusual to punt on an optimisation because of a REG_EQUAL/EQUIV
note.  What's the reason for doing this?  Are you trying to avoid
fusing pairs before reload that are equivalent to a MEM (i.e. have
a natural spill slot)?  I think Alex hit a similar situation.

> +
> +  for (auto def : info->defs ())
> +{
> +  auto set = dyn_cast (def);
> +  if (set && set->has_any_uses ())
> + {
> +   for (auto use : set->all_uses())

Nit: has_any_uses isn't necessary: the inner loop will simply do nothing
in that case.  Also, we can/should restrict the scan to non-debug uses.

This can then be:

  for (auto def : info->defs ())
if (auto set = dyn_cast (def))
  for (auto use : set->nondebug_insn_uses())

> + {
> +   if (use->insn ()->is_artificial ())
> + return false;
> +
> +insn_info *info = use->insn ();
> +
> +if (info
> +&& info->rtl ()

This test shouldn't be necessary.

> +&& info->is_real ())
> +   {
> + rtx_insn *rtl_insn = info->rtl ();
> + rtx set = single_set (rtl_insn);
> +
> + if (set == NULL_RTX)
> +   return false;
> +
> + rtx op0 = SET_SRC (set);
> + if (GET_CODE (op0) != UNSPEC)
> +   return false;

What's the motivation for rejecting unspecs?  It's unsual to treat
all unspecs as a distinct group.

Also, using single_set means that the function still lets through
parallels of two sets in which the sources are unspecs.  Is that
intentional?

The reasons behind things like the REG_EQUAL/EQUIV and UNSPEC decisions
need to be described in comments, so that other people coming to this
code later can understand the motivation.  The same thing applies to
other decisions in the patch.

> +   }
> +   }
> +   }
> +}
> +  return true;
> +}
> [...]
> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
> index 9f897ac04e2..2dbe9f854ef 100644
> --- a/gcc/pair-fusion.cc
> +++ b/gcc/pair-fusion.cc
> @@ -312,7 +312,7 @@ static int
>  encode_lfs (lfs_fields fields)
>  {
>int size_log2 = exact_log2 (fields.size);
> -  gcc_checking_assert (size_log2 >= 2 && size_log2 <= 4);
> +  gcc_checking_assert (size_log2 >= 2 && size_log2 <= 6);
>return ((int)fields.load_p << 3)
>  | ((int)fields.fpsimd_p << 2)
>  | (size_log2 - 2);

The point of the assert 

Re: [PATCH 6/7] OpenMP: Fortran front-end support for dispatch + adjust_args

2024-05-31 Thread Paul-Antoine Arras

Hi Tobias,

Thanks for your comments. Here is an updated patch.

On 28/05/2024 09:14, Tobias Burnus wrote:

Paul-Antoine Arras:

+  if (n->sym->ts.type != BT_DERIVED
+  || !n->sym->ts.u.derived->ts.is_iso_c)
+    {
+  gfc_error ("argument list item %qs in "
+ "% at %L must be of "
+ "TYPE(C_PTR)",
+ n->sym->name, &n->where);


I think you need to rule out 'c_funptr' as well, e.g. via:

     || (n->sym->ts.u.derived->intmod_sym_id
     != ISOCBINDING_PTR)))

I do note that in openmp.cc, we have one check which checks explicitly 
for c_ptr and one existing one which only checks for (c_ptr or 
c_funptr); can you fix that one as well?


This is now handled in the new patch.

But I mainly miss an update to 'module.cc' for the 'declare variant' 
change; the 'adjust_args' (for 'need_device_ptr', only) list items have

to be saved in the .mod file - otherwise the following will not work:

-aux.f90
! { dg-do compile { target skip-all-targets } }
module my_mod
   ...
   !$omp declare variant ... adjust_args(need_device_ptr: ...)
   ...
end module

.f90
{ dg-do ...
! { dg-additional-sources -aux.f90 }
   ...
   call 
   ...
   !$omp displatch
    call 
end


I added a new testcase along those lines. However, I had to xfail it due 
to completely missing support for declare variant (even without 
adjust_args) in module.cc. For reference, Tobias created this PR: 
https://gcc.gnu.org/PR115271.

--
PA
commit ab1b93e3e6e7cb9b5a7419b7106ea0324699
Author: Paul-Antoine Arras 
Date:   Fri May 24 19:13:50 2024 +0200

OpenMP: Fortran front-end support for dispatch + adjust_args

This patch adds support for the `dispatch` construct and the `adjust_args`
clause to the Fortran front-end.

Handling of `adjust_args` across translation units is missing due to PR115271.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_clauses): Handle novariants and nocontext
clauses.
(show_omp_node): Handle EXEC_OMP_DISPATCH.
(show_code_node): Likewise.
* frontend-passes.cc (gfc_code_walker): Handle novariants and nocontext.
* gfortran.h (enum gfc_statement): Add ST_OMP_DISPATCH.
(symbol_attribute): Add omp_declare_variant_need_device_ptr.
(gfc_omp_clauses): Add novariants and nocontext.
(gfc_omp_declare_variant): Add need_device_ptr_arg_list.
(enum gfc_exec_op): Add EXEC_OMP_DISPATCH.
* match.h (gfc_match_omp_dispatch): Declare.
* openmp.cc (gfc_free_omp_clauses): Free novariants and nocontext
clauses.
(gfc_free_omp_declare_variant_list): Free need_device_ptr_arg_list
namelist.
(enum omp_mask2): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT.
(gfc_match_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(OMP_DISPATCH_CLAUSES): Define.
(gfc_match_omp_dispatch): New function.
(gfc_match_omp_declare_variant): Parse adjust_args.
(resolve_omp_clauses): Handle adjust_args, novariants and nocontext.
Adjust handling of OMP_LIST_IS_DEVICE_PTR.
(icode_code_error_callback): Handle EXEC_OMP_DISPATCH.
(omp_code_to_statement): Likewise.
(resolve_omp_dispatch): New function.
(gfc_resolve_omp_directive): Handle EXEC_OMP_DISPATCH.
* parse.cc (decode_omp_directive): Match dispatch.
(next_statement): Handle ST_OMP_DISPATCH.
(gfc_ascii_statement): Likewise.
(parse_omp_dispatch): New function.
(parse_executable): Handle ST_OMP_DISPATCH.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_DISPATCH.
* st.cc (gfc_free_statement): Likewise.
* trans-decl.cc (create_function_arglist): Declare.
(gfc_get_extern_function_decl): Call it.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle novariants and
nocontext.
(gfc_trans_omp_dispatch): New function.
(gfc_trans_omp_directive): Handle EXEC_OMP_DISPATCH.
(gfc_trans_omp_declare_variant): Handle adjust_args.
* trans.cc (trans_code): Handle EXEC_OMP_DISPATCH:.
* types.def (BT_FN_PTR_CONST_PTR_INT): Declare.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/declare-variant-2.f90: Update dg-error.
* gfortran.dg/gomp/declare-variant-21.f90: New test (xfail).
* gfortran.dg/gomp/declare-variant-21-aux.f90: New test.
* gfortran.dg/gomp/adjust-args-1.f90: New test.
* gfortran.dg/gomp/adjust-args-2.f90: New test.
* gfortran.dg/gomp/adjust-args-3.f90: New test.
* gfortran.dg/gomp/adjust-args-4.f90: New test.
* gfortran.dg/gomp/adjust-

[COMMITTED] fix: valid compiler optimization may fail the test

2024-05-31 Thread Marc Poulhiès
cxa4001 may fail with "Exception not raised" when the compiler omits the
calls to To_Mapping, in accordance with 10.2.1(18/3):

  "If a library unit is declared pure, then the implementation is
  permitted to omit a call on a library-level subprogram of the library
  unit if the results are not needed after the call"

Using the result of both To_Mapping calls prevents the compiler from
omitting them.

"The corrected test will be available on the ACAA web site
(http://www.ada-auth.org/), and will be issued with the Modified Tests List
version 2.6K, 3.1DD, and 4.1GG."

gcc/testsuite/ChangeLog:

* ada/acats/tests/cxa/cxa4001.a: Use function result.
---
Tested on x86_64-linux-gnu, commited on master.

 gcc/testsuite/ada/acats/tests/cxa/cxa4001.a | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/testsuite/ada/acats/tests/cxa/cxa4001.a 
b/gcc/testsuite/ada/acats/tests/cxa/cxa4001.a
index d850acd4a72..52fabc3d514 100644
--- a/gcc/testsuite/ada/acats/tests/cxa/cxa4001.a
+++ b/gcc/testsuite/ada/acats/tests/cxa/cxa4001.a
@@ -185,6 +185,12 @@ begin
   begin
  Bad_Map := Maps.To_Mapping(From => "aa", To => "yz");
  Report.Failed("Exception not raised with repeated character");
+
+ if Report.Equal (Character'Pos('y'),
+  Character'Pos(Maps.Value(Bad_Map, 'a'))) then
+-- Use the map to avoid optimization.
+Report.Comment ("Shouldn't get here.");
+ end if;
   exception
  when Translation_Error => null;  -- OK
  when others=> 
@@ -200,6 +206,12 @@ begin
   begin
  Bad_Map := Maps.To_Mapping("abc", "yz");
  Report.Failed("Exception not raised with unequal parameter lengths");
+
+ if Report.Equal (Character'Pos('y'),
+  Character'Pos(Maps.Value(Bad_Map, 'a'))) then
+-- Use the map to avoid optimization.
+Report.Comment ("Shouldn't get here.");
+ end if;
   exception
  when Translation_Error => null;  -- OK
  when others=> 
-- 
2.45.1



Re: [PATCH] rust: Do not link with libdl and libpthread unconditionally

2024-05-31 Thread Arthur Cohen

Hi Richard,

On 4/30/24 09:55, Richard Biener wrote:

On Fri, Apr 19, 2024 at 11:49 AM Arthur Cohen  wrote:


Hi everyone,

This patch checks for the presence of dlopen and pthread_create in libc. If 
that is not the
case, we check for the existence of -ldl and -lpthread, as these libraries are 
required to
link the Rust runtime to our Rust frontend.

If these libs are not present on the system, then we disable the Rust frontend.

This was tested on x86_64, in an environment with a recent GLIBC and in a 
container with GLIBC
2.27.

Apologies for sending it in so late.


For example GCC_ENABLE_PLUGINS simply does

  # Check -ldl
  saved_LIBS="$LIBS"
  AC_SEARCH_LIBS([dlopen], [dl])
  if test x"$ac_cv_search_dlopen" = x"-ldl"; then
pluginlibs="$pluginlibs -ldl"
  fi
  LIBS="$saved_LIBS"

which I guess would also work for pthread_create?  This would simplify
the code a bit.


Thanks a lot for the review. I've udpated the patch's content in 
configure.ac per your suggestion. Tested similarly on x86_64 and in a 
container with libc 2.27


From 00669b600a75743523c358ee41ab999b6e9fa0f6 Mon Sep 17 00:00:00 2001
From: Arthur Cohen 
Date: Fri, 12 Apr 2024 13:52:18 +0200
Subject: [PATCH] rust: Do not link with libdl and libpthread unconditionally

ChangeLog:

* Makefile.tpl: Add CRAB1_LIBS variable.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check if -ldl and -lpthread are needed, and if so, add
them to CRAB1_LIBS.

gcc/rust/ChangeLog:

* Make-lang.in: Remove overazealous LIBS = -ldl -lpthread line, link
crab1 against CRAB1_LIBS.
---
 Makefile.in   |   3 +
 Makefile.tpl  |   3 +
 configure | 154 ++
 configure.ac  |  41 +++
 gcc/rust/Make-lang.in |   6 +-
 5 files changed, 203 insertions(+), 4 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index edb0c8a9a42..1753fb6b862 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -197,6 +197,7 @@ HOST_EXPORTS = \
$(BASE_EXPORTS) \
CC="$(CC)"; export CC; \
ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
+   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
CFLAGS="$(CFLAGS)"; export CFLAGS; \
CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
CXX="$(CXX)"; export CXX; \
@@ -450,6 +451,8 @@ GOCFLAGS = $(CFLAGS)
 GDCFLAGS = @GDCFLAGS@
 GM2FLAGS = $(CFLAGS)

+CRAB1_LIBS = @CRAB1_LIBS@
+
 PKG_CONFIG_PATH = @PKG_CONFIG_PATH@

 GUILE = guile
diff --git a/Makefile.tpl b/Makefile.tpl
index adbcbdd1d57..4aeaad3c1a5 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -200,6 +200,7 @@ HOST_EXPORTS = \
$(BASE_EXPORTS) \
CC="$(CC)"; export CC; \
ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
+   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
CFLAGS="$(CFLAGS)"; export CFLAGS; \
CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
CXX="$(CXX)"; export CXX; \
@@ -453,6 +454,8 @@ GOCFLAGS = $(CFLAGS)
 GDCFLAGS = @GDCFLAGS@
 GM2FLAGS = $(CFLAGS)

+CRAB1_LIBS = @CRAB1_LIBS@
+
 PKG_CONFIG_PATH = @PKG_CONFIG_PATH@

 GUILE = guile
diff --git a/configure b/configure
index 02b435c1163..a9ea5258f0f 100755
--- a/configure
+++ b/configure
@@ -690,6 +690,7 @@ extra_host_zlib_configure_flags
 extra_host_libiberty_configure_flags
 stage1_languages
 host_libs_picflag
+CRAB1_LIBS
 PICFLAG
 host_shared
 gcc_host_pie
@@ -8826,6 +8827,139 @@ fi



+# Rust requires -ldl and -lpthread if you are using an old glibc that 
does not include them by

+# default, so we check for them here
+
+missing_rust_dynlibs=none
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library 
containing dlopen" >&5

+$as_echo_n "checking for library containing dlopen... " >&6; }
+if ${ac_cv_search_dlopen+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_func_search_save_LIBS=$LIBS
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char dlopen ();
+int
+main ()
+{
+return dlopen ();
+  ;
+  return 0;
+}
+_ACEOF
+for ac_lib in '' dl; do
+  if test -z "$ac_lib"; then
+ac_res="none required"
+  else
+ac_res=-l$ac_lib
+LIBS="-l$ac_lib  $ac_func_search_save_LIBS"
+  fi
+  if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_search_dlopen=$ac_res
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext
+  if ${ac_cv_search_dlopen+:} false; then :
+  break
+fi
+done
+if ${ac_cv_search_dlopen+:} false; then :
+
+else
+  ac_cv_search_dlopen=no
+fi
+rm conftest.$ac_ext
+LIBS=$ac_func_search_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_dlopen" >&5
+$as_echo "$ac_cv_search_dlopen" >&6; }
+ac_res=$ac_cv_search_dlopen
+if test "$ac_res" != no; the

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Richard Sandiford
Reviewing my review :)

Richard Sandiford  writes:
>> +
>> +  for (auto def : info->defs ())
>> +{
>> +  auto set = dyn_cast (def);
>> +  if (set && set->has_any_uses ())
>> +{
>> +  for (auto use : set->all_uses())
>
> Nit: has_any_uses isn't necessary: the inner loop will simply do nothing
> in that case.  Also, we can/should restrict the scan to non-debug uses.
>
> This can then be:
>
>   for (auto def : info->defs ())
> if (auto set = dyn_cast (def))
>   for (auto use : set->nondebug_insn_uses())

I forgot the space before "()" in the line above.

>
>> +{
>> +  if (use->insn ()->is_artificial ())
>> +return false;
>> +
>> +   insn_info *info = use->insn ();
>> +
>> +   if (info
>> +   && info->rtl ()
>
> This test shouldn't be necessary.
>
>> +   && info->is_real ())
>> +  {
>> +rtx_insn *rtl_insn = info->rtl ();
>> +rtx set = single_set (rtl_insn);
>> +
>> +if (set == NULL_RTX)
>> +  return false;
>> +
>> +rtx op0 = SET_SRC (set);
>> +if (GET_CODE (op0) != UNSPEC)
>> +  return false;
> [...]
> Also, using single_set means that the function still lets through
> parallels of two sets in which the sources are unspecs.  Is that
> intentional?

I got this wrong, sorry.  You return false for non-single_set,
so that particular problem doesn't arise.  But why do we want to
reject uses of registers that are set by parallel sets?

Thanks,
Richard


[PATCH] aarch64: Add missing ACLE macro for NEON-SVE Bridge

2024-05-31 Thread Richard Ball
__ARM_NEON_SVE_BRIDGE was missed in the original patch and is
added by this patch.

Ok for trunk and a backport into gcc-14?

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Add missing __ARM_NEON_SVE_BRIDGE.diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
fe1a20e4e546a68e5f7eddff3bbb0d3e831fbd9b..1121be118cf8d05e3736ad4ee75568ff7cb92bfd
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -260,6 +260,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_SME_I16I64, "__ARM_FEATURE_SME_I16I64", pfile);
   aarch64_def_or_undef (TARGET_SME_F64F64, "__ARM_FEATURE_SME_F64F64", pfile);
   aarch64_def_or_undef (TARGET_SME2, "__ARM_FEATURE_SME2", pfile);
+  aarch64_def_or_undef (TARGET_SVE, "__ARM_NEON_SVE_BRIDGE", pfile);
 
   /* Not for ACLE, but required to keep "float.h" correct if we switch
  target between implementations that do or do not support ARMv8.2-A


[PATCH v4 0/5] libgomp: OpenMP pinned memory for omp_alloc

2024-05-31 Thread Andrew Stubbs
This patch series is a rebase and partial rework of the v3 series I
posted in December:

https://patchwork.sourceware.org/project/gcc/list/?series=28237&state=%2A&archive=both

The first patch from that series was already approved and committed, but
the rest of the patch series remains to-do.

Besides rebase and retest, I've addressed the review comments regarding
the enum assignments.

OK for mainline?

Andrew

Andrew Stubbs (4):
  libgomp, openmp: Add ompx_pinned_mem_alloc
  openmp: Add -foffload-memory
  openmp: -foffload-memory=pinned
  libgomp: fine-grained pinned memory allocator

Thomas Schwinge (1):
  libgomp, nvptx: Cuda pinned memory

 gcc/common.opt|  16 +
 gcc/coretypes.h   |   7 +
 gcc/doc/invoke.texi   |  16 +-
 gcc/omp-builtins.def  |   3 +
 gcc/omp-low.cc|  66 
 libgomp/Makefile.am   |   2 +-
 libgomp/Makefile.in   |   7 +-
 libgomp/allocator.c   | 115 +--
 libgomp/config/linux/allocator.c  | 206 +--
 libgomp/libgomp-plugin.h  |   2 +
 libgomp/libgomp.h |  14 +
 libgomp/libgomp.map   |   1 +
 libgomp/libgomp.texi  |  18 +-
 libgomp/libgomp_g.h   |   1 +
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/plugin/plugin-nvptx.c |  42 +++
 libgomp/target.c  | 136 
 .../libgomp.c-c++-common/alloc-pinned-1.c |  28 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c  |  26 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c  |  26 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c  |  45 ++-
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c  |  44 ++-
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 129 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 128 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  |  63 
 libgomp/testsuite/libgomp.c/alloc-pinned-8.c  | 127 +++
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +
 libgomp/usmpin-allocator.c| 319 ++
 29 files changed, 1520 insertions(+), 86 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-8.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
 create mode 100644 libgomp/usmpin-allocator.c

-- 
2.41.0



[PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc

2024-05-31 Thread Andrew Stubbs
Compared to the previous v3 posting of this patch, the enumeration of
the "ompx" allocators have been moved to start at "100".

-

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.

libgomp/ChangeLog:

* allocator.c (ompx_min_predefined_alloc): New.
(ompx_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_alloc_mapping): New.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New (as a function).
(omp_aligned_alloc): Support ompx_pinned_mem_alloc. Use
predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* libgomp.texi: Document ompx_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/allocator.c   | 115 +-
 libgomp/libgomp.texi  |   7 +-
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 103 
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 101 +++
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +++
 7 files changed, 312 insertions(+), 33 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index cdedc7d80e9..18e3f525ec6 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr)
 
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
+#define ompx_min_predefined_alloc ompx_pinned_mem_alloc
+#define ompx_max_predefined_alloc ompx_pinned_mem_alloc
 
 /* These macros may be overridden in config//allocator.c.
The defaults (no override) are to return NULL for pinned memory requests
@@ -131,7 +133,7 @@ GOMP_is_alloc (void *ptr)
The index to this table is the omp_allocator_handle_t enum value.
When the user calls omp_alloc with a predefined allocator this
table determines what memory they get.  */
-static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+static const omp_memspace_handle_t predefined_omp_alloc_mapping[] = {
   omp_default_mem_space,   /* omp_null_allocator doesn't actually use this. */
   omp_default_mem_space,   /* omp_default_mem_alloc. */
   omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
@@ -142,11 +144,41 @@ static const omp_memspace_handle_t 
predefined_alloc_mapping[] = {
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc (implementation defined). */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc (implementation defined). */
 };
+static const omp_memspace_handle_t predefined_ompx_alloc_mapping[] = {
+  omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
+};
 
 #define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
-_Static_assert (ARRAY_SIZE (predefined_alloc_mapping)
+_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping)
== omp_max_predefined_alloc + 1,
-   "predefined_alloc_mapping must match omp_memspace_handle_t");
+   "predefined_omp_alloc_mapping must match 
omp_memspace_handle_t");
+#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
+_Static_assert (ARRAY_SIZE (predefined_ompx_alloc_mapping)
+   == ompx_max_predefined_alloc - ompx_min_predefined_alloc + 1,
+   "predefined_ompx_alloc_mapping must match"
+   " omp_memspace_handle_t");
+
+static inline bool
+predefined_allocator_p (omp_allocator_handle_t allocator)
+{
+  return allocator <= ompx_max_predefined_alloc;
+}
+
+static inline omp_memspace_handle_t
+predefined_alloc_mapping (omp_allocator_handle_t allocator)
+{
+  if (allocator <= omp_max_predefined_alloc)
+return predefined_omp_alloc_mapping[allocator];
+  else if (allocator >= ompx_min_predefined_alloc
+  && allocator <= ompx_max_predefined_alloc)
+{
+  int index = allocator - ompx_min_predefined_alloc;
+  return predefined_ompx_alloc_mapping[index];
+}
+  else
+/* This should never happen.  */
+return omp_default_mem_space;
+}
 
 enum gomp_numa_memkind_kind
 {
@@ -556,7 +588,7 @@ retry:
   

[PATCH v4 4/5] libgomp, nvptx: Cuda pinned memory

2024-05-31 Thread Andrew Stubbs
From: Thomas Schwinge 

This patch was already approved, by Tobias Burnus (with one caveat about
initialization location), but wasn't committed at that time as I didn't
want to disentangle it from the textual dependencies on the other
patches in the series.



Use Cuda to pin memory, instead of Linux mlock, when available.

There are two advantages: firstly, this gives a significant speed boost for
NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit
setting.

The design adds a device independent plugin API for allocating pinned memory,
and then implements it for NVPTX.  At present, the other supported devices do
not have equivalent capabilities (or requirements).

libgomp/ChangeLog:

* config/linux/allocator.c: Include assert.h.
(using_device_for_page_locked): New variable.
(linux_memspace_alloc): Add init0 parameter. Support device pinning.
(linux_memspace_calloc): Set init0 to true.
(linux_memspace_free): Support device pinning.
(linux_memspace_realloc): Support device pinning.
(MEMSPACE_ALLOC): Set init0 to false.
* libgomp-plugin.h
(GOMP_OFFLOAD_page_locked_host_alloc): New prototype.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* libgomp.h (gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(struct gomp_device_descr): Add page_locked_host_alloc_func and
page_locked_host_free_func.
* libgomp.texi: Adjust the docs for the pinned trait.
* libgomp_g.h (GOMP_enable_pinned_mode): New prototype.
* plugin/plugin-nvptx.c
(GOMP_OFFLOAD_page_locked_host_alloc): New function.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* target.c (device_for_page_locked): New variable.
(get_device_for_page_locked): New function.
(gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(gomp_load_plugin_for_device): Add page_locked_host_alloc and
page_locked_host_free.
* testsuite/libgomp.c/alloc-pinned-1.c: Change expectations for NVPTX
devices.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/config/linux/allocator.c | 141 ++-
 libgomp/libgomp-plugin.h |   2 +
 libgomp/libgomp.h|   4 +
 libgomp/libgomp.texi |  11 +-
 libgomp/libgomp_g.h  |   1 +
 libgomp/plugin/plugin-nvptx.c|  42 ++
 libgomp/target.c | 136 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c |  26 
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c |  26 
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c |  45 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c |  44 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c |  26 
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c |  35 -
 13 files changed, 489 insertions(+), 50 deletions(-)

diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 7e09ba44b2f..063c46f972c 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -36,6 +36,11 @@
 
 /* Implement malloc routines that can handle pinned memory on Linux.

+   Given that pinned memory is typically used to help host <-> device memory
+   transfers, we attempt to allocate such memory using a device (really:
+   libgomp plugin), but fall back to mmap plus mlock if no suitable device is
+   available.
+
It's possible to use mlock on any heap memory, but using munlock is
problematic if there are multiple pinned allocations on the same page.
Tracking all that manually would be possible, but adds overhead. This may
@@ -49,6 +54,7 @@
 #define _GNU_SOURCE
 #include 
 #include 
+#include 
 #include "libgomp.h"
 #ifdef HAVE_INTTYPES_H
 # include   /* For PRIu64.  */
@@ -68,50 +74,92 @@ GOMP_enable_pinned_mode ()
 always_pinned_mode = true;
 }
 
+static int using_device_for_page_locked
+  = /* uninitialized */ -1;
+
 static void *
-linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
+linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin,
+ bool init0)
 {
-  (void)memspace;
+  gomp_debug (0, "%s: memspace=%llu, size=%llu, pin=%d, init0=%d\n",
+ __FUNCTION__, (unsigned long long) memspace,
+ (unsigned long long) size, pin, init0);
+
+  void *addr;
 
   /* Explicit pinning may not be required.  */
   pin = pin && !always_pinned_mode;
 
   if (pin)
 {
-  /* Note that mmap always returns zeroed memory and is therefore als

[PATCH v4 3/5] openmp: -foffload-memory=pinned

2024-05-31 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs without
modifying the code.

This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning.  It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.

In this mode the ompx_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.

gcc/ChangeLog:

* omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

* config/linux/allocator.c (always_pinned_mode): New variable.
(GOMP_enable_pinned_mode): New function.
(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.map: Add GOMP_enable_pinned_mode.
* testsuite/libgomp.c/alloc-pinned-7.c: New test.
* testsuite/libgomp.c-c++-common/alloc-pinned-1.c: New test.
---
 gcc/omp-builtins.def  |  3 +
 gcc/omp-low.cc| 66 +++
 libgomp/config/linux/allocator.c  | 26 
 libgomp/libgomp.map   |  1 +
 .../libgomp.c-c++-common/alloc-pinned-1.c | 28 
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  | 63 ++
 6 files changed, 187 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 044d5d087b6..aefc52e5f9f 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -476,3 +476,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning",
  BT_FN_VOID_CONST_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error",
  BT_FN_VOID_CONST_PTR_SIZE, 
ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE,
+ "GOMP_enable_pinned_mode",
+ BT_FN_VOID, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 4d003f42098..cf3f57748d8 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14596,6 +14596,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
+   it by calling GOMP_enable_pinned mode before the program proper runs.  */
+
+static void
+omp_enable_pinned_mode ()
+{
+  static bool visited = false;
+  if (visited)
+return;
+  visited = true;
+
+  /* Create a new function like this:
+ 
+   static void __attribute__((constructor))
+   __set_pinned_mode ()
+   {
+ GOMP_enable_pinned_mode ();
+   }
+  */
+
+  tree name = get_identifier ("__set_pinned_mode");
+  tree voidfntype = build_function_type_list (void_type_node, NULL_TREE);
+  tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype);
+
+  TREE_STATIC (decl) = 1;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"),
+ NULL_TREE, NULL_TREE);
+
+  tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+  void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_CONTEXT (t) = decl;
+  DECL_RESULT (decl) = t;
+
+  push_struct_function (decl);
+  init_tree_ssa (cfun);
+
+  tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE);
+  gcall *call = gimple_build_call (calldecl, 0);
+
+  gimple_seq seq = NULL;
+  gimple_seq_add_stmt (&seq, call);
+  gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL));
+
+  cfun->function_end_locus = UNKNOWN_LOCATION;
+  cfun->curr_properties |= PROP_gimple_any;
+  pop_cfun ();
+  cgraph_node::add_new_function (decl, true);
+}
+
 /* Main entry point.  */
 
 static unsigned int
@@ -14652,6 +14714,10 @@ execute_lower_omp (void)
   for (auto task_stmt : task_cpyfns)
 finalize_task_copyfn (task_stmt);
   task_cpyfns.release ();
+
+  if (flag_offload_memory == OFFLOAD_MEMORY_PINNED)
+omp_enable_pinned_mode ();
+
   return 0;
 }
 
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index de

[PATCH v4 2/5] openmp: Add -foffload-memory

2024-05-31 Thread Andrew Stubbs
Add a new option.  It's inactive until I add some follow-up patches.

gcc/ChangeLog:

* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt  | 16 
 gcc/coretypes.h |  7 +++
 gcc/doc/invoke.texi | 16 +++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 2c078fdd1f8..e874e88d3e1 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2349,6 +2349,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) 
Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned] Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload 
memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 1ac6f0abea3..938cfa93753 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -218,6 +218,13 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 45115b5fbed..eb0f8b4a58d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted
 -flax-vector-conversions  -fms-extensions
--foffload=@var{arg}  -foffload-options=@var{arg}
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} 
 -fopenacc  -fopenacc-dim=@var{geom}
 -fopenmp  -fopenmp-simd  -fopenmp-target-simd-clone@r{[}=@var{device-type}@r{]}
 -fpermitted-flt-eval-methods=@var{standard}
@@ -2786,6 +2786,20 @@ Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906
 @end smallexample
 
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @opindex fopenacc
 @cindex OpenACC accelerator programming
 @item -fopenacc
-- 
2.41.0



[PATCH v4 5/5] libgomp: fine-grained pinned memory allocator

2024-05-31 Thread Andrew Stubbs
This patch was already approved, by Tobias Burnus, in the v3 posting,
but I've not yet committed it because there are some textual dependecies
on the yet-to-be-approved patches.

-

This patch introduces a new custom memory allocator for use with pinned
memory (in the case where the Cuda allocator isn't available).  In future,
this allocator will also be used for Unified Shared Memory.  Both memories
are incompatible with the system malloc because allocated memory cannot
share a page with memory allocated for other purposes.

This means that small allocations will no longer consume an entire page of
pinned memory.  Unfortunately, it also means that pinned memory pages will
never be unmapped (although they may be reused).

The implementation is not perfect; there are various corner cases (especially
related to extending onto new pages) where allocations and reallocations may
be sub-optimal, but it should still be a step forward in support for small
allocations.

I have considered using libmemkind's "fixed" memory but rejected it for three
reasons: 1) libmemkind may not always be present at runtime, 2) there's no
currently documented means to extend a "fixed" kind one page at a time
(although the code appears to have an undocumented function that may do the
job, and/or extending libmemkind to support the MAP_LOCKED mmap flag with its
regular kinds would be straight-forward), 3) USM benefits from having the
metadata located in different memory and using an external implementation makes
it hard to guarantee this.

libgomp/ChangeLog:

* Makefile.am (libgomp_la_SOURCES): Add usmpin-allocator.c.
* Makefile.in: Regenerate.
* config/linux/allocator.c: Include unistd.h.
(pin_ctx): New variable.
(ctxlock): New variable.
(linux_init_pin_ctx): New function.
(linux_memspace_alloc): Use usmpin-allocator for pinned memory.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.h (usmpin_init_context): New prototype.
(usmpin_register_memory): New prototype.
(usmpin_alloc): New prototype.
(usmpin_free): New prototype.
(usmpin_realloc): New prototype.
* testsuite/libgomp.c/alloc-pinned-1.c: Adjust for new behaviour.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-8.c: New test.
* usmpin-allocator.c: New file.
---
 libgomp/Makefile.am  |   2 +-
 libgomp/Makefile.in  |   7 +-
 libgomp/config/linux/allocator.c |  97 --
 libgomp/libgomp.h|  10 +
 libgomp/testsuite/libgomp.c/alloc-pinned-8.c | 127 
 libgomp/usmpin-allocator.c   | 319 +++
 6 files changed, 527 insertions(+), 35 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-8.c
 create mode 100644 libgomp/usmpin-allocator.c

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 855f0affddf..73c21699332 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -70,7 +70,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
target.c splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-   oacc-target.c target-indirect.c
+   oacc-target.c target-indirect.c usmpin-allocator.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index da902f3daca..b74e39a1c2a 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -219,7 +219,8 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
critical.lo \
oacc-parallel.lo oacc-host.lo oacc-init.lo oacc-mem.lo \
oacc-async.lo oacc-plugin.lo oacc-cuda.lo priority_queue.lo \
affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
-   oacc-target.lo target-indirect.lo $(am__objects_1)
+   oacc-target.lo target-indirect.lo usmpin-allocator.lo \
+   $(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -552,7 +553,8 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c \
oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-   oacc-target.c target-indirect.c $(am__append_3)
+   oacc-target.c target-indirect.c usmpin-allocator.c \
+   $(am__append_3)
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info 
$(libtool_VERSION)
@@ -786,6 +788,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@
 @A

[PATCH] tree-optimization/115278 - fix DSE in if-conversion wrt volatiles

2024-05-31 Thread Richard Biener
The following adds the missing guard for volatile stores to the
embedded DSE in the loop if-conversion pass.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115278
* tree-if-conv.cc (ifcvt_local_dce): Do not DSE volatile stores.

* g++.dg/vect/pr115278.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr115278.cc | 38 +++
 gcc/tree-if-conv.cc   |  4 ++-
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr115278.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr115278.cc 
b/gcc/testsuite/g++.dg/vect/pr115278.cc
new file mode 100644
index 000..331075fb278
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr115278.cc
@@ -0,0 +1,38 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-additional-options "-fdump-tree-optimized" }
+
+#include 
+
+const int runs = 92;
+
+union BitfieldStructUnion {
+struct {
+uint64_t a : 17;
+uint64_t padding: 39;
+uint64_t b : 8;
+} __attribute__((packed));
+
+struct {
+uint32_t value_low;
+uint32_t value_high;
+} __attribute__((packed));
+
+BitfieldStructUnion(uint32_t value_low, uint32_t value_high) : 
value_low(value_low), value_high(value_high) {}
+};
+
+volatile uint32_t *WRITE = (volatile unsigned*)0x42;
+
+void buggy() {
+for (int i = 0; i < runs; i++) {
+BitfieldStructUnion rt{*WRITE, *WRITE};
+
+rt.a = 99;
+rt.b = 1;
+
+*WRITE = rt.value_low;
+*WRITE = rt.value_high;
+}
+}
+
+// { dg-final { scan-tree-dump-times "\\\*WRITE\[^\r\n\]* ={v} " 2 "optimized" 
} }
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 09d99fb9dda..c4c3ed41a44 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -3381,7 +3381,9 @@ ifcvt_local_dce (class loop *loop)
   gimple_stmt_iterator gsiprev = gsi;
   gsi_prev (&gsiprev);
   stmt = gsi_stmt (gsi);
-  if (gimple_store_p (stmt) && gimple_vdef (stmt))
+  if (!gimple_has_volatile_ops (stmt)
+ && gimple_store_p (stmt)
+ && gimple_vdef (stmt))
{
  tree lhs = gimple_get_lhs (stmt);
  ao_ref write;
-- 
2.35.3


Re: [PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-05-31 Thread Tobias Burnus

Hi Sandra,

some observations/comments, but in general it looks good.

Sandra Loosemore wrote:

This patch adds the OMP_METADIRECTIVE tree node and shared tree-level
support for manipulating metadirectives.  It defines/exposes
interfaces that will be used in subsequent patches that add front-end
and middle-end support, but nothing generates these nodes yet.

This patch also adds compile-time support for dynamic context
selectors (the target_device selector set and the condition selector
of the user selector set) for metadirectives only.  The "declare
variant" directive still supports only static selectors.

...

  /* Return 1 if context selector matches the current OpenMP context, 0
 if it does not and -1 if it is unknown and need to be determined later.
 Some properties can be checked right away during parsing (this routine),
 others need to wait until the whole TU is parsed, others need to wait until
-   IPA, others until vectorization.  */
+   IPA, others until vectorization.
+
+   METADIRECTIVE_P is true if this is a metadirective context, and DELAY_P
+   is true if it's too early in compilation to determine whether some
+   properties match.
+
+   Dynamic properties (which are evaluated at run-time) should always
+   return 1.  */

I have to admit that I don't really see the use of metadirective_p as …

  int
-omp_context_selector_matches (tree ctx)
+omp_context_selector_matches (tree ctx, bool metadirective_p, bool delay_p)

...

+   if (metadirective_p && delay_p)
+ return -1;


I do see why the resolution of KIND/ARCH/ISA should be delayed – for 
both variant/metadirective as long as the code is run by the host and 
the device. Except that we could exclude, e.g., 'kind(FPGA)' early on as 
we don't support it at all.


But once the device code is split off, I don't see why we can't expand 
the DEVICE clause right away for both variant and metadirective – while 
for 'target_device', we cannot do much until runtime – except of 
excluding things like 'kind(fpga)' – or excluding all 'arch' known not 
to be supported neither by the host nor by any enabled offload devices.


Thus, I see why there is a 'delay_p', but not why there is a 
'metadirective_p'.


But I might have missed something important ...


 case OMP_TRAIT_USER_CONDITION:
   if (set == OMP_TRAIT_SET_USER)
 for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))
   if (OMP_TP_NAME (p) == NULL_TREE)
 {
+ /* OpenMP 5.1 allows non-constant conditions for
+metadirectives.  */
+ if (metadirective_p
+ && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+   break;
   if (integer_zerop (OMP_TP_VALUE (p)))
 return 0;
   if (integer_nonzerop (OMP_TP_VALUE (p)))
 break;
   ret = -1;
 }


(BTW: I am happy to be enlightened as I likely have miss some fine print.)

Regarding the comment: True, but shouldn't this be handled before by 
issuing an error when such a clause is used in 'declare variant', i.e. 
only occur when metadirective_p is/can be true?


Besides, I have to admit that I do not understand the new code. The 
current code has: constant zero → whole selector known to be false 
("return 0"); nonzero constant → keep current state, i.e. either 'true' 
(1) or don't known ('-1') and continue; otherwise (not const) → set to 
"don't know" (-1) and continue with the next item.


That seems to make also sense for metadirectives. But your patch changes 
this to keep current state if a variable. In that case, '1' is used if 
this is the only item or the previous condition is true. Or "-1" when 
the previous item is "don't know" (-1). - I think that doesn't make 
sense and it should always return -1 for a run time value.


Additionally, I wonder why you use tree_fits_shwi_p instead of a simple 
'TREE_CODE (OMP_TP_VALUE (p)) != INTEGER_CST'. It does not seem to 
matter here, but '(uint128_t)-1' looks like a valid condition and valid 
constant, which integer_nonzerop should handled but if the hwi is 128bit 
wide, it won't fit into a signed variable.


(As integer_nonzerop and the current code both do "break;" it won't 
change the result of the current code.)


* * *

+static tree
+omp_dynamic_cond (tree ctx)
+{

...

+  /* The user condition is not dynamic if it is constant.  */
+  if (!tree_fits_shwi_p (TREE_VALUE (expr_list)))


Any reason for using tree_fits_shwi_p instead of INTEGER_CST? Here, 
(uint128_t)-1 could make a difference …



+   /* omp_initial_device is -1, omp_invalid_device is -4; choose
+  a value that isn't otherwise defined to indicate the default
+  device.  */
+   device_num = build_int_cst (integer_type_node, -2);


Don't do this - we do it differently fo

RE: [PATCH 1/3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-05-31 Thread Richard Biener
On Fri, 31 May 2024, Hu, Lin1 wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, May 29, 2024 5:41 PM
> > To: Hu, Lin1 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > ubiz...@gmail.com
> > Subject: Re: [PATCH 1/3] vect: generate suitable convert insn for int -> 
> > int, float
> > -> float and int <-> float.
> > 
> > On Thu, 23 May 2024, Hu, Lin1 wrote:
> > 
> > > gcc/ChangeLog:
> > >
> > >   PR target/107432
> > >   * tree-vect-generic.cc
> > >   (supportable_indirect_narrowing_operation): New function for
> > >   support indirect narrowing convert.
> > >   (supportable_indirect_widening_operation): New function for
> > >   support indirect widening convert.
> > >   (expand_vector_conversion): Support convert for int -> int,
> > >   float -> float and int <-> float.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR target/107432
> > >   * gcc.target/i386/pr107432-1.c: New test.
> > >   * gcc.target/i386/pr107432-2.c: Ditto.
> > >   * gcc.target/i386/pr107432-3.c: Ditto.
> > >   * gcc.target/i386/pr107432-4.c: Ditto.
> > >   * gcc.target/i386/pr107432-5.c: Ditto.
> > >   * gcc.target/i386/pr107432-6.c: Ditto.
> > >   * gcc.target/i386/pr107432-7.c: Ditto.
> > > ---
> > > diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index
> > > ab640096ca2..0bedb53d9f9 100644
> > > --- a/gcc/tree-vect-generic.cc
> > > +++ b/gcc/tree-vect-generic.cc
> > > @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
> > > #include "gimple-match.h"
> > >  #include "recog.h"   /* FIXME: for insn_data */
> > >  #include "optabs-libfuncs.h"
> > > +#include "cfgloop.h"
> > > +#include "tree-vectorizer.h"
> > >
> > >
> > >  /* Build a ternary operation and gimplify it.  Emit code before GSI.
> > > @@ -1834,6 +1836,142 @@ do_vec_narrow_conversion
> > (gimple_stmt_iterator *gsi, tree inner_type, tree a,
> > >return gimplify_build2 (gsi, code, outer_type, b, c);  }
> > >
> > > +/* A subroutine of expand_vector_conversion, support indirect conversion
> > for
> > > +   float <-> int, like double -> char.  */ bool
> > > +supportable_indirect_narrowing_operation (gimple_stmt_iterator *gsi,
> > > +  enum tree_code code,
> > > +  tree lhs,
> > > +  tree arg)
> > > +{
> > > +  gimple *g;
> > > +  tree ret_type = TREE_TYPE (lhs);
> > > +  tree arg_type = TREE_TYPE (arg);
> > > +  tree new_rhs;
> > > +
> > > +  unsigned int ret_elt_bits = vector_element_bits (ret_type);
> > > + unsigned int arg_elt_bits = vector_element_bits (arg_type);  if
> > > + (code != FIX_TRUNC_EXPR || flag_trapping_math || ret_elt_bits >=
> > arg_elt_bits)
> > > +return false;
> > > +
> > > +  unsigned short target_size;
> > > +  scalar_mode tmp_cvt_mode;
> > > +  scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE (ret_type));
> > > + scalar_mode rhs_mode = GET_MODE_INNER (TYPE_MODE (arg_type));  tree
> > > + cvt_type = NULL_TREE;  tmp_cvt_mode = lhs_mode;  target_size =
> > > + GET_MODE_SIZE (rhs_mode);
> > > +
> > > +  opt_scalar_mode mode_iter;
> > > +  enum tree_code tc1, tc2;
> > > +  unsigned HOST_WIDE_INT nelts
> > > += constant_lower_bound (TYPE_VECTOR_SUBPARTS (arg_type));
> > > +
> > > +  FOR_EACH_2XWIDER_MODE (mode_iter, tmp_cvt_mode)
> > > +{
> > > +  tmp_cvt_mode = mode_iter.require ();
> > > +
> > > +  if (GET_MODE_SIZE (tmp_cvt_mode) > target_size)
> > > + break;
> > > +
> > > +  scalar_mode cvt_mode;
> > > +  int tmp_cvt_size = GET_MODE_BITSIZE (tmp_cvt_mode);
> > > +  if (!int_mode_for_size (tmp_cvt_size, 0).exists (&cvt_mode))
> > > + break;
> > > +
> > > +  int cvt_size = GET_MODE_BITSIZE (cvt_mode);
> > > +  bool isUnsigned = TYPE_UNSIGNED (ret_type) || TYPE_UNSIGNED
> > (arg_type);
> > > +  cvt_type = build_nonstandard_integer_type (cvt_size,
> > > + isUnsigned);
> > > +
> > > +  cvt_type = build_vector_type (cvt_type, nelts);
> > > +  if (cvt_type == NULL_TREE
> > > +   || !supportable_convert_operation ((tree_code) NOP_EXPR,
> > > +  ret_type,
> > > +  cvt_type, &tc1)
> > > +   || !supportable_convert_operation ((tree_code) code,
> > > +  cvt_type,
> > > +  arg_type, &tc2))
> > > + continue;
> > > +
> > > +  new_rhs = make_ssa_name (cvt_type);
> > > +  g = vect_gimple_build (new_rhs, tc2, arg);
> > > +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > > +  g = gimple_build_assign (lhs, tc1, new_rhs);
> > > +  gsi_replace (gsi, g, false);
> > > +  return true;
> > > +}
> > > +  return false;
> > > +}
> > > +
> > > +/* A subroutine of expand_vector_conversion, support indirect conversion
> > for
> > > +   float <-> int, like char -> double.  */ bool
> > > +supportable_indirect_widening_operation (gimple_stmt_iterator *gsi,
> > > +   

Re: [PATCH v10 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-31 Thread Richard Biener
On Thu, 30 May 2024, Qing Zhao wrote:

> Including the following changes:
> * The definition of the new internal function .ACCESS_WITH_SIZE
>   in internal-fn.def.
> * C FE converts every reference to a FAM with a "counted_by" attribute
>   to a call to the internal function .ACCESS_WITH_SIZE.
>   (build_component_ref in c_typeck.cc)
> 
>   This includes the case when the object is statically allocated and
>   initialized.
>   In order to make this working, the routine digest_init in c-typeck.cc
>   is updated to fold calls to .ACCESS_WITH_SIZE to its first argument
>   when require_constant is TRUE.
> 
>   However, for the reference inside "offsetof", the "counted_by" attribute is
>   ignored since it's not useful at all.
>   (c_parser_postfix_expression in c/c-parser.cc)
> 
>   In addtion to "offsetof", for the reference inside operator "typeof" and
>   "alignof", we ignore counted_by attribute too.
> 
>   When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
>   replace the call with its first argument.
> 
> * Convert every call to .ACCESS_WITH_SIZE to its first argument.
>   (expand_ACCESS_WITH_SIZE in internal-fn.cc)
> * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
>   get the reference from the call to .ACCESS_WITH_SIZE.
>   (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)

The middle-end parts of this revised patch are OK.

Thanks,
Richard.

> gcc/c/ChangeLog:
> 
>   * c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
>   attribute when build_component_ref inside offsetof operator.
>   * c-tree.h (build_component_ref): Add one more parameter.
>   * c-typeck.cc (build_counted_by_ref): New function.
>   (build_access_with_size_for_counted_by): New function.
>   (build_component_ref): Check the counted-by attribute and build
>   call to .ACCESS_WITH_SIZE.
>   (build_unary_op): When building ADDR_EXPR for
> .ACCESS_WITH_SIZE, use its first argument.
> (lvalue_p): Accept call to .ACCESS_WITH_SIZE.
>   (digest_init): Fold call to .ACCESS_WITH_SIZE to its first
>   argument when require_constant is TRUE.
> 
> gcc/ChangeLog:
> 
>   * internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
>   * internal-fn.def (ACCESS_WITH_SIZE): New internal function.
>   * tree.cc (is_access_with_size_p): New function.
>   (get_ref_from_access_with_size): New function.
>   * tree.h (is_access_with_size_p): New prototype.
>   (get_ref_from_access_with_size): New prototype.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/flex-array-counted-by-2.c: New test.
> ---
>  gcc/c/c-parser.cc |  10 +-
>  gcc/c/c-tree.h|   2 +-
>  gcc/c/c-typeck.cc | 142 +-
>  gcc/internal-fn.cc|  34 +
>  gcc/internal-fn.def   |   5 +
>  .../gcc.dg/flex-array-counted-by-2.c  | 112 ++
>  gcc/tree.cc   |  22 +++
>  gcc/tree.h|   8 +
>  8 files changed, 328 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
> 
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index 00f8bf4376e5..2d9e9c0969f0 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -10848,9 +10848,12 @@ c_parser_postfix_expression (c_parser *parser)
>   if (c_parser_next_token_is (parser, CPP_NAME))
> {
>   c_token *comp_tok = c_parser_peek_token (parser);
> + /* Ignore the counted_by attribute for reference inside
> +offsetof since the information is not useful at all.  */
>   offsetof_ref
> = build_component_ref (loc, offsetof_ref, comp_tok->value,
> -  comp_tok->location, UNKNOWN_LOCATION);
> +  comp_tok->location, UNKNOWN_LOCATION,
> +  false);
>   c_parser_consume_token (parser);
>   while (c_parser_next_token_is (parser, CPP_DOT)
>  || c_parser_next_token_is (parser,
> @@ -10877,11 +10880,14 @@ c_parser_postfix_expression (c_parser *parser)
>   break;
> }
>   c_token *comp_tok = c_parser_peek_token (parser);
> + /* Ignore the counted_by attribute for reference inside
> +offsetof since the information is not useful.  */
>   offsetof_ref
> = build_component_ref (loc, offsetof_ref,
>comp_tok->value,
>comp_tok->location,
> -  UNKNOWN_LOCATION);
> +  UNKNOWN_LOCATION,
>

Re: [PATCH] rust: Do not link with libdl and libpthread unconditionally

2024-05-31 Thread Richard Biener
On Fri, May 31, 2024 at 12:24 PM Arthur Cohen  wrote:
>
> Hi Richard,
>
> On 4/30/24 09:55, Richard Biener wrote:
> > On Fri, Apr 19, 2024 at 11:49 AM Arthur Cohen  
> > wrote:
> >>
> >> Hi everyone,
> >>
> >> This patch checks for the presence of dlopen and pthread_create in libc. 
> >> If that is not the
> >> case, we check for the existence of -ldl and -lpthread, as these libraries 
> >> are required to
> >> link the Rust runtime to our Rust frontend.
> >>
> >> If these libs are not present on the system, then we disable the Rust 
> >> frontend.
> >>
> >> This was tested on x86_64, in an environment with a recent GLIBC and in a 
> >> container with GLIBC
> >> 2.27.
> >>
> >> Apologies for sending it in so late.
> >
> > For example GCC_ENABLE_PLUGINS simply does
> >
> >   # Check -ldl
> >   saved_LIBS="$LIBS"
> >   AC_SEARCH_LIBS([dlopen], [dl])
> >   if test x"$ac_cv_search_dlopen" = x"-ldl"; then
> > pluginlibs="$pluginlibs -ldl"
> >   fi
> >   LIBS="$saved_LIBS"
> >
> > which I guess would also work for pthread_create?  This would simplify
> > the code a bit.
>
> Thanks a lot for the review. I've udpated the patch's content in
> configure.ac per your suggestion. Tested similarly on x86_64 and in a
> container with libc 2.27

LGTM.

Thanks,
Richard.

>  From 00669b600a75743523c358ee41ab999b6e9fa0f6 Mon Sep 17 00:00:00 2001
> From: Arthur Cohen 
> Date: Fri, 12 Apr 2024 13:52:18 +0200
> Subject: [PATCH] rust: Do not link with libdl and libpthread unconditionally
>
> ChangeLog:
>
> * Makefile.tpl: Add CRAB1_LIBS variable.
> * Makefile.in: Regenerate.
> * configure: Regenerate.
> * configure.ac: Check if -ldl and -lpthread are needed, and if so, add
> them to CRAB1_LIBS.
>
> gcc/rust/ChangeLog:
>
> * Make-lang.in: Remove overazealous LIBS = -ldl -lpthread line, link
> crab1 against CRAB1_LIBS.
> ---
>   Makefile.in   |   3 +
>   Makefile.tpl  |   3 +
>   configure | 154 ++
>   configure.ac  |  41 +++
>   gcc/rust/Make-lang.in |   6 +-
>   5 files changed, 203 insertions(+), 4 deletions(-)
>
> diff --git a/Makefile.in b/Makefile.in
> index edb0c8a9a42..1753fb6b862 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -197,6 +197,7 @@ HOST_EXPORTS = \
> $(BASE_EXPORTS) \
> CC="$(CC)"; export CC; \
> ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
> +   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
> CFLAGS="$(CFLAGS)"; export CFLAGS; \
> CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> CXX="$(CXX)"; export CXX; \
> @@ -450,6 +451,8 @@ GOCFLAGS = $(CFLAGS)
>   GDCFLAGS = @GDCFLAGS@
>   GM2FLAGS = $(CFLAGS)
>
> +CRAB1_LIBS = @CRAB1_LIBS@
> +
>   PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
>
>   GUILE = guile
> diff --git a/Makefile.tpl b/Makefile.tpl
> index adbcbdd1d57..4aeaad3c1a5 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -200,6 +200,7 @@ HOST_EXPORTS = \
> $(BASE_EXPORTS) \
> CC="$(CC)"; export CC; \
> ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
> +   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
> CFLAGS="$(CFLAGS)"; export CFLAGS; \
> CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> CXX="$(CXX)"; export CXX; \
> @@ -453,6 +454,8 @@ GOCFLAGS = $(CFLAGS)
>   GDCFLAGS = @GDCFLAGS@
>   GM2FLAGS = $(CFLAGS)
>
> +CRAB1_LIBS = @CRAB1_LIBS@
> +
>   PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
>
>   GUILE = guile
> diff --git a/configure b/configure
> index 02b435c1163..a9ea5258f0f 100755
> --- a/configure
> +++ b/configure
> @@ -690,6 +690,7 @@ extra_host_zlib_configure_flags
>   extra_host_libiberty_configure_flags
>   stage1_languages
>   host_libs_picflag
> +CRAB1_LIBS
>   PICFLAG
>   host_shared
>   gcc_host_pie
> @@ -8826,6 +8827,139 @@ fi
>
>
>
> +# Rust requires -ldl and -lpthread if you are using an old glibc that
> does not include them by
> +# default, so we check for them here
> +
> +missing_rust_dynlibs=none
> +
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library
> containing dlopen" >&5
> +$as_echo_n "checking for library containing dlopen... " >&6; }
> +if ${ac_cv_search_dlopen+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  ac_func_search_save_LIBS=$LIBS
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +/* Override any GCC internal prototype to avoid an error.
> +   Use char because int might match the return type of a GCC
> +   builtin and then its argument prototype would still apply.  */
> +#ifdef __cplusplus
> +extern "C"
> +#endif
> +char dlopen ();
> +int
> +main ()
> +{
> +return dlopen ();
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +for ac_lib in '' dl; do
> +  if test -z "$ac_lib"; then
> +ac_res="none required"
> +  else
> +ac_res=-l$ac_lib
> +LIBS="-l$ac_lib  $ac_func_search_save_LIBS"
> +  fi
> +  if ac_fn_c_try_link "$LINENO"; then :
> +  ac_c

Re: [patch] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-31 Thread Andrew Stubbs

On 29/05/2024 13:15, Tobias Burnus wrote:

This patch depends (on the libgomp/target.c parts) of the patch
"[patch] libgomp: Enable USM for some nvptx devices",
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html

AMD GPUs that are either APU devices or MI200 [or MI300X]
(with HSA_XNACK=1 set) can access host memory; the run-time library
returns in that case HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = true.

Thus, it makes sense to enable USM support for those devices, which
this patch does. — A simple test with all unified_shared_memory tests
shipping with sollve_vv now works:*

   Test passed on the device.

as tested on an MI200 series device. In line with (some) other compilers,
it requires that HSA_XNACK=1 is set, otherwise the code will be executed
on the host.

(* Well, for C++, -O2 -fno-exception was used but stillonly 5 test case PASS, 1 delete[] etc. link error 1 ICE (segfault during 
IPA pass: cpin gcn gcc) 1 runtime fail for 
tests/5.2/unified_shared_mem/test_target_struct_obj_access.cpp [**] but 
all 15 Fortran and 16 C tests PASS.)


Comments, remarks, suggestions?
Any reason not to commit it to mainline?




index f9b5d9daf85..3c7be95d7fd 100644
--- a/include/hsa.h
+++ b/include/hsa.h
@@ -466,7 +466,9 @@ typedef enum {
   /**
   * String containing the ROCr build identifier.
   */
-  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200
+  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200,
+
+  HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = 0x202
 } hsa_system_info_t;
 
 /**


Please don't edit imported files.

Andrew


Re: [PATCH v10 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-31 Thread Qing Zhao



> On May 31, 2024, at 08:58, Richard Biener  wrote:
> 
> On Thu, 30 May 2024, Qing Zhao wrote:
> 
>> Including the following changes:
>> * The definition of the new internal function .ACCESS_WITH_SIZE
>>  in internal-fn.def.
>> * C FE converts every reference to a FAM with a "counted_by" attribute
>>  to a call to the internal function .ACCESS_WITH_SIZE.
>>  (build_component_ref in c_typeck.cc)
>> 
>>  This includes the case when the object is statically allocated and
>>  initialized.
>>  In order to make this working, the routine digest_init in c-typeck.cc
>>  is updated to fold calls to .ACCESS_WITH_SIZE to its first argument
>>  when require_constant is TRUE.
>> 
>>  However, for the reference inside "offsetof", the "counted_by" attribute is
>>  ignored since it's not useful at all.
>>  (c_parser_postfix_expression in c/c-parser.cc)
>> 
>>  In addtion to "offsetof", for the reference inside operator "typeof" and
>>  "alignof", we ignore counted_by attribute too.
>> 
>>  When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
>>  replace the call with its first argument.
>> 
>> * Convert every call to .ACCESS_WITH_SIZE to its first argument.
>>  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
>> * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
>>  get the reference from the call to .ACCESS_WITH_SIZE.
>>  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
> 
> The middle-end parts of this revised patch are OK.

Thanks a lot for the review.
Will commit the patch set soon.

Qing
> 
> Thanks,
> Richard.
> 
>> gcc/c/ChangeLog:
>> 
>> * c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
>> attribute when build_component_ref inside offsetof operator.
>> * c-tree.h (build_component_ref): Add one more parameter.
>> * c-typeck.cc (build_counted_by_ref): New function.
>> (build_access_with_size_for_counted_by): New function.
>> (build_component_ref): Check the counted-by attribute and build
>> call to .ACCESS_WITH_SIZE.
>> (build_unary_op): When building ADDR_EXPR for
>>.ACCESS_WITH_SIZE, use its first argument.
>>(lvalue_p): Accept call to .ACCESS_WITH_SIZE.
>> (digest_init): Fold call to .ACCESS_WITH_SIZE to its first
>> argument when require_constant is TRUE.
>> 
>> gcc/ChangeLog:
>> 
>> * internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
>> * internal-fn.def (ACCESS_WITH_SIZE): New internal function.
>> * tree.cc (is_access_with_size_p): New function.
>> (get_ref_from_access_with_size): New function.
>> * tree.h (is_access_with_size_p): New prototype.
>> (get_ref_from_access_with_size): New prototype.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.dg/flex-array-counted-by-2.c: New test.
>> ---
>> gcc/c/c-parser.cc |  10 +-
>> gcc/c/c-tree.h|   2 +-
>> gcc/c/c-typeck.cc | 142 +-
>> gcc/internal-fn.cc|  34 +
>> gcc/internal-fn.def   |   5 +
>> .../gcc.dg/flex-array-counted-by-2.c  | 112 ++
>> gcc/tree.cc   |  22 +++
>> gcc/tree.h|   8 +
>> 8 files changed, 328 insertions(+), 7 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>> 
>> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
>> index 00f8bf4376e5..2d9e9c0969f0 100644
>> --- a/gcc/c/c-parser.cc
>> +++ b/gcc/c/c-parser.cc
>> @@ -10848,9 +10848,12 @@ c_parser_postfix_expression (c_parser *parser)
>> if (c_parser_next_token_is (parser, CPP_NAME))
>>   {
>> c_token *comp_tok = c_parser_peek_token (parser);
>> + /* Ignore the counted_by attribute for reference inside
>> +offsetof since the information is not useful at all.  */
>> offsetof_ref
>>   = build_component_ref (loc, offsetof_ref, comp_tok->value,
>> -  comp_tok->location, UNKNOWN_LOCATION);
>> +  comp_tok->location, UNKNOWN_LOCATION,
>> +  false);
>> c_parser_consume_token (parser);
>> while (c_parser_next_token_is (parser, CPP_DOT)
>>|| c_parser_next_token_is (parser,
>> @@ -10877,11 +10880,14 @@ c_parser_postfix_expression (c_parser *parser)
>> break;
>>   }
>> c_token *comp_tok = c_parser_peek_token (parser);
>> + /* Ignore the counted_by attribute for reference inside
>> +offsetof since the information is not useful.  */
>> offsetof_ref
>>   = build_component_ref (loc, offsetof_ref,
>>  comp_tok->value,
>>  comp_tok->location,
>> -  UNKNOWN_LOCATION);
>> +  UNKNOWN_LOCATION,
>> +  false);
>> c_parser_consume_token (parser);
>>   }
>> else
>> diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
>> index 531a7e8742e3..56a33b8156c6 100644
>> --- a/gcc/c/c-tree.h
>> +++ b/gcc/c/c-tree.h
>> @@ -779,7 +779,7 @@ extern void mark_exp_read (tree);
>> extern tree composite_type (tree, tree);
>> extern tree lookup_field (const_tree, tree);
>> extern tree build_component_ref (location_t, tree, tree, location_t,
>> -  location_t);

nvptx target: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2')

2024-05-31 Thread Thomas Schwinge
Hi!

On 2022-12-02T14:35:35+0100, I wrote:
> On 2022-12-01T22:13:38+0100, I wrote:
>> I'm working on support for global constructors/destructors with
>> GCC/nvptx
>
> See "nvptx: Support global constructors/destructors via 'collect2'"
> attached; [...]
>
> Per my quick scanning of 'gcc/config.gcc' history, for more than two
> decades, there was a clear trend to remove 'use_collect2=yes'
> configurations; now finally a new one is being added -- making sure we're
> not slowly dispensing with the need for the early 1990s piece of work
> that 'gcc/collect2*' is...  ;'-P

In the following, I have then reconsidered that stance; we may actually
"Implement global constructor, destructor support in a conceptually
simpler way than using 'collect2' (the program): implement the respective
functionality in the nvptx-tools 'ld'".  The latter is

"ld: Global constructor/destructor support".

Thus, this:

> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -2783,6 +2783,7 @@ nvptx-*)
>   tm_file="${tm_file} newlib-stdint.h"
>   use_gcc_stdint=wrap
>   tmake_file="nvptx/t-nvptx"
> + use_collect2=yes
>   if test x$enable_as_accelerator = xyes; then
>   extra_programs="${extra_programs} mkoffload\$(exeext)"
>   tm_file="${tm_file} nvptx/offload.h"

... now is gone again.  ;'-)

Pushed to trunk branch commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'",
see attached.

(Support for nvptx offloading, enablement of full libgfortran for nvptx,
and corresponding documentation updates, etc. are to follow as separate
commits.)


Compared to the 2022 'collect2' version, this 'ld' version also does
happen to avoid one class of FAILs:

[-FAIL:-]{+PASS:+} gfortran.dg/implicit_class_1.f90   -O0  (test for excess 
errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/implicit_class_1.f90   -O0  
[-compilation failed to produce executable-]{+execution test+}
[...]

That was due to:

Executing on host: [gfortran] [...] [...]/gfortran.dg/implicit_class_1.f90 
[...] -fdump-fortran-original [...]
[...]
cc1: error: unrecognized command-line option '-fdump-fortran-original'; did 
you mean '-fdump-tree-original'?
collect2: fatal error: gcc returned 1 exit status
compilation terminated.
compiler exited with status 1
FAIL: gfortran.dg/implicit_class_1.f90   -O0  (test for excess errors)

That is, the 'gcc' invocation by 'collect2' is passed
'-fdump-fortran-original', but doesn't know what to do with that.  (Maybe
using '-Wno-complain-wrong-lang' in 'collect2' would help?)  (I'm not
going to look into that any further.)


Grüße
 Thomas


>From d9c90c82d900fdae95df4499bf5f0a4ecb903b53 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 28 May 2024 23:20:29 +0200
Subject: [PATCH] nvptx target: Global constructor, destructor support, via
 nvptx-tools 'ld'

The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this.  Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more
"sorry, unimplemented: global constructors not supported on this target".

For proper execution test results, this depends on

"ld: Global constructor/destructor support".

	gcc/
	* config/nvptx/nvptx.h: Configure global constructor, destructor
	support.
	gcc/testsuite/
	* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
	'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
	in identifiers.
	* lib/target-supports.exp
	(check_effective_target_global_constructor): Enable for nvptx.
	libgcc/
	* config/nvptx/crt0.c (__gbl_ctors): New weak function.
	(__main): Invoke it.
	* config/nvptx/gbl-ctors.c: New.
	* config/nvptx/t-nvptx: Configure global constructor, destructor
	support.
---
 gcc/config/nvptx/nvptx.h  | 14 +++-
 .../no_profile_instrument_function-attr-1.c   |  2 +-
 gcc/testsuite/lib/target-supports.exp |  3 +-
 libgcc/config/nvptx/crt0.c| 12 +++
 libgcc/config/nvptx/gbl-ctors.c   | 74 +++
 libgcc/config/nvptx/t-nvptx   |  9 ++-
 6 files changed, 109 insertions(+), 5 deletions(-)
 create mode 100644 libgcc/config/nvptx/gbl-ctors.c

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index e282aad1b73..74f4a68924c 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -356,7 +356,19 @@ struct GTY(()) machine_function
 #define MOVE_MAX 8
 #define MOVE_RATIO(SPEED) 4
 #define FUNCTION_MODE QImode
-#define HAS_INIT_SECTION 1
+
+/* Implement global constructor, destructor support in a conceptually simpler
+   way than using 'collect2' (the program): 

[PATCH 1/5][v3] Avoid ICE with pointer reduction

2024-05-31 Thread Richard Biener
There's another case where we can refer to neutral_op before
eventually converting it from pointer to integer so simply
do that unconditionally.

* tree-vect-loop.cc (get_initial_defs_for_reduction):
Always convert neutral_op.
---
 gcc/tree-vect-loop.cc | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 04a9ac64df7..fc690336b38 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5608,6 +5608,12 @@ get_initial_defs_for_reduction (loop_vec_info loop_vinfo,
   tree_vector_builder elts (vector_type, nunits, 1);
   elts.quick_grow (nunits);
   gimple_seq ctor_seq = NULL;
+  if (neutral_op
+  && !useless_type_conversion_p (TREE_TYPE (vector_type),
+TREE_TYPE (neutral_op)))
+neutral_op = gimple_convert (&ctor_seq,
+TREE_TYPE (vector_type),
+neutral_op);
   for (j = 0; j < nunits * number_of_vectors; ++j)
 {
   tree op;
@@ -5616,14 +5622,7 @@ get_initial_defs_for_reduction (loop_vec_info loop_vinfo,
   /* Get the def before the loop.  In reduction chain we have only
 one initial value.  Else we have as many as PHIs in the group.  */
   if (i >= initial_values.length () || (j > i && neutral_op))
-   {
- if (!useless_type_conversion_p (TREE_TYPE (vector_type),
- TREE_TYPE (neutral_op)))
-   neutral_op = gimple_convert (&ctor_seq,
-TREE_TYPE (vector_type),
-neutral_op);
- op = neutral_op;
-   }
+   op = neutral_op;
   else
{
  if (!useless_type_conversion_p (TREE_TYPE (vector_type),
-- 
2.35.3



[PATCH 2/5][v3] Adjust vector dump scans

2024-05-31 Thread Richard Biener
The following adjusts dump scanning for something followed by
successful vector analysis to more specifically look for
'Analysis succeeded' and not 'Analysis failed' because the
previous look for just 'succeeded' or 'failed' is easily confused
by SLP discovery dumping those words.

* tree-vect-loop.cc (vect_analyze_loop_1): Avoid extra space
before 'failed'.

* gcc.dg/vect/no-scevccp-outer-7.c: Adjust scanning for
succeeded analysis not interrupted by failure.
* gcc.dg/vect/no-scevccp-vect-iv-3.c: Likewise.
* gcc.dg/vect/vect-cond-reduc-4.c: Likewise.
* gcc.dg/vect/vect-live-2.c: Likewise.
* gcc.dg/vect/vect-outer-4c-big-array.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-s16a.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-s8a.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-u16a.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-u16b.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-u8a.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-u8b.c: Likewise.
* gcc.dg/vect/vect-reduc-pattern-1a.c: Likewise.
* gcc.dg/vect/vect-reduc-pattern-1b-big-array.c: Likewise.
* gcc.dg/vect/vect-reduc-pattern-1c-big-array.c: Likewise.
* gcc.dg/vect/vect-reduc-pattern-2a.c: Likewise.
* gcc.dg/vect/vect-reduc-pattern-2b-big-array.c: Likewise.
* gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c| 2 +-
 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-live-2.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-outer-4c-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c  | 4 ++--
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c  | 4 ++--
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16a.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16b.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8a.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8b.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1a.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1b-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1c-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2a.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2b-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c| 4 ++--
 gcc/tree-vect-loop.cc   | 2 +-
 19 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
index 87048422013..e796e6ba216 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
@@ -77,4 +77,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
target vect_widen_mult_hi_to_si } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" { target 
vect_widen_mult_hi_to_si } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
detected(?:(?!Analysis failed).)*Analysis succeeded" 1 "vect" { target 
vect_widen_mult_hi_to_si } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c
index 6f2b2210b11..f268d4a5131 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c
@@ -30,4 +30,4 @@ unsigned int main1 ()
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_widen_sum_hi_to_si } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" { target 
vect_widen_sum_hi_to_si } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: 
detected(?:(?!Analysis failed).)*Analysis succeeded" 1 "vect" { target 
vect_widen_sum_hi_to_si } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
index 27f18dc5bda..e9d414287e8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
@@ -42,6 +42,6 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
-/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
FOLD_EXTRACT_LAST(?:(?!failed)(?!Re-trying).)*succeeded" 2 "vect" { target { 
vect_fold_extract_last && vect_pack_trunc } } } } */
+/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
FOLD_EXTRACT_L

[PATCH 3/5][v3] Do single-lane SLP discovery for reductions

2024-05-31 Thread Richard Biener
The following performs single-lane SLP discovery for reductions.
It requires a fixup for outer loop vectorization where a check
for multiple types needs adjustments as otherwise bogus pointer
IV increments happen when there are multiple copies of vector stmts
in the inner loop.

For the reduction epilog handling this extends the optimized path
to cover the trivial single-lane SLP reduction case.

The fix for PR65518 implemented in vect_grouped_load_supported for
non-SLP needs a SLP counterpart that I put in get_group_load_store_type.

* tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
discoveries are reduction chains and need special backedge
treatment.
(vect_analyze_slp): Fall back to single-lane SLP discovery
for reductions.  Make sure to try single-lane SLP reduction
for all reductions as fallback.
(vectorizable_load): Avoid outer loop SLP vectorization with
multi-copy vector stmts in the inner loop.
(vectorizable_store): Likewise.
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Allow
direct opcode and shift reduction also for SLP reductions
with a single lane.
* tree-vect-stmts.cc (get_group_load_store_type): For SLP also
check for the PR65518 single-element interleaving case as done in
vect_grouped_load_supported.
---
 gcc/tree-vect-loop.cc  |  4 +--
 gcc/tree-vect-slp.cc   | 71 --
 gcc/tree-vect-stmts.cc | 24 --
 3 files changed, 78 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index e8109f9ac3c..8b25f8f4d51 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6506,7 +6506,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   /* 2.3 Create the reduction code, using one of the three schemes described
  above. In SLP we simply need to extract all the elements from the 
  vector (without reducing them), so we use scalar shifts.  */
-  else if (reduc_fn != IFN_LAST && !slp_reduc)
+  else if (reduc_fn != IFN_LAST && (!slp_reduc || group_size == 1))
 {
   tree tmp;
   tree vec_elem_type;
@@ -6676,7 +6676,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
   reduc_inputs[0] = new_temp;
 
-  if (reduce_with_shift && !slp_reduc)
+  if (reduce_with_shift && (!slp_reduc || group_size == 1))
{
  int element_bitsize = tree_to_uhwi (bitsize);
  /* Enforced by vectorizable_reduction, which disallows SLP reductions
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index bc7a85d6bfc..c1d9dfe042e 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1911,7 +1911,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
/* Reduction chain backedge defs are filled manually.
   ???  Need a better way to identify a SLP reduction chain PHI.
   Or a better overall way to SLP match those.  */
-   if (all_same && def_type == vect_reduction_def)
+   if (stmts.length () > 1
+   && all_same && def_type == vect_reduction_def)
  skip_args[loop_latch_edge (loop)->dest_idx] = true;
  }
else if (def_type != vect_internal_def)
@@ -3909,9 +3910,10 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
  }
 
   /* Find SLP sequences starting from groups of reductions.  */
-  if (loop_vinfo->reductions.length () > 1)
+  if (loop_vinfo->reductions.length () > 0)
{
- /* Collect reduction statements.  */
+ /* Collect reduction statements we can combine into
+a SLP reduction.  */
  vec scalar_stmts;
  scalar_stmts.create (loop_vinfo->reductions.length ());
  for (auto next_info : loop_vinfo->reductions)
@@ -3924,25 +3926,60 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
 reduction path.  In that case we'd have to reverse
 engineer that conversion stmt following the chain using
 reduc_idx and from the PHI using reduc_def.  */
- && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
- /* Do not discover SLP reductions for lane-reducing ops, that
-will fail later.  */
- && (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
+ && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def)
+   {
+ /* Do not discover SLP reductions combining lane-reducing
+ops, that will fail later.  */
+ if (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
  || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR
  && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR
- && gimple_assign_rhs_code (g) != SA

[PATCH 5/5][v3] RISC-V: Avoid inserting after a GIMPLE_COND with SLP and early break

2024-05-31 Thread Richard Biener
When vectorizing an early break loop with LENs (do we miss some
check here to disallow this?) we can end up deciding to insert
stmts after a GIMPLE_COND when doing SLP scheduling and trying
to be conservative with placing of stmts only dependent on
the implicit loop mask/len.  The following avoids this, I guess
it's not perfect but it does the job fixing some observed
RISC-V regression.

* tree-vect-slp.cc (vect_schedule_slp_node): For mask/len
loops make sure to not advance the insertion iterator
beyond a GIMPLE_COND.
---
 gcc/tree-vect-slp.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b469977aab2..dd7703a3cc0 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9689,7 +9689,12 @@ vect_schedule_slp_node (vec_info *vinfo,
   else
{
  si = gsi_for_stmt (last_stmt);
- gsi_next (&si);
+ /* When we're getting gsi_after_labels from the starting
+condition of a fully masked/len loop avoid insertion
+after a GIMPLE_COND that can appear as the only header
+stmt with early break vectorization.  */
+ if (gimple_code (last_stmt) != GIMPLE_COND)
+   gsi_next (&si);
}
 }
 
-- 
2.35.3


[PATCH 4/5][v3] Reduce single-lane SLP testresult noise

2024-05-31 Thread Richard Biener
The following avoids dumping 'vectorizing stmts using SLP' for
single-lane instances since that causes extra testsuite fallout.

* tree-vect-slp.cc (vect_schedule_slp): Gate dumping
'vectorizing stmts using SLP' on > 1 lanes.
---
 gcc/tree-vect-slp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index c1d9dfe042e..b469977aab2 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10108,7 +10108,8 @@ vect_schedule_slp (vec_info *vinfo, const 
vec &slp_instances)
   if (!SLP_INSTANCE_ROOT_STMTS (instance).is_empty ())
vectorize_slp_instance_root_stmt (node, instance);
 
-  if (dump_enabled_p ())
+  /* ???  Reduce some testsuite noise because of "more SLP".  */
+  if (SLP_TREE_LANES (node) > 1 && dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
  "vectorizing stmts using SLP.\n");
 }
-- 
2.35.3



Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Ajit Agarwal
Hello Richard:

On 31/05/24 3:23 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> Hello All:
>>
>> Common infrastructure using generic code for pair mem fusion of different
>> targets.
>>
>> rs6000 target specific specific code implements virtual functions defined
>> by generic code.
>>
>> Code is implemented with pure virtual functions to interface with target
>> code.
>>
>> Target specific code are added in rs6000-mem-fusion.cc and additional virtual
>> function implementation required for rs6000 are added in 
>> aarch64-ldp-fusion.cc.
>>
>> Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>>
>> aarch64, rs6000, middle-end: Add implementation for different targets for 
>> pair mem fusion
>>
>> Common infrastructure using generic code for pair mem fusion of different
>> targets.
>>
>> rs6000 target specific specific code implements virtual functions defined
>> by generic code.
>>
>> Code is implemented with pure virtual functions to interface with target
>> code.
>>
>> Target specific code are added in rs6000-mem-fusion.cc and additional virtual
>> function implementation required for rs6000 are added in 
>> aarch64-ldp-fusion.cc.
>>
>> 2024-05-31  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64-ldp-fusion.cc: Add target specific
>>  implementation of additional virtual functions added in pair_fusion
>>  struct.
>>  * config/rs6000/rs6000-passes.def: New mem fusion pass
>>  before pass_early_remat.
>>  * config/rs6000/rs6000-mem-fusion.cc: Add new pass.
>>  Add target specific implementation using pure virtual
>>  functions.
>>  * config.gcc: Add new object file.
>>  * config/rs6000/rs6000-protos.h: Add new prototype for mem
>>  fusion pass.
>>  * config/rs6000/t-rs6000: Add new rule.
>>  * rtl-ssa/accesses.h: Moved set_is_live_out_use as public
>>  from private.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.target/powerpc/me-fusion.C: New test.
>>  * g++.target/powerpc/mem-fusion-1.C: New test.
>>  * gcc.target/powerpc/mma-builtin-1.c: Modify test.
>> ---
> 
> This isn't a complete review, just some initial questions & comments
> about selected parts.
> 
>> [...]
>> +/* Check whether load can be fusable or not.
>> +   Return true if dependent use is UNSPEC otherwise false.  */
>> +bool
>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
>> +{
>> +  rtx_insn *insn = info->rtl ();
>> +
>> +  for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1))
>> +if (REG_NOTE_KIND (note) == REG_EQUAL
>> +|| REG_NOTE_KIND (note) == REG_EQUIV)
>> +  return false;
> 
> It's unusual to punt on an optimisation because of a REG_EQUAL/EQUIV
> note.  What's the reason for doing this?  Are you trying to avoid
> fusing pairs before reload that are equivalent to a MEM (i.e. have
> a natural spill slot)?  I think Alex hit a similar situation.
> 

We have used the above check because of some SPEC benchmarks failing with
with MEM pairs having REG_EQUAL/EQUIV notes.

By adding the checks the benchmarks passes and also it improves the
performance.

This checks were added during initial implementation of pair fusion
pass.

I will investigate further if this check is still required or not.

Sorry for the inconvenience caused.

>> +
>> +  for (auto def : info->defs ())
>> +{
>> +  auto set = dyn_cast (def);
>> +  if (set && set->has_any_uses ())
>> +{
>> +  for (auto use : set->all_uses())
> 
> Nit: has_any_uses isn't necessary: the inner loop will simply do nothing
> in that case.  Also, we can/should restrict the scan to non-debug uses.
> 
> This can then be:
> 
>   for (auto def : info->defs ())
> if (auto set = dyn_cast (def))
>   for (auto use : set->nondebug_insn_uses())
> 

Sure. I will change as above.

>> +{
>> +  if (use->insn ()->is_artificial ())
>> +return false;
>> +
>> +   insn_info *info = use->insn ();
>> +
>> +   if (info
>> +   && info->rtl ()
> 
> This test shouldn't be necessary.
> 

Sure I will remove this check.

>> +   && info->is_real ())
>> +  {
>> +rtx_insn *rtl_insn = info->rtl ();
>> +rtx set = single_set (rtl_insn);
>> +
>> +if (set == NULL_RTX)
>> +  return false;
>> +
>> +rtx op0 = SET_SRC (set);
>> +if (GET_CODE (op0) != UNSPEC)
>> +  return false;
> 
> What's the motivation for rejecting unspecs?  It's unsual to treat
> all unspecs as a distinct group.
> 
> Also, using single_set means that the function still lets through
> parallels of two sets in which the sources are unspecs.  Is that
> intentional?
> 
> The reasons behind things like the REG_EQUAL/EQUIV and UNSPEC decisions
> need to be described in comments, so that other people coming to this
> code later can understand the motivation.  The same thing a

[committed] alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

2024-05-31 Thread Uros Bizjak
any_divmod instructions are modelled with invalid RTX:

  [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
[(match_operand:DI 1 "register_operand" "a")
 (match_operand:DI 2 "register_operand" "b")])))
   (clobber (reg:DI 23))
   (clobber (reg:DI 28))]

where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.

Wrap input operand with truncate:SI to make machine modes consistent.

PR target/115297

gcc/ChangeLog:

* config/alpha/alpha.md (si3): Wrap DImode
operands 3 and 4 with truncate:SI RTX.
(*divmodsi_internal_er): Ditto for operands 1 and 2.
(*divmodsi_internal_er_1): Ditto.
(*divmodsi_internal): Ditto.
* config/alpha/constraints.md ("b"): Correct register
number in the description.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115297.c: New test.

Tested by building an alpha-linux-gnu crosscompiler.

Uros.
diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 79f12c53c16..1e2de5a4d15 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -725,7 +725,8 @@ (define_expand "si3"
(sign_extend:DI (match_operand:SI 2 "nonimmediate_operand")))
(parallel [(set (match_dup 5)
   (sign_extend:DI
-   (any_divmod:SI (match_dup 3) (match_dup 4
+   (any_divmod:SI (truncate:SI (match_dup 3))
+  (truncate:SI (match_dup 4)
  (clobber (reg:DI 23))
  (clobber (reg:DI 28))])
(set (match_operand:SI 0 "nonimmediate_operand")
@@ -751,9 +752,10 @@ (define_expand "di3"
 
 (define_insn_and_split "*divmodsi_internal_er"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_EXPLICIT_RELOCS && TARGET_ABI_OSF"
@@ -795,8 +797,8 @@ (define_insn_and_split "*divmodsi_internal_er"
 (define_insn "*divmodsi_internal_er_1"
   [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
-[(match_operand:DI 1 "register_operand" "a")
- (match_operand:DI 2 "register_operand" "b")])))
+[(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+ (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(use (match_operand:DI 4 "register_operand" "c"))
(use (match_operand 5 "const_int_operand"))
(clobber (reg:DI 23))
@@ -808,9 +810,10 @@ (define_insn "*divmodsi_internal_er_1"
 
 (define_insn "*divmodsi_internal"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_ABI_OSF"
diff --git a/gcc/config/alpha/constraints.md b/gcc/config/alpha/constraints.md
index 0d001ba26f1..4383f1fa895 100644
--- a/gcc/config/alpha/constraints.md
+++ b/gcc/config/alpha/constraints.md
@@ -27,7 +27,7 @@ (define_register_constraint "a" "R24_REG"
  "General register 24, input to division routine")
 
 (define_register_constraint "b" "R25_REG"
- "General register 24, input to division routine")
+ "General register 25, input to division routine")
 
 (define_register_constraint "c" "R27_REG"
  "General register 27, function call address")
diff --git a/gcc/testsuite/gcc.target/alpha/pr115297.c 
b/gcc/testsuite/gcc.target/alpha/pr115297.c
new file mode 100644
index 000..4d5890ec8d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/pr115297.c
@@ -0,0 +1,13 @@
+/* PR target/115297 */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+enum { BPF_F_USER_BUILD_ID } __bpf_get_stack_size;
+long __bpf_get_stack_flags, bpf_get_stack___trans_tmp_2;
+
+void bpf_get_stack() {
+  unsigned elem_size;
+  int err = elem_size = __bpf_get_stack_flags ?: sizeof(long);
+  if (__builtin_expect(__bpf_get_stack_size % elem_size, 0))
+bpf_get_stack___trans_tmp_2 = err;
+}


[PATCH] Fix PR c++/109958: ICE taking the address of bound static member function brought into derived class by using-declaration

2024-05-31 Thread Simon Martin
From: Simon Martin 

We currently ICE upon the following because we don't properly handle the
overload created for B::f through the using statement.

=== cut here ===
struct B { static int f(); };
struct D : B { using B::f; };
void f(D d) { &d.f; }
=== cut here ===

This patch makes build_class_member_access_expr and cp_build_addr_expr_1 handle
such overloads, and fixes the PR.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/109958

gcc/cp/ChangeLog:

* typeck.cc (build_class_member_access_expr): Handle single OVERLOADs.
(cp_build_addr_expr_1): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/overload/using6.C: New test.

---
 gcc/cp/typeck.cc   | 5 +
 gcc/testsuite/g++.dg/overload/using6.C | 5 +
 2 files changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/overload/using6.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 1b7a31d32f3..5970ac3d398 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -3025,6 +3025,8 @@ build_class_member_access_expr (cp_expr object, tree 
member,
 know the type of the expression.  Otherwise, we must wait
 until overload resolution has been performed.  */
   functions = BASELINK_FUNCTIONS (member);
+  if (TREE_CODE (functions) == OVERLOAD && OVL_SINGLE_P (functions))
+   functions = OVL_FIRST (functions);
   if (TREE_CODE (functions) == FUNCTION_DECL
  && DECL_STATIC_FUNCTION_P (functions))
type = TREE_TYPE (functions);
@@ -7333,6 +7335,9 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue, 
tsubst_flags_t complain)
 {
   tree fn = BASELINK_FUNCTIONS (TREE_OPERAND (arg, 1));
 
+  if (TREE_CODE (fn) == OVERLOAD && OVL_SINGLE_P (fn))
+   fn = OVL_FIRST (fn);
+
   /* We can only get here with a single static member
 function.  */
   gcc_assert (TREE_CODE (fn) == FUNCTION_DECL
diff --git a/gcc/testsuite/g++.dg/overload/using6.C 
b/gcc/testsuite/g++.dg/overload/using6.C
new file mode 100644
index 000..4f89f68a30f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/using6.C
@@ -0,0 +1,5 @@
+// PR c++/109958
+
+struct B { static int f(); };
+struct D : B { using B::f; };
+void f(D d) { &d.f; }
-- 
2.44.0




Re: [PATCH 4/6] vect: Bind input vectype to lane-reducing operation

2024-05-31 Thread Richard Biener
On Thu, May 30, 2024 at 4:53 PM Feng Xue OS  wrote:
>
> The input vectype is an attribute of lane-reducing operation, instead of
> reduction PHI that it is associated to, since there might be more than one
> lane-reducing operations with different type in a loop reduction chain. So
> bind each lane-reducing operation with its own input type.

OK.

Thanks,
Richard.

> Thanks,
> Feng
> ---
> gcc/
> * tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Remove 
> parameter
> loop_vinfo. Get input vectype from stmt_info instead of reduction PHI.
> (vect_model_reduction_cost): Remove loop_vinfo argument of call to
> vect_is_emulated_mixed_dot_prod.
> (vect_transform_reduction): Likewise.
> (vectorizable_reduction): Likewise, and bind input vectype to
> lane-reducing operation.
> ---
>  gcc/tree-vect-loop.cc | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 51627c27f8a..20c99f11e9a 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5270,8 +5270,7 @@ have_whole_vector_shift (machine_mode mode)
> See vect_emulate_mixed_dot_prod for the actual sequence used.  */
>
>  static bool
> -vect_is_emulated_mixed_dot_prod (loop_vec_info loop_vinfo,
> -stmt_vec_info stmt_info)
> +vect_is_emulated_mixed_dot_prod (stmt_vec_info stmt_info)
>  {
>gassign *assign = dyn_cast (stmt_info->stmt);
>if (!assign || gimple_assign_rhs_code (assign) != DOT_PROD_EXPR)
> @@ -5282,10 +5281,9 @@ vect_is_emulated_mixed_dot_prod (loop_vec_info 
> loop_vinfo,
>if (TYPE_SIGN (TREE_TYPE (rhs1)) == TYPE_SIGN (TREE_TYPE (rhs2)))
>  return false;
>
> -  stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
> -  gcc_assert (reduc_info->is_reduc_info);
> +  gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
>return !directly_supported_p (DOT_PROD_EXPR,
> -   STMT_VINFO_REDUC_VECTYPE_IN (reduc_info),
> +   STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
> optab_vector_mixed_sign);
>  }
>
> @@ -5324,8 +5322,8 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
>if (!gimple_extract_op (orig_stmt_info->stmt, &op))
>  gcc_unreachable ();
>
> -  bool emulated_mixed_dot_prod
> -= vect_is_emulated_mixed_dot_prod (loop_vinfo, stmt_info);
> +  bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info);
> +
>if (reduction_type == EXTRACT_LAST_REDUCTION)
>  /* No extra instructions are needed in the prologue.  The loop body
> operations are costed in vectorizable_condition.  */
> @@ -7840,6 +7838,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>  vectype_in = STMT_VINFO_VECTYPE (phi_info);
>STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in;
>
> +  /* Each lane-reducing operation has its own input vectype, while reduction
> + PHI records the input vectype with least lanes.  */
> +  if (lane_reducing)
> +STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in;
> +
>enum vect_reduction_type v_reduc_type = STMT_VINFO_REDUC_TYPE (phi_info);
>STMT_VINFO_REDUC_TYPE (reduc_info) = v_reduc_type;
>/* If we have a condition reduction, see if we can simplify it further.  */
> @@ -8366,7 +8369,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>if (single_defuse_cycle || lane_reducing)
>  {
>int factor = 1;
> -  if (vect_is_emulated_mixed_dot_prod (loop_vinfo, stmt_info))
> +  if (vect_is_emulated_mixed_dot_prod (stmt_info))
> /* Three dot-products and a subtraction.  */
> factor = 4;
>record_stmt_cost (cost_vec, ncopies * factor, vector_stmt,
> @@ -8617,8 +8620,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
> : &vec_oprnds2));
>  }
>
> -  bool emulated_mixed_dot_prod
> -= vect_is_emulated_mixed_dot_prod (loop_vinfo, stmt_info);
> +  bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info);
> +
>FOR_EACH_VEC_ELT (vec_oprnds0, i, def0)
>  {
>gimple *new_stmt;
> --
> 2.17.1


[PATCH] [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__

2024-05-31 Thread Alexandre Oliva


A proprietary embedded operating system that uses clang as its primary
compiler ships headers that require __clang__ to be defined.  Defining
that macro causes libstdc++ to adopt workarounds that work for clang
but that break for GCC.

So, introduce a _GLIBCXX_CLANG macro, and a convention to test for it
rather than for __clang__, so that a GCC variant that adds -D__clang__
to satisfy system headers can also -D_GLIBCXX_CLANG=0 to avoid
workarounds that are not meant for GCC.

I've left fast_float and ryu files alone, their tests for __clang__
don't seem to be harmful for GCC, they don't include bits/c++config,
and patching such third-party files would just make trouble for
updating them without visible benefit.

Regstrapping on x86_64-linux-gnu.  Ok to install?

PS: I went for mechanical replacement s/__clang__/_GLIBCXX_CLANG/g which
made c++config slightly more complicated, but I'm open to making
_GLIBCXX_CLANG be defined unconditionally, using nonzero tests for it
elsewhere, if that's preferred.  I figured this would be easier to
validate: I only had to check that the modified headers used other
c++config-defined macros.


for  libstdc++-v3/ChangeLog

* include/bits/c++config (_GLIBCXX_CLANG): Define or undefine.
* include/bits/locale_facets_nonio.tcc: Test for it.
* include/bits/stl_bvector.h: Likewise.
* include/c_compatibility/stdatomic.h: Likewise.
* include/experimental/bits/simd.h: Likewise.
* include/experimental/bits/simd_builtin.h: Likewise.
* include/experimental/bits/simd_detail.h: Likewise.
* include/experimental/bits/simd_x86.h: Likewise.
* include/experimental/simd: Likewise.
* include/std/complex: Likewise.
* include/std/ranges: Likewise.
* include/std/variant: Likewise.
* include/pstl/pstl_config.h: Likewise, when defined(__GLIBCXX__).
---
 libstdc++-v3/include/bits/c++config|   13 -
 libstdc++-v3/include/bits/locale_facets_nonio.tcc  |2 +-
 libstdc++-v3/include/bits/stl_bvector.h|2 +-
 libstdc++-v3/include/c_compatibility/stdatomic.h   |2 +-
 libstdc++-v3/include/experimental/bits/simd.h  |   14 +++---
 .../include/experimental/bits/simd_builtin.h   |4 ++--
 .../include/experimental/bits/simd_detail.h|8 
 libstdc++-v3/include/experimental/bits/simd_x86.h  |   12 ++--
 libstdc++-v3/include/experimental/simd |2 +-
 libstdc++-v3/include/pstl/pstl_config.h|4 ++--
 libstdc++-v3/include/std/complex   |4 ++--
 libstdc++-v3/include/std/ranges|8 
 libstdc++-v3/include/std/variant   |2 +-
 13 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index b57e3f338e92a..6dca2d9467aa5 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -481,9 +481,20 @@ _GLIBCXX_END_NAMESPACE_VERSION
 // Define if compatibility should be provided for -mlong-double-64.
 #undef _GLIBCXX_LONG_DOUBLE_COMPAT
 
+// Use an alternate macro to test for clang, so as to provide an easy
+// workaround for systems (such as vxworks) whose headers require
+// __clang__ to be defined, even when compiling with GCC.
+#if !defined _GLIBCXX_CLANG && defined __clang__
+# define _GLIBCXX_CLANG __clang__
+// Turn -D_GLIBCXX_CLANG=0 into -U_GLIBCXX_CLANG, so that
+// _GLIBCXX_CLANG can be tested as defined, just like __clang__.
+#elif !_GLIBCXX_CLANG
+# undef _GLIBCXX_CLANG
+#endif
+
 // Define if compatibility should be provided for alternative 128-bit long
 // double formats. Not possible for Clang until __ibm128 is supported.
-#ifndef __clang__
+#ifndef _GLIBCXX_CLANG
 #undef _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT
 #endif
 
diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc 
b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
index 8f67be5a6147b..72136f42f0866 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -1465,7 +1465,7 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
   ctype<_CharT> const& __ctype = use_facet >(__loc);
   __err = ios_base::goodbit;
   bool __use_state = false;
-#if __GNUC__ >= 5 && !defined(__clang__)
+#if __GNUC__ >= 5 && !defined(_GLIBCXX_CLANG)
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wpmf-conversions"
   // Nasty hack.  The C++ standard mandates that get invokes the do_get
diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index d567e26f4e430..52153cadf8f70 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -185,7 +185,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 void
 _M_assume_normalized() const
 {
-#if __has_attribute(__assume__) && !defined(__clang__)
+#if __has_attribute

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Segher Boessenkool
On Fri, May 31, 2024 at 09:14:21AM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > Hi!
> >
> > On Fri, May 31, 2024 at 01:21:44AM +0530, Ajit Agarwal wrote:
> >> Code is implemented with pure virtual functions to interface with target
> >> code.
> >
> > It's not a pure function.  A pure function -- by definition -- has no
> > side effects.  These things have side effects.
> >
> > What you mean is this is *an implementation* for C++ functions without
> > a generic implementation.  An obfuscation some people (like me) would
> > say.  But please call things what they are!  So not "pure function".
> > That has a meaning, and this isn't it.
> 
> "pure virtual function" is an established term.  The "pure" modifies
> "virtual", not "function".
> 
> The description is correct because the patch adds pure virtual functions
> to the base class and expects the derived class to override and implement
> them.

But this code -- the architecture implementation! -- certainly does
*not* add abstract functions: it should provide an implementation for
them, instead.  So the commit message is completely misleading :-(

And no, it is not an established term, not outside of the C++ world.
Which GCC agreed *not* to dive too much into.  You can only sanely write
most of compilers -- just like anything where algorithmics matter -- in
an imperative, procedural language.  Obfuscation (like action at a
distance like this, where the meaning of a program depends hugely on
knowledge of other parts of the program, far away!) is the devil.

Reading the program is at least 1000x more important than writing it.
Writing is easy.  Reading and understanding, not so much.

> >>* config/aarch64/aarch64-ldp-fusion.cc: Add target specific
> >>implementation of additional virtual functions added in pair_fusion
> >>struct.
> >
> > This does not belong in this patch.  Do not send "rs6000" patches that
> > touch anything outside of config/rs6000/ and similar, certainly not in
> > config/something-else/!
> >
> > This would be WAY easier to review (read: AT ALL POSSIBLE) if you
> > included some detailed rationale and design document.
> 
> Please don't shout.
> 
> I don't think this kind of aggressive review is helpful to the project.

And I don't think wasting people's time is helpful.

I don't shout.  If you read that as shouting, that's not my problem.

It is EMPHASIS (there are no small caps in email).  Do you prefer *fat
print* for that?  Or /slanted/?  Or **italics**?  _Underlined_ perhaps?


Segher


Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Richard Sandiford
Ajit Agarwal  writes:
> On 31/05/24 3:23 pm, Richard Sandiford wrote:
>> Ajit Agarwal  writes:
>>> Hello All:
>>>
>>> Common infrastructure using generic code for pair mem fusion of different
>>> targets.
>>>
>>> rs6000 target specific specific code implements virtual functions defined
>>> by generic code.
>>>
>>> Code is implemented with pure virtual functions to interface with target
>>> code.
>>>
>>> Target specific code are added in rs6000-mem-fusion.cc and additional 
>>> virtual
>>> function implementation required for rs6000 are added in 
>>> aarch64-ldp-fusion.cc.
>>>
>>> Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu.
>>>
>>> Thanks & Regards
>>> Ajit
>>>
>>>
>>> aarch64, rs6000, middle-end: Add implementation for different targets for 
>>> pair mem fusion
>>>
>>> Common infrastructure using generic code for pair mem fusion of different
>>> targets.
>>>
>>> rs6000 target specific specific code implements virtual functions defined
>>> by generic code.
>>>
>>> Code is implemented with pure virtual functions to interface with target
>>> code.
>>>
>>> Target specific code are added in rs6000-mem-fusion.cc and additional 
>>> virtual
>>> function implementation required for rs6000 are added in 
>>> aarch64-ldp-fusion.cc.
>>>
>>> 2024-05-31  Ajit Kumar Agarwal  
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/aarch64/aarch64-ldp-fusion.cc: Add target specific
>>> implementation of additional virtual functions added in pair_fusion
>>> struct.
>>> * config/rs6000/rs6000-passes.def: New mem fusion pass
>>> before pass_early_remat.
>>> * config/rs6000/rs6000-mem-fusion.cc: Add new pass.
>>> Add target specific implementation using pure virtual
>>> functions.
>>> * config.gcc: Add new object file.
>>> * config/rs6000/rs6000-protos.h: Add new prototype for mem
>>> fusion pass.
>>> * config/rs6000/t-rs6000: Add new rule.
>>> * rtl-ssa/accesses.h: Moved set_is_live_out_use as public
>>> from private.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * g++.target/powerpc/me-fusion.C: New test.
>>> * g++.target/powerpc/mem-fusion-1.C: New test.
>>> * gcc.target/powerpc/mma-builtin-1.c: Modify test.
>>> ---
>> 
>> This isn't a complete review, just some initial questions & comments
>> about selected parts.
>> 
>>> [...]
>>> +/* Check whether load can be fusable or not.
>>> +   Return true if dependent use is UNSPEC otherwise false.  */
>>> +bool
>>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
>>> +{
>>> +  rtx_insn *insn = info->rtl ();
>>> +
>>> +  for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1))
>>> +if (REG_NOTE_KIND (note) == REG_EQUAL
>>> +   || REG_NOTE_KIND (note) == REG_EQUIV)
>>> +  return false;
>> 
>> It's unusual to punt on an optimisation because of a REG_EQUAL/EQUIV
>> note.  What's the reason for doing this?  Are you trying to avoid
>> fusing pairs before reload that are equivalent to a MEM (i.e. have
>> a natural spill slot)?  I think Alex hit a similar situation.
>> 
>
> We have used the above check because of some SPEC benchmarks failing with
> with MEM pairs having REG_EQUAL/EQUIV notes.
>
> By adding the checks the benchmarks passes and also it improves the
> performance.
>
> This checks were added during initial implementation of pair fusion
> pass.
>
> I will investigate further if this check is still required or not.

Thanks.  If it does affect SPEC results, it would be good to look
at the underlying reason, as a justification for the check.

AIUI, the case Alex was due to the way that the RA recognises:

  (set (reg R) (mem address-of-a-stack-variable))
REG_EQUIV: (mem address-of-a-stack-variable)

where the REG_EQUIV is either explicit or detected by the RA.
If R needs to be spilled, it can then be spilled to its existing
location on the stack.  And if R needs to be spilled in the
instruction above (because of register pressure before the first
use of R), the RA is able to delete the instruction.

But if that is the reason, the condition should be restricted
to cases in which the note is a memory.

I think Alex had tried something similar and found that it wasn't
always effective.

> [...]
>>> +  && info->is_real ())
>>> + {
>>> +   rtx_insn *rtl_insn = info->rtl ();
>>> +   rtx set = single_set (rtl_insn);
>>> +
>>> +   if (set == NULL_RTX)
>>> + return false;
>>> +
>>> +   rtx op0 = SET_SRC (set);
>>> +   if (GET_CODE (op0) != UNSPEC)
>>> + return false;
>> 
>> What's the motivation for rejecting unspecs?  It's unsual to treat
>> all unspecs as a distinct group.
>> 
>> Also, using single_set means that the function still lets through
>> parallels of two sets in which the sources are unspecs.  Is that
>> intentional?
>> 
>> The reasons behind things like the REG_EQUAL/EQUIV and UNSPEC decisions
>> need to be described in comments, so that other people coming to t

Re: [PATCH] [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__

2024-05-31 Thread Jonathan Wakely

On 31/05/24 11:07 -0300, Alexandre Oliva wrote:


A proprietary embedded operating system that uses clang as its primary
compiler ships headers that require __clang__ to be defined.  Defining
that macro causes libstdc++ to adopt workarounds that work for clang
but that break for GCC.

So, introduce a _GLIBCXX_CLANG macro, and a convention to test for it
rather than for __clang__, so that a GCC variant that adds -D__clang__
to satisfy system headers can also -D_GLIBCXX_CLANG=0 to avoid
workarounds that are not meant for GCC.

I've left fast_float and ryu files alone, their tests for __clang__
don't seem to be harmful for GCC, they don't include bits/c++config,
and patching such third-party files would just make trouble for
updating them without visible benefit.

Regstrapping on x86_64-linux-gnu.  Ok to install?

PS: I went for mechanical replacement s/__clang__/_GLIBCXX_CLANG/g which
made c++config slightly more complicated, but I'm open to making
_GLIBCXX_CLANG be defined unconditionally, using nonzero tests for it
elsewhere, if that's preferred.  I figured this would be easier to
validate: I only had to check that the modified headers used other
c++config-defined macros.


I think I prefer your approach to defining it unconditionally to zero
or one.

Comments inline below ...


for  libstdc++-v3/ChangeLog

* include/bits/c++config (_GLIBCXX_CLANG): Define or undefine.
* include/bits/locale_facets_nonio.tcc: Test for it.
* include/bits/stl_bvector.h: Likewise.
* include/c_compatibility/stdatomic.h: Likewise.
* include/experimental/bits/simd.h: Likewise.
* include/experimental/bits/simd_builtin.h: Likewise.
* include/experimental/bits/simd_detail.h: Likewise.
* include/experimental/bits/simd_x86.h: Likewise.
* include/experimental/simd: Likewise.
* include/std/complex: Likewise.
* include/std/ranges: Likewise.
* include/std/variant: Likewise.
* include/pstl/pstl_config.h: Likewise, when defined(__GLIBCXX__).
---
libstdc++-v3/include/bits/c++config|   13 -
libstdc++-v3/include/bits/locale_facets_nonio.tcc  |2 +-
libstdc++-v3/include/bits/stl_bvector.h|2 +-
libstdc++-v3/include/c_compatibility/stdatomic.h   |2 +-
libstdc++-v3/include/experimental/bits/simd.h  |   14 +++---
.../include/experimental/bits/simd_builtin.h   |4 ++--
.../include/experimental/bits/simd_detail.h|8 
libstdc++-v3/include/experimental/bits/simd_x86.h  |   12 ++--
libstdc++-v3/include/experimental/simd |2 +-
libstdc++-v3/include/pstl/pstl_config.h|4 ++--
libstdc++-v3/include/std/complex   |4 ++--
libstdc++-v3/include/std/ranges|8 
libstdc++-v3/include/std/variant   |2 +-
13 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index b57e3f338e92a..6dca2d9467aa5 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -481,9 +481,20 @@ _GLIBCXX_END_NAMESPACE_VERSION
// Define if compatibility should be provided for -mlong-double-64.
#undef _GLIBCXX_LONG_DOUBLE_COMPAT

+// Use an alternate macro to test for clang, so as to provide an easy
+// workaround for systems (such as vxworks) whose headers require
+// __clang__ to be defined, even when compiling with GCC.
+#if !defined _GLIBCXX_CLANG && defined __clang__
+# define _GLIBCXX_CLANG __clang__
+// Turn -D_GLIBCXX_CLANG=0 into -U_GLIBCXX_CLANG, so that
+// _GLIBCXX_CLANG can be tested as defined, just like __clang__.
+#elif !_GLIBCXX_CLANG
+# undef _GLIBCXX_CLANG
+#endif
+
// Define if compatibility should be provided for alternative 128-bit long
// double formats. Not possible for Clang until __ibm128 is supported.
-#ifndef __clang__
+#ifndef _GLIBCXX_CLANG
#undef _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT
#endif

diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc 
b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
index 8f67be5a6147b..72136f42f0866 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -1465,7 +1465,7 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
  ctype<_CharT> const& __ctype = use_facet >(__loc);
  __err = ios_base::goodbit;
  bool __use_state = false;
-#if __GNUC__ >= 5 && !defined(__clang__)
+#if __GNUC__ >= 5 && !defined(_GLIBCXX_CLANG)
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpmf-conversions"
  // Nasty hack.  The C++ standard mandates that get invokes the do_get
diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index d567e26f4e430..52153cadf8f70 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -185,7 +185,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER

Re: [PATCH 5/6] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-31 Thread Richard Biener
On Thu, May 30, 2024 at 4:55 PM Feng Xue OS  wrote:
>
> For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
> vectorizer could only handle the pattern if the reduction chain does not
> contain other operation, no matter the other is normal or lane-reducing.
>
> Actually, to allow multiple arbitray lane-reducing operations, we need to
> support vectorization of loop reduction chain with mixed input vectypes. Since
> lanes of vectype may vary with operation, the effective ncopies of vectorized
> statements for operation also may not be same to each other, this causes
> mismatch on vectorized def-use cycles. A simple way is to align all operations
> with the one that has the most ncopies, the gap could be complemented by
> generating extra trival pass-through copies. For example:
>
>int sum = 0;
>for (i)
>  {
>sum += d0[i] * d1[i];  // dot-prod 
>sum += w[i];   // widen-sum 
>sum += abs(s0[i] - s1[i]); // sad 
>sum += n[i];   // normal 
>  }
>
> The vector size is 128-bit,vectorization factor is 16. Reduction statements
> would be transformed as:
>
>vector<4> int sum_v0 = { 0, 0, 0, 0 };
>vector<4> int sum_v1 = { 0, 0, 0, 0 };
>vector<4> int sum_v2 = { 0, 0, 0, 0 };
>vector<4> int sum_v3 = { 0, 0, 0, 0 };
>
>for (i / 16)
>  {
>sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
>sum_v1 = sum_v1;  // copy
>sum_v2 = sum_v2;  // copy
>sum_v3 = sum_v3;  // copy
>
>sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
>sum_v1 = sum_v1;  // copy
>sum_v2 = sum_v2;  // copy
>sum_v3 = sum_v3;  // copy
>
>sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
>sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
>sum_v2 = sum_v2;  // copy
>sum_v3 = sum_v3;  // copy
>
>sum_v0 += n_v0[i: 0  ~ 3 ];
>sum_v1 += n_v1[i: 4  ~ 7 ];
>sum_v2 += n_v2[i: 8  ~ 11];
>sum_v3 += n_v3[i: 12 ~ 15];
>  }
>
> Thanks,
> Feng
> ---
> gcc/
> PR tree-optimization/114440
> * tree-vectorizer.h (vectorizable_lane_reducing): New function
> declaration.
> * tree-vect-stmts.cc (vect_analyze_stmt): Call new function
> vectorizable_lane_reducing to analyze lane-reducing operation.
> * tree-vect-loop.cc (vect_model_reduction_cost): Remove cost 
> computation
> code related to emulated_mixed_dot_prod.
> (vectorizable_lane_reducing): New function.
> (vectorizable_reduction): Allow multiple lane-reducing operations in
> loop reduction. Move some original lane-reducing related code to
> vectorizable_lane_reducing.
> (vect_transform_reduction): Extend transformation to support reduction
> statements with mixed input vectypes.
>
> gcc/testsuite/
> PR tree-optimization/114440
> * gcc.dg/vect/vect-reduc-chain-1.c
> * gcc.dg/vect/vect-reduc-chain-2.c
> * gcc.dg/vect/vect-reduc-chain-3.c
> * gcc.dg/vect/vect-reduc-chain-dot-slp-1.c
> * gcc.dg/vect/vect-reduc-chain-dot-slp-2.c
> * gcc.dg/vect/vect-reduc-dot-slp-1.c
> ---
>  .../gcc.dg/vect/vect-reduc-chain-1.c  |  62 +++
>  .../gcc.dg/vect/vect-reduc-chain-2.c  |  77 +++
>  .../gcc.dg/vect/vect-reduc-chain-3.c  |  66 +++
>  .../gcc.dg/vect/vect-reduc-chain-dot-slp-1.c  |  97 
>  .../gcc.dg/vect/vect-reduc-chain-dot-slp-2.c  |  81 +++
>  .../gcc.dg/vect/vect-reduc-dot-slp-1.c|  35 ++
>  gcc/tree-vect-loop.cc | 478 --
>  gcc/tree-vect-stmts.cc|   2 +
>  gcc/tree-vectorizer.h |   2 +
>  9 files changed, 755 insertions(+), 145 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
> new file mode 100644
> index 000..04bfc419dbd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
> @@ -0,0 +1,62 @@
> +/* Disabling epilogues until we find a better way to deal with scans.  */
> +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { 
> aarch64*-*-* || arm*-*-* } } } */
> +/* { dg-add-options arm_v8_2a_dotprod_neon }  */
> +
> +#include "tree-vect.h"
> +
> +#define N 50
> +
> +#ifndef SIGNEDNESS_1
> +#define SIGNEDNESS_

Re: [PATCH] fix PowerPC < 7 w/ Altivec not to default to power7

2024-05-31 Thread Rene Rebe
Hi Kewen,

thank you for your reply.

> on 2024/3/8 19:33, Rene Rebe wrote:
> > This might not be the best timing -short before a major release-,
> > however, Sam just commented on the bug I filled years ago [1], so here
> > we go:
> > 
> > Glibc uses .machine to determine assembler optimizations to use.
> > However, since reworking the rs6000 .machine output selection in
> > commit e154242724b084380e3221df7c08fcdbd8460674 22 May 2019, G5 as
> > well as Cell, and even power4 w/ -maltivec currently resulted in
> > power7. Mask _ALTIVEC away as the .machine selection already did for
> > GFX and GPOPT.
> 
> Thanks for fixing, this fix looks reasonable to me, OPTION_MASK_ALTIVEC
> is a part of POWERPC_7400_MASK so any specified cpu type which has this
> POWERPC_7400_MASK by default and isn't handled early in function
> rs6000_machine_from_flags can suffer from this issue.
> 
> > 
> > powerpc64-t2-linux-gnu-gcc  test.c -S -o - -mcpu=G5
> > .file   "test.c"
> > .machine power7
> > .abiversion 2
> > .section".text"
> > .ident  "GCC: (GNU) 10.2.0"
> > .section.note.GNU-stack,"",@progbits
> > 
> 
> Nit: Could you also add one test case for this?
> 
> btw, -mdejagnu-cpu=G5 can force the cpu type in dg-options.

It took me a while to allocate enough time to study dejagnu and write
a suitable test, I hope this suits your needs:

--- ./gcc/testsuite/gcc.target/powerpc/pr97367.c.vanilla2024-05-30 
18:26:29.839784279 +0200
+++ ./gccc/testsuite/gcc.target/powerpc/pr97367.c   2024-05-30 
18:20:34.873818482 +0200
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-S -mcpu=G5" } */
+
+/* { dg-final { scan-assembler "power4" } } */

I double checked it works and fails as expected.

> > We ship this in T2/Linux [2] since 2020 and it is tested on G5, Cell
> > and Power8.
> > 
> > Signed-of-by: René Rebe 
> > 
> > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97367
> > [2] https://t2sde.org
> > 
> > --- gcc-11.1.0-RC-20210423/gcc/config/rs6000/rs6000.cc.vanilla  
> > 2021-04-25 22:57:16.964223106 +0200
> > +++ gcc-11.1.0-RC-20210423/gcc/config/rs6000/rs6000.cc  2021-04-25 
> > 22:57:27.193223841 +0200
> > @@ -5765,7 +5765,7 @@
> >HOST_WIDE_INT flags = rs6000_isa_flags;
> >  
> >/* Disable the flags that should never influence the .machine selection. 
> >  */
> > -  flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
> > OPTION_MASK_ISEL);
> > +  flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
> > OPTION_MASK_ALTIVEC | OPTION_MASK_ISEL);
> 
> Nit: This line is too long and needs re-format.

While I don't really find ~100 chars too long for modern standards,
I'm happy to line break that for you once the above test is approved.

Thank you so much,

  René

-- 
  René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
  https://exactcode.com | https://t2sde.org | https://rene.rebe.de


[PATCH] ifcvt: Clarify if_info.original_cost.

2024-05-31 Thread Robin Dapp
Hi,

before noce_find_if_block processes a block it sets up an if_info
structure that holds the original costs.  At that point the costs of
the then/else blocks have not been added so we only care about the
"if" cost.

The code originally used BRANCH_COST for that but was then changed
to COST_N_INSNS (2) - a compare and a jump.
This patch computes the jump costs via
  insn_cost (if_info.jump, ...)
which is supposed to incorporate the branch costs and, in case of a CC
comparison,
  pattern_cost (if_info.cond, ...)
which is supposed to account for the CC creation.

For compare_and_jump patterns insn_cost should have already computed
the right cost.

Does this "split" make sense, generally?

Bootstrapped and regtested on x86, aarch64 and power10.  Regtested
on riscv.

Regards
 Robin

gcc/ChangeLog:

* ifcvt.cc (noce_process_if_block): Subtract condition pattern
cost if applicable.
(noce_find_if_block): Use insn_cost and pattern_cost for
original cost.
---
 gcc/ifcvt.cc | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..305b9faed38 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3940,7 +3940,9 @@ noce_process_if_block (struct noce_if_info *if_info)
  ??? Actually, instead of the branch instruction costs we might want
  to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
 
-  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  unsigned potential_cost = if_info->original_cost;
+  if (cc_in_cond (if_info->cond))
+potential_cost -= pattern_cost (if_info->cond, if_info->speed_p);
   unsigned old_cost = if_info->original_cost;
   if (!else_bb
   && HAVE_conditional_move
@@ -4703,11 +4705,13 @@ noce_find_if_block (basic_block test_bb, edge 
then_edge, edge else_edge,
 = targetm.max_noce_ifcvt_seq_cost (then_edge);
   /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
  that they are valid to transform.  We can't easily get back to the insn
- for COND (and it may not exist if we had to canonicalize to get COND),
- and jump_insns are always given a cost of 1 by seq_cost, so treat
- both instructions as having cost COSTS_N_INSNS (1).  */
-  if_info.original_cost = COSTS_N_INSNS (2);
-
+ for COND (and it may not exist if we had to canonicalize to get COND).
+ Here we assume one CC compare insn (if the target uses CC) and one
+ jump insn that is costed via insn_cost.  It is assumed that the
+ costs of a jump insn are dependent on the branch costs.  */
+  if (cc_in_cond (if_info.cond))
+if_info.original_cost = pattern_cost (if_info.cond, if_info.speed_p);
+  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
 
   /* Do the real work.  */
 
-- 
2.45.1


[PATCH] RISC-V: Add min/max patterns for ifcvt.

2024-05-31 Thread Robin Dapp
Hi,

ifcvt likes to emit

(set
  (if_then_else)
(ge (reg 1) (reg2))
(reg 1)
(reg 2))

which can be recognized as min/max patterns in the backend.
This patch adds such patterns and the respective iterators as well as a
test.

This depends on the generic ifcvt change.
Regtested on rv64gcv_zvfh_zicond_zbb_zvbb. 

Regards
 Robin

gcc/ChangeLog:

* config/riscv/bitmanip.md (*_cmp_3):
New min/max ifcvt pattern.
* config/riscv/iterators.md (minu): New iterator.
* config/riscv/riscv.cc (riscv_noce_conversion_profitable_p):
Remove riscv-specific adjustment.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-04.c: New test.
---
 gcc/config/riscv/bitmanip.md  | 13 +
 gcc/config/riscv/iterators.md |  8 
 gcc/config/riscv/riscv.cc |  3 --
 .../gcc.target/riscv/zbb-min-max-04.c | 47 +++
 4 files changed, 68 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8769a6b818b..11102985796 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -547,6 +547,19 @@ (define_insn "*3"
   "\t%0,%1,%z2"
   [(set_attr "type" "")])
 
+;; Provide a minmax pattern for ifcvt to match.
+(define_insn "*_cmp_3"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (if_then_else:X
+   (bitmanip_minmax_cmp_op
+   (match_operand:X 1 "register_operand" "r")
+   (match_operand:X 2 "register_operand" "r"))
+   (match_dup 1)
+   (match_dup 2)))]
+  "TARGET_ZBB"
+  "\t%0,%1,%z2"
+  [(set_attr "type" "")])
+
 ;; Optimize the common case of a SImode min/max against a constant
 ;; that is safe both for sign- and zero-extension.
 (define_insn_and_split "*minmax"
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 8a9d1986b4a..2f7be6e83c1 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -202,6 +202,14 @@ (define_code_iterator bitmanip_bitwise [and ior])
 
 (define_code_iterator bitmanip_minmax [smin umin smax umax])
 
+(define_code_iterator bitmanip_minmax_cmp_op [lt ltu le leu ge geu gt gtu])
+
+; Map a comparison operator to a min or max.
+(define_code_attr bitmanip_minmax_cmp_insn [(lt "min") (ltu "minu")
+   (le "min") (leu "minu")
+   (ge "max") (geu "maxu")
+   (gt "max") (gtu "maxu")])
+
 (define_code_iterator clz_ctz_pcnt [clz ctz popcount])
 
 (define_code_iterator bitmanip_rotate [rotate rotatert])
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 13cd61a4a22..d17c0a260a2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4009,9 +4009,6 @@ riscv_noce_conversion_profitable_p (rtx_insn *seq,
 {
   struct noce_if_info riscv_if_info = *if_info;
 
-  riscv_if_info.original_cost -= COSTS_N_INSNS (2);
-  riscv_if_info.original_cost += insn_cost (if_info->jump, if_info->speed_p);
-
   /* Hack alert!  When `noce_try_store_flag_mask' uses `cstore4'
  to emit a conditional set operation on DImode output it comes up
  with a sequence such as:
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c 
b/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c
new file mode 100644
index 000..ebf1889075d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc_zicond_zbb -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-finline-functions" "-funroll-loops" 
"-ftracer" } } */
+
+typedef int move_s;
+
+int
+remove_one_fast (int *move_ordering, const int num_moves, int mark)
+{
+  int i, best = -100;
+  int tmp = 0;
+
+  for (i = mark; i < num_moves; i++)
+{
+  if (move_ordering[i] > best)
+{
+  best = move_ordering[i];
+  tmp = i;
+}
+}
+
+  return tmp;
+}
+
+/* { dg-final { scan-assembler-times "max\t" 1 } }  */
+/* { dg-final { scan-assembler-times "czero.nez" 2 } }  */
+/* { dg-final { scan-assembler-times "czero.eqz" 2 } }  */
+
+int
+remove_one_fast2 (int *move_ordering, const int num_moves, int mark)
+{
+  int i, best = -100;
+  int tmp = 0;
+
+  for (i = mark; i < num_moves; i++)
+{
+  if (move_ordering[i] < best)
+{
+  best = move_ordering[i];
+  tmp = i;
+}
+}
+
+  return tmp;
+}
+
+/* { dg-final { scan-assembler-times "min\t" 1 } }  */
-- 
2.45.1


Re: [Patch, aarch64, middle-end\ v4: Move pair_fusion pass from aarch64 to middle-end

2024-05-31 Thread Marc Poulhiès
Hello,

I can't bootstrap using gcc 5.5 since this change. It fails with:

.../gcc/pair-fusion.cc: In member function ‘bool 
pair_fusion_bb_info::fuse_pair(bool, unsigned int, int, rtl_ssa::insn_info*, 
rtl_ssa::in
sn_info*, base_cand&, const rtl_ssa::insn_range_info&)’:
.../gcc/pair-fusion.cc:1790:40: error: ‘writeback’ is not a class, namespace, 
or enumeration
   if (m_pass->should_handle_writeback (writeback::ALL)
^
Is it possible that C++11 enum classes are not correctly supported in
older GCC?

Thanks,
Marc


[PATCH] testsuite: Improve check-function-bodies

2024-05-31 Thread Wilco Dijkstra
Improve check-function-bodies by allowing single-character function names.
Also skip '#' comments which may be emitted from inline assembler.

Passes regress, OK for commit?

gcc/testsuite:
* lib/scanasm.exp (configure_check-function-bodies): Allow single-char
function names.  Skip '#' comments.

---

diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 
6cf9997240deec274a191103d21690d80e34ba95..0e461ef260b7a6fee5a9c60d0571e46468f752c0
 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -869,15 +869,15 @@ proc configure_check-function-bodies { config } {
 # Regexp for the start of a function definition (name in \1).
 if { [istarget nvptx*-*-*] } {
set up_config(start) {
-   {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S+)$}
+   {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S*)$}
}
 } elseif { [istarget *-*-darwin*] } {
set up_config(start) {
-   {^_([a-zA-Z_]\S+):$}
+   {^_([a-zA-Z_]\S*):$}
{^LFB[0-9]+:}
}
 } else {
-   set up_config(start) {{^([a-zA-Z_]\S+):$}}
+   set up_config(start) {{^([a-zA-Z_]\S*):$}}
 }
 
 # Regexp for the end of a function definition.
@@ -899,9 +899,9 @@ proc configure_check-function-bodies { config } {
 } else {
# Skip lines beginning with labels ('.L[...]:') or other directives
# ('.align', '.cfi_startproc', '.quad [...]', '.text', etc.), '//' or
-   # '@' comments ('-fverbose-asm' or ARM-style, for example), or empty
-   # lines.
-   set up_config(fluff) {^\s*(?:\.|//|@|$)}
+   # '@' or '#' comments ('-fverbose-asm' or ARM-style, for example), or
+   # empty lines.
+   set up_config(fluff) {^\s*(?:\.|//|@|#|$)}
 }
 
 # Regexp for expected output lines prefix.



[PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Wilco Dijkstra

Add __ARM_FEATURE_MOPS predefine.  Add support for ACLE __arm_mops_memset_tag.

Passes regress, OK for commit?

gcc:
* config/aaarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Add __ARM_FEATURE_MOPS predefine.
* config/aarch64/arm_acle.h: Add __arm_mops_memset_tag().

gcc/testsuite:
* gcc.target/aarch64/acle/memtag_5.c: Add new test.

---

diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
fe1a20e4e546a68e5f7eddff3bbb0d3e831fbd9b..884a7ba5d10b58fbe182a765041cf80bdaec9615
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -260,6 +260,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_SME_I16I64, "__ARM_FEATURE_SME_I16I64", pfile);
   aarch64_def_or_undef (TARGET_SME_F64F64, "__ARM_FEATURE_SME_F64F64", pfile);
   aarch64_def_or_undef (TARGET_SME2, "__ARM_FEATURE_SME2", pfile);
+  aarch64_def_or_undef (TARGET_MOPS, "__ARM_FEATURE_MOPS", pfile);
 
   /* Not for ACLE, but required to keep "float.h" correct if we switch
  target between implementations that do or do not support ARMv8.2-A
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 
2aa681090fa205449cf1ac63151565f960716189..22ee4b211a55ca6537a1d9e3bf4dad09585071fb
 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -344,6 +344,21 @@ __rndrrs (uint64_t *__res)
 
 #pragma GCC pop_options
 
+#if defined (__ARM_FEATURE_MOPS) && defined (__ARM_FEATURE_MEMORY_TAGGING)
+__extension__ extern __inline void *
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_mops_memset_tag (void *__ptr, int __val, size_t __size)
+{
+  void *__ptr2 = __ptr;
+  __asm volatile ("setgp\t[%0]!, %1!, %x2\n\t"
+ "setgm\t[%0]!, %1!, %x2\n\t"
+ "setge\t[%0]!, %1!, %x2"
+ : "+r" (__ptr2), "+r" (__size)
+ : "rZ" (__val) : "cc", "memory");
+  return __ptr;
+}
+#endif
+
 #define __arm_rsr(__regname) \
   __builtin_aarch64_rsr (__regname)
 
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/memtag_5.c 
b/gcc/testsuite/gcc.target/aarch64/acle/memtag_5.c
new file mode 100644
index 
..79ba1eb39d7c6d577fbe98a3285f8cc618428823
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/memtag_5.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8.8-a+memtag -O2" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#include "arm_acle.h"
+
+#ifndef __ARM_FEATURE_MOPS
+# error __ARM_FEATURE_MOPS not defined!
+#endif
+
+/*
+** set_tag:
+** mov (x[0-9]+), x0
+** setgp   \[\1\]\!, x1\!, xzr
+** setgm   \[\1\]\!, x1\!, xzr
+** setge   \[\1\]\!, x1\!, xzr
+** ret
+*/
+void *set_tag (void *p, size_t size)
+{
+  return __arm_mops_memset_tag (p, 0, size);
+}




Re: [PATCH] [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__

2024-05-31 Thread Alexandre Oliva
On May 31, 2024, Jonathan Wakely  wrote:

> On 31/05/24 11:07 -0300, Alexandre Oliva wrote:
>> --- a/libstdc++-v3/include/pstl/pstl_config.h
[...]
>> -#if defined(__clang__)
>> +#if defined(__GLIBCXX__) ? defined(_GLIBCXX_CLANG) : defined(__clang__)

> This file is also imported from upstream, like Ryu and fast_float.

Oh, yeah, I should have mentioned this one in the proposed commit
message.

The problem here was that it wasn't clear c++config would always be
included, so I figured I'd better be conservative.

> I don't think having a "spurious" definition of _PSTL_CLANG_VERSION
> here actually matters.

Yeah, no, it's the other macros guarded by __clang__ that I'm concerned
about, and since the version macro could replace it, I went for it.

> So either don't change this line at all, or just do a simple
> s/__clang__/_GLIBCXX_CLANG/

If c++config can be counted on, I'd be happy to do that, but I couldn't
tell that it could.

> Does the vxworks toolchain need to support the PSTL headers?

Maybe we can do without them.  I really don't know.  Olivier?

> If not, we could just ignore this file, so the local changes don't
> need to be re-applied when we import a new version of the header from
> upstream.

That sounds desirable indeed.  This change is supposed to spare us
(AdaCore) from precisely this sort of trouble, so it wouldn't be fair if
it made this very kind of trouble for our upstream.

>> --- a/libstdc++-v3/include/std/ranges

>> -#ifdef __clang__ // LLVM-61763 workaround
>> +#ifdef _GLIBCXX_CLANG // LLVM-61763 workaround

> This one doesn't matter, since making these members public for a "fake
> clang" doesn't really hurt anything. For consistency maybe it makes
> sense to use _GLIBCXX_CLANG anyway.

Yeah, uniformity would be good to simplify checking for no new
appearances of __clang__, and to set the example to avoid accidental
additions thereof.

>> --- a/libstdc++-v3/include/std/variant

>> -#if defined(__clang__) && __clang_major__ <= 7
>> +#if defined(_GLIBCXX_CLANG) && __clang_major__ <= 7

> I think we could drop this kluge entirely, clang 7 is old now, we
> generally only support the most recent 3 or 4 clang versions.

Fine with me, but I'd do that in a separate later patch, so that this
goes in, and if it gets backported, it will cover this change, rather
than miss it.  Though, as you say, it doesn't matter much either way.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__

2024-05-31 Thread Matthias Kretz
So __clang__ is turning into the next __GNUC__ ;)

In essence, inside libstdc++ code, __clang__ is going to mean Clang-compatible 
compiler and _GLIBCXX_CLANG then means it's the actual Clang compiler (or a 
vendor variant like Apple's). Would it make sense to adjust the Apple clang 
major version then so that it matches the LLVM major versions? (e.g. https://
stackoverflow.com/a/78014367)

More inline.

On Friday, 31 May 2024 16:50:26 GMT+2 Jonathan Wakely wrote:
> >diff --git a/libstdc++-v3/include/bits/c++config
> >b/libstdc++-v3/include/bits/c++config index b57e3f338e92a..6dca2d9467aa5
> >100644
> >--- a/libstdc++-v3/include/bits/c++config
> >+++ b/libstdc++-v3/include/bits/c++config
> >@@ -481,9 +481,20 @@ _GLIBCXX_END_NAMESPACE_VERSION
> >
> > // Define if compatibility should be provided for -mlong-double-64.
> > #undef _GLIBCXX_LONG_DOUBLE_COMPAT
> >
> >+// Use an alternate macro to test for clang, so as to provide an easy
> >+// workaround for systems (such as vxworks) whose headers require
> >+// __clang__ to be defined, even when compiling with GCC.
> >+#if !defined _GLIBCXX_CLANG && defined __clang__
> >+# define _GLIBCXX_CLANG __clang__
> >+// Turn -D_GLIBCXX_CLANG=0 into -U_GLIBCXX_CLANG, so that
> >+// _GLIBCXX_CLANG can be tested as defined, just like __clang__.
> >+#elif !_GLIBCXX_CLANG
> >+# undef _GLIBCXX_CLANG
> >+#endif
> >+

So passing -D__clang__=17 -D_GLIBCXX_CLANG=1 is possible? Should it be?

> >diff --git a/libstdc++-v3/include/experimental/bits/simd.h
> >b/libstdc++-v3/include/experimental/bits/simd.h index
> >7c52462571902..ea034138fd720 100644
> >--- a/libstdc++-v3/include/experimental/bits/simd.h
> >+++ b/libstdc++-v3/include/experimental/bits/simd.h
> >@@ -606,7 +606,7 @@ template 
> >
> > static_assert(_Bytes > 0);
> > if constexpr (_Bytes == sizeof(int))
> > 
> >   return int();
> >
> >-  #ifdef __clang__
> >+  #ifdef _GLIBCXX_CLANG
> 
> I'd like to hear from Matthias to know he's OK with the simd changes.

Yes, LGTM. There's no maintained upstream copy of simd anymore. And all 
occurrences in the simd code are asking for the actual Clang compiler.

-Matthias


-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
 std::simd
──


signature.asc
Description: This is a digitally signed message part.


[PATCH v2] c++/modules: Fix revealing with using-decls [PR114867]

2024-05-31 Thread Nathaniel Shead
On Tue, May 28, 2024 at 02:57:09PM -0400, Jason Merrill wrote:
> On 5/26/24 09:01, Nathaniel Shead wrote:
> > Is this approach OK?  Alternatively I suppose we could do a deep copy of
> > the overload list when this occurs to ensure we don't affect existing
> > referents, would that be preferable?
> 
> This strategy makes sense, but I have other concerns:
> 
> > Bootstrapped and regtested (so far just modules.exp) on
> > x86_64-pc-linux-gnu, OK for trunk if full regtest succeeds?
> > 
> > -- >8 --
> > 
> > Doing 'remove_node' here is not safe, because it not only mutates the
> > OVERLOAD we're walking over but potentially any other references to this
> > OVERLOAD that are cached from phase-1 template lookup.  This causes the
> > attached testcase to fail because the overload set in X::test no longer
> > contains the 'ns::foo' template once instantiated at the end of the
> 
> It looks like ns::foo has been renamed to just f in the testcase.
> 

Whoops, fixed.

> > file.
> > 
> > This patch works around this by simply not removing the old declaration.
> > This does make the overload list potentially longer than it otherwise
> > would have been, but only when re-exporting the same set of functions in
> > a using-decl.  Additionally, because 'ovl_insert' always prepends these
> > newly inserted overloads, repeated exported using-decls won't continue
> > to add declarations, as the first exported using-decl will be found
> > before the original (unexported) declaration.
> > 
> > PR c++/114867
> > 
> > gcc/cp/ChangeLog:
> > 
> > * name-lookup.cc (do_nonmember_using_decl): Don't remove the
> > existing overload.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/using-17_a.C: New test.
> > * g++.dg/modules/using-17_b.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/name-lookup.cc | 24 +++---
> >   gcc/testsuite/g++.dg/modules/using-17_a.C | 31 +++
> >   gcc/testsuite/g++.dg/modules/using-17_b.C | 13 ++
> >   3 files changed, 53 insertions(+), 15 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/modules/using-17_a.C
> >   create mode 100644 gcc/testsuite/g++.dg/modules/using-17_b.C
> > 
> > diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> > index f1f8c19feb1..130a0e6b5db 100644
> > --- a/gcc/cp/name-lookup.cc
> > +++ b/gcc/cp/name-lookup.cc
> > @@ -5231,25 +5231,19 @@ do_nonmember_using_decl (name_lookup &lookup, bool 
> > fn_scope_p,
> >   if (new_fn == old_fn)
> > {
> > - /* The function already exists in the current
> > -namespace.  We will still want to insert it if
> > -it is revealing a not-revealed thing.  */
> > + /* The function already exists in the current namespace.  */
> >   found = true;
> > - if (!revealing_p)
> > -   ;
> > - else if (old.using_p ())
> > + if (exporting)
> > {
> > - if (exporting)
> > + if (old.using_p ())
> > /* Update in place.  'tis ok.  */
> > OVL_EXPORT_P (old.get_using ()) = true;
> > - ;
> > -   }
> > - else if (DECL_MODULE_EXPORT_P (new_fn))
> > -   ;
> > - else
> > -   {
> > - value = old.remove_node (value);
> > - found = false;
> > + else if (!DECL_MODULE_EXPORT_P (new_fn))
> > +   /* We need to re-insert this function as an exported
> > +  declaration.  We can't remove the existing decl
> > +  because that will change any overloads cached in
> > +  template functions.  */
> > +   found = false;
> 
> What if we're revealing without exporting?  That is, a using-declaration in
> module purview that isn't exported?  Such a declaration should still prevent
> discarding, which is my understanding of the use of "revealing" here.
> 
> It seems like the current code already gets that wrong for e.g.
> 
> M_1.C:
> module;
>  struct A {};
>  inline int f() { return 42; }
> export module M;
>  using ::A;
>  using ::f;
> 
> M_2.C:
> import M;
>  inline int f();
>  struct A a; // { dg-bogus "incomplete" }
> int main() {
>   return f(); // { dg-bogus "undefined" }
> }
> 
> It looks like part of the problem is that add_binding_entity is only
> interested in exported usings, but I think it should also handle revealing
> ones.
> 
> Jason
> 

Right; I hadn't thought about that.  The cleanest way to solve this I
think is to add a new flag to OVERLOAD to indicate their purviewness,
which we can then use in 'add_binding_entity' instead of the current
reliance on exported usings; this is what I've done in the below patch.
(There aren't any more TREE_LANG_FLAG_?s left so I just picked another
unused flag lying around; alternatively

Re: [PATCH v3 4/6] btf: add -fprune-btf option

2024-05-31 Thread David Faust



On 5/31/24 00:07, Richard Biener wrote:
> On Thu, May 30, 2024 at 11:34 PM David Faust  wrote:
>>
>> This patch adds a new option, -fprune-btf, to control BTF debug info
>> generation.
> 
> Can you name it -gprune-btf instead?

Yes, sure.

I think I followed -feliminate-unused-debug-types, but I see the large
majority of options controlling e.g. DWARF use -g instead.

Happy to change it to -gprune-btf.

> 
>> As the name implies, this option enables a kind of "pruning" of the BTF
>> information before it is emitted.  When enabled, rather than emitting
>> all type information translated from DWARF, only information for types
>> directly used in the source program is emitted.
>>
>> The primary purpose of this pruning is to reduce the amount of
>> unnecessary BTF information emitted, especially for BPF programs.  It is
>> very common for BPF programs to incldue Linux kernel internal headers in
>> order to have access to kernel data structures.  However, doing so often
>> has the side effect of also adding type definitions for a large number
>> of types which are not actually used by nor relevant to the program.
>> In these cases, -fprune-btf commonly reduces the size of the resulting
>> BTF information by 10x or more, as seen on average when compiling Linux
>> kernel BPF selftests.  This both slims down the size of the resulting
>> object and reduces the time required by the BPF loader to verify the
>> program and its BTF information.
>>
>> Note that the pruning implemented in this patch follows the same rules
>> as the BTF pruning performed unconditionally by LLVM's BPF backend when
>> generating BTF.  In particular, the main sources of pruning are:
>>
>>   1) Only generate BTF for types used by variables and functions at
>>  the file scope.  Note that with or without pruning, BTF_KIND_VAR
>>  entries are only generated for variables present in the final
>>  object - unused static variables or variables completely optimized
>>  away must not have VAR entries in BTF.
>>
>>   2) Avoid emitting full BTF for struct and union types which are only
>>  pointed-to by members of other struct/union types.  In these cases,
>>  the full BTF_KIND_STRUCT or BTF_KIND_UNION which would normally
>>  be emitted is replaced with a BTF_KIND_FWD, as though the
>>  underlying type was a forward-declared struct or union type.
>>
>> gcc/
>> * btfout.cc (btf_used_types): New hash set.
>> (struct btf_fixup): New.
>> (fixups, forwards): New vecs.
>> (btf_output): Calculate num_types depending on flag_prune_btf.
>> (btf_early_finsih): New initialization for flag_prune_btf.
>> (btf_add_used_type): New function.
>> (btf_used_type_list_cb): Likewise.
>> (btf_late_collect_pruned_types): Likewise.
>> (btf_late_add_vars): Handle special case for variables in ".maps"
>> section when generating BTF for BPF CO-RE target.
>> (btf_late_finish): Use btf_late_collect_pruned_types when
>> flag_prune_btf in effect.  Move some initialization to 
>> btf_early_finish.
>> (btf_finalize): Additional deallocation for flag_prune_btf.
>> * common.opt (fprune-btf): New flag.
>> * ctfc.cc (init_ctf_strtable): Make non-static.
>> * ctfc.h (struct ctf_dtdef): Add visited_children_p boolean flag.
>> (init_ctf_strtable, ctfc_delete_strtab): Make extern.
>> * doc/invoke.texi (Debugging Options): Document -fprune-btf.
>>
>> gcc/testsuite/
>> * gcc.dg/debug/btf/btf-prune-1.c: New test.
>> * gcc.dg/debug/btf/btf-prune-2.c: Likewise.
>> * gcc.dg/debug/btf/btf-prune-3.c: Likewise.
>> * gcc.dg/debug/btf/btf-prune-maps.c: Likewise.
>> ---
>>  gcc/btfout.cc | 359 +-
>>  gcc/common.opt|   4 +
>>  gcc/ctfc.cc   |   2 +-
>>  gcc/ctfc.h|   3 +
>>  gcc/doc/invoke.texi   |  20 +
>>  gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c  |  25 ++
>>  gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c  |  33 ++
>>  gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c  |  35 ++
>>  .../gcc.dg/debug/btf/btf-prune-maps.c |  20 +
>>  9 files changed, 494 insertions(+), 7 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c
>>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-maps.c
>>
>> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
>> index 32fda14f704b..a7da164f6b31 100644
>> --- a/gcc/btfout.cc
>> +++ b/gcc/btfout.cc
>> @@ -828,7 +828,10 @@ output_btf_types (ctf_container_ref ctfc)
>>  {
>>size_t i;
>>size_t num_types;
>> -  num_types = ctfc->ctfc_types->elements ();
>> +  if (flag_prune_btf)
>> +num_types = max_translated_id;
>> +  else
>>

Re: [PATCH] testsuite: Improve check-function-bodies

2024-05-31 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Improve check-function-bodies by allowing single-character function names.
> Also skip '#' comments which may be emitted from inline assembler.
>
> Passes regress, OK for commit?
>
> gcc/testsuite:
> * lib/scanasm.exp (configure_check-function-bodies): Allow single-char
> function names.  Skip '#' comments.
>
> ---
>
> diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
> index 
> 6cf9997240deec274a191103d21690d80e34ba95..0e461ef260b7a6fee5a9c60d0571e46468f752c0
>  100644
> --- a/gcc/testsuite/lib/scanasm.exp
> +++ b/gcc/testsuite/lib/scanasm.exp
> @@ -869,15 +869,15 @@ proc configure_check-function-bodies { config } {
>  # Regexp for the start of a function definition (name in \1).
>  if { [istarget nvptx*-*-*] } {
>   set up_config(start) {
> - {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S+)$}
> + {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S*)$}
>   }
>  } elseif { [istarget *-*-darwin*] } {
>   set up_config(start) {
> - {^_([a-zA-Z_]\S+):$}
> + {^_([a-zA-Z_]\S*):$}
>   {^LFB[0-9]+:}
>   }
>  } else {
> - set up_config(start) {{^([a-zA-Z_]\S+):$}}
> + set up_config(start) {{^([a-zA-Z_]\S*):$}}
>  }
>  
>  # Regexp for the end of a function definition.

This part is ok, thanks.

> @@ -899,9 +899,9 @@ proc configure_check-function-bodies { config } {
>  } else {
>   # Skip lines beginning with labels ('.L[...]:') or other directives
>   # ('.align', '.cfi_startproc', '.quad [...]', '.text', etc.), '//' or
> - # '@' comments ('-fverbose-asm' or ARM-style, for example), or empty
> - # lines.
> - set up_config(fluff) {^\s*(?:\.|//|@|$)}
> + # '@' or '#' comments ('-fverbose-asm' or ARM-style, for example), or
> + # empty lines.
> + set up_config(fluff) {^\s*(?:\.|//|@|#|$)}
>  }
>  
>  # Regexp for expected output lines prefix.

I think this should be done separately.  It looks like at least
gcc.target/riscv/target-attr-06.c relies on the current behaviour.

Richard


Re: [PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Add __ARM_FEATURE_MOPS predefine.  Add support for ACLE __arm_mops_memset_tag.
>
> Passes regress, OK for commit?
>
> gcc:
> * config/aaarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
> Add __ARM_FEATURE_MOPS predefine.
> * config/aarch64/arm_acle.h: Add __arm_mops_memset_tag().
>
> gcc/testsuite:
> * gcc.target/aarch64/acle/memtag_5.c: Add new test.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index 
> fe1a20e4e546a68e5f7eddff3bbb0d3e831fbd9b..884a7ba5d10b58fbe182a765041cf80bdaec9615
>  100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -260,6 +260,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>aarch64_def_or_undef (TARGET_SME_I16I64, "__ARM_FEATURE_SME_I16I64", 
> pfile);
>aarch64_def_or_undef (TARGET_SME_F64F64, "__ARM_FEATURE_SME_F64F64", 
> pfile);
>aarch64_def_or_undef (TARGET_SME2, "__ARM_FEATURE_SME2", pfile);
> +  aarch64_def_or_undef (TARGET_MOPS, "__ARM_FEATURE_MOPS", pfile);
>  
>/* Not for ACLE, but required to keep "float.h" correct if we switch
>   target between implementations that do or do not support ARMv8.2-A
> diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
> index 
> 2aa681090fa205449cf1ac63151565f960716189..22ee4b211a55ca6537a1d9e3bf4dad09585071fb
>  100644
> --- a/gcc/config/aarch64/arm_acle.h
> +++ b/gcc/config/aarch64/arm_acle.h
> @@ -344,6 +344,21 @@ __rndrrs (uint64_t *__res)
>  
>  #pragma GCC pop_options
>  
> +#if defined (__ARM_FEATURE_MOPS) && defined (__ARM_FEATURE_MEMORY_TAGGING)
> +__extension__ extern __inline void *
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_mops_memset_tag (void *__ptr, int __val, size_t __size)
> +{
> +  void *__ptr2 = __ptr;
> +  __asm volatile ("setgp\t[%0]!, %1!, %x2\n\t"
> +   "setgm\t[%0]!, %1!, %x2\n\t"
> +   "setge\t[%0]!, %1!, %x2"
> +   : "+r" (__ptr2), "+r" (__size)
> +   : "rZ" (__val) : "cc", "memory");
> +  return __ptr;
> +}
> +#endif
> +

I think this should be in a push_options/pop_options block, as for other
intrinsics that require certain features.

What was the reason for using an inline asm rather than a builtin?
Feels a bit old school. :)  Using a builtin should mean that the
RTL optimisers see the extent of the write.

Thanks,
Richard

>  #define __arm_rsr(__regname) \
>__builtin_aarch64_rsr (__regname)
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/memtag_5.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/memtag_5.c
> new file mode 100644
> index 
> ..79ba1eb39d7c6d577fbe98a3285f8cc618428823
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/memtag_5.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8.8-a+memtag -O2" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#include "arm_acle.h"
> +
> +#ifndef __ARM_FEATURE_MOPS
> +# error __ARM_FEATURE_MOPS not defined!
> +#endif
> +
> +/*
> +** set_tag:
> +**   mov (x[0-9]+), x0
> +**   setgp   \[\1\]\!, x1\!, xzr
> +**   setgm   \[\1\]\!, x1\!, xzr
> +**   setge   \[\1\]\!, x1\!, xzr
> +**   ret
> +*/
> +void *set_tag (void *p, size_t size)
> +{
> +  return __arm_mops_memset_tag (p, 0, size);
> +}


Re: [PATCH v3 4/6] btf: add -fprune-btf option

2024-05-31 Thread Richard Biener



> Am 31.05.2024 um 17:58 schrieb David Faust :
> 
> 
> 
>> On 5/31/24 00:07, Richard Biener wrote:
>>> On Thu, May 30, 2024 at 11:34 PM David Faust  wrote:
>>> 
>>> This patch adds a new option, -fprune-btf, to control BTF debug info
>>> generation.
>> 
>> Can you name it -gprune-btf instead?
> 
> Yes, sure.
> 
> I think I followed -feliminate-unused-debug-types,

Unfortunately GCC is full of historical mistakes…

> but I see the large
> majority of options controlling e.g. DWARF use -g instead.
> 
> Happy to change it to -gprune-btf.

Please.

Richard 

>> 
>>> As the name implies, this option enables a kind of "pruning" of the BTF
>>> information before it is emitted.  When enabled, rather than emitting
>>> all type information translated from DWARF, only information for types
>>> directly used in the source program is emitted.
>>> 
>>> The primary purpose of this pruning is to reduce the amount of
>>> unnecessary BTF information emitted, especially for BPF programs.  It is
>>> very common for BPF programs to incldue Linux kernel internal headers in
>>> order to have access to kernel data structures.  However, doing so often
>>> has the side effect of also adding type definitions for a large number
>>> of types which are not actually used by nor relevant to the program.
>>> In these cases, -fprune-btf commonly reduces the size of the resulting
>>> BTF information by 10x or more, as seen on average when compiling Linux
>>> kernel BPF selftests.  This both slims down the size of the resulting
>>> object and reduces the time required by the BPF loader to verify the
>>> program and its BTF information.
>>> 
>>> Note that the pruning implemented in this patch follows the same rules
>>> as the BTF pruning performed unconditionally by LLVM's BPF backend when
>>> generating BTF.  In particular, the main sources of pruning are:
>>> 
>>>  1) Only generate BTF for types used by variables and functions at
>>> the file scope.  Note that with or without pruning, BTF_KIND_VAR
>>> entries are only generated for variables present in the final
>>> object - unused static variables or variables completely optimized
>>> away must not have VAR entries in BTF.
>>> 
>>>  2) Avoid emitting full BTF for struct and union types which are only
>>> pointed-to by members of other struct/union types.  In these cases,
>>> the full BTF_KIND_STRUCT or BTF_KIND_UNION which would normally
>>> be emitted is replaced with a BTF_KIND_FWD, as though the
>>> underlying type was a forward-declared struct or union type.
>>> 
>>> gcc/
>>>* btfout.cc (btf_used_types): New hash set.
>>>(struct btf_fixup): New.
>>>(fixups, forwards): New vecs.
>>>(btf_output): Calculate num_types depending on flag_prune_btf.
>>>(btf_early_finsih): New initialization for flag_prune_btf.
>>>(btf_add_used_type): New function.
>>>(btf_used_type_list_cb): Likewise.
>>>(btf_late_collect_pruned_types): Likewise.
>>>(btf_late_add_vars): Handle special case for variables in ".maps"
>>>section when generating BTF for BPF CO-RE target.
>>>(btf_late_finish): Use btf_late_collect_pruned_types when
>>>flag_prune_btf in effect.  Move some initialization to 
>>> btf_early_finish.
>>>(btf_finalize): Additional deallocation for flag_prune_btf.
>>>* common.opt (fprune-btf): New flag.
>>>* ctfc.cc (init_ctf_strtable): Make non-static.
>>>* ctfc.h (struct ctf_dtdef): Add visited_children_p boolean flag.
>>>(init_ctf_strtable, ctfc_delete_strtab): Make extern.
>>>* doc/invoke.texi (Debugging Options): Document -fprune-btf.
>>> 
>>> gcc/testsuite/
>>>* gcc.dg/debug/btf/btf-prune-1.c: New test.
>>>* gcc.dg/debug/btf/btf-prune-2.c: Likewise.
>>>* gcc.dg/debug/btf/btf-prune-3.c: Likewise.
>>>* gcc.dg/debug/btf/btf-prune-maps.c: Likewise.
>>> ---
>>> gcc/btfout.cc | 359 +-
>>> gcc/common.opt|   4 +
>>> gcc/ctfc.cc   |   2 +-
>>> gcc/ctfc.h|   3 +
>>> gcc/doc/invoke.texi   |  20 +
>>> gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c  |  25 ++
>>> gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c  |  33 ++
>>> gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c  |  35 ++
>>> .../gcc.dg/debug/btf/btf-prune-maps.c |  20 +
>>> 9 files changed, 494 insertions(+), 7 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c
>>> create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c
>>> create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c
>>> create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-maps.c
>>> 
>>> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
>>> index 32fda14f704b..a7da164f6b31 100644
>>> --- a/gcc/btfout.cc
>>> +++ b/gcc/btfout.cc
>>> @@ -828,7 +828,10 @@ output_btf_types (

Re: [PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Wilco Dijkstra
Hi Richard,

> I think this should be in a push_options/pop_options block, as for other
> intrinsics that require certain features.

But then the intrinsic would always be defined, which is contrary to what the
ACLE spec demands - it would not give a compilation error at the callsite
but give assembler errors (potentially in different functions after inlining).

> What was the reason for using an inline asm rather than a builtin?
> Feels a bit old school. :)  Using a builtin should mean that the
> RTL optimisers see the extent of the write.

Given this intrinsic will be used very rarely, if ever, it does not make sense
to provide anything more than the basic functionality.

Cheers,
Wilco

Re: [PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Kyrill Tkachov
Hi Wilco,

On Fri, May 31, 2024 at 6:38 PM Wilco Dijkstra 
wrote:

> Hi Richard,
>
> > I think this should be in a push_options/pop_options block, as for other
> > intrinsics that require certain features.
>
> But then the intrinsic would always be defined, which is contrary to what
> the
> ACLE spec demands - it would not give a compilation error at the callsite
> but give assembler errors (potentially in different functions after
> inlining).
>
> What was the reason for using an inline asm rather than a builtin?
> > Feels a bit old school. :)  Using a builtin should mean that the
> > RTL optimisers see the extent of the write.
>
> Given this intrinsic will be used very rarely, if ever, it does not make
> sense
> to provide anything more than the basic functionality.
>

I agree that it's unlikely to get much use.
IMO we should be moving the arm_acle.h header to be implemented in
the #pragma GCC aarch64 "arm_acle.h" at the top as much as possible.
So I'd expect handle_arm_acle_h to be extended to inject these definitions
when appropriate and during expansion it'd just generate the RTL pattern
for it, which needn't be exposed as an implementation-defined builtin.
Thanks,
Kyrill

Cheers,
> Wilco


Re: [PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Hi Richard,
>
>> I think this should be in a push_options/pop_options block, as for other
>> intrinsics that require certain features.
>
> But then the intrinsic would always be defined, which is contrary to what the
> ACLE spec demands - it would not give a compilation error at the callsite
> but give assembler errors (potentially in different functions after inlining).

Inlining will fail with an error if the callsite doesn't have the right
features.  E.g.: https://godbolt.org/z/7zz59PhTE

The error message isn't great, but it is at least an error. :)

>> What was the reason for using an inline asm rather than a builtin?
>> Feels a bit old school. :)  Using a builtin should mean that the
>> RTL optimisers see the extent of the write.
>
> Given this intrinsic will be used very rarely, if ever, it does not make sense
> to provide anything more than the basic functionality.

But a lot of effort went into making the old inline asm ACLE
implementations use builtins instead.  It even seems to have been
a complete transition.  (Although we still have:

/* Start of temporary inline asm implementations.  */
...
/* End of temporary inline asm.  */

heh.)

So this feels like a regression in terms of implementation methodology.

I won't object if another maintainer approves the function in this form,
but I'd only be comfortable approving a builtin version.

Thanks,
Richard


Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Ajit Agarwal
Hello Richard:

On 31/05/24 8:08 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> On 31/05/24 3:23 pm, Richard Sandiford wrote:
>>> Ajit Agarwal  writes:
 Hello All:

 Common infrastructure using generic code for pair mem fusion of different
 targets.

 rs6000 target specific specific code implements virtual functions defined
 by generic code.

 Code is implemented with pure virtual functions to interface with target
 code.

 Target specific code are added in rs6000-mem-fusion.cc and additional 
 virtual
 function implementation required for rs6000 are added in 
 aarch64-ldp-fusion.cc.

 Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu.

 Thanks & Regards
 Ajit


 aarch64, rs6000, middle-end: Add implementation for different targets for 
 pair mem fusion

 Common infrastructure using generic code for pair mem fusion of different
 targets.

 rs6000 target specific specific code implements virtual functions defined
 by generic code.

 Code is implemented with pure virtual functions to interface with target
 code.

 Target specific code are added in rs6000-mem-fusion.cc and additional 
 virtual
 function implementation required for rs6000 are added in 
 aarch64-ldp-fusion.cc.

 2024-05-31  Ajit Kumar Agarwal  

 gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc: Add target specific
implementation of additional virtual functions added in pair_fusion
struct.
* config/rs6000/rs6000-passes.def: New mem fusion pass
before pass_early_remat.
* config/rs6000/rs6000-mem-fusion.cc: Add new pass.
Add target specific implementation using pure virtual
functions.
* config.gcc: Add new object file.
* config/rs6000/rs6000-protos.h: Add new prototype for mem
fusion pass.
* config/rs6000/t-rs6000: Add new rule.
* rtl-ssa/accesses.h: Moved set_is_live_out_use as public
from private.

 gcc/testsuite/ChangeLog:

* g++.target/powerpc/me-fusion.C: New test.
* g++.target/powerpc/mem-fusion-1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
 ---
>>>
>>> This isn't a complete review, just some initial questions & comments
>>> about selected parts.
>>>
 [...]
 +/* Check whether load can be fusable or not.
 +   Return true if dependent use is UNSPEC otherwise false.  */
 +bool
 +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
 +{
 +  rtx_insn *insn = info->rtl ();
 +
 +  for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1))
 +if (REG_NOTE_KIND (note) == REG_EQUAL
 +  || REG_NOTE_KIND (note) == REG_EQUIV)
 +  return false;
>>>
>>> It's unusual to punt on an optimisation because of a REG_EQUAL/EQUIV
>>> note.  What's the reason for doing this?  Are you trying to avoid
>>> fusing pairs before reload that are equivalent to a MEM (i.e. have
>>> a natural spill slot)?  I think Alex hit a similar situation.
>>>
>>
>> We have used the above check because of some SPEC benchmarks failing with
>> with MEM pairs having REG_EQUAL/EQUIV notes.
>>
>> By adding the checks the benchmarks passes and also it improves the
>> performance.
>>
>> This checks were added during initial implementation of pair fusion
>> pass.
>>
>> I will investigate further if this check is still required or not.
> 
> Thanks.  If it does affect SPEC results, it would be good to look
> at the underlying reason, as a justification for the check.
> 
> AIUI, the case Alex was due to the way that the RA recognises:
> 
>   (set (reg R) (mem address-of-a-stack-variable))
> REG_EQUIV: (mem address-of-a-stack-variable)
> 
> where the REG_EQUIV is either explicit or detected by the RA.
> If R needs to be spilled, it can then be spilled to its existing
> location on the stack.  And if R needs to be spilled in the
> instruction above (because of register pressure before the first
> use of R), the RA is able to delete the instruction.
> 
> But if that is the reason, the condition should be restricted
> to cases in which the note is a memory.
> 
> I think Alex had tried something similar and found that it wasn't
> always effective.
> 

Sure I will check.
>> [...]
 + && info->is_real ())
 +{
 +  rtx_insn *rtl_insn = info->rtl ();
 +  rtx set = single_set (rtl_insn);
 +
 +  if (set == NULL_RTX)
 +return false;
 +
 +  rtx op0 = SET_SRC (set);
 +  if (GET_CODE (op0) != UNSPEC)
 +return false;
>>>
>>> What's the motivation for rejecting unspecs?  It's unsual to treat
>>> all unspecs as a distinct group.
>>>
>>> Also, using single_set means that the function still lets through
>>> parallels of two se

Re: [Patch, aarch64, middle-end\ v4: Move pair_fusion pass from aarch64 to middle-end

2024-05-31 Thread Richard Sandiford
Marc Poulhiès  writes:
> Hello,
>
> I can't bootstrap using gcc 5.5 since this change. It fails with:
>
> .../gcc/pair-fusion.cc: In member function ‘bool 
> pair_fusion_bb_info::fuse_pair(bool, unsigned int, int, rtl_ssa::insn_info*, 
> rtl_ssa::in
> sn_info*, base_cand&, const rtl_ssa::insn_range_info&)’:
> .../gcc/pair-fusion.cc:1790:40: error: ‘writeback’ is not a class, namespace, 
> or enumeration
>if (m_pass->should_handle_writeback (writeback::ALL)
> ^
> Is it possible that C++11 enum classes are not correctly supported in
> older GCC?

Looks to be due to an overloading of "writeback", which is also a local
variable in that function.

One fix would be to rename the type to "writeback_type".
FWIW, the "enum"s in "enum writeback" can also be removed,
so it'd be s/enum writeback/writeback_type/.

Richard


Re: [PATCH] [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__

2024-05-31 Thread Alexandre Oliva
On May 31, 2024, Matthias Kretz  wrote:

> So __clang__ is turning into the next __GNUC__ ;)

Yeah :-(

> So passing -D__clang__=17 -D_GLIBCXX_CLANG=1 is possible? Should it be?

I thought setting them independently could give us some testing
flexibility, thought it's not for the faint of heart ;-)

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH v2] [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__

2024-05-31 Thread Alexandre Oliva
On May 31, 2024, Alexandre Oliva  wrote:

>> So either don't change this line at all, or just do a simple
>> s/__clang__/_GLIBCXX_CLANG/

> If c++config can be counted on, I'd be happy to do that, but I couldn't
> tell that it could.

Here's what I've retested on x86_64-linux-gnu and, slightly adjusted for
gcc-13, on arm-vx7r2.  Ok to install?


[libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__

A proprietary embedded operating system that uses clang as its primary
compiler ships headers that require __clang__ to be defined.  Defining
that macro causes libstdc++ to adopt workarounds that work for clang
but that break for GCC.

So, introduce a _GLIBCXX_CLANG macro, and a convention to test for it
rather than for __clang__, so that a GCC variant that adds -D__clang__
to satisfy system headers can also -D_GLIBCXX_CLANG=0 to avoid
workarounds that are not meant for GCC.

I've left fast_float and ryu files alone, their tests for __clang__
don't seem to be harmful for GCC, they don't include bits/c++config,
and patching such third-party files would just make trouble for
updating them without visible benefit.  pstl_config.h, though also
imported, required adjustment.


for  libstdc++-v3/ChangeLog

* include/bits/c++config (_GLIBCXX_CLANG): Define or undefine.
* include/bits/locale_facets_nonio.tcc: Test for it.
* include/bits/stl_bvector.h: Likewise.
* include/c_compatibility/stdatomic.h: Likewise.
* include/experimental/bits/simd.h: Likewise.
* include/experimental/bits/simd_builtin.h: Likewise.
* include/experimental/bits/simd_detail.h: Likewise.
* include/experimental/bits/simd_x86.h: Likewise.
* include/experimental/simd: Likewise.
* include/std/complex: Likewise.
* include/std/ranges: Likewise.
* include/std/variant: Likewise.
* include/pstl/pstl_config.h: Likewise.
---
 libstdc++-v3/include/bits/c++config|   13 -
 libstdc++-v3/include/bits/locale_facets_nonio.tcc  |2 +-
 libstdc++-v3/include/bits/stl_bvector.h|2 +-
 libstdc++-v3/include/c_compatibility/stdatomic.h   |2 +-
 libstdc++-v3/include/experimental/bits/simd.h  |   14 +++---
 .../include/experimental/bits/simd_builtin.h   |4 ++--
 .../include/experimental/bits/simd_detail.h|8 
 libstdc++-v3/include/experimental/bits/simd_x86.h  |   12 ++--
 libstdc++-v3/include/experimental/simd |2 +-
 libstdc++-v3/include/pstl/pstl_config.h|4 ++--
 libstdc++-v3/include/std/complex   |4 ++--
 libstdc++-v3/include/std/ranges|8 
 libstdc++-v3/include/std/variant   |2 +-
 13 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index b57e3f338e92a..6dca2d9467aa5 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -481,9 +481,20 @@ _GLIBCXX_END_NAMESPACE_VERSION
 // Define if compatibility should be provided for -mlong-double-64.
 #undef _GLIBCXX_LONG_DOUBLE_COMPAT
 
+// Use an alternate macro to test for clang, so as to provide an easy
+// workaround for systems (such as vxworks) whose headers require
+// __clang__ to be defined, even when compiling with GCC.
+#if !defined _GLIBCXX_CLANG && defined __clang__
+# define _GLIBCXX_CLANG __clang__
+// Turn -D_GLIBCXX_CLANG=0 into -U_GLIBCXX_CLANG, so that
+// _GLIBCXX_CLANG can be tested as defined, just like __clang__.
+#elif !_GLIBCXX_CLANG
+# undef _GLIBCXX_CLANG
+#endif
+
 // Define if compatibility should be provided for alternative 128-bit long
 // double formats. Not possible for Clang until __ibm128 is supported.
-#ifndef __clang__
+#ifndef _GLIBCXX_CLANG
 #undef _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT
 #endif
 
diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc 
b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
index 8f67be5a6147b..72136f42f0866 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -1465,7 +1465,7 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
   ctype<_CharT> const& __ctype = use_facet >(__loc);
   __err = ios_base::goodbit;
   bool __use_state = false;
-#if __GNUC__ >= 5 && !defined(__clang__)
+#if __GNUC__ >= 5 && !defined(_GLIBCXX_CLANG)
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wpmf-conversions"
   // Nasty hack.  The C++ standard mandates that get invokes the do_get
diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index d567e26f4e430..52153cadf8f70 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -185,7 +185,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 void
 _M_assume_normalized() const
 {
-#if __has_attribute(__assume__) && !defined(__clan

Re: [PATCH v6 1/8] Improve must tail in RTL backend

2024-05-31 Thread Andi Kleen
> I think the ultimate knowledge if a call can or cannot be implemented as 
> tail-call lies within calls.cc/expand_call: It is inherently 
> target and ABI specific how arguments and returns are layed out, how the 
> stack frame is generated, if arguments are or aren't removed by callers 
> or callees and so on; all of that being knowledge that tree-tailcall 
> doesn't have and doesn't want to have.  As such tree-tailcall should 
> not be regarded as ultimate truth, and failures of tree-tailcall to 
> recognize something as tail-callable shouldn't matter.

It's not the ultimate truth, but some of the checks it does are not
duplicated at expand time nor the backend. So it's one necessary pre condition
with the current code base.

Yes maybe the checks could be all moved, but that's a much larger
project.

-Andi


MAINTAINERS: Add myself to Write After Approval and DCO

2024-05-31 Thread Pengxuan Zheng
ChangeLog:

* MAINTAINERS: Add myself to Write After Approval and DCO.

Signed-off-by: Pengxuan Zheng 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e2870eef2ef..6444e6ea2f1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -743,6 +743,7 @@ Dennis Zhang

 Yufeng Zhang   
 Qing Zhao  
 Shujing Zhao   
+Pengxuan Zheng 
 Jon Ziegler
 Roman Zippel   
 Josef Zlomek   
@@ -789,3 +790,4 @@ Martin Uecker   

 Jonathan Wakely
 Alexander Westbrooks   
 Chung-Ju Wu
+Pengxuan Zheng 
-- 
2.17.1



RE: [PATCH] aarch64: testsuite: Explicitly add -mlittle-endian to vget_low_2.c

2024-05-31 Thread Pengxuan Zheng (QUIC)
> > Pengxuan Zheng  writes:
> > > vget_low_2.c is a test case for little-endian, but we missed the
> > > -mlittle-endian flag in r15-697-ga2e4fe5a53cf75.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/vget_low_2.c: Add -mlittle-endian.
> >
> > Ok, thanks.
> >
> > If you'd like write access, please follow the instructions on
> > https://gcc.gnu.org/gitwrite.html (I'll sponsor).
> >
> > Richard
> 
> Thanks a lot, Richard! I really appreciate it!
> 
> I have submitted a request for write access naming you as sponsor.
> 
> Thanks,
> Pengxuan

Thanks, Richard! I've been granted write access now and committed
the patch as r15-950-g7fb62627cfb3e0.
> >
> > > Signed-off-by: Pengxuan Zheng 
> > > ---
> > >  gcc/testsuite/gcc.target/aarch64/vget_low_2.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> > > b/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> > > index 44414e1c043..93e9e664ee9 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O3 -fdump-tree-optimized" } */
> > > +/* { dg-options "-O3 -fdump-tree-optimized -mlittle-endian" } */
> > >
> > >  #include 


[PATCH] check_GNU_style: Use raw strings.

2024-05-31 Thread Robin Dapp
Hi,

this silences some warnings when using check_GNU_style.

I didn't expect this to have any bootstrap or regtest impact
but I still ran it on x86 - no change.

Regards
 Robin

contrib/ChangeLog:

* check_GNU_style_lib.py: Use raw strings for regexps.
---
 contrib/check_GNU_style_lib.py | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/contrib/check_GNU_style_lib.py b/contrib/check_GNU_style_lib.py
index f1a120fa8d3..6dbe4b53559 100755
--- a/contrib/check_GNU_style_lib.py
+++ b/contrib/check_GNU_style_lib.py
@@ -103,7 +103,7 @@ class SpacesCheck:
 
 class SpacesAndTabsMixedCheck:
 def __init__(self):
-self.re = re.compile('\ \t')
+self.re = re.compile(r'\ \t')
 
 def check(self, filename, lineno, line):
 stripped = line.lstrip()
@@ -115,7 +115,7 @@ class SpacesAndTabsMixedCheck:
 
 class TrailingWhitespaceCheck:
 def __init__(self):
-self.re = re.compile('(\s+)$')
+self.re = re.compile(r'(\s+)$')
 
 def check(self, filename, lineno, line):
 assert(len(line) == 0 or line[-1] != '\n')
@@ -128,7 +128,7 @@ class TrailingWhitespaceCheck:
 
 class SentenceSeparatorCheck:
 def __init__(self):
-self.re = re.compile('\w\.(\s|\s{3,})\w')
+self.re = re.compile(r'\w\.(\s|\s{3,})\w')
 
 def check(self, filename, lineno, line):
 m = self.re.search(line)
@@ -140,7 +140,7 @@ class SentenceSeparatorCheck:
 
 class SentenceEndOfCommentCheck:
 def __init__(self):
-self.re = re.compile('\w\.(\s{0,1}|\s{3,})\*/')
+self.re = re.compile(r'\w\.(\s{0,1}|\s{3,})\*/')
 
 def check(self, filename, lineno, line):
 m = self.re.search(line)
@@ -152,7 +152,7 @@ class SentenceEndOfCommentCheck:
 
 class SentenceDotEndCheck:
 def __init__(self):
-self.re = re.compile('\w(\s*\*/)')
+self.re = re.compile(r'\w(\s*\*/)')
 
 def check(self, filename, lineno, line):
 m = self.re.search(line)
@@ -164,7 +164,7 @@ class SentenceDotEndCheck:
 class FunctionParenthesisCheck:
 # TODO: filter out GTY stuff
 def __init__(self):
-self.re = re.compile('\w(\s{2,})?(\()')
+self.re = re.compile(r'\w(\s{2,})?(\()')
 
 def check(self, filename, lineno, line):
 if '#define' in line:
@@ -179,7 +179,7 @@ class FunctionParenthesisCheck:
 
 class SquareBracketCheck:
 def __init__(self):
-self.re = re.compile('\w\s+(\[)')
+self.re = re.compile(r'\w\s+(\[)')
 
 def check(self, filename, lineno, line):
 if filename.endswith('.md'):
@@ -194,7 +194,7 @@ class SquareBracketCheck:
 
 class ClosingParenthesisCheck:
 def __init__(self):
-self.re = re.compile('\S\s+(\))')
+self.re = re.compile(r'\S\s+(\))')
 
 def check(self, filename, lineno, line):
 m = self.re.search(line)
@@ -208,7 +208,7 @@ class BracesOnSeparateLineCheck:
 # This will give false positives for C99 compound literals.
 
 def __init__(self):
-self.re = re.compile('(\)|else)\s*({)')
+self.re = re.compile(r'(\)|else)\s*({)')
 
 def check(self, filename, lineno, line):
 m = self.re.search(line)
@@ -219,7 +219,7 @@ class BracesOnSeparateLineCheck:
 
 class TrailinigOperatorCheck:
 def __init__(self):
-regex = '^\s.*(([^a-zA-Z_]\*)|([-%<=&|^?])|([^*]/)|([^:][+]))$'
+regex = r'^\s.*(([^a-zA-Z_]\*)|([-%<=&|^?])|([^*]/)|([^:][+]))$'
 self.re = re.compile(regex)
 
 def check(self, filename, lineno, line):
-- 
2.45.1


Re: [PATCH v2] c++/modules: Fix revealing with using-decls [PR114867]

2024-05-31 Thread Jason Merrill

On 5/31/24 11:57, Nathaniel Shead wrote:

On Tue, May 28, 2024 at 02:57:09PM -0400, Jason Merrill wrote:

What if we're revealing without exporting?  That is, a using-declaration in
module purview that isn't exported?  Such a declaration should still prevent
discarding, which is my understanding of the use of "revealing" here.

It seems like the current code already gets that wrong for e.g.

M_1.C:
module;
  struct A {};
  inline int f() { return 42; }
export module M;
  using ::A;
  using ::f;

M_2.C:
import M;
  inline int f();
  struct A a; // { dg-bogus "incomplete" }
int main() {
   return f(); // { dg-bogus "undefined" }
}

It looks like part of the problem is that add_binding_entity is only
interested in exported usings, but I think it should also handle revealing
ones.


Right; I hadn't thought about that.  The cleanest way to solve this I
think is to add a new flag to OVERLOAD to indicate their purviewness,
which we can then use in 'add_binding_entity' instead of the current
reliance on exported usings; this is what I've done in the below patch.
(There aren't any more TREE_LANG_FLAG_?s left so I just picked another
unused flag lying around; alternatively I could create _7, there does
seem to be spare bits in tree_base.)

Another option would be to do like what I've done in my workaround for
non-functions and just mark the original decl as being in PURVIEW_P; I'm
not a huge fan of this though.


Agreed.


+  /* FIXME: Handle exporting declarations from a different scope
+without also marking those declarations as exported.
+This will require not just binding directly to the underlying
+value; see c++/114863 and c++/114865.  We allow this for purview
+declarations for now as this doesn't (currently) cause ICEs
+later down the line, but should also be revisited then.  */


I'm not sure what "then" you're referring to?

OK either clarifying, or striking "also" and "then".

Jason



Re: [PATCH] Fix PR c++/109958: ICE taking the address of bound static member function brought into derived class by using-declaration

2024-05-31 Thread Jason Merrill

On 5/31/24 09:58, Simon Martin wrote:

From: Simon Martin 

We currently ICE upon the following because we don't properly handle the
overload created for B::f through the using statement.

=== cut here ===
struct B { static int f(); };
struct D : B { using B::f; };
void f(D d) { &d.f; }
=== cut here ===

This patch makes build_class_member_access_expr and cp_build_addr_expr_1 handle
such overloads, and fixes the PR.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/109958

gcc/cp/ChangeLog:

* typeck.cc (build_class_member_access_expr): Handle single OVERLOADs.
(cp_build_addr_expr_1): Likewise.


OK.


gcc/testsuite/ChangeLog:

* g++.dg/overload/using6.C: New test.

---
  gcc/cp/typeck.cc   | 5 +
  gcc/testsuite/g++.dg/overload/using6.C | 5 +
  2 files changed, 10 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/overload/using6.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 1b7a31d32f3..5970ac3d398 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -3025,6 +3025,8 @@ build_class_member_access_expr (cp_expr object, tree 
member,
 know the type of the expression.  Otherwise, we must wait
 until overload resolution has been performed.  */
functions = BASELINK_FUNCTIONS (member);
+  if (TREE_CODE (functions) == OVERLOAD && OVL_SINGLE_P (functions))
+   functions = OVL_FIRST (functions);
if (TREE_CODE (functions) == FUNCTION_DECL
  && DECL_STATIC_FUNCTION_P (functions))
type = TREE_TYPE (functions);
@@ -7333,6 +7335,9 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue, 
tsubst_flags_t complain)
  {
tree fn = BASELINK_FUNCTIONS (TREE_OPERAND (arg, 1));
  
+  if (TREE_CODE (fn) == OVERLOAD && OVL_SINGLE_P (fn))

+   fn = OVL_FIRST (fn);
+
/* We can only get here with a single static member
 function.  */
gcc_assert (TREE_CODE (fn) == FUNCTION_DECL
diff --git a/gcc/testsuite/g++.dg/overload/using6.C 
b/gcc/testsuite/g++.dg/overload/using6.C
new file mode 100644
index 000..4f89f68a30f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/using6.C
@@ -0,0 +1,5 @@
+// PR c++/109958
+
+struct B { static int f(); };
+struct D : B { using B::f; };
+void f(D d) { &d.f; }




Re: [PATCH] Fix PR c++/111106: missing ; causes internal compiler error

2024-05-31 Thread Jason Merrill

On 5/30/24 07:31, Simon Martin wrote:

We currently fail upon the following because an assert in dependent_type_p
fails for f's parameter

=== cut here ===
consteval int id (int i) { return i; }
constexpr int
f (auto i) requires requires { id (i) } { return i; }
void g () { f (42); }
=== cut here ===

This patch fixes this by handling synthesized parameters for abbreviated
function templates in that assert.


I don't see why implicit template parameters should be handled 
differently from explicit ones here.


This seems more like an error-recovery issue, and I'd be open to adding 
|| seen_error() to that assert like in various others.



Successfully tested on x86_64-pc-linux-gnu.

PR c++/06

gcc/cp/ChangeLog:

* pt.cc (dependent_type_p): Relax assert to handle synthesized template
parameters when !processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval37.C: New test.

---
  gcc/cp/pt.cc |  6 +-
  gcc/testsuite/g++.dg/cpp2a/consteval37.C | 19 +++
  2 files changed, 24 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/consteval37.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index dfce1b3c359..a50d5cfd5a2 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -28019,7 +28019,11 @@ dependent_type_p (tree type)
/* If we are not processing a template, then nobody should be
 providing us with a dependent type.  */
gcc_assert (type);
-  gcc_assert (TREE_CODE (type) != TEMPLATE_TYPE_PARM || is_auto (type));
+  gcc_assert (TREE_CODE (type) != TEMPLATE_TYPE_PARM || is_auto (type)
+ || (/* Synthesized template parameter */
+ DECL_TEMPLATE_PARM_P (TEMPLATE_TYPE_DECL (type)) &&
+ (DECL_IMPLICIT_TEMPLATE_PARM_P
+  (TEMPLATE_TYPE_DECL (type);
return false;
  }
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/consteval37.C b/gcc/testsuite/g++.dg/cpp2a/consteval37.C

new file mode 100644
index 000..ea2641fc204
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/consteval37.C
@@ -0,0 +1,19 @@
+// PR c++/06
+// { dg-do compile { target c++20 } }
+
+consteval int id (int i) { return i; }
+
+constexpr int f (auto i) // { dg-line line_1 }
+  requires requires { id (i) } // { dg-error "expected|invalid use" }
+{
+  return i;
+}
+
+void g () {
+  f (42); // { dg-error "parameter 1" }
+}
+
+// { dg-error "constraints on a non-templated" {} { target *-*-* } line_1 }
+// { dg-error "has incomplete type" {} { target *-*-* } line_1 }
+// { dg-error "invalid type for" {} { target *-*-* } line_1 }
+// { dg-note "declared here" {} { target *-*-* } line_1 }


These errors are wrong, so should not be tested for;  only the syntax 
error about the missing semicolon should have a dg-error.  You can use 
dg-excess-errors to cover the rest.


Jason



Re: [PATCH] c-family: Introduce the -Winvalid-noreturn flag from clang with extra tuneability

2024-05-31 Thread Jason Merrill

On 5/29/24 09:58, Julian Waters wrote:

Currently, gcc warns about noreturn marked functions that return both 
explicitly and implicitly, with no way to turn this warning off. clang does 
have an option for these classes of warnings, -Winvalid-noreturn. However, we 
can do better. Instead of just having 1 option that switches the warnings for 
both on and off, we can define an extra layer of granularity, and have a 
separate options for implicit returns and explicit returns, as in 
-Winvalid-return=explicit and -Winvalid-noreturn=implicit. This patch adds both 
to gcc, for compatibility with clang.


Thanks!


Do note that I am relatively new to gcc's codebase, and as such couldn't figure 
out how to cleanly define a general -Winvalid-noreturn warning that switch both 
on and off, for better compatibility with clang. If someone should point out 
how to do so, I'll happily rewrite my patch.


See -fstrong-eval-order for an example of an option that can be used 
with or without =arg.



I also do not have write access to gcc, and will need help pushing this patch 
once the green light is given


Good to know, I can take care of that.


best regards,
Julian

gcc/c-family/ChangeLog:

* c.opt: Introduce -Winvalid-noreturn=explicit and 
-Winvalid-noreturn=implicit

gcc/ChangeLog:

* tree-cfg.cc (pass_warn_function_return::execute): Use it

gcc/c/ChangeLog:

* c-typeck.cc (c_finish_return): Use it
* gimple-parser.cc (c_finish_gimple_return): Use it

gcc/config/mingw/ChangeLog:

* mingw32.h (EXTRA_OS_CPP_BUILTINS): Fix semicolons

gcc/cp/ChangeLog:

* coroutines.cc (finish_co_return_stmt): Use it
* typeck.cc (check_return_expr): Use it

gcc/doc/ChangeLog:

* invoke.texi: Document new options

 From 4daf884f8bbc1e318ba93121a6fdf4139da80b64 Mon Sep 17 00:00:00 2001
From: TheShermanTanker 
Date: Wed, 29 May 2024 21:32:08 +0800
Subject: [PATCH] Introduce the -Winvalid-noreturn flag from clang with extra
  tuneability


The rationale and ChangeLog entries should be part of the commit message 
(and so the git format-patch output).




Signed-off-by: TheShermanTanker 


A DCO sign-off can't use a pseudonym, sorry; please either sign off 
using your real name or file a copyright assignment for the pseudonym 
with the FSF.


See https://gcc.gnu.org/contribute.html#legal for more detail.


---
  gcc/c-family/c.opt |  8 
  gcc/c/c-typeck.cc  |  2 +-
  gcc/c/gimple-parser.cc |  2 +-
  gcc/config/mingw/mingw32.h |  6 +++---
  gcc/cp/coroutines.cc   |  2 +-
  gcc/cp/typeck.cc   |  2 +-
  gcc/doc/invoke.texi| 13 +
  gcc/tree-cfg.cc|  2 +-
  8 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index fb34c3b7031..32a2859fdcc 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -886,6 +886,14 @@ Winvalid-constexpr
  C++ ObjC++ Var(warn_invalid_constexpr) Init(-1) Warning
  Warn when a function never produces a constant expression.
  
+Winvalid-noreturn=explicit

+C ObjC C++ ObjC++ Warning
+Warn when a function marked noreturn returns explicitly.
+
+Winvalid-noreturn=implicit
+C ObjC C++ ObjC++ Warning
+Warn when a function marked noreturn returns implicitly.
+
  Winvalid-offsetof
  C++ ObjC++ Var(warn_invalid_offsetof) Init(1) Warning
  Warn about invalid uses of the \"offsetof\" macro.
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index ad4c7add562..1941fbc44cb 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -11468,7 +11468,7 @@ c_finish_return (location_t loc, tree retval, tree 
origtype)
location_t xloc = expansion_point_location_if_in_system_header (loc);
  
if (TREE_THIS_VOLATILE (current_function_decl))

-warning_at (xloc, 0,
+warning_at (xloc, OPT_Winvalid_noreturn_explicit,
"function declared % has a % statement");
  
if (retval)

diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc
index d156d83cd37..1acaf75f844 100644
--- a/gcc/c/gimple-parser.cc
+++ b/gcc/c/gimple-parser.cc
@@ -2593,7 +2593,7 @@ c_finish_gimple_return (location_t loc, tree retval)
location_t xloc = expansion_point_location_if_in_system_header (loc);
  
if (TREE_THIS_VOLATILE (current_function_decl))

-warning_at (xloc, 0,
+warning_at (xloc, OPT_Winvalid_noreturn_explicit,
"function declared % has a % statement");
  
if (! retval)

diff --git a/gcc/config/mingw/mingw32.h b/gcc/config/mingw/mingw32.h
index fa6e307476c..a69926133b1 100644
--- a/gcc/config/mingw/mingw32.h
+++ b/gcc/config/mingw/mingw32.h
@@ -35,9 +35,9 @@ along with GCC; see the file COPYING3.  If not see
 | MASK_MS_BITFIELD_LAYOUT)
  
  #ifdef TARGET_USING_MCFGTHREAD

-#define DEFINE_THREAD_MODEL  builtin_define ("__USING_MCFGTHREAD__");
+#define DEFINE_THREAD_MODEL  builtin_define ("__USING_MCFGTHREAD__")
  #elif defined(TARGET_USE_PTHREAD_BY_DEFAULT)
-#define DEFINE_THREAD_MODEL  builtin_define (

Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-31 Thread Qing Zhao


> On May 23, 2024, at 07:46, Richard Biener  wrote:
> 
> On Wed, May 22, 2024 at 8:53 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On May 22, 2024, at 03:38, Richard Biener  
>>> wrote:
>>> 
>>> On Tue, May 21, 2024 at 11:36 PM David Malcolm  wrote:
 
 On Tue, 2024-05-21 at 15:13 +, Qing Zhao wrote:
> Thanks for the comments and suggestions.
> 
>> On May 15, 2024, at 10:00, David Malcolm 
>> wrote:
>> 
>> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
>>> On Mon, 13 May 2024, Qing Zhao wrote:
>>> 
 -Warray-bounds is an important option to enable linux kernal to
 keep
 the array out-of-bound errors out of the source tree.
 
 However, due to the false positive warnings reported in
 PR109071
 (-Warray-bounds false positive warnings due to code duplication
 from
 jump threading), -Warray-bounds=1 cannot be added on by
 default.
 
 Although it's impossible to elinimate all the false positive
 warnings
 from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
 documentation says "always out of bounds"), we should minimize
 the
 false positive warnings in -Warray-bounds=1.
 
 The root reason for the false positive warnings reported in
 PR109071 is:
 
 When the thread jump optimization tries to reduce the # of
 branches
 inside the routine, sometimes it needs to duplicate the code
 and
 split into two conditional pathes. for example:
 
 The original code:
 
 void sparx5_set (int * ptr, struct nums * sg, int index)
 {
 if (index >= 4)
   warn ();
 *ptr = 0;
 *val = sg->vals[index];
 if (index >= 4)
   warn ();
 *ptr = *val;
 
 return;
 }
 
 With the thread jump, the above becomes:
 
 void sparx5_set (int * ptr, struct nums * sg, int index)
 {
 if (index >= 4)
   {
 warn ();
 *ptr = 0; // Code duplications since "warn" does
 return;
 *val = sg->vals[index];   // same this line.
   // In this path, since it's
 under
 the condition
   // "index >= 4", the compiler
 knows
 the value
   // of "index" is larger then 4,
 therefore the
   // out-of-bound warning.
 warn ();
   }
 else
   {
 *ptr = 0;
 *val = sg->vals[index];
   }
 *ptr = *val;
 return;
 }
 
 We can see, after the thread jump optimization, the # of
 branches
 inside
 the routine "sparx5_set" is reduced from 2 to 1, however,  due
 to
 the
 code duplication (which is needed for the correctness of the
 code),
 we
 got a false positive out-of-bound warning.
 
 In order to eliminate such false positive out-of-bound warning,
 
 A. Add one more flag for GIMPLE: is_splitted.
 B. During the thread jump optimization, when the basic blocks
 are
  duplicated, mark all the STMTs inside the original and
 duplicated
  basic blocks as "is_splitted";
 C. Inside the array bound checker, add the following new
 heuristic:
 
 If
  1. the stmt is duplicated and splitted into two conditional
 paths;
 +  2. the warning level < 2;
 +  3. the current block is not dominating the exit block
 Then not report the warning.
 
 The false positive warnings are moved from -Warray-bounds=1 to
 -Warray-bounds=2 now.
 
 Bootstrapped and regression tested on both x86 and aarch64.
 adjusted
 -Warray-bounds-61.c due to the false positive warnings.
 
 Let me know if you have any comments and suggestions.
>>> 
>>> At the last Cauldron I talked with David Malcolm about these kind
>>> of
>>> issues and thought of instead of suppressing diagnostics to
>>> record
>>> how a block was duplicated.  For jump threading my idea was to
>>> record
>>> the condition that was proved true when entering the path and do
>>> this
>>> by recording the corresponding locations
> 
> Is only recording the location for the TRUE path  enough?
> We might need to record the corresponding locations for both TRUE and
> FALSE paths since the VRP might be more accurate on both paths.
> Is only recording the location is enough?
> Do we need to record the pointer to the original condition stmt?
 
 Just to be clear

Re: [PATCH 1/2] xtensa: Simplify several MD templates

2024-05-31 Thread Max Filippov
On Fri, May 31, 2024 at 07:23:13PM +0900, Takayuki 'January June' Suwa wrote:
> No functional changes.
> 
> gcc/ChangeLog:
> 
>   * config/xtensa/predicates.md
>   (subreg_HQI_lowpart_operator, xtensa_sminmax_operator):
>   New operator predicates.
>   * config/xtensa/xtensa-protos.h (xtensa_match_CLAMPS_imms_p):
>   Remove.
>   * config/xtensa/xtensa.cc (xtensa_match_CLAMPS_imms_p): Ditto.
>   * config/xtensa/xtensa.md
>   (*addsubx, *extzvsi-1bit_ashlsi3, *extzvsi-1bit_addsubx):
>   Revise the output statements by conditional ternary operator rather
>   than switch-case clause in order to avoid using gcc_unreachable().
>   (xtensa_clamps): Reduce to a single pattern definition using the
>   predicate added above.
>   (Some split patterns to assist *masktrue_const_bitcmpl): Ditto.
> ---
>  gcc/config/xtensa/predicates.md   |  23 +++
>  gcc/config/xtensa/xtensa-protos.h |   1 -
>  gcc/config/xtensa/xtensa.cc   |  10 ---
>  gcc/config/xtensa/xtensa.md   | 109 ++
>  4 files changed, 43 insertions(+), 100 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

Suwa-san, something has changed in your mail setup apparently: every
patch context line now gets an extra space in the beginning. Could you
please fix that?

-- 
Thanks.
-- Max


Re: [PATCH 2/2] xtensa: Prepend "(use A0_REG)" to sibling call CALL_INSN_FUNCTION_USAGE instead of emitting it as insn at the end of epilogue

2024-05-31 Thread Max Filippov
On Fri, May 31, 2024 at 07:24:48PM +0900, Takayuki 'January June' Suwa wrote:
> No functional changes.
> 
> gcc/ChangeLog:
> 
>   * config/xtensa/xtensa-protos.h (xtensa_expand_call):
>   Add the third argument as boolean.
>   (xtensa_expand_epilogue): Remove the first argument.
>   * config/xtensa/xtensa.cc (xtensa_expand_call):
>   Add the third argument "sibcall_p", and modify in order to prepend
>   "(use A0_REG)" to CALL_INSN_FUNCTION_USAGE if the argument is true.
>   (xtensa_expand_epilogue): Remove the first argument "sibcall_p" and
>   its conditional clause.
>   * config/xtensa/xtensa.md (call, call_value, sibcall, sibcall_value):
>   Append a boolean value to the argument of xtensa_expand_call()
>   indicating whether it is sibling call or not.
>   (epilogue): Remove the boolean argument from xtensa_expand_epilogue(),
>   and then append emitting "(return)".
>   (sibcall_epilogue): Remove the boolean argument from
>   xtensa_expand_epilogue().
> ---
>  gcc/config/xtensa/xtensa-protos.h |  4 ++--
>  gcc/config/xtensa/xtensa.cc   | 16 ++--
>  gcc/config/xtensa/xtensa.md   | 13 +++--
>  3 files changed, 19 insertions(+), 14 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH 2/6] vect: Split out partial vect checking for reduction into a function

2024-05-31 Thread Feng Xue OS
Ok. Updated as the comments.

Thanks,
Feng


From: Richard Biener 
Sent: Friday, May 31, 2024 3:29 PM
To: Feng Xue OS
Cc: Tamar Christina; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 2/6] vect: Split out partial vect checking for reduction 
into a function

On Thu, May 30, 2024 at 4:48 PM Feng Xue OS  wrote:
>
> This is a patch that is split out from 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html.
>
> Partial vectorization checking for vectorizable_reduction is a piece of
> relatively isolated code, which may be reused by other places. Move the
> code into a new function for sharing.
>
> Thanks,
> Feng
> ---
> gcc/
> * tree-vect-loop.cc (vect_reduction_use_partial_vector): New function.

Can you rename the function to vect_reduction_update_partial_vector_usage
please?  And keep ...

> (vectorizable_reduction): Move partial vectorization checking code to
> vect_reduction_use_partial_vector.
> ---
>  gcc/tree-vect-loop.cc | 138 --
>  1 file changed, 78 insertions(+), 60 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index a42d79c7cbf..aa5f21ccd1a 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7391,6 +7391,81 @@ build_vect_cond_expr (code_helper code, tree vop[3], 
> tree mask,
>  }
>  }
>
> +/* Given an operation with CODE in loop reduction path whose reduction PHI is
> +   specified by REDUC_INFO, the operation has TYPE of scalar result, and its
> +   input vectype is represented by VECTYPE_IN. The vectype of vectorized 
> result
> +   may be different from VECTYPE_IN, either in base type or vectype lanes,
> +   lane-reducing operation is the case.  This function check if it is 
> possible,
> +   and how to perform partial vectorization on the operation in the context
> +   of LOOP_VINFO.  */
> +
> +static void
> +vect_reduction_use_partial_vector (loop_vec_info loop_vinfo,
> +  stmt_vec_info reduc_info,
> +  slp_tree slp_node, code_helper code,
> +  tree type, tree vectype_in)
> +{
> +  if (!LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +return;
> +
> +  enum vect_reduction_type reduc_type = STMT_VINFO_REDUC_TYPE (reduc_info);
> +  internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
> +  internal_fn cond_fn = get_conditional_internal_fn (code, type);
> +
> +  if (reduc_type != FOLD_LEFT_REDUCTION
> +  && !use_mask_by_cond_expr_p (code, cond_fn, vectype_in)
> +  && (cond_fn == IFN_LAST
> + || !direct_internal_fn_supported_p (cond_fn, vectype_in,
> + OPTIMIZE_FOR_SPEED)))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"can't operate on partial vectors because"
> +" no conditional operation is available.\n");
> +  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +}
> +  else if (reduc_type == FOLD_LEFT_REDUCTION
> +  && reduc_fn == IFN_LAST
> +  && !expand_vec_cond_expr_p (vectype_in, truth_type_for 
> (vectype_in),
> +  SSA_NAME))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +   "can't operate on partial vectors because"
> +   " no conditional operation is available.\n");
> +  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +}
> +  else if (reduc_type == FOLD_LEFT_REDUCTION
> +  && internal_fn_mask_index (reduc_fn) == -1
> +  && FLOAT_TYPE_P (vectype_in)
> +  && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"can't operate on partial vectors because"
> +" signed zeros cannot be preserved.\n");
> +  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +}
> +  else
> +{
> +  internal_fn mask_reduc_fn
> +   = get_masked_reduction_fn (reduc_fn, vectype_in);
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> +  unsigned nvectors;
> +
> +  if (slp_node)
> +   nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
> +  else
> +   nvectors = vect_get_num_copies (loop_vinfo, vectype_in);
> +
> +  if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
> +   vect_record_loop_len (loop_vinfo, lens, nvectors, vectype_in, 1);
> +  else
> +   vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype_in, NULL);
> +}
> +}
> +
>  /* Function vectorizable_reduction.
>
> Check if STMT_INFO performs a reduction operation that can be vectorize

Re: [PATCH] RISC-V: Add min/max patterns for ifcvt.

2024-05-31 Thread Jeff Law




On 5/31/24 9:07 AM, Robin Dapp wrote:

Hi,

ifcvt likes to emit

(set
   (if_then_else)
 (ge (reg 1) (reg2))
 (reg 1)
 (reg 2))

which can be recognized as min/max patterns in the backend.
This patch adds such patterns and the respective iterators as well as a
test.

This depends on the generic ifcvt change.
Regtested on rv64gcv_zvfh_zicond_zbb_zvbb.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/bitmanip.md (*_cmp_3):
New min/max ifcvt pattern.
* config/riscv/iterators.md (minu): New iterator.
* config/riscv/riscv.cc (riscv_noce_conversion_profitable_p):
Remove riscv-specific adjustment.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-04.c: New test.

OK once prereqs are ACK'd.


Presumably this fixes that deepsjeng missed if-conversion we kicked 
around a couple months back?



jeff


Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-05-31 Thread Jeff Law




On 5/31/24 9:03 AM, Robin Dapp wrote:

Hi,

before noce_find_if_block processes a block it sets up an if_info
structure that holds the original costs.  At that point the costs of
the then/else blocks have not been added so we only care about the
"if" cost.

The code originally used BRANCH_COST for that but was then changed
to COST_N_INSNS (2) - a compare and a jump.
This patch computes the jump costs via
   insn_cost (if_info.jump, ...)
which is supposed to incorporate the branch costs and, in case of a CC
comparison,
   pattern_cost (if_info.cond, ...)
which is supposed to account for the CC creation.

For compare_and_jump patterns insn_cost should have already computed
the right cost.

Does this "split" make sense, generally?

Bootstrapped and regtested on x86, aarch64 and power10.  Regtested
on riscv.

Regards
  Robin

gcc/ChangeLog:

* ifcvt.cc (noce_process_if_block): Subtract condition pattern
cost if applicable.
(noce_find_if_block): Use insn_cost and pattern_cost for
original cost.
OK.  Obviously we'll need to be on the lookout for regressions.  My bet 
is on s390 since you already tested the x86, aarch64 & p10 targets :-)



jeff



Re: [PING] [PATCH] RISC-V: Add Zfbfmin extension

2024-05-31 Thread Jeff Law




On 5/30/24 5:38 AM, Xiao Zeng wrote:

1 In the previous patch, the libcall for BF16 was implemented:


2 Riscv provides Zfbfmin extension, which completes the "Scalar BF16 Converts":


3 Implemented replacing libcall with Zfbfmin extension instruction.

4 Reused previous testcases in:


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_output_move): Handle BFmode move
for zfbfmin.
* config/riscv/riscv.md (truncsfbf2): New pattern for BFmode.
(trunchfbf2): Dotto.
(truncdfbf2): Dotto.
(trunctfbf2): Dotto.
(extendbfsf2): Dotto.
(*movhf_hardfloat): Add BFmode.
(*mov_hardfloat): Dotto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfbfmin-bf16_arithmetic.c: New test.
* gcc.target/riscv/zfbfmin-bf16_comparison.c: New test.
* gcc.target/riscv/zfbfmin-bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/zfbfmin-bf16_integer_libcall_convert.c: New test.
---









+
+;; The conversion of HF/DF/TF to BF needs to be done with SF if there is a
+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __trunc[h|d|t]fbf2.
+(define_expand "trunchfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:HF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, HFmode, operands[1], 0), 0);
+DONE;
+  }
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])

I would suggest using a mode iterator to avoid explicit pattern duplication.

Essentially a mode iterator allows you to specify that the pattern 
should be repeated over a series of modes.


It looks like you've deine a conversion from HF, DF, TF.  So you define 
an iterator that includes just those modes.  You would use the mode 
iterator rather than BF, DF or TF in your pattern.


That just fixes the mode in the pattern.  You also need to have the name 
automagically adjust as well.  Use  in the name.  so the name 
would be somethig like truncbf2.


When you want to reference the mode in code you can do something like
E_mode

And that will map down to HFmode, BFmode, TFmode appropriately.

I suspect you can do something similar for the extension patterns.

In fact, it looks like you did this for the movehardfloat pattern.

Jeff


Re: [PATCH 5/5][v3] RISC-V: Avoid inserting after a GIMPLE_COND with SLP and early break

2024-05-31 Thread Jeff Law




On 5/31/24 7:44 AM, Richard Biener wrote:

When vectorizing an early break loop with LENs (do we miss some
check here to disallow this?) we can end up deciding to insert
stmts after a GIMPLE_COND when doing SLP scheduling and trying
to be conservative with placing of stmts only dependent on
the implicit loop mask/len.  The following avoids this, I guess
it's not perfect but it does the job fixing some observed
RISC-V regression.

* tree-vect-slp.cc (vect_schedule_slp_node): For mask/len
loops make sure to not advance the insertion iterator
beyond a GIMPLE_COND.
Note this patch may depend on others in the series.  I don't think the 
pre-commit CI tester is particularly good at handling that, particularly 
if the other patches in the series don't have the tagging for the 
pre-commit CI.


What most likely happened is this patch and only this patch was applied 
against the baseline for testing.


There are (manual) ways to get things re-tested.  I'm hoping Patrick and 
Edwin automate that procedure relatively soon.  Until that happens you 
have to email patchworks...@rivosinc.com with a URL for the patch in 
patchwork that you want retested.




Jeff



Re: [RFC/RFA] [PATCH 02/12] Add built-ins and tests for bit-forward and bit-reversed CRCs

2024-05-31 Thread Jeff Law




On 5/28/24 12:44 AM, Richard Biener wrote:

On Mon, May 27, 2024 at 5:16 PM Jeff Law  wrote:




On 5/27/24 12:38 AM, Richard Biener wrote:

On Fri, May 24, 2024 at 10:44 AM Mariam Arutunian
 wrote:


This patch introduces new built-in functions to GCC for computing bit-forward 
and bit-reversed CRCs.
These builtins aim to provide efficient CRC calculation capabilities.
When the target architecture supports CRC operations (as indicated by the 
presence of a CRC optab),
the builtins will utilize the expander to generate CRC code.
In the absence of hardware support, the builtins default to generating code for 
a table-based CRC calculation.


I wonder whether for embedded target use we should arrange for the
table-based CRC calculation to be out-of-line and implemented in a
way so uses across TUs can be merged?  I guess a generic
implementation inside libgcc is difficult?

I think the difficulty is the table is dependent upon the polynomial.
So we'd have to arrange to generate, then pass in the table.

In theory we could have the linker fold away duplicate tables as those
should be in read only sections without relocations to internal members.
   So much like we do for constant pool entries.  Though this hasn't been
tested.

The CRC implementation itself could be subject to ICF if it's off in its
own function.  If it's inlined (and that's a real possibility), then
there's little hope of ICF helping on the codesize.


I was wondering of doing some "standard" mangling in the implementation
namespace and using comdat groups for both code and data?
But I'm not sure how that really solves anything given the dependencies 
on the polynomial.  ie, the contents of the table varies based on that 
polynomial and the polynomial can (and will) differ across CRC 
implementations.








Or we could just not do any of this for -Os/-Oz if the target doesn't
have a carryless multiply or crc with the appropriate polynomial.  Given
the CRC table is probably larger than all the code in a bitwise
impementation, disabling for -Os/-Oz seems like a very reasonable choice.


I was mainly thinking about the case where the user uses the new builtins,
but yes, when optimizing for size we can disable the recognition of open-coded
variants.

Turns out Mariam's patch already disables this for -Os.  :-)

For someone directly using the builtin, they're going to have to pass 
the polynomial as a constant to the builtin, with the possible exception 
of when the target has a crc instruction where the polynomial is defined 
by the hardware.


Jeff


Re: RISC-V: Fix round_32.c test on RV32

2024-05-31 Thread Jeff Law




On 5/27/24 4:17 PM, Jivan Hakobyan wrote:

Ya, makes sense -- I guess the current values aren't that exciting for
execution, but we could just add some more interesting ones...


During the development of the patch, I have an issue with large
numbers (2e34, -2e34). They are used in gfortran.fortran-torture/
execute/ intrinsic_aint_anint.f90 test. Besides that, a benchmark
from Spec 2017 also failed (can not remember which one), Now we
haven't an issue with them, Of course, I can add additional tests
with large numbers. But it will be double-check (first fortran's
test)

So i think the question is what do we want to do in the immediate term.

We can remove the test to get cleaner testresults on rv32.  I'm not a 
big fan of removing tests, but this test just doesn't make sense on rv32 
as-is.



We could leave things alone for now on the assumption the test will be 
rewritten to check for calls to the proper routines and possibly 
extended to include runtime verification.


I tend to lean towards the first.  That obviously wouldn't close the 
door on re-adding the test later with runtime verification and such.


Palmer, do you have a strong opinion either way?

jeff


Re: [PATCH 5/5][v3] RISC-V: Avoid inserting after a GIMPLE_COND with SLP and early break

2024-05-31 Thread Patrick O'Neill
On Fri, May 31, 2024 at 9:41 PM Jeff Law  wrote:
>
> On 5/31/24 7:44 AM, Richard Biener wrote:
> > When vectorizing an early break loop with LENs (do we miss some
> > check here to disallow this?) we can end up deciding to insert
> > stmts after a GIMPLE_COND when doing SLP scheduling and trying
> > to be conservative with placing of stmts only dependent on
> > the implicit loop mask/len.  The following avoids this, I guess
> > it's not perfect but it does the job fixing some observed
> > RISC-V regression.
> >
> >   * tree-vect-slp.cc (vect_schedule_slp_node): For mask/len
> >   loops make sure to not advance the insertion iterator
> >   beyond a GIMPLE_COND.
> Note this patch may depend on others in the series.  I don't think the
> pre-commit CI tester is particularly good at handling that, particularly
> if the other patches in the series don't have the tagging for the
> pre-commit CI.
>
> What most likely happened is this patch and only this patch was applied
> against the baseline for testing.

I fixed that last week (5/16) so we shouldn't be seeing that issue anymore.
If you're still seeing it please let me know and I'd be interested to dig in.
>From checking the patch_urls artifact it looks like all 5 patches were applied
to 46d931b. It's an old baseline so that might be the issue.

We've been having hard-to-diagnose network issues on some of the
newly-added CI boxes. Fingers crossed that's resolved now.
I'll rerun this patch tomorrow once the new baseline is generated.

> There are (manual) ways to get things re-tested.  I'm hoping Patrick and
> Edwin automate that procedure relatively soon.  Until that happens you
> have to email patchworks...@rivosinc.com with a URL for the patch in
> patchwork that you want retested.
Ditto - treat it as if it's automated. When I see it I'll reply with a
link to the rerun.
I'll start to look at wiring it up to an automation next week.

I also have the ability to manually trigger on patches not labelled 'RISC-V' so
feel free to ping for that as well.

Patrick

> Jeff
>