[PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-04-07 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.

Acctually, to allow multiple arbitray lane-reducing operations, we need to
support vectorization of loop reduction chain with mixed input vectypes. Since
lanes of vectype may vary with operation, the effective ncopies of vectorized
statements for operation also may not be same to each other, this causes
mismatch on vectorized def-use cycles. A simple way is to align all operations
with the one that has the most ncopies, the gap could be complemented by
generating extra trival pass-through copies. For example:

   int sum = 0;
   for (i)
 {
   sum += d0[i] * d1[i];  // dot-prod 
   sum += w[i];   // widen-sum 
   sum += abs(s0[i] - s1[i]); // sad 
   sum += n[i];   // normal 
 }

The vector size is 128-bit,vectorization factor is 16. Reduction statements
would be transformed as:

   vector<4> int sum_v0 = { 0, 0, 0, 0 };
   vector<4> int sum_v1 = { 0, 0, 0, 0 };
   vector<4> int sum_v2 = { 0, 0, 0, 0 };
   vector<4> int sum_v3 = { 0, 0, 0, 0 };

   for (i / 16)
 {
   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = sum_v0;  // copy
   sum_v1 = WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = sum_v0;  // copy
   sum_v1 = sum_v1;  // copy
   sum_v2 = SAD (s0_v2[i: 0 ~ 7 ], s1_v2[i: 0 ~ 7 ], sum_v2);
   sum_v3 = SAD (s0_v3[i: 8 ~ 15], s1_v3[i: 8 ~ 15], sum_v3);

   sum_v0 += n_v0[i: 0  ~ 3 ];
   sum_v1 += n_v1[i: 4  ~ 7 ];
   sum_v2 += n_v2[i: 8  ~ 11];
   sum_v3 += n_v3[i: 12 ~ 15];
 }

Moreover, for a higher instruction parallelism in final vectorized loop, it
is considered to make those effective vectorized lane-reducing statements be
distributed evenly among all def-use cycles. In the above example, DOT_PROD,
WIDEN_SUM and SADs are generated into disparate cycles.

Bootstrapped/regtested on x86_64-linux and aarch64-linux.

Feng
---
gcc/
PR tree-optimization/114440
* tree-vectorizer.h (struct _stmt_vec_info): Add a new field
reduc_result_pos.
(vectorizable_lane_reducing): New function declaration.
* tree-vect-stmts.cc (vectorizable_condition): Treat the condition
statement that is pointed by stmt_vec_info of reduction PHI as the
real "for_reduction" statement.
(vect_analyze_stmt): Call new function vectorizable_lane_reducing
to analyze lane-reducing operation.
* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Remove parameter
loop_vinfo. Get input vectype from stmt_info instead of reduction PHI.
(vect_model_reduction_cost): Remove cost computation code related to
emulated_mixed_dot_prod.
(vect_reduction_use_partial_vector): New function.
(vectorizable_lane_reducing): New function.
(vectorizable_reduction): Allow multiple lane-reducing operations in
loop reduction. Move some original lane-reducing related code to
vectorizable_lane_reducing, and move partial vectorization checking
code to vect_reduction_use_partial_vector.
(vect_transform_reduction): Extend transformation to support reduction
statements with mixed input vectypes.

gcc/testsuite/
PR tree-optimization/114440
* gcc.dg/vect/vect-reduc-chain-1.c
* gcc.dg/vect/vect-reduc-chain-2.c
* gcc.dg/vect/vect-reduc-chain-3.c
* gcc.dg/vect/vect-reduc-dot-slp-1.c
* gcc.dg/vect/vect-reduc-dot-slp-2.c
---
 .../gcc.dg/vect/vect-reduc-chain-1.c  |  62 ++
 .../gcc.dg/vect/vect-reduc-chain-2.c  |  77 ++
 .../gcc.dg/vect/vect-reduc-chain-3.c  |  66 ++
 .../gcc.dg/vect/vect-reduc-dot-slp-1.c|  97 +++
 .../gcc.dg/vect/vect-reduc-dot-slp-2.c|  81 +++
 gcc/tree-vect-loop.cc | 668 --
 gcc/tree-vect-stmts.cc|  13 +-
 gcc/tree-vectorizer.h |   8 +
 8 files changed, 863 insertions(+), 209 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
new file mode 100644
index 000..04bfc419dbd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
@@ -0,0 +1,62 @@
+/* Disabling epilogues until we find a

[PATCH v2] Internal-fn: Introduce new internal function SAT_ADD

2024-04-07 Thread pan2 . li
From: Pan Li 

Update in v2:
* Fix one failure for x86 bootstrap.

Original log:

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

The patch also implement the SAT_ADD in the riscv backend as
the sample for both the scalar and vector.  Given below example:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;succ:   EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;succ:   EXIT
}

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, 
vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv   v0,v0,v1
  vmerge.vim  v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vsaddu.vv   v1,v1,v2
  vse64.v v1,0(a0)
  ...

To limit the patch size for review, only unsigned version of
usadd3 are involved here. The signed version will be covered
in the underlying patch(es).

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* config/riscv/autovec.md (usadd3): New pattern expand
for unsigned SAT_ADD vector.
* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
decl to expand usadd3 pattern.
(expand_vec_usadd): Ditto but for vector.
* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
emit the vsadd insn.
(expand_vec_usadd): New func impl to expand usadd3 for
vector.
* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
to expand usadd3 for scalar.
* config/riscv/riscv.md (usadd3): New pattern expand
for unsigned SAT_ADD scalar.
* config/riscv/vector.md: Allow VLS mode for vsaddu.
* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
* match.pd: Add unsigned SAT_ADD matc

Re: [PATCH 0/2] Condition coverage fixes

2024-04-07 Thread Jørgen Kvalsvik

On 07/04/2024 08:26, Richard Biener wrote:




Am 06.04.2024 um 22:41 schrieb Jørgen Kvalsvik :

On 06/04/2024 13:15, Jørgen Kvalsvik wrote:

On 06/04/2024 07:50, Richard Biener wrote:



Am 05.04.2024 um 21:59 schrieb Jørgen Kvalsvik :

Hi,

I propose these fixes for the current issues with the condition
coverage.

Rainer, I propose to simply delete the test with __sigsetjmp. I don't
think it actually detects anything reasonable any more, I kept it around
to prevent a regression. Since then I have built a lot of programs (with
optimization enabled) and not really seen this problem.

H.J., the problem you found with -O2 was really a problem of
tree-inlining, which was actually caught earlier by Jan [1]. It probably
warrants some more testing, but I could reproduce by tuning your test
case to use always_inline and not -O2 and trigger the error.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648785.html


Ok

Thanks, committed.
I am wondering if the fn->cond_uids access should always be guarded (in 
tree-profile.cc) should always be guarded. Right now there is the assumption that 
if condition coverage is requested the will exist and be populated, but as this 
shows there may be other circumstances where this is not true.
Or perhaps there should be a gcc_assert to (reliably) detect cases where the 
map is not constructed properly?
Thanks,
Jørgen


I gave this some more thought, and realised I was too eager to fix the segfault. 
While trunk no longer crashes (at least on my x86_64 linux) the fix itself is bad. 
It copies the gcond -> uid mappings into the caller, but the stmts are deep 
copied into the caller, so no gcond will ever be a hit when we look up the 
condition_uids in tree-profile.cc.

I did a very quick prototype to confirm. By applying this patch:

@@ -2049,6 +2049,9 @@ copy_bb (copy_body_data *id, basic_block bb,

   copy_gsi = gsi_start_bb (copy_basic_block);

+  if (!cfun->cond_uids && id->src_cfun->cond_uids)
+ cfun->cond_uids = new hash_map  ();
+
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 {
   gimple_seq stmts;
@@ -2076,6 +2079,12 @@ copy_bb (copy_body_data *id, basic_block bb,
  if (gimple_nop_p (stmt))
  continue;

+ if (id->src_cfun->cond_uids && is_a  (orig_stmt))
+   {
+ unsigned *v = id->src_cfun->cond_uids->get (as_a 
(orig_stmt));
+ if (v) cfun->cond_uids->put (as_a  (stmt), *v);
+   }
+


and this test program:

__attribute__((always_inline))
inline int
inlinefn (int a)
{
if (a > 5)
{
printf ("a > 5\n");
return a;
}
else
printf ("a < 5, was %d\n", a);
return a * a - 2;
}

int
mcdc027e (int a, int b)
{
int y = inlinefn (a);
return y + b;
}


gcov reports:

2:   18:mcdc027e (int a, int b)
condition outcomes covered 1/2
condition  0 not covered (true)
-:   19:{
2:   20:int y = inlinefn (a);
2:   21:return y + b;
-:   22:}

but without the patch, gcov prints nothing.

I am not sure if this approach is even ideal. Probably the most problematic is 
the source line mapping which is all messed up. I checked with gcov 
--branch-probabilities and it too reports the callee at the top of the caller.

If you think it is a good strategy I can clean up the prototype and submit a 
patch. I suppose the function _totals_ should be accurate, even if the source 
mapping is a bit surprising.

What do you think? I am open to other strategies, too


I think the most important bit is that the segfault is gone.  The interaction 
of coverage with inlining or even other optimization when applying optimization 
to coverage should be documented better.

Does condition coverage apply ontop of regular coverage counting or is it an 
either/or?


On top, it is perfectly reasonable (and desirable) to measure 
statement/line coverage in addition to condition coverage. That being 
said, if you achieve MC/DC you also achieve branch coverage, but gcc 
-fprofile-arcs + --branch-counts/--branch-probabilities measure more 
than just taken/not taken, so -fcondition-coverage does not completely 
replace it. You might also not care about MC/DC, only branch coverage.


Personally, I have come around to this strategy being alright. It can, 
and even might be, documented that inlined functions will be anchored to 
the top of the calling function, and the summaries will be useful still. 
A future project could be to improve the source mapping also through 
inlining. In practice this is ok because code under test tends to not be 
inlined so much in practice.


Thanks,
Jørgen



Thanks,
Richard


Thanks,
Jørgen



Thanks,
Richard



Thanks,
Jørgen

Jørgen Kvalsvik (2):
   Remove unecessary and broken MC/DC compile test
   Copy condition->expr map when inlining [PR114599]

gcc/testsuite/gcc.misc-tests/gcov-19.c   | 11 -
gcc/testsuite/gcc.misc-tests/gcov-pr114599.c | 25 
gcc/tree-inlin

[PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Yang Yujie
This patch fixes the back-end context switching in cases where functions
should be built with their own target contexts instead of the
global one, such as LTO linking and functions with target attributes (TBD).

PR target/113233

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_reg_init):
Reinitialize the loongarch_regno_mode_ok cache.
(loongarch_option_override): Same.
(loongarch_save_restore_target_globals): Restore target globals.
(loongarch_set_current_function): Restore the target contexts
for functions.
(TARGET_SET_CURRENT_FUNCTION): Define.
* config/loongarch/loongarch.h (SWITCHABLE_TARGET): Enable
switchable target context.
* config/loongarch/loongarch-builtins.cc (loongarch_init_builtins):
Initialize all builtin functions at startup.
(loongarch_expand_builtin): Turn assertion of builtin availability
into a test.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Define condition loongarch_sx_as.
* gcc.dg/lto/pr113233_0.c: New test.
---
 gcc/config/loongarch/loongarch-builtins.cc | 25 +++---
 gcc/config/loongarch/loongarch.cc  | 91 --
 gcc/config/loongarch/loongarch.h   |  2 +
 gcc/testsuite/gcc.dg/lto/pr113233_0.c  | 14 
 gcc/testsuite/lib/target-supports.exp  | 12 +++
 5 files changed, 127 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113233_0.c

diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index efe7e5e5ebc..fbe46833c9b 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -2512,14 +2512,11 @@ loongarch_init_builtins (void)
   for (i = 0; i < ARRAY_SIZE (loongarch_builtins); i++)
 {
   d = &loongarch_builtins[i];
-  if (d->avail ())
-   {
- type = loongarch_build_function_type (d->function_type);
- loongarch_builtin_decls[i]
-   = add_builtin_function (d->name, type, i, BUILT_IN_MD, NULL,
-   NULL);
- loongarch_get_builtin_decl_index[d->icode] = i;
-   }
+  type = loongarch_build_function_type (d->function_type);
+  loongarch_builtin_decls[i]
+   = add_builtin_function (d->name, type, i, BUILT_IN_MD, NULL,
+ NULL);
+  loongarch_get_builtin_decl_index[d->icode] = i;
 }
 }
 
@@ -3105,15 +3102,21 @@ loongarch_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
  int ignore ATTRIBUTE_UNUSED)
 {
   tree fndecl;
-  unsigned int fcode, avail;
+  unsigned int fcode;
   const struct loongarch_builtin_description *d;
 
   fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   fcode = DECL_MD_FUNCTION_CODE (fndecl);
   gcc_assert (fcode < ARRAY_SIZE (loongarch_builtins));
   d = &loongarch_builtins[fcode];
-  avail = d->avail ();
-  gcc_assert (avail != 0);
+
+  if (!d->avail ())
+{
+  error_at (EXPR_LOCATION (exp),
+   "built-in function %qD is not enabled", fndecl);
+  return target;
+}
+
   switch (d->builtin_type)
 {
 case LARCH_BUILTIN_DIRECT:
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index c90b701a533..6b92e7034c5 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7570,15 +7570,19 @@ loongarch_global_init (void)
loongarch_dwarf_regno[i] = INVALID_REGNUM;
 }
 
+  /* Function to allocate machine-dependent function status.  */
+  init_machine_status = &loongarch_init_machine_status;
+};
+
+static void
+loongarch_reg_init (void)
+{
   /* Set up loongarch_hard_regno_mode_ok.  */
   for (int mode = 0; mode < MAX_MACHINE_MODE; mode++)
 for (int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
   loongarch_hard_regno_mode_ok_p[mode][regno]
= loongarch_hard_regno_mode_ok_uncached (regno, (machine_mode) mode);
-
-  /* Function to allocate machine-dependent function status.  */
-  init_machine_status = &loongarch_init_machine_status;
-};
+}
 
 static void
 loongarch_option_override_internal (struct loongarch_target *target,
@@ -7605,20 +7609,92 @@ loongarch_option_override_internal (struct 
loongarch_target *target,
 
   /* Override some options according to the resolved target.  */
   loongarch_target_option_override (target, opts, opts_set);
+
+  target_option_default_node = target_option_current_node
+= build_target_option_node (opts, opts_set);
+
+  loongarch_reg_init ();
+}
+
+/* Remember the last target of loongarch_set_current_function.  */
+
+static GTY(()) tree loongarch_previous_fndecl;
+
+/* Restore or save the TREE_TARGET_GLOBALS from or to new_tree.
+   Used by loongarch_set_current_function to
+   make sure optab availability predicates are recomputed when necessary.  */
+
+static void
+loongarch_save_restore_target_globals (tree new_tree)
+{
+  if (TREE_TARGE

Re: Combine patch ping

2024-04-07 Thread Richard Biener



> Am 01.04.2024 um 21:28 schrieb Uros Bizjak :
> 
> Hello!
> 
> I'd like to ping the
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647634.html
> PR112560 P1 patch.

Ok.

Thanks,
Richard 

> Thanks,
> Uros.


Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao
On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
> This patch fixes the back-end context switching in cases where functions
> should be built with their own target contexts instead of the
> global one, such as LTO linking and functions with target attributes (TBD).
> 
>   PR target/113233

Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option
save/restore"?  Should I reopen it?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Yang Yujie
On Sun, Apr 07, 2024 at 04:23:53PM +0800, Xi Ruoyao wrote:
> On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
> > This patch fixes the back-end context switching in cases where functions
> > should be built with their own target contexts instead of the
> > global one, such as LTO linking and functions with target attributes (TBD).
> > 
> > PR target/113233
> 
> Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option
> save/restore"?  Should I reopen it?
> 
> -- 
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

Yes, the issue was not fixed with that patch. This one should do.



Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao
On Sun, 2024-04-07 at 16:23 +0800, Yang Yujie wrote:
> On Sun, Apr 07, 2024 at 04:23:53PM +0800, Xi Ruoyao wrote:
> > On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
> > > This patch fixes the back-end context switching in cases where functions
> > > should be built with their own target contexts instead of the
> > > global one, such as LTO linking and functions with target attributes 
> > > (TBD).
> > > 
> > >   PR target/113233
> > 
> > Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option
> > save/restore"?  Should I reopen it?
> > 
> > -- 
> > Xi Ruoyao 
> > School of Aerospace Science and Technology, Xidian University
> 
> Yes, the issue was not fixed with that patch. This one should do.

So reopened the PR.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] ICF&SRA: Make ICF and SRA agree on padding

2024-04-07 Thread Xi Ruoyao
On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote:
> The patch has been approved by Honza in Bugzilla. (I hope.  He did write
> it looked reasonable.)  Together with the patch for PR 113907, it has
> passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on
> x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux.  It also
> passed normal bootstrap on aarch64-linux but there many testcases failed
> because the compiler timed out.  The machine is old and slow and might
> have been oversubscribed so my plan is to try again on gcc185 from
> cfarm.  If that goes well, I intend to commit the patch and then start
> working on backports.

I've tried these two patches out on my own 24-core AArch64 machine. 
Bootstrapped (but no LTO or PGO) and regtested fine.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] ICF&SRA: Make ICF and SRA agree on padding

2024-04-07 Thread Xi Ruoyao
On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote:
> +/* Given two types in an assignment, return true either if any one cannot be
> +   totally scalarized or if they have padding (i.e. not copied bits)  */
> +
> +bool
> +sra_total_scalarization_would_copy_same_data_p (tree t1, tree t2)
> +{
> +  sra_padding_collecting p1;
> +  if (!check_ts_and_push_padding_to_vec (t1, &p1))
> +    return true;
> +
> +  sra_padding_collecting p2;
> +  if (!check_ts_and_push_padding_to_vec (t2, &p2))
> +    return true;
> +
> +  unsigned l = p1.m_padding.length ();
> +  if (l != p2.m_padding.length ())
> +    return false;
> +  for (unsigned i = 0; i < l; i++)
> +    if (p1.m_padding[i].first != p2.m_padding[i].first
> + || p1.m_padding[i].second != p2.m_padding[i].second)
> +  return false;
> +
> +  return true;
> +}
> +

Better remove this trailing empty line from tree-sra.cc.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao
On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
>   * config/loongarch/loongarch-builtins.cc
> (loongarch_init_builtins):
>     Initialize all builtin functions at startup.

git gcc-verify complains that tab should be used instead of space for
this line.

>   (loongarch_expand_builtin): Turn assertion of builtin
> availability
>     into a test.

and this line.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2024-04-07 Thread Peter Bergner
I'm picking up Will's patches for this bug.  As an FYI, this is the bug where
_ARCH_PWR8 is conditional on TARGET_DIRECT_MOVE which can be disabled with
-mno-vsx which is bad.

I already posted the cleanup patch that the updated patch for this bug will rely
on, that removed the OPTION_MASK_DIRECT_MOVE because it is fully redundant with
OPTION_MASK_P8_VECTOR.  I've also incorporated some of Ke Wen's review comments
on Will's original patch.  I have a couple of comments on your review though...


On 10/17/22 1:08 PM, Segher Boessenkool wrote:
> On Mon, Sep 19, 2022 at 11:13:20AM -0500, will schmidt wrote:
>> @@ -24046,10 +24045,11 @@ static struct rs6000_opt_mask const 
>> rs6000_opt_masks[] =
>>{ "block-ops-vector-pair",OPTION_MASK_BLOCK_OPS_VECTOR_PAIR,
>>  false, true  },
>>{ "cmpb", OPTION_MASK_CMPB,   false, true  },
>>{ "crypto",   OPTION_MASK_CRYPTO, false, 
>> true  },
>>{ "direct-move",  OPTION_MASK_DIRECT_MOVE,false, true  },
>> +  { "power8",   OPTION_MASK_POWER8, false, 
>> true  },
> 
> Why would we want a #pragma power8 ?

Agreed, we don't want that.  We have target attribute cpu=power8 for that.



>> +mpower8
>> +Target Mask(POWER8) Var(rs6000_isa_flags)
>> +Use instructions added in ISA 2.07 (power8).
> 
> There should not be such an option.  It is set by -mcpu=power8 and
> later, but can never be enabled or disabled direfctly by the user.

So we need an OPTION_MASK_POWER8 to be created for use in rs6000_isa_flags, but
the only way I see that we can do that is to create an option in rs6000.opt.
Did I miss that there is another way?  Otherwise, I was thinking of creating a
dummy option that is WarnRemoved from the start ala:

+;; This option exists only for its MASK.  It is not intended for users.
+mpower8
+Target Mask(POWER8) Var(rs6000_isa_flags) WarnRemoved
+

Is there a better way?  The problem is P8 created lots of new instructions, but
they were basically all vector and htm instructions.  There were no general
GPR or FPR instructions (ie, what we'd think of as base architecture) added,
so there's no other OPTION_MASK_*/TARGET_* we can use as a P8 base architecture
test.

I'll note I tried just a bare "Target Mask(POWER8) Var(rs6000_isa_flags)" with 
no
option name mentioned at all, but that didn't work, as no OPTION_MASK_POWER8 was
created.

Peter




Re: [PATCH 2/9] wwwdocs: gcc-14: add URLs to some options

2024-04-07 Thread Hans-Peter Nilsson
On Thu, 4 Apr 2024, David Malcolm wrote:

> Signed-off-by: David Malcolm 
> ---
>  htdocs/gcc-14/changes.html | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 5cc729c5..397458d5 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -149,26 +149,33 @@ a work-in-progress.
>  to enable additional hardening.
>
>
> -New option -fhardened, an umbrella option that enables a set
> -of hardening flags.  The options it enables can be displayed using the
> +New option
> + href="https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fhardened";>-fhardened,

Shouldn't those URLs better point to a specific version, lest 
they might break with any newer release?

The question is "a bit" rhetorical, since there appears to be 
nothing at onlinedocs/gcc-14.0.0/ (and "nearby numbers").

Still, maybe there ought to be a copy of onlinedocs/gcc/ that is 
frozen at time of release.

brgds, H-P


Re:[pushed] [PATCH v1] LoongArch: Set default alignment for functions jumps and loops [PR112919].

2024-04-07 Thread Lulu Cheng



在 2024/4/6 下午5:53, Xi Ruoyao 写道:

On Tue, 2024-04-02 at 15:03 +0800, Lulu Cheng wrote:

+/* Alignment for functions loops and jumps for best performance.  For new
+   uarchs the value should be measured via benchmarking.  See the documentation
+   for -falign-functions -falign-loops and -falign-jumps in invoke.texi for the

^ ^

Better have two commas here.

Otherwise it should be OK.


+   format.  */

Modify the annotation information and pushed to r14-9824.



[PATCH] aarch64: Fix vld1/st1_x4 intrinsic test

2024-04-07 Thread Swinney, Jonathan
The test for this intrinsic was failing silently and so it failed to
report the bug reported in 114521. This patch modifes the test to
report the result.

Bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114521

Signed-off-by: Jonathan Swinney 
---
 .../gcc.target/aarch64/advsimd-intrinsics/vld1x4.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c
index 89b289bb21d..17db262a31a 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c
@@ -3,6 +3,7 @@
 /* { dg-skip-if "unimplemented" { arm*-*-* } } */
 /* { dg-options "-O3" } */
 
+#include 
 #include 
 #include "arm-neon-ref.h"
 
@@ -71,13 +72,16 @@ VARIANT (float64, 2, q_f64)
 VARIANTS (TESTMETH)
 
 #define CHECKS(BASE, ELTS, SUFFIX) \
-  if (test_vld1##SUFFIX##_x4 () != 0)  \
-fprintf (stderr, "test_vld1##SUFFIX##_x4");
+  if (test_vld1##SUFFIX##_x4 () != 0) {\
+fprintf (stderr, "test_vld1" #SUFFIX "_x4 failed\n"); \
+failed = true; \
+  }
 
 int
 main (int argc, char **argv)
 {
+  bool failed = false;
   VARIANTS (CHECKS)
 
-  return 0;
+  return (failed) ? 1 : 0;
 }
-- 
2.40.1



Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Yang Yujie
On Sun, Apr 07, 2024 at 08:56:53PM +0800, Xi Ruoyao wrote:
> On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
> > * config/loongarch/loongarch-builtins.cc
> > (loongarch_init_builtins):
> >     Initialize all builtin functions at startup.
> 
> git gcc-verify complains that tab should be used instead of space for
> this line.
> 
> > (loongarch_expand_builtin): Turn assertion of builtin
> > availability
> >     into a test.
> 
> and this line.
> 
> -- 
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

Thanks! I will fix it soon.



Re: [PATCH 0/2] Condition coverage fixes

2024-04-07 Thread Sam James
Jørgen Kvalsvik  writes:

> Hi,
>
> I propose these fixes for the current issues with the condition
> coverage.
>
> Rainer, I propose to simply delete the test with __sigsetjmp. I don't
> think it actually detects anything reasonable any more, I kept it around
> to prevent a regression. Since then I have built a lot of programs (with
> optimization enabled) and not really seen this problem.
>
> H.J., the problem you found with -O2 was really a problem of
> tree-inlining, which was actually caught earlier by Jan [1]. It probably
> warrants some more testing, but I could reproduce by tuning your test
> case to use always_inline and not -O2 and trigger the error.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648785.html

I couldn't find your BZ account, but FWIW:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114627.

Thanks.

>
> Thanks,
> Jørgen
>
> Jørgen Kvalsvik (2):
>   Remove unecessary and broken MC/DC compile test
>   Copy condition->expr map when inlining [PR114599]
>
>  gcc/testsuite/gcc.misc-tests/gcov-19.c   | 11 -
>  gcc/testsuite/gcc.misc-tests/gcov-pr114599.c | 25 
>  gcc/tree-inline.cc   | 20 +++-
>  3 files changed, 44 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr114599.c