from:"Vladimir Makarov via Gcc\-patches"

[pushed] LRA: Update insn sp offset if its input reload changes SP

2023-05-30 Thread Vladimir Makarov via Gcc-patches

The following patch fixes an LRA bug triggered by switching H8300 target 
from reload to LRA.  The description of the problem is in the commit 
message.


The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


commit 30038a207c10a2783fa2695b62c7c8458ef05e73
Author: Vladimir N. Makarov 
Date:   Tue May 30 15:54:28 2023 -0400

LRA: Update insn sp offset if its input reload changes SP

The patch fixes a bug when there is input reload changing SP.  The bug was
triggered by switching H8300 target to LRA.  The insn in question is

(insn 21 20 22 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
(reg/f:SI 31)) "j.c":10:3 19 {*movsi}
 (expr_list:REG_DEAD (reg/f:SI 31)
(expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
(nil

The memory address is reloaded but the SP offset for the original insn was 
not updated.

gcc/ChangeLog:

* lra-int.h (lra_update_sp_offset): Add the prototype.
* lra.cc (setup_sp_offset): Change the return type.  Use
lra_update_sp_offset.
* lra-eliminations.cc (lra_update_sp_offset): New function.
(lra_process_new_insns): Push the current insn to reprocess if the
input reload changes sp offset.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 4220639..68225339cb6 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1308,6 +1308,16 @@ init_elimination (void)
   setup_elimination_map ();
 }
 
+/* Update and return stack pointer OFFSET after processing X.  */
+poly_int64
+lra_update_sp_offset (rtx x, poly_int64 offset)
+{
+  curr_sp_change = offset;
+  mark_not_eliminable (x, VOIDmode);
+  return curr_sp_change;
+}
+
+
 /* Eliminate hard reg given by its location LOC.  */
 void
 lra_eliminate_reg_if_possible (rtx *loc)
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index a400a0f85e2..4dbe6672f3a 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -412,6 +412,7 @@ extern rtx lra_eliminate_regs_1 (rtx_insn *, rtx, 
machine_mode,
 extern void eliminate_regs_in_insn (rtx_insn *insn, bool, bool, poly_int64);
 extern void lra_eliminate (bool, bool);
 
+extern poly_int64 lra_update_sp_offset (rtx, poly_int64);
 extern void lra_eliminate_reg_if_possible (rtx *);
 
 
diff --git a/gcc/lra.cc b/gcc/lra.cc
index eb3ee1f8b63..c8b3f139acd 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -1838,10 +1838,10 @@ push_insns (rtx_insn *from, rtx_insn *to)
   lra_push_insn (insn);
 }
 
-/* Set up sp offset for insn in range [FROM, LAST].  The offset is
+/* Set up and return sp offset for insns in range [FROM, LAST].  The offset is
taken from the next BB insn after LAST or zero if there in such
insn.  */
-static void
+static poly_int64
 setup_sp_offset (rtx_insn *from, rtx_insn *last)
 {
   rtx_insn *before = next_nonnote_nondebug_insn_bb (last);
@@ -1849,7 +1849,11 @@ setup_sp_offset (rtx_insn *from, rtx_insn *last)
   ? 0 : lra_get_insn_recog_data (before)->sp_offset);
 
   for (rtx_insn *insn = from; insn != NEXT_INSN (last); insn = NEXT_INSN 
(insn))
-lra_get_insn_recog_data (insn)->sp_offset = offset;
+{
+  lra_get_insn_recog_data (insn)->sp_offset = offset;
+  offset = lra_update_sp_offset (PATTERN (insn), offset);
+}
+  return offset;
 }
 
 /* Emit insns BEFORE before INSN and insns AFTER after INSN.  Put the
@@ -1875,8 +1879,25 @@ lra_process_new_insns (rtx_insn *insn, rtx_insn *before, 
rtx_insn *after,
   if (cfun->can_throw_non_call_exceptions)
copy_reg_eh_region_note_forward (insn, before, NULL);
   emit_insn_before (before, insn);
+  poly_int64 old_sp_offset = lra_get_insn_recog_data (insn)->sp_offset;
+  poly_int64 new_sp_offset = setup_sp_offset (before, PREV_INSN (insn));
+  if (maybe_ne (old_sp_offset, new_sp_offset))
+   {
+ if (lra_dump_file != NULL)
+   {
+ fprintf (lra_dump_file, "Changing sp offset from ");
+ print_dec (old_sp_offset, lra_dump_file);
+ fprintf (lra_dump_file, " to ");
+ print_dec (new_sp_offset, lra_dump_file);
+ fprintf (lra_dump_file, " for insn");
+ dump_rtl_slim (lra_dump_file, insn, NULL, -1, 0);
+   }
+ lra_get_insn_recog_data (insn)->sp_offset = new_sp_offset;
+ eliminate_regs_in_insn (insn, false, false,
+ old_sp_offset - new_sp_offset);
+ lra_push_insn (insn);
+   }
   push_insns (PREV_INSN (insn), PREV_INSN (before));
-  setup_sp_offset (before, PREV_INSN (insn));
 }
   if (after != NULL_RTX)
 {

[pushed] [PR109541] RA: Constrain class of pic offset table pseudo to general regs

2023-06-07 Thread Vladimir Makarov via Gcc-patches


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109541

The patch was successfully bootstrapped and tested on x86-64, aarcha64, 
and ppc64le.

Re: [pushed] [PR109541] RA: Constrain class of pic offset table pseudo to general regs

2023-06-07 Thread Vladimir Makarov via Gcc-patches



On 6/7/23 12:20, Jeff Law wrote:



On 6/7/23 09:35, Vladimir Makarov via Gcc-patches wrote:

The following patch fixes



-ENOPATCH


Sorry, here is the patch.

commit 08ca31fb27841cb7f3bff7086be6f139136be1a7
Author: Vladimir N. Makarov 
Date:   Wed Jun 7 09:51:54 2023 -0400

RA: Constrain class of pic offset table pseudo to general regs

On some targets an integer pseudo can be assigned to a FP reg.  For
pic offset table pseudo it means we will reload the pseudo in this
case and, as a consequence, memory containing the pseudo might be
recognized as wrong one.  The patch fix this problem.

PR target/109541

gcc/ChangeLog:

* ira-costs.cc: (find_costs_and_classes): Constrain classes of pic
  offset table pseudo to a general reg subset.

gcc/testsuite/ChangeLog:

* gcc.target/sparc/pr109541.c: New.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index ae8304ff938..d9e700e8947 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -2016,6 +2016,16 @@ find_costs_and_classes (FILE *dump_file)
 	  ira_assert (regno_aclass[i] != NO_REGS
 			  && ira_reg_allocno_class_p[regno_aclass[i]]);
 	}
+	  if (pic_offset_table_rtx != NULL
+	  && i == (int) REGNO (pic_offset_table_rtx))
+	{
+	  /* For some targets, integer pseudos can be assigned to fp
+		 regs.  As we don't want reload pic offset table pseudo, we
+		 should avoid using non-integer regs.  */
+	  regno_aclass[i]
+		= ira_reg_class_intersect[regno_aclass[i]][GENERAL_REGS];
+	  alt_class = ira_reg_class_intersect[alt_class][GENERAL_REGS];
+	}
 	  if ((new_class
 	   = (reg_class) (targetm.ira_change_pseudo_allocno_class
 			  (i, regno_aclass[i], best))) != regno_aclass[i])
diff --git a/gcc/testsuite/gcc.target/sparc/pr109541.c b/gcc/testsuite/gcc.target/sparc/pr109541.c
new file mode 100644
index 000..1360f101930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/pr109541.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -mcpu=niagara4 -fpic -w" } */
+
+int rhash_sha512_process_block_A, rhash_sha512_process_block_i,
+rhash_sha512_process_block_block, rhash_sha512_process_block_W_0;
+
+unsigned rhash_sha512_process_block_W_2;
+
+void rhash_sha512_process_block (void)
+{
+  unsigned C, E, F, G, H, W_0, W_4, W_9, W_5, W_3, T1;
+
+  for (; rhash_sha512_process_block_i; rhash_sha512_process_block_i += 6) {
+T1 = F + (rhash_sha512_process_block_W_2 += 6);
+rhash_sha512_process_block_A += H & G + (W_5 += rhash_sha512_process_block_W_0);
+H = C & T1 & E ^ F + (W_9 += rhash_sha512_process_block_W_0);
+G = T1 ^ 6 + (W_0 += rhash_sha512_process_block_block);
+F = (unsigned) &G;
+T1 = (unsigned) (&T1 + (W_3 += rhash_sha512_process_block_block > 9 > W_4));
+C = (unsigned) (T1 + &E);
+W_4 += W_5 += rhash_sha512_process_block_W_0;
+  }
+}

Re: [pushed][LRA][PR110372]: Refine reload pseudo class

2023-07-12 Thread Vladimir Makarov via Gcc-patches




On 7/12/23 06:07, Richard Sandiford wrote:

Vladimir Makarov via Gcc-patches  writes:

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index 73fbef29912..2f95121df06 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1443,10 +1443,11 @@ assign_by_spills (void)
 pass.  Indicate that it is no longer spilled.  */
  bitmap_clear_bit (&all_spilled_pseudos, regno);
  assign_hard_regno (hard_regno, regno);
- if (! reload_p)
-   /* As non-reload pseudo assignment is changed we
-  should reconsider insns referring for the
-  pseudo.  */
+ if (! reload_p || regno_allocno_class_array[regno] == ALL_REGS)

Is this test meaningful on all targets?  We have some for which
GENERAL_REGS == ALL_REGS (e.g. nios2 and nvptx), so ALL_REGS can
be a valid allocation class.


Richard, thank you for the question.

As I remember nvptx does not use IRA/LRA.

I don't think it is a problem.  For targets with GENERAL_REGS == 
ALL_REGS, it only results in one more insn processing on the next 
constraint sub-pass.


I could do more accurate solution but it would need introducing new data 
(flags) for pseudos which I'd like to avoid.

Re: [pushed][LRA][PR110372]: Refine reload pseudo class

2023-07-12 Thread Vladimir Makarov via Gcc-patches




On 7/12/23 12:22, Richard Sandiford wrote:

Vladimir Makarov  writes:

On 7/12/23 06:07, Richard Sandiford wrote:

Vladimir Makarov via Gcc-patches  writes:

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index 73fbef29912..2f95121df06 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1443,10 +1443,11 @@ assign_by_spills (void)
 pass.  Indicate that it is no longer spilled.  */
  bitmap_clear_bit (&all_spilled_pseudos, regno);
  assign_hard_regno (hard_regno, regno);
- if (! reload_p)
-   /* As non-reload pseudo assignment is changed we
-  should reconsider insns referring for the
-  pseudo.  */
+ if (! reload_p || regno_allocno_class_array[regno] == ALL_REGS)

Is this test meaningful on all targets?  We have some for which
GENERAL_REGS == ALL_REGS (e.g. nios2 and nvptx), so ALL_REGS can
be a valid allocation class.


Richard, thank you for the question.

As I remember nvptx does not use IRA/LRA.

I don't think it is a problem.  For targets with GENERAL_REGS ==
ALL_REGS, it only results in one more insn processing on the next
constraint sub-pass.

Ah, ok, thanks.  If there's no risk of cycling then I agree it
doesn't matter.
No. There is no additional risk of cycling as insn processing only 
starts after assigning hard reg to the reload pseudo and it can happens 
only once for the reload pseudo before spilling sub-pass.

Re: [IRA] Skip empty register classes in setup_reg_class_relations

2023-07-13 Thread Vladimir Makarov via Gcc-patches




On 7/12/23 07:05, senthilkumar.selva...@microchip.com wrote:

Hi,

   I've been spending some (spare) time trying to get LRA working
   for the avr target.


Thank you for addressing this problem.

The code you changing is very sensitive and was a source of multiple PRs 
in the past.  But I found the change your propose logical and I think it 
will not create problems.  Still please be alert and revert the patch if 
people reports the problem with this change.



  After making a couple of changes to get
   libgcc going, I'm now hitting an assert at
   lra-constraints.cc:4423 for a subarch (avrtiny) that has a
   couple of regclasses with no available registers.

   The assert fires because in_class_p (correctly) returns
   false for get_reg_class (regno) = ALL_REGS, and new_class =
   NO_LD_REGS. For avrtiny, NO_LD_REGS is an empty regset, and
   therefore hard_reg_set_subset_p (NO_LD_REGS, lra_no_alloc_regs)
   is always true, making in_class_p return false.

   in_class_p picks NO_LD_REGS as new_class because common_class =
   ira_reg_class_subset[ALL_REGS][NO_REGS] evaluates as
   NO_LD_REGS. This appears wrong to me - it should be NO_REGS
   instead (lra-constraints.cc:4421 checks for NO_REGS).

   ira.cc:setup_reg_class_relations sets up
   ira_reg_class_subset (among other things), and the problem
   appears to be a missing continue statement if
   reg_class_contents[cl3] (in the innermost loop) is empty.

   In this case, for cl1 = ALL_REGS and cl2 = NO_REGS, cl3 =
   NO_LD_REGS, temp_hard_regset and temp_set2 are both empty, and
   hard_reg_subset_p (, ) is always true, so
   ira_reg_class_subset[ALL_REGS][NO_REGS] ends up being set to
   cl3 = NO_LD_REGS. Adding a continue if hard_reg_set_empty_p 
(temp_hard_regset)
   fixes the problem for me.

   Does the below patch look ok? Bootstrapping and regression
   testing passed on x86_64.

OK.

[pushed][RA][PR109520]: Catch error when there are no enough registers for asm insn

2023-07-13 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109520

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


commit b175b4887f928118af997f6d4d75097a64dcec5d
Author: Vladimir N. Makarov 
Date:   Thu Jul 13 10:42:17 2023 -0400

[RA][PR109520]: Catch error when there are no enough registers for asm insn

Asm insn unlike other insns can have so many operands whose
constraints can not be satisfied.  It results in LRA cycling for such
test case.  The following patch catches such situation and reports the
problem.

PR middle-end/109520

gcc/ChangeLog:

* lra-int.h (lra_insn_recog_data): Add member asm_reloads_num.
(lra_asm_insn_error): New prototype.
* lra.cc: Include rtl_error.h.
(lra_set_insn_recog_data): Initialize asm_reloads_num.
(lra_asm_insn_error): New func whose code is taken from ...
* lra-assigns.cc (lra_split_hard_reg_for): ... here.  Use lra_asm_insn_error.
* lra-constraints.cc (curr_insn_transform): Check reloads nummber for asm.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr109520.c: New test.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index 2f95121df06..3555926af66 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1851,20 +1851,8 @@ lra_split_hard_reg_for (void)
   insn = lra_insn_recog_data[u]->insn;
   if (asm_noperands (PATTERN (insn)) >= 0)
 	{
-	  lra_asm_error_p = asm_p = true;
-	  error_for_asm (insn,
-			 "% operand has impossible constraints");
-	  /* Avoid further trouble with this insn.  */
-	  if (JUMP_P (insn))
-	{
-	  ira_nullify_asm_goto (insn);
-	  lra_update_insn_regno_info (insn);
-	}
-	  else
-	{
-	  PATTERN (insn) = gen_rtx_USE (VOIDmode, const0_rtx);
-	  lra_set_insn_deleted (insn);
-	}
+	  asm_p = true;
+	  lra_asm_insn_error (insn);
 	}
   else if (!asm_p)
 	{
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 9bfc88149ff..0c6912d6e7d 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4813,6 +4813,10 @@ curr_insn_transform (bool check_only_p)
   lra_update_operator_dups (curr_id);
   /* Something changes -- process the insn.	 */
   lra_update_insn_regno_info (curr_insn);
+  if (asm_noperands (PATTERN (curr_insn)) >= 0
+	  && ++curr_id->asm_reloads_num >= FIRST_PSEUDO_REGISTER)
+	/* Most probably there are no enough registers to satisfy asm insn: */
+	lra_asm_insn_error (curr_insn);
 }
   lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
   return change_p;
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 4dbe6672f3a..a32359e5772 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -209,6 +209,9 @@ public:
  debug insn.  LRA_NON_CLOBBERED_ALT means ignoring any earlier
  clobbers for the insn.  */
   int used_insn_alternative;
+  /* Defined for asm insn and it is how many times we already generated reloads
+ for the asm insn.  */
+  int asm_reloads_num;
   /* SP offset before the insn relative to one at the func start.  */
   poly_int64 sp_offset;
   /* The insn itself.  */
@@ -307,6 +310,7 @@ extern void lra_delete_dead_insn (rtx_insn *);
 extern void lra_emit_add (rtx, rtx, rtx);
 extern void lra_emit_move (rtx, rtx);
 extern void lra_update_dups (lra_insn_recog_data_t, signed char *);
+extern void lra_asm_insn_error (rtx_insn *insn);
 
 extern void lra_process_new_insns (rtx_insn *, rtx_insn *, rtx_insn *,
    const char *);
diff --git a/gcc/lra.cc b/gcc/lra.cc
index c8b3f139acd..563aff10b96 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -106,6 +106,7 @@ along with GCC; see the file COPYING3.	If not see
 #include "backend.h"
 #include "target.h"
 #include "rtl.h"
+#include "rtl-error.h"
 #include "tree.h"
 #include "predict.h"
 #include "df.h"
@@ -536,6 +537,27 @@ lra_update_dups (lra_insn_recog_data_t id, signed char *nops)
 	*id->dup_loc[i] = *id->operand_loc[nop];
 }
 
+/* Report asm insn error and modify the asm insn.  */
+void
+lra_asm_insn_error (rtx_insn *insn)
+{
+  lra_asm_error_p = true;
+  error_for_asm (insn,
+		 "% operand has impossible constraints"
+		 " or there are not enough registers");
+  /* Avoid further trouble with this insn.  */
+  if (JUMP_P (insn))
+{
+  ira_nullify_asm_goto (insn);
+  lra_update_insn_regno_info (insn);
+}
+  else
+{
+  PATTERN (insn) = gen_rtx_USE (VOIDmode, const0_rtx);
+  lra_set_insn_deleted (insn);
+}
+}
+
 
 
 /* This page contains code dealing with info about registers in the
@@ -973,6 +995,7 @@ lra_set_insn_recog_data (rtx_insn *insn)
   lra_insn_recog_data[uid] = data;
   data->insn = insn;
   data->used_insn_alternative = LRA_UNKNOWN_ALT;
+  data->asm_reloads_num = 0;
   data->icode = icode;
   data->regs = NULL;
   if (DEBUG_INSN_P (insn))
diff --git a/gcc/testsuite/gcc.target/i386/pr109520.c b/gcc/testsuite/gcc.target

[pushed][LRA]: Check and update frame to stack pointer elimination after stack slot allocation

2023-07-19 Thread Vladimir Makarov via Gcc-patches


The following patch is necessary for porting avr to LRA.

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


There is still avr poring problem with reloading of subreg of frame 
pointer.  I'll address it later on this week.


commit 2971ff7b1d564ac04b537d907c70e6093af70832
Author: Vladimir N. Makarov 
Date:   Wed Jul 19 09:35:37 2023 -0400

[LRA]: Check and update frame to stack pointer elimination after stack slot 
allocation

Avr is an interesting target which does not use stack pointer to
address stack slots.  The elimination of stack pointer to frame pointer
is impossible if there are stack slots.  During LRA works, the
stack slots can be allocated and used and the elimination can be done
anymore.  The situation can be complicated even more if some pseudos
were allocated to the frame pointer.

gcc/ChangeLog:

* lra-int.h (lra_update_fp2sp_elimination): New prototype.
(lra_asm_insn_error): New prototype.
* lra-spills.cc (remove_pseudos): Add check for pseudo slot memory
existence.
(lra_spill): Call lra_update_fp2sp_elimination.
* lra-eliminations.cc: Remove trailing spaces.
(elimination_fp2sp_occured_p): New static flag.
(lra_eliminate_regs_1): Set the flag up.
(update_reg_eliminate): Modify the assert for stack to frame
pointer elimination.
(lra_update_fp2sp_elimination): New function.
(lra_eliminate): Clear flag elimination_fp2sp_occured_p.

gcc/testsuite/ChangeLog:

* gcc.target/avr/lra-elim.c: New test.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 68225339cb6..cf0aa94b69a 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -286,7 +286,7 @@ move_plus_up (rtx x)
 {
   rtx subreg_reg;
   machine_mode x_mode, subreg_reg_mode;
-  
+
   if (GET_CODE (x) != SUBREG || !subreg_lowpart_p (x))
 return x;
   subreg_reg = SUBREG_REG (x);
@@ -309,6 +309,9 @@ move_plus_up (rtx x)
   return x;
 }
 
+/* Flag that we already did frame pointer to stack pointer elimination.  */
+static bool elimination_fp2sp_occured_p = false;
+
 /* Scan X and replace any eliminable registers (such as fp) with a
replacement (such as sp) if SUBST_P, plus an offset.  The offset is
a change in the offset between the eliminable register and its
@@ -366,6 +369,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
+ if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
+   elimination_fp2sp_occured_p = true;
+
  if (maybe_ne (update_sp_offset, 0))
{
  if (ep->to_rtx == stack_pointer_rtx)
@@ -396,9 +402,12 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  poly_int64 offset, curr_offset;
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
+ if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
+   elimination_fp2sp_occured_p = true;
+
  if (! update_p && ! full_p)
return gen_rtx_PLUS (Pmode, to, XEXP (x, 1));
- 
+
  if (maybe_ne (update_sp_offset, 0))
offset = ep->to_rtx == stack_pointer_rtx ? update_sp_offset : 0;
  else
@@ -456,6 +465,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
+ if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
+   elimination_fp2sp_occured_p = true;
+
  if (maybe_ne (update_sp_offset, 0))
{
  if (ep->to_rtx == stack_pointer_rtx)
@@ -500,7 +512,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
 case LE:  case LT:   case LEU:case LTU:
   {
rtx new0 = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode,
-subst_p, update_p, 
+subst_p, update_p,
 update_sp_offset, full_p);
rtx new1 = XEXP (x, 1)
   ? lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode,
@@ -749,7 +761,7 @@ mark_not_eliminable (rtx x, machine_mode mem_mode)
  && poly_int_rtx_p (XEXP (XEXP (x, 1), 1), &offset
{
  poly_int64 size = GET_MODE_SIZE (mem_mode);
- 
+
 #ifdef PUSH_ROUNDING
  /* If more bytes than MEM_MODE are pushed, account for
 them.  */
@@ -822,7 +834,7 @@ mark_not_eliminable (rtx x, machine_mode mem_mode)
{
  /* See if this is setting the replacement hard register for
 an elimination.
-
+
 If DEST is the hard frame pointer, we do nothing because
 we as

[pushed][LRA]: Exclude reloading of frame pointer in subreg for some cases

2023-07-20 Thread Vladimir Makarov via Gcc-patches

The following patch improves code for avr LRA port.  More explanation 
for the patch can be found in the commit message.


The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.
commit 4b8878fbf7b74ea5c3405c9f558df0517036f131
Author: Vladimir N. Makarov 
Date:   Thu Jul 20 14:34:26 2023 -0400

[LRA]: Exclude reloading of frame pointer in subreg for some cases

LRA for avr port reloads frame pointer in subreg although we can just
simplify the subreg.  It results in generation of bad performance code.  
The following
patch fixes this.

gcc/ChangeLog:

* lra-constraints.cc (simplify_operand_subreg): Check frame pointer
simplification.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 76a155e99c2..f3784cf5a5b 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1797,6 +1797,16 @@ simplify_operand_subreg (int nop, machine_mode reg_mode)
   alter_subreg (curr_id->operand_loc[nop], false);
   return true;
 }
+  auto fp_subreg_can_be_simplified_after_reload_p = [] (machine_mode innermode,
+   poly_uint64 offset,
+   machine_mode mode) {
+reload_completed = 1;
+bool res = simplify_subreg_regno (FRAME_POINTER_REGNUM,
+ innermode,
+ offset, mode) >= 0;
+reload_completed = 0;
+return res;
+  };
   /* Force a reload of the SUBREG_REG if this is a constant or PLUS or
  if there may be a problem accessing OPERAND in the outer
  mode.  */
@@ -1809,6 +1819,12 @@ simplify_operand_subreg (int nop, machine_mode reg_mode)
   >= hard_regno_nregs (hard_regno, mode))
&& simplify_subreg_regno (hard_regno, innermode,
 SUBREG_BYTE (operand), mode) < 0
+   /* Exclude reloading of frame pointer in subreg if frame pointer can not
+ be simplified here only because the reload is not finished yet.  */
+   && (hard_regno != FRAME_POINTER_REGNUM
+  || !fp_subreg_can_be_simplified_after_reload_p (innermode,
+  SUBREG_BYTE 
(operand),
+  mode))
/* Don't reload subreg for matching reload.  It is actually
  valid subreg in LRA.  */
&& ! LRA_SUBREG_P (operand))

Re: [pushed][LRA]: Check and update frame to stack pointer elimination after stack slot allocation

2023-07-21 Thread Vladimir Makarov via Gcc-patches




On 7/20/23 16:45, Rainer Orth wrote:

Hi Vladimir,


The following patch is necessary for porting avr to LRA.

The patch was successfully bootstrapped and tested on x86-64, aarch64, and
ppc64le.

There is still avr poring problem with reloading of subreg of frame
pointer.  I'll address it later on this week.

this patch most likely broke sparc-sun-solaris2.11 bootstrap:

/var/gcc/regression/master/11.4-gcc/build/./gcc/xgcc 
-B/var/gcc/regression/master/11.4-gcc/build/./gcc/ 
-B/vol/gcc/sparc-sun-solaris2.11/bin/ -B/vol/gcc/sparc-sun-solaris2.11/lib/ 
-isystem /vol/gcc/sparc-sun-solaris2.11/include -isystem 
/vol/gcc/sparc-sun-solaris2.11/sys-include   -fchecking=1 -c -g -O2   -W -Wall 
-gnatpg -nostdinc   g-alleve.adb -o g-alleve.o
+===GNAT BUG DETECTED==+
| 14.0.0 20230720 (experimental) [master 
506f068e7d01ad2fb107185b8fb204a0ec23785c] (sparc-sun-solaris2.11) GCC error:|
| in update_reg_eliminate, at lra-eliminations.cc:1179 |
| Error detected around g-alleve.adb:4132:8

This is in stage 3.  I haven't investigated further yet.


Thank you for reporting this.  I'll try to fix on this week.  I have a 
patch but unfortunately bootstrap is too slow.  If the patch does not 
work, I'll revert the original patch.

[pushed][LRA]: Fix sparc bootstrap after recent patch for fp elimination for avr LRA port

2023-07-21 Thread Vladimir Makarov via Gcc-patches

The following patch fixes sparc solaris bootstrap.  The explanation of 
the patch is in the commit message.


The patch was successfully bootstrap on x86-64, aarch64, and sparc64 
solaris.


commit d17be8f7f36abe257a7d026dad61e5f8d14bdafc
Author: Vladimir N. Makarov 
Date:   Fri Jul 21 20:28:50 2023 -0400

[LRA]: Fix sparc bootstrap after recent patch for fp elimination for avr 
LRA port

The recent patch for fp elimination for avr LRA port modified an assert
which can be wrong for targets using hard frame pointer different from
frame pointer.  Also for such ports spilling pseudos assigned to fp
was wrong too in the new code.  Although this code is not used for any 
target
currently using LRA except for avr.  Given patch fixes the issues.

gcc/ChangeLog:

* lra-eliminations.cc (update_reg_eliminate): Fix the assert.
(lra_update_fp2sp_elimination): Use HARD_FRAME_POINTER_REGNUM
instead of FRAME_POINTER_REGNUM to spill pseudos.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index cf0aa94b69a..1f4e3fec9e0 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1179,8 +1179,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
  gcc_assert (ep->to_rtx != stack_pointer_rtx
  || (ep->from == FRAME_POINTER_REGNUM
  && !elimination_fp2sp_occured_p)
- || (ep->from != FRAME_POINTER_REGNUM
- && ep->from < FIRST_PSEUDO_REGISTER
+ || (ep->from < FIRST_PSEUDO_REGISTER
  && fixed_regs [ep->from]));
 
  /* Mark that is not eliminable anymore.  */
@@ -1398,7 +1397,7 @@ lra_update_fp2sp_elimination (void)
 " Frame pointer can not be eliminated anymore\n");
   frame_pointer_needed = true;
   CLEAR_HARD_REG_SET (set);
-  add_to_hard_reg_set (&set, Pmode, FRAME_POINTER_REGNUM);
+  add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
   spill_pseudos (set);
   for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
 if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)

Re: [PATCH] rtl-optimization/110587 - remove quadratic regno_in_use_p

2023-08-01 Thread Vladimir Makarov via Gcc-patches




On 7/25/23 09:40, Richard Biener wrote:

The following removes the code checking whether a noop copy
is between something involved in the return sequence composed
of a SET and USE.  Instead of checking for this special-case
the following makes us only ever remove noop copies between
pseudos - which is the case that is necessary for IRA/LRA
interfacing to function according to the comment.  That makes
looking for the return reg special case unnecessary, reducing
the compile-time in LRA non-specific to zero for the testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu with
all languages and {,-m32}.

OK?


Richard, sorry for the delay with the answer.  I was on vacation.

There is a lot of history of changes of the code.  I believe your change 
is right.  I don't think that RTL will ever contain noop return move 
insn involving the return hard register especially after removing hard 
reg propagation couple years ago, at least IRA/LRA do not generate such 
insns during its work.


So the patch is OK for me.  I specially like that the big part of code 
is removed.  No code, no problem (including performance one).  Thank you 
for the patch.



PR rtl-optimization/110587
* lra-spills.cc (return_regno_p): Remove.
(regno_in_use_p): Likewise.
(lra_final_code_change): Do not remove noop moves
between hard registers.
---
  gcc/lra-spills.cc | 69 +--
  1 file changed, 1 insertion(+), 68 deletions(-)

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 3a7bb7e8cd9..fe58f162d05 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -705,72 +705,6 @@ alter_subregs (rtx *loc, bool final_p)
return res;
  }

Re: [PING][PATCH] ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

2023-08-02 Thread Vladimir Makarov via Gcc-patches




On 8/1/23 01:20, Surya Kumari Jangala wrote:

Ping

Sorry for delay with the answer. I was on vacation.

On 21/07/23 3:43 pm, Surya Kumari Jangala via Gcc-patches wrote:

The improve_allocation() routine does not update the
allocated_hardreg_p[] array after an allocno is assigned a register.

If the register chosen in improve_allocation() is one that already has
been assigned to a conflicting allocno, then allocated_hardreg_p[]
already has the corresponding bit set to TRUE, so nothing needs to be
done.

But improve_allocation() can also choose a register that has not been
assigned to a conflicting allocno, and also has not been assigned to any
other allocno. In this case, allocated_hardreg_p[] has to be updated.

The patch is OK for me.  Thank you for finding and fixing this issue.

2023-07-21  Surya Kumari Jangala  

gcc/
PR rtl-optimization/PR110254
* ira-color.cc (improve_allocation): Update array


I guess you missed the next line in the changelog.  I suspect it should 
be "Update array allocated_hard_reg_p."


Please, fix it before committing the patch.


---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 1fb2958bddd..5807d6d26f6 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -3340,6 +3340,10 @@ improve_allocation (void)
}
/* Assign the best chosen hard register to A.  */
ALLOCNO_HARD_REGNO (a) = best;
+
+  for (j = nregs - 1; j >= 0; j--)
+   allocated_hardreg_p[best + j] = true;
+
if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
 best, ALLOCNO_NUM (a), ALLOCNO_REGNO (a));

[pushed][LRA] Check input insn pattern hard regs against early clobber hard regs for live info

2023-08-04 Thread Vladimir Makarov via Gcc-patches

The following patch fixes a problem found by LRA port for avr target.  
The problem description is in the commit message.


The patch was successfully bootstrapped and tested on x86-64 and aarch64.
commit abf953042ace471720c1dc284b5f38e546fc0595
Author: Vladimir N. Makarov 
Date:   Fri Aug 4 08:04:44 2023 -0400

LRA: Check input insn pattern hard regs against early clobber hard regs for live info

For the test case LRA generates wrong code for AVR cpymem_qi insn:

(insn 16 15 17 3 (parallel [
(set (mem:BLK (reg:HI 26 r26) [0  A8])
(mem:BLK (reg:HI 30 r30) [0  A8]))
(unspec [
(const_int 0 [0])
] UNSPEC_CPYMEM)
(use (reg:QI 52))
(clobber (reg:HI 26 r26))
(clobber (reg:HI 30 r30))
(clobber (reg:QI 0 r0))
(clobber (reg:QI 52))
]) "t.c":16:22 132 {cpymem_qi}

The insn gets the same value in r26 and r30.  The culprit is clobbering
r30 and using r30 as input.  For such situation LRA wrongly assumes that
r30 does not live before the insn.  The patch is fixing it.

gcc/ChangeLog:

* lra-lives.cc (process_bb_lives): Check input insn pattern hard regs
against early clobber hard regs.

gcc/testsuite/ChangeLog:

* gcc.target/avr/lra-cpymem_qi.c: New.

diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index f7a3ba8d76a..f60e564da82 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -989,7 +989,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	/* We can have early clobbered non-operand hard reg and
 	   the same hard reg as an insn input.  Don't make hard
 	   reg dead before the insns.  */
-	for (reg2 = curr_id->regs; reg2 != NULL; reg2 = reg2->next)
+	for (reg2 = curr_static_id->hard_regs; reg2 != NULL; reg2 = reg2->next)
 	  if (reg2->type != OP_OUT && reg2->regno == reg->regno)
 		break;
 	if (reg2 == NULL)
diff --git a/gcc/testsuite/gcc.target/avr/lra-cpymem_qi.c b/gcc/testsuite/gcc.target/avr/lra-cpymem_qi.c
new file mode 100644
index 000..fdffb445b45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/lra-cpymem_qi.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mmcu=avr51 -Os" } */
+
+#include 
+
+struct A
+{
+  unsigned int a;
+  unsigned char c1, c2;
+  bool b1 : 1;
+};
+
+void
+foo (const struct A *x, int y)
+{
+  int s = 0, i;
+  for (i = 0; i < y; ++i)
+{
+  const struct A a = x[i];
+  s += a.b1 ? 1 : 0;
+}
+  if (s != 0)
+__builtin_abort ();
+}
+
+/* { dg-final { scan-assembler-not "movw\[^\n\r]*r26,r30" } } */

Re: [PATCH] rtl-optimization/110587 - speedup find_hard_regno_for_1

2023-08-08 Thread Vladimir Makarov via Gcc-patches




On 8/7/23 09:18, Richard Biener wrote:

On Wed, 2 Aug 2023, Richard Biener wrote:


On Mon, 31 Jul 2023, Jeff Law wrote:



On 7/31/23 04:54, Richard Biener via Gcc-patches wrote:

On Tue, 25 Jul 2023, Richard Biener wrote:


The following applies a micro-optimization to find_hard_regno_for_1,
re-ordering the check so we can easily jump-thread by using an else.
This reduces the time spent in this function by 15% for the testcase
in the PR.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, OK if that
passes?

Ping.


Thanks,
Richard.

  PR rtl-optimization/110587
  * lra-assigns.cc (find_hard_regno_for_1): Re-order checks.
---
   gcc/lra-assigns.cc | 9 +
   1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index b8582dcafff..d2ebcfd5056 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -522,14 +522,15 @@ find_hard_regno_for_1 (int regno, int *cost, int
@@ try_only_hard_regno,
   r2 != NULL;
   r2 = r2->start_next)
{
- if (r2->regno >= lra_constraint_new_regno_start
+ if (live_pseudos_reg_renumber[r2->regno] < 0
+ && r2->regno >= lra_constraint_new_regno_start
   && lra_reg_info[r2->regno].preferred_hard_regno1 >= 0
- && live_pseudos_reg_renumber[r2->regno] < 0
   && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
 sparseset_set_bit (conflict_reload_and_inheritance_pseudos,
   r2->regno);
- if (live_pseudos_reg_renumber[r2->regno] >= 0
- && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
+ else if (live_pseudos_reg_renumber[r2->regno] >= 0
+  && rclass_intersect_p
+   [regno_allocno_class_array[r2->regno]])
 sparseset_set_bit (live_range_hard_reg_pseudos, r2->regno);

My biggest concern here would be r2->regno < 0  in the new code which could
cause an OOB array reference in the first condition of the test.

Isn't that the point if the original ordering?  Test that r2->regno is
reasonable before using it as an array index?

Note the original code is

   if (r2->regno >= lra_constraint_new_regno_start
...
  if (live_pseudos_reg_renumber[r2->regno] >= 0
...

so we are going to access live_pseudos_reg_renumber[r2->regno]
independent on the r2->regno >= lra_constraint_new_regno_start check,
so I don't think that's the point of the original ordering.  Note
I preserved the ordering with respect to other array accesses,
the speedup seen is because we now have the


if (live_pseudos_reg_renumber[r2->regno] < 0
...
else if (live_pseudos_reg_renumber[r2->regno] >= 0
 ...

structure directly exposed which helps the compiler.

I think the check on r2->regno is to decide whether to alter
conflict_reload_and_inheritance_pseudos or
live_range_hard_reg_pseudos (so it's also somewhat natural to check
that first).

So - OK?


Richard, sorry, I overlooked this thread.

Yes, it is OK to commit.  In general Jeff has a reasonable concern but 
in this case r2->regno is always >= 0 and I can not imagine reasons that 
we will change algorithm in the future in such way when it is not true.

[pushed][LRA]: Implement output stack pointer reloads

2023-08-11 Thread Vladimir Makarov via Gcc-patches

Sorry, I had some problems with email.  Therefore there are email 
duplication and they were sent to g...@gcc.gnu.org instead of 
gcc-patches@gcc.gnu.org



On 8/9/23 16:54, Vladimir Makarov wrote:




On 8/9/23 07:15, senthilkumar.selva...@microchip.com wrote:

Hi,

   After turning on FP -> SP elimination after Vlad fixed
   an elimination issue in 
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832,

   I'm now running into reload failure if arithmetic is done on SP.

I think we can permit to stack pointer output reloads.  The only thing 
we need to update sp offset accurately for the original and reload 
insns.  I'll try to make the patch on this week.



The following patch fixes the problem.  The patch was successfully 
bootstrapped and tested on x86_64, aarch64, and ppc64le.


The test case is actually one from GCC test suite.

commit c0121083d07ffd4a8424f4be50de769d9ad0386d
Author: Vladimir N. Makarov 
Date:   Fri Aug 11 07:57:37 2023 -0400

[LRA]: Implement output stack pointer reloads

LRA prohibited output stack pointer reloads but it resulted in LRA
failure for AVR target which has no arithmetic insns working with the
stack pointer register.  Given patch implements the output stack
pointer reloads.

gcc/ChangeLog:

* lra-constraints.cc (goal_alt_out_sp_reload_p): New flag.
(process_alt_operands): Set the flag.
(curr_insn_transform): Modify stack pointer offsets if output
stack pointer reload is generated.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 09ff6de1657..26239908747 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1466,6 +1466,8 @@ static int goal_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
 static bool goal_alt_swapped;
 /* The chosen insn alternative.	 */
 static int goal_alt_number;
+/* True if output reload of the stack pointer should be generated.  */
+static bool goal_alt_out_sp_reload_p;
 
 /* True if the corresponding operand is the result of an equivalence
substitution.  */
@@ -2128,6 +2130,9 @@ process_alt_operands (int only_alternative)
   int curr_alt_dont_inherit_ops_num;
   /* Numbers of operands whose reload pseudos should not be inherited.	*/
   int curr_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+  /* True if output stack pointer reload should be generated for the current
+ alternative.  */
+  bool curr_alt_out_sp_reload_p;
   rtx op;
   /* The register when the operand is a subreg of register, otherwise the
  operand itself.  */
@@ -2211,7 +2216,8 @@ process_alt_operands (int only_alternative)
 	}
   reject += static_reject;
   early_clobbered_regs_num = 0;
-
+  curr_alt_out_sp_reload_p = false;
+  
   for (nop = 0; nop < n_operands; nop++)
 	{
 	  const char *p;
@@ -2682,12 +2688,10 @@ process_alt_operands (int only_alternative)
 	  bool no_regs_p;
 
 	  reject += op_reject;
-	  /* Never do output reload of stack pointer.  It makes
-		 impossible to do elimination when SP is changed in
-		 RTL.  */
-	  if (op == stack_pointer_rtx && ! frame_pointer_needed
+	  /* Mark output reload of the stack pointer.  */
+	  if (op == stack_pointer_rtx
 		  && curr_static_id->operand[nop].type != OP_IN)
-		goto fail;
+		curr_alt_out_sp_reload_p = true;
 
 	  /* If this alternative asks for a specific reg class, see if there
 		 is at least one allocatable register in that class.  */
@@ -3317,6 +3321,7 @@ process_alt_operands (int only_alternative)
 	  for (nop = 0; nop < curr_alt_dont_inherit_ops_num; nop++)
 	goal_alt_dont_inherit_ops[nop] = curr_alt_dont_inherit_ops[nop];
 	  goal_alt_swapped = curr_swapped;
+	  goal_alt_out_sp_reload_p = curr_alt_out_sp_reload_p;
 	  best_overall = overall;
 	  best_losers = losers;
 	  best_reload_nregs = reload_nregs;
@@ -4836,6 +4841,27 @@ curr_insn_transform (bool check_only_p)
 	lra_asm_insn_error (curr_insn);
 }
   lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
+  if (goal_alt_out_sp_reload_p)
+{
+  /* We have an output stack pointer reload -- update sp offset: */
+  rtx set;
+  bool done_p = false;
+  poly_int64 sp_offset = curr_id->sp_offset;
+  for (rtx_insn *insn = after; insn != NULL_RTX; insn = NEXT_INSN (insn))
+	if ((set = single_set (insn)) != NULL_RTX
+	&& SET_DEST (set) == stack_pointer_rtx)
+	  {
+	lra_assert (!done_p);
+	curr_id->sp_offset = 0;
+	lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
+	id->sp_offset = sp_offset;
+	if (lra_dump_file != NULL)
+	  fprintf (lra_dump_file,
+		   "Moving sp offset from insn %u to %u\n",
+		   INSN_UID (curr_insn), INSN_UID (insn));
+	  }
+  lra_assert (!done_p);
+}
   return change_p;
 }

[pushed]LRA]: Fix asserts for output stack pointer reloads

2023-08-13 Thread Vladimir Makarov via Gcc-patches

The following patch fixes useless asserts in my latest patch 
implementing output stack pointer reloads.
commit 18b417fe1a46d37738243267c1f559cd0acc4886
Author: Vladimir N. Makarov 
Date:   Sun Aug 13 20:54:58 2023 -0400

[LRA]: Fix asserts for output stack pointer reloads

The patch implementing output stack pointer reloads contained superfluous
asserts.  The patch makes them useful.

gcc/ChangeLog:

* lra-constraints.cc (curr_insn_transform): Set done_p up and
check it on true after processing output stack pointer reload.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 26239908747..8d9443adeb6 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4852,6 +4852,7 @@ curr_insn_transform (bool check_only_p)
&& SET_DEST (set) == stack_pointer_rtx)
  {
lra_assert (!done_p);
+   done_p = true;
curr_id->sp_offset = 0;
lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
id->sp_offset = sp_offset;
@@ -4860,7 +4861,7 @@ curr_insn_transform (bool check_only_p)
   "Moving sp offset from insn %u to %u\n",
   INSN_UID (curr_insn), INSN_UID (insn));
  }
-  lra_assert (!done_p);
+  lra_assert (done_p);
 }
   return change_p;
 }

Re: [pushed]LRA]: Fix asserts for output stack pointer reloads

2023-08-14 Thread Vladimir Makarov via Gcc-patches




On 8/14/23 14:37, Prathamesh Kulkarni wrote:

On Mon, 14 Aug 2023 at 06:39, Vladimir Makarov via Gcc-patches
 wrote:

The following patch fixes useless asserts in my latest patch
implementing output stack pointer reloads.

Hi Vladimir,
It seems that this patch caused the following ICE on aarch64-linux-gnu
while building cp-demangle.c:
compile:  
/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/stage1-build/./gcc/xgcc
-B/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/stage1-build/./gcc/
-B/usr/local/aarch64-unknown-linux-gnu/bin/
-B/usr/local/aarch64-unknown-linux-gnu/lib/ -isystem
/usr/local/aarch64-unknown-linux-gnu/include -isystem
/usr/local/aarch64-unknown-linux-gnu/sys-include -DHAVE_CONFIG_H -I..
-I/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/gcc/libstdc++-v3/../libiberty
-I/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/gcc/libstdc++-v3/../include
-D_GLIBCXX_SHARED
-I/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/stage1-build/aarch64-unknown-linux-gnu/libstdc++-v3/include/aarch64-unknown-linux-gnu
-I/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/stage1-build/aarch64-unknown-linux-gnu/libstdc++-v3/include
-I/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/gcc/libstdc++-v3/libsupc++
-g -O2 -DIN_GLIBCPP_V3 -Wno-error -c cp-demangle.c  -fPIC -DPIC -o
cp-demangle.o
during RTL pass: reload
cp-demangle.c: In function ‘d_demangle_callback.constprop’:
cp-demangle.c:6815:1: internal compiler error: in curr_insn_transform,
at lra-constraints.cc:4854
  6815 | }
   | ^
0xce6b37 curr_insn_transform
 ../../gcc/gcc/lra-constraints.cc:4854
0xce7887 lra_constraints(bool)
 ../../gcc/gcc/lra-constraints.cc:5478
0xccdfa7 lra(_IO_FILE*)
 ../../gcc/gcc/lra.cc:2419
0xc7e417 do_reload
 ../../gcc/gcc/ira.cc:5970
0xc7e417 execute
 ../../gcc/gcc/ira.cc:6156
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.


Sorry, I should have bootstrapped my patch on aarch64.

The asserts actually seems very useful as I found they caught a bug in 
my previous patch.


I'll push a patch fixing the problems after finishing bootstraps, 
probably in couple hours.


Thank you

[pushed][LRA]: Process output stack pointer reloads before emitting reload insns

2023-08-14 Thread Vladimir Makarov via Gcc-patches


The patch fixes a failure of building aarch64 port with my yesterday patch.

The patch was successfully bootstrapped on x86-64 and aarch64.
commit c4760c0161f92b92361feba11836e3d066bb330c
Author: Vladimir N. Makarov 
Date:   Mon Aug 14 16:06:27 2023 -0400

[LRA]: Process output stack pointer reloads before emitting reload insns

Previous patch setting up asserts for processing stack pointer reloads
caught an error in code moving sp offset.  This resulted in failure of
building aarch64 port. The code wrongly processed insns beyond the
output reloads of the current insn.  This patch fixes it.

gcc/ChangeLog:

* lra-constraints.cc (curr_insn_transform): Process output stack
pointer reloads before emitting reload insns.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 8d9443adeb6..c718bedff32 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4840,7 +4840,6 @@ curr_insn_transform (bool check_only_p)
/* Most probably there are no enough registers to satisfy asm insn: */
lra_asm_insn_error (curr_insn);
 }
-  lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
   if (goal_alt_out_sp_reload_p)
 {
   /* We have an output stack pointer reload -- update sp offset: */
@@ -4863,6 +4862,7 @@ curr_insn_transform (bool check_only_p)
  }
   lra_assert (done_p);
 }
+  lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
   return change_p;
 }

[pushed][LRA]: Spill pseudos assigned to fp when fp->sp elimination became impossible

2023-08-16 Thread Vladimir Makarov via Gcc-patches

The attached patch fixes recently found wrong insn removal in LRA port 
for AVR.


The patch was successfully tested and bootstrapped on x86-64 and aarch64.


commit 748a77558ff37761faa234e19327ad1decaace33
Author: Vladimir N. Makarov 
Date:   Wed Aug 16 09:13:54 2023 -0400

[LRA]: Spill pseudos assigned to fp when fp->sp elimination became 
impossible

Porting LRA to AVR revealed that creating a stack slot can make fp->sp
elimination impossible.  The previous patches undoes fp assignment after
the stack slot creation but calculated wrongly live info after this.  This
resulted in wrong generation by deleting some still alive insns.  This
patch fixes this problem.

gcc/ChangeLog:

* lra-int.h (lra_update_fp2sp_elimination): Change the prototype.
* lra-eliminations.cc (spill_pseudos): Record spilled pseudos.
(lra_update_fp2sp_elimination): Ditto.
(update_reg_eliminate): Adjust spill_pseudos call.
* lra-spills.cc (lra_spill): Assign stack slots to pseudos spilled
in lra_update_fp2sp_elimination.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 1f4e3fec9e0..3c58d4a3815 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1086,18 +1086,18 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, 
bool first_p,
   lra_update_insn_recog_data (insn);
 }
 
-/* Spill pseudos which are assigned to hard registers in SET.  Add
-   affected insns for processing in the subsequent constraint
-   pass.  */
-static void
-spill_pseudos (HARD_REG_SET set)
+/* Spill pseudos which are assigned to hard registers in SET, record them in
+   SPILLED_PSEUDOS unless it is null, and return the recorded pseudos number.
+   Add affected insns for processing in the subsequent constraint pass.  */
+static int
+spill_pseudos (HARD_REG_SET set, int *spilled_pseudos)
 {
-  int i;
+  int i, n;
   bitmap_head to_process;
   rtx_insn *insn;
 
   if (hard_reg_set_empty_p (set))
-return;
+return 0;
   if (lra_dump_file != NULL)
 {
   fprintf (lra_dump_file, "   Spilling non-eliminable hard regs:");
@@ -1107,6 +1107,7 @@ spill_pseudos (HARD_REG_SET set)
   fprintf (lra_dump_file, "\n");
 }
   bitmap_initialize (&to_process, ®_obstack);
+  n = 0;
   for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
 if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0
&& overlaps_hard_reg_set_p (set,
@@ -1116,6 +1117,8 @@ spill_pseudos (HARD_REG_SET set)
  fprintf (lra_dump_file, "  Spilling r%d(%d)\n",
   i, reg_renumber[i]);
reg_renumber[i] = -1;
+   if (spilled_pseudos != NULL)
+ spilled_pseudos[n++] = i;
bitmap_ior_into (&to_process, &lra_reg_info[i].insn_bitmap);
   }
   lra_no_alloc_regs |= set;
@@ -1126,6 +1129,7 @@ spill_pseudos (HARD_REG_SET set)
lra_set_used_insn_alternative (insn, LRA_UNKNOWN_ALT);
   }
   bitmap_clear (&to_process);
+  return n;
 }
 
 /* Update all offsets and possibility for elimination on eliminable
@@ -1238,7 +1242,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
   }
   lra_no_alloc_regs |= temp_hard_reg_set;
   eliminable_regset &= ~temp_hard_reg_set;
-  spill_pseudos (temp_hard_reg_set);
+  spill_pseudos (temp_hard_reg_set, NULL);
   return result;
 }
 
@@ -1382,15 +1386,17 @@ process_insn_for_elimination (rtx_insn *insn, bool 
final_p, bool first_p)
 
 /* Update frame pointer to stack pointer elimination if we started with
permitted frame pointer elimination and now target reports that we can not
-   do this elimination anymore.  */
-void
-lra_update_fp2sp_elimination (void)
+   do this elimination anymore.  Record spilled pseudos in SPILLED_PSEUDOS
+   unless it is null, and return the recorded pseudos number.  */
+int
+lra_update_fp2sp_elimination (int *spilled_pseudos)
 {
+  int n;
   HARD_REG_SET set;
   class lra_elim_table *ep;
 
   if (frame_pointer_needed || !targetm.frame_pointer_required ())
-return;
+return 0;
   gcc_assert (!elimination_fp2sp_occured_p);
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
@@ -1398,10 +1404,11 @@ lra_update_fp2sp_elimination (void)
   frame_pointer_needed = true;
   CLEAR_HARD_REG_SET (set);
   add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
-  spill_pseudos (set);
+  n = spill_pseudos (set, spilled_pseudos);
   for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
 if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)
   setup_can_eliminate (ep, false);
+  return n;
 }
 
 /* Entry function to do final elimination if FINAL_P or to update
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 633d9af8058..d0752c2ae50 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -414,7 +414,7 @@ extern int lra_get_elimination_hard_regno (int);
 extern rtx lra_eliminate_regs_1 (rtx_insn *, rtx, machine_mode,
 boo

[pushed][LRA]: When assigning stack slots to pseudos previously assigned to fp consider other spilled pseudos

2023-08-17 Thread Vladimir Makarov via Gcc-patches

The following patch fixes a problem with allocating the same stack slots 
to conflicting pseudos.  The problem exists only for AVR LRA port.


The patch was successfully bootstrapped and tested on x86-64 and aarch64.

commit c024867d1aa9d465e0236fc9d45d8e1d4bb6bd30
Author: Vladimir N. Makarov 
Date:   Thu Aug 17 11:57:45 2023 -0400

[LRA]: When assigning stack slots to pseudos previously assigned to fp 
consider other spilled pseudos

The previous LRA patch can assign slot of conflicting pseudos to
pseudos spilled after prohibiting fp->sp elimination.  This patch
fixes this problem.

gcc/ChangeLog:

* lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Moving
slots_num initialization from here ...
(lra_spill): ... to here before the 1st call of
assign_stack_slot_num_and_sort_pseudos.  Add the 2nd call after
fp->sp elimination.

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 7e1d35b5e4e..a663a1931e3 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -363,7 +363,6 @@ assign_stack_slot_num_and_sort_pseudos (int *pseudo_regnos, 
int n)
 {
   int i, j, regno;
 
-  slots_num = 0;
   /* Assign stack slot numbers to spilled pseudos, use smaller numbers
  for most frequently used pseudos. */
   for (i = 0; i < n; i++)
@@ -628,6 +627,7 @@ lra_spill (void)
   /* Sort regnos according their usage frequencies.  */
   qsort (pseudo_regnos, n, sizeof (int), regno_freq_compare);
   n = assign_spill_hard_regs (pseudo_regnos, n);
+  slots_num = 0;
   assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n);
   for (i = 0; i < n; i++)
 if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
@@ -635,6 +635,7 @@ lra_spill (void)
   if ((n2 = lra_update_fp2sp_elimination (pseudo_regnos)) > 0)
 {
   /* Assign stack slots to spilled pseudos assigned to fp.  */
+  assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n2);
   for (i = 0; i < n2; i++)
if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
  assign_mem_slot (pseudo_regnos[i]);

Re: [pushed][LRA]: Spill pseudos assigned to fp when fp->sp elimination became impossible

2023-08-17 Thread Vladimir Makarov via Gcc-patches




On 8/17/23 07:19, senthilkumar.selva...@microchip.com wrote:

On Wed, 2023-08-16 at 12:13 -0400, Vladimir Makarov wrote:

EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
content is safe

The attached patch fixes recently found wrong insn removal in LRA port
for AVR.

The patch was successfully tested and bootstrapped on x86-64 and aarch64.



Hi Vladimir,

   Thanks for working on this. After applying the patch, I'm seeing that the
   pseudo in the frame pointer that got spilled is taking up the same stack
   slot that was already assigned to a spilled pseudo, and that is causing 
execution
   failure (it is also causing a crash when building libgcc for avr)

...
   I tried a hacky workaround (see patch below) to create a new stack slot and
   assign the spilled pseudo to it, and that works.
   
   Not sure if that's the right way to do it though.


The general way of solution is right but I've just committed a different 
version of the patch.

[pushed] [RA] [PR110215] Ignore conflicts for some pseudos from insns throwing a final exception

2023-06-16 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

The patch was successfully tested and bootstrapped on x86-64, aarch64, 
and ppc64le.


It is difficult to make a stable test for the PR.  So there is not test 
in the patch.


commit 154c69039571c66b3a6d16ecfa9e6ff22942f59f
Author: Vladimir N. Makarov 
Date:   Fri Jun 16 11:12:32 2023 -0400

RA: Ignore conflicts for some pseudos from insns throwing a final exception

IRA adds conflicts to the pseudos from insns can throw exceptions
internally even if the exception code is final for the function and
the pseudo value is not used in the exception code.  This results in
spilling a pseudo in a loop (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215).

The following patch fixes the problem.

PR rtl-optimization/110215

gcc/ChangeLog:

* ira-lives.cc: Include except.h.
(process_bb_node_lives): Ignore conflicts from cleanup exceptions
when the pseudo does not live at the exception landing pad.

diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index 6a3901ee234..bc8493856a4 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ira-int.h"
 #include "sparseset.h"
 #include "function-abi.h"
+#include "except.h"
 
 /* The code in this file is similar to one in global but the code
works on the allocno basis and creates live ranges instead of
@@ -1383,14 +1384,24 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 		  SET_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj));
 		  SET_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
 		}
-		  if (can_throw_internal (insn))
+		  eh_region r;
+		  eh_landing_pad lp;
+		  rtx_code_label *landing_label;
+		  basic_block landing_bb;
+		  if (can_throw_internal (insn)
+		  && (r = get_eh_region_from_rtx (insn)) != NULL
+		  && (lp = gen_eh_landing_pad (r)) != NULL
+		  && (landing_label = lp->landing_pad) != NULL
+		  && (landing_bb = BLOCK_FOR_INSN (landing_label)) != NULL
+		  && (r->type != ERT_CLEANUP
+			  || bitmap_bit_p (df_get_live_in (landing_bb),
+	   ALLOCNO_REGNO (a
 		{
-		  OBJECT_CONFLICT_HARD_REGS (obj)
-			|= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
-		  OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)
-			|= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
+		  HARD_REG_SET new_conflict_regs
+			= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
+		  OBJECT_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
+		  OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
 		}
-
 		  if (sparseset_bit_p (allocnos_processed, num))
 		continue;
 		  sparseset_set_bit (allocnos_processed, num);

[pushed][LRA][PR110372]: Refine reload pseudo class

2023-07-07 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110372

The patch was successfully bootstrapped and tested on x86-64.
commit 1f7e5a7b91862b999aab88ee0319052aaf00f0f1
Author: Vladimir N. Makarov 
Date:   Fri Jul 7 09:53:38 2023 -0400

LRA: Refine reload pseudo class

For given testcase a reload pseudo happened to occur only in reload
insns created on one constraint sub-pass.  Therefore its initial class
(ALL_REGS) was not refined and the reload insns were not processed on
the next constraint sub-passes.  This resulted into the wrong insn.

PR rtl-optimization/110372

gcc/ChangeLog:

* lra-assigns.cc (assign_by_spills): Add reload insns involving
reload pseudos with non-refined class to be processed on the next
sub-pass.
* lra-constraints.cc (enough_allocatable_hard_regs_p): New func.
(in_class_p): Use it.
(print_curr_insn_alt): New func.
(process_alt_operands): Use it.  Improve debug info.
(curr_insn_transform): Use print_curr_insn_alt.  Refine reload
pseudo class if it is not refined yet.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110372.c: New.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index 73fbef29912..2f95121df06 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1443,10 +1443,11 @@ assign_by_spills (void)
 		 pass.  Indicate that it is no longer spilled.  */
 	  bitmap_clear_bit (&all_spilled_pseudos, regno);
 	  assign_hard_regno (hard_regno, regno);
-	  if (! reload_p)
-		/* As non-reload pseudo assignment is changed we
-		   should reconsider insns referring for the
-		   pseudo.  */
+	  if (! reload_p || regno_allocno_class_array[regno] == ALL_REGS)
+		/* As non-reload pseudo assignment is changed we should
+		   reconsider insns referring for the pseudo.  Do the same if a
+		   reload pseudo did not refine its class which can happens
+		   when the pseudo occurs only in reload insns.  */
 		bitmap_set_bit (&changed_pseudo_bitmap, regno);
 	}
 	}
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 4dc2d70c402..123ff662cbc 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -233,6 +233,34 @@ get_reg_class (int regno)
   return NO_REGS;
 }
 
+/* Return true if REG_CLASS has enough allocatable hard regs to keep value of
+   REG_MODE.  */
+static bool
+enough_allocatable_hard_regs_p (enum reg_class reg_class,
+enum machine_mode reg_mode)
+{
+  int i, j, hard_regno, class_size, nregs;
+  
+  if (hard_reg_set_subset_p (reg_class_contents[reg_class], lra_no_alloc_regs))
+return false;
+  class_size = ira_class_hard_regs_num[reg_class];
+  for (i = 0; i < class_size; i++)
+{
+  hard_regno = ira_class_hard_regs[reg_class][i];
+  nregs = hard_regno_nregs (hard_regno, reg_mode);
+  if (nregs == 1)
+	return true;
+  for (j = 0; j < nregs; j++)
+	if (TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno + j)
+	|| ! TEST_HARD_REG_BIT (reg_class_contents[reg_class],
+hard_regno + j))
+	  break;
+  if (j >= nregs)
+	return true;
+}
+  return false;
+}
+
 /* Return true if REG satisfies (or will satisfy) reg class constraint
CL.  Use elimination first if REG is a hard register.  If REG is a
reload pseudo created by this constraints pass, assume that it will
@@ -252,7 +280,6 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class,
   enum reg_class rclass, common_class;
   machine_mode reg_mode;
   rtx src;
-  int class_size, hard_regno, nregs, i, j;
   int regno = REGNO (reg);
 
   if (new_class != NULL)
@@ -291,26 +318,7 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class,
   common_class = ira_reg_class_subset[rclass][cl];
   if (new_class != NULL)
 	*new_class = common_class;
-  if (hard_reg_set_subset_p (reg_class_contents[common_class],
- lra_no_alloc_regs))
-	return false;
-  /* Check that there are enough allocatable regs.  */
-  class_size = ira_class_hard_regs_num[common_class];
-  for (i = 0; i < class_size; i++)
-	{
-	  hard_regno = ira_class_hard_regs[common_class][i];
-	  nregs = hard_regno_nregs (hard_regno, reg_mode);
-	  if (nregs == 1)
-	return true;
-	  for (j = 0; j < nregs; j++)
-	if (TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno + j)
-		|| ! TEST_HARD_REG_BIT (reg_class_contents[common_class],
-	hard_regno + j))
-	  break;
-	  if (j >= nregs)
-	return true;
-	}
-  return false;
+  return enough_allocatable_hard_regs_p (common_class, reg_mode);
 }
 }
 
@@ -2046,6 +2054,23 @@ update_and_check_small_class_inputs (int nop, int nalt,
   return false;
 }
 
+/* Print operand constraints for alternative ALT_NUMBER of the current
+   insn.  */
+static void
+print_curr_insn_alt (int alt_number)
+{
+  for (int i = 0; i < curr_static_id->n_operands; i++)
+{
+  const char *p

[pushed] [PR108774] RA: Clear reg equiv caller_save_p flag when clearing defined_p flag

2023-02-13 Thread Vladimir Makarov via Gcc-patches


The following patch solves

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108774

The patch was successfully bootstrapped and tested on i686, x86_64, and 
aarch64.
commit a33e3dcbd15e73603796e30b5eeec11a0c8bacec
Author: Vladimir N. Makarov 
Date:   Mon Feb 13 16:05:04 2023 -0500

RA: Clear reg equiv caller_save_p flag when clearing defined_p flag

IRA can invalidate initially setup equivalence in setup_reg_equiv.
Flag caller_saved was not cleared during invalidation although
init_insns were cleared.  It resulted in segmentation fault in
get_equiv.  Clearing the flag solves the problem.  For more
precaution I added clearing the flag in other places too although it
might be not necessary.

PR rtl-optimization/108774

gcc/ChangeLog:

* ira.cc (ira_update_equiv_info_by_shuffle_insn): Clear equiv
caller_save_p flag when clearing defined_p flag.
(setup_reg_equiv): Ditto.
* lra-constraints.cc (lra_constraints): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108774.c: New.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 9f9af808f63..6c7f4901e4c 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -2725,6 +2725,7 @@ ira_update_equiv_info_by_shuffle_insn (int to_regno, int from_regno, rtx_insn *i
 	  return;
 	}
   ira_reg_equiv[to_regno].defined_p = false;
+  ira_reg_equiv[to_regno].caller_save_p = false;
   ira_reg_equiv[to_regno].memory
 	= ira_reg_equiv[to_regno].constant
 	= ira_reg_equiv[to_regno].invariant
@@ -4193,6 +4194,7 @@ setup_reg_equiv (void)
 			if (ira_reg_equiv[i].memory == NULL_RTX)
 			  {
 			ira_reg_equiv[i].defined_p = false;
+			ira_reg_equiv[i].caller_save_p = false;
 			ira_reg_equiv[i].init_insns = NULL;
 			break;
 			  }
@@ -4203,6 +4205,7 @@ setup_reg_equiv (void)
 	  }
 	  }
 	ira_reg_equiv[i].defined_p = false;
+	ira_reg_equiv[i].caller_save_p = false;
 	ira_reg_equiv[i].init_insns = NULL;
 	break;
   }
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index dd4f68bbfc0..dbfaf0485a5 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5100,7 +5100,8 @@ lra_constraints (bool first_p)
 			 && (targetm.preferred_reload_class
 			 (x, lra_get_allocno_class (i)) == NO_REGS))
 			|| contains_symbol_ref_p (x
-	  ira_reg_equiv[i].defined_p = false;
+	  ira_reg_equiv[i].defined_p
+		= ira_reg_equiv[i].caller_save_p = false;
 	if (contains_reg_p (x, false, true))
 	  ira_reg_equiv[i].profitable_p = false;
 	if (get_equiv (reg) != reg)
diff --git a/gcc/testsuite/gcc.target/i386/pr108774.c b/gcc/testsuite/gcc.target/i386/pr108774.c
new file mode 100644
index 000..482bc490cde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108774.c
@@ -0,0 +1,11 @@
+/* PR target/108774 */
+/* { dg-do compile  { target x86_64-*-* } } */
+/* { dg-options "-Os -ftrapv -mcmodel=large" } */
+
+int i, j;
+
+void
+foo (void)
+{
+  i = ((1 << j) - 1) >> j;
+}

[pushed][PR90706] IRA: Use minimal cost for hard register movement

2023-03-02 Thread Vladimir Makarov via Gcc-patches


The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

The patch was successfully bootstrapped and tested on i686, x86-64, 
aarch64, ppc64le.


commit 23661e39df76e07fb4ce1ea015379c7601d947ef
Author: Vladimir N. Makarov 
Date:   Thu Mar 2 16:29:05 2023 -0500

IRA: Use minimal cost for hard register movement

This is the 2nd attempt to fix PR90706.  IRA calculates wrong AVR
costs for moving general hard regs of SFmode.  This was the reason for
spilling a pseudo in the PR.  In this patch we use smaller move cost
of hard reg in its natural and operand modes.

PR rtl-optimization/90706

gcc/ChangeLog:

* ira-costs.cc: Include print-rtl.h.
(record_reg_classes, scan_one_insn): Add code to print debug info.
(record_operand_costs): Find and use smaller cost for hard reg
move.

gcc/testsuite/ChangeLog:

* gcc.target/avr/pr90706.c: New.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index 4c28171f27d..c0fdef807dd 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ira-int.h"
 #include "addresses.h"
 #include "reload.h"
+#include "print-rtl.h"
 
 /* The flags is set up every time when we calculate pseudo register
classes through function ira_set_pseudo_classes.  */
@@ -503,6 +504,18 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
   int insn_allows_mem[MAX_RECOG_OPERANDS];
   move_table *move_in_cost, *move_out_cost;
   short (*mem_cost)[2];
+  const char *p;
+
+  if (ira_dump_file != NULL && internal_flag_ira_verbose > 5)
+{
+  fprintf (ira_dump_file, "Processing insn %u", INSN_UID (insn));
+  if (INSN_CODE (insn) >= 0
+	  && (p = get_insn_name (INSN_CODE (insn))) != NULL)
+	fprintf (ira_dump_file, " {%s}", p);
+  fprintf (ira_dump_file, " (freq=%d)\n",
+	   REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn)));
+  dump_insn_slim (ira_dump_file, insn);
+  }
 
   for (i = 0; i < n_ops; i++)
 insn_allows_mem[i] = 0;
@@ -526,6 +539,21 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
 	  continue;
 	}
 
+  if (ira_dump_file != NULL && internal_flag_ira_verbose > 5)
+	{
+	  fprintf (ira_dump_file, "  Alt %d:", alt);
+	  for (i = 0; i < n_ops; i++)
+	{
+	  p = constraints[i];
+	  if (*p == '\0')
+		continue;
+	  fprintf (ira_dump_file, "  (%d) ", i);
+	  for (; *p != '\0' && *p != ',' && *p != '#'; p++)
+		fputc (*p, ira_dump_file);
+	}
+	  fprintf (ira_dump_file, "\n");
+	}
+
   for (i = 0; i < n_ops; i++)
 	{
 	  unsigned char c;
@@ -593,12 +621,16 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
 		 register, this alternative can't be used.  */
 
 		  if (classes[j] == NO_REGS)
-		alt_fail = 1;
-		  /* Otherwise, add to the cost of this alternative
-		 the cost to copy the other operand to the hard
-		 register used for this operand.  */
+		{
+		  alt_fail = 1;
+		}
 		  else
-		alt_cost += copy_cost (ops[j], mode, classes[j], 1, NULL);
+		/* Otherwise, add to the cost of this alternative the cost
+		   to copy the other operand to the hard register used for
+		   this operand.  */
+		{
+		  alt_cost += copy_cost (ops[j], mode, classes[j], 1, NULL);
+		}
 		}
 	  else
 		{
@@ -1021,18 +1053,45 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
   for (i = 0; i < n_ops; i++)
 	if (REG_P (ops[i]) && REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER)
 	  {
+	int old_cost;
+	bool cost_change_p = false;
 	struct costs *pp = op_costs[i], *qq = this_op_costs[i];
 	int *pp_costs = pp->cost, *qq_costs = qq->cost;
 	int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);
 	cost_classes_t cost_classes_ptr
 	  = regno_cost_classes[REGNO (ops[i])];
 
-	pp->mem_cost = MIN (pp->mem_cost,
+	old_cost = pp->mem_cost;
+	pp->mem_cost = MIN (old_cost,
 (qq->mem_cost + op_cost_add) * scale);
 
+	if (ira_dump_file != NULL && internal_flag_ira_verbose > 5
+		&& pp->mem_cost < old_cost)
+	  {
+		cost_change_p = true;
+		fprintf (ira_dump_file, "op %d(r=%u) new costs MEM:%d",
+			 i, REGNO(ops[i]), pp->mem_cost);
+	  }
 	for (k = cost_classes_ptr->num - 1; k >= 0; k--)
-	  pp_costs[k]
-		= MIN (pp_costs[k], (qq_costs[k] + op_cost_add) * scale);
+	  {
+		old_cost = pp_costs[k];
+		pp_costs[k]
+		  = MIN (old_cost, (qq_costs[k] + op_cost_add) * scale);
+		if (ira_dump_file != NULL && internal_flag_ira_verbose > 5
+		&& pp_costs[k] < old_cost)
+		  {
+		if (!cost_change_p)
+		  fprintf (ira_dump_file, "op %d(r=%u) new costs",
+			   i, REGNO(ops[i]));
+		cost_change_p = true;
+		fprintf (ira_dump_file, " %s:%d",
+			 reg_class_names[cost_classes_ptr->classes[k]],
+			 pp_costs[k]);
+		  }
+	  }
+	if (ira_dump_file != NULL && internal_flag_ira_verbose > 5
+

[pushed] [PR108999] LRA: For clobbered regs use operand mode instead of the biggest mode

2023-03-09 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108999

The patch was successfully bootstrapped and tested on i686, x86-64, 
aarch64, and ppc64 be/le.
commit 3c75631fc09a22f2513fab80ef502c2a8b0f9121
Author: Vladimir N. Makarov 
Date:   Thu Mar 9 08:41:09 2023 -0500

LRA: For clobbered regs use operand mode instead of the biggest mode

LRA is too conservative in calculation of conflicts with clobbered regs by
using the biggest access mode.  This results in failure of possible reg
coalescing and worse code.  This patch solves the problem.

PR rtl-optimization/108999

gcc/ChangeLog:

* lra-constraints.cc (process_alt_operands): Use operand modes for
clobbered regs instead of the biggest access mode.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr108999.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index dbfaf0485a5..c38566a7451 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3108,7 +3108,8 @@ process_alt_operands (int only_alternative)
 	  lra_assert (operand_reg[i] != NULL_RTX);
 	  clobbered_hard_regno = hard_regno[i];
 	  CLEAR_HARD_REG_SET (temp_set);
-	  add_to_hard_reg_set (&temp_set, biggest_mode[i], clobbered_hard_regno);
+	  add_to_hard_reg_set (&temp_set, GET_MODE (*curr_id->operand_loc[i]),
+			   clobbered_hard_regno);
 	  first_conflict_j = last_conflict_j = -1;
 	  for (j = 0; j < n_operands; j++)
 	if (j == i
diff --git a/gcc/testsuite/gcc.target/aarch64/pr108999.c b/gcc/testsuite/gcc.target/aarch64/pr108999.c
new file mode 100644
index 000..a34db85be83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr108999.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+sve" } */
+#include 
+
+void subreg_coalesce5 (
+svbool_t pg, int64_t* base, int n,
+int64_t *in1, int64_t *in2, int64_t*out
+)
+{
+svint64x2_t result = svld2_s64 (pg, base);
+
+for (int i = 0; i < n; i += 1) {
+svint64_t v18 = svld1_s64(pg, in1 + i);
+svint64_t v19 = svld1_s64(pg, in2 + i);
+result.__val[0] = svmad_s64_z(pg, v18, v19, result.__val[0]);
+result.__val[1] = svmad_s64_z(pg, v18, v19, result.__val[1]);
+}
+svst2_s64(pg, out, result);
+}
+
+/* { dg-final { scan-assembler-not {[ \t]*mov[ \t]*z[0-9]+\.d} } } */

Re: [PATCH] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.

2023-05-25 Thread Vladimir Makarov via Gcc-patches




On 5/17/23 02:57, liuhongt wrote:

r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost
calculation when the preferred register class are not known yet.
It regressed powerpc PR109610 and PR109858, it looks too aggressive to use
NO_REGS when mode can be allocated with GENERAL_REGS.
The patch takes a step back, still use GENERAL_REGS when
hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS.
Kewen confirmed the patch fixed PR109858, I vefiried it also fixed PR109610.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
No big performance impact for SPEC2017 on icelake server.
Ok for trunk?

gcc/ChangeLog:

* ira-costs.cc (scan_one_insn): Only use NO_REGS in cost
calculation when !hard_regno_mode_ok for GENERAL_REGS and
mode, otherwise still use GENERAL_REGS.


Thank you for the patch.  It looks good for me.  It is ok to commit it 
into the trunk.

[committed] LRA: patch fixing PR98777

2021-01-21 Thread Vladimir Makarov via Gcc-patches


The following patch fixes recently reported

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98777

The patch was successfully bootstrapped on x86-64.


[PR98777] LRA: Use preliminary created pseudo for in LRA elimination subpass

LRA did not extend ira_reg_equiv after generation of a pseudo in
eliminate_regs_in_insn which might results in LRA crash.  It is better not
to extend ira_reg_equiv but to use preliminary generated pseudo.  The
patch implements it.

gcc/ChangeLog:

	PR rtl-optimization/98777
	* lra-int.h (lra_pmode_pseudo): New extern.
	* lra.c (lra_pmode_pseudo): New global.
	(lra): Set it up.
	* lra-eliminations.c (eliminate_regs_in_insn): Use it.

gcc/testsuite/ChangeLog:

	PR rtl-optimization/98777
	* gcc.target/riscv/pr98777.c: New.

diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c
index 5b9717574ed..c97f9ca4c68 100644
--- a/gcc/lra-eliminations.c
+++ b/gcc/lra-eliminations.c
@@ -1059,7 +1059,7 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, bool first_p,
 	  && REGNO (reg1) < FIRST_PSEUDO_REGISTER
 	  && REGNO (reg2) >= FIRST_PSEUDO_REGISTER
 	  && GET_MODE (reg1) == Pmode
-	  && !have_addptr3_insn (gen_reg_rtx (Pmode), reg1,
+	  && !have_addptr3_insn (lra_pmode_pseudo, reg1,
  XEXP (XEXP (SET_SRC (set), 0), 1)))
 	{
 	  XEXP (XEXP (SET_SRC (set), 0), 0) = op2;
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 1b8f7b6ae61..4dadccc79f4 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -324,6 +324,7 @@ extern lra_copy_t lra_get_copy (int);
 extern int lra_new_regno_start;
 extern int lra_constraint_new_regno_start;
 extern int lra_bad_spill_regno_start;
+extern rtx lra_pmode_pseudo;
 extern bitmap_head lra_inheritance_pseudos;
 extern bitmap_head lra_split_regs;
 extern bitmap_head lra_subreg_reload_pseudos;
diff --git a/gcc/lra.c b/gcc/lra.c
index aa49de6f154..5a4b6638913 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -2192,6 +2192,9 @@ int lra_constraint_new_regno_start;
it is possible.  */
 int lra_bad_spill_regno_start;
 
+/* A pseudo of Pmode.  */
+rtx lra_pmode_pseudo;
+
 /* Inheritance pseudo regnos before the new spill pass.	 */
 bitmap_head lra_inheritance_pseudos;
 
@@ -2255,6 +2258,7 @@ lra (FILE *f)
 
   lra_dump_file = f;
   lra_asm_error_p = false;
+  lra_pmode_pseudo = gen_reg_rtx (Pmode);
   
   timevar_push (TV_LRA);
 
diff --git a/gcc/testsuite/gcc.target/riscv/pr98777.c b/gcc/testsuite/gcc.target/riscv/pr98777.c
new file mode 100644
index 000..ea2c2f9ca64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr98777.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrict-aliasing -O" } */
+
+typedef struct {
+  _Complex e;
+  _Complex f;
+  _Complex g;
+  _Complex h;
+  _Complex i;
+  _Complex j;
+  _Complex k;
+  _Complex l;
+  _Complex m;
+  _Complex n;
+  _Complex o;
+  _Complex p;
+} Scl16;
+
+Scl16 g1sScl16, g2sScl16, g3sScl16, g4sScl16, g5sScl16, g6sScl16, g7sScl16,
+g8sScl16, g9sScl16, g10sScl16, g11sScl16, g12sScl16, g13sScl16, g14sScl16,
+g15sScl16, g16sScl16;
+
+void testvaScl16();
+
+void
+testitScl16() {
+  testvaScl16(g10sScl16, g11sScl16, g12sScl16, g13sScl16, g14sScl16, g1sScl16,
+  g2sScl16, g3sScl16, g4sScl16, g5sScl16, g6sScl16, g7sScl16,
+  g8sScl16, g9sScl16, g10sScl16, g11sScl16, g12sScl16, g13sScl16,
+  g14sScl16, g15sScl16, g16sScl16);
+}

[committed] [PR97684] IRA: Recalculate pseudo classes if we added new pseduos since last calculation before updating equiv regs

2021-01-27 Thread Vladimir Makarov via Gcc-patches


The patch solves the following problem:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97684

The patch was successfully bootstrapped and tested on x86-64.

commit 238ea13cca75ad499f227b60a95c40174c6caf78
Author: Vladimir N. Makarov 
Date:   Wed Jan 27 14:53:28 2021 -0500

[PR97684] IRA: Recalculate pseudo classes if we added new pseduos since last calculation before updating equiv regs

update_equiv_regs can use reg classes of pseudos and they are set up in
register pressure sensitive scheduling and loop invariant motion and in
live range shrinking.  This info can become obsolete if we add new pseudos
since the last set up.  Recalculate it again if the new pseudos were
added.

gcc/ChangeLog:

PR rtl-optimization/97684
* ira.c (ira): Call ira_set_pseudo_classes before
update_equiv_regs when it is necessary.

gcc/testsuite/ChangeLog:

PR rtl-optimization/97684
* gcc.target/i386/pr97684.c: New.

diff --git a/gcc/ira.c b/gcc/ira.c
index f0bdbc8cf56..c32ecf814fd 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5566,6 +5566,15 @@ ira (FILE *f)
   if (warn_clobbered)
 generate_setjmp_warnings ();
 
+  /* update_equiv_regs can use reg classes of pseudos and they are set up in
+ register pressure sensitive scheduling and loop invariant motion and in
+ live range shrinking.  This info can become obsolete if we add new pseudos
+ since the last set up.  Recalculate it again if the new pseudos were
+ added.  */
+  if (resize_reg_info () && (flag_sched_pressure || flag_live_range_shrinkage
+			 || flag_ira_loop_pressure))
+ira_set_pseudo_classes (true, ira_dump_file);
+
   init_alias_analysis ();
   loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
   reg_equiv = XCNEWVEC (struct equivalence, max_reg_num ());
@@ -5610,9 +5619,6 @@ ira (FILE *f)
   regstat_recompute_for_max_regno ();
 }
 
-  if (resize_reg_info () && flag_ira_loop_pressure)
-ira_set_pseudo_classes (true, ira_dump_file);
-
   setup_reg_equiv ();
   grow_reg_equivs ();
   setup_reg_equiv_init ();
diff --git a/gcc/testsuite/gcc.target/i386/pr97684.c b/gcc/testsuite/gcc.target/i386/pr97684.c
new file mode 100644
index 000..983bf535ad8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97684.c
@@ -0,0 +1,24 @@
+/* PR rtl-optimization/97684 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -flive-range-shrinkage -fschedule-insns -fselective-scheduling -funroll-all-loops -fno-web" } */
+
+void
+c5 (double);
+
+void
+g4 (int *n4)
+{
+  double lp = 0.0;
+  int fn;
+
+  for (fn = 0; fn < 18; ++fn)
+{
+  int as;
+
+  as = __builtin_abs (n4[fn]);
+  if (as > lp)
+lp = as;
+}
+
+  c5 (lp);
+}

[committed] [PR97701] LRA: Don't narrow class only for REG or MEM.

2021-01-29 Thread Vladimir Makarov via Gcc-patches


The following patch solves

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97701

The patch was successfully bootstrapped and tested on x86-64, arm64, 
ppc64be.


This patch variant is only for trunk.  GCC-10 branch will have a bit 
different patch.



commit 449f17f23a7b8c4a340cc9342d68303ffa35cacc (HEAD -> master)
Author: Vladimir N. Makarov 
Date:   Fri Jan 29 11:51:44 2021 -0500

[PR97701] LRA: Don't narrow class only for REG or MEM.

Reload pseudos of ALL_REGS class did not narrow class from constraint
in insn (set (pseudo) (lo_sum ...)) because lo_sum is considered an
object (OBJECT_P) although the insn is not a classic move.  To permit
narrowing we are starting to use MEM_P and REG_P instead of OBJECT_P.

gcc/ChangeLog:

PR target/97701
* lra-constraints.c (in_class_p): Don't narrow class only for REG
or MEM.

gcc/testsuite/ChangeLog:

PR target/97701
* gcc.target/aarch64/pr97701.c: New.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index d716ee48e51..e739a466a0d 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -250,6 +250,7 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class,
 {
   enum reg_class rclass, common_class;
   machine_mode reg_mode;
+  rtx src;
   int class_size, hard_regno, nregs, i, j;
   int regno = REGNO (reg);
 
@@ -265,6 +266,7 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class,
 }
   reg_mode = GET_MODE (reg);
   rclass = get_reg_class (regno);
+  src = curr_insn_set != NULL ? SET_SRC (curr_insn_set) : NULL;
   if (regno < new_regno_start
   /* Do not allow the constraints for reload instructions to
 	 influence the classes of new pseudos.  These reloads are
@@ -273,12 +275,10 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class,
 	 where other reload pseudos are no longer allocatable.  */
   || (!allow_all_reload_class_changes_p
 	  && INSN_UID (curr_insn) >= new_insn_uid_start
-	  && curr_insn_set != NULL
-	  && ((OBJECT_P (SET_SRC (curr_insn_set))
-	   && ! CONSTANT_P (SET_SRC (curr_insn_set)))
-	  || (GET_CODE (SET_SRC (curr_insn_set)) == SUBREG
-		  && OBJECT_P (SUBREG_REG (SET_SRC (curr_insn_set)))
-		  && ! CONSTANT_P (SUBREG_REG (SET_SRC (curr_insn_set)))
+	  && src != NULL
+	  && ((REG_P (src) || MEM_P (src))
+	  || (GET_CODE (src) == SUBREG
+		  && (REG_P (SUBREG_REG (src)) || MEM_P (SUBREG_REG (src)))
 /* When we don't know what class will be used finally for reload
pseudos, we use ALL_REGS.  */
 return ((regno >= new_regno_start && rclass == ALL_REGS)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr97701.c b/gcc/testsuite/gcc.target/aarch64/pr97701.c
new file mode 100644
index 000..ede3540c48d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr97701.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+extern char a[][12][18][17][17];
+extern short b[][12][18][17][17];
+extern int c[][2][8][7];
+short *d;
+void e(signed f, int g, char h, char i, char j) {
+  for (int k = 648; k; k += f)
+for (short l; l < j; l += 9)
+  for (long m = f + 6LL; m < (h ? h : i); m += 2)
+for (int n = 0; n < 16; n += 3LL) {
+  for (int o = g; o; o++)
+a[k][l][m][n][o] = b[k][l][m][n][o] = d[k] ? 2 : 0;
+  c[k][l][m][0] = 0;
+}
+}

[committed] [PR97701] Modify test for trunk

2021-01-29 Thread Vladimir Makarov via Gcc-patches



commit 0202fa3d6359911a9e6d605d33d0ac669e21eaf3
Author: Vladimir N. Makarov 
Date:   Fri Jan 29 16:04:03 2021 -0500

[PR97701] Modify test for trunk

Original test was for gcc-10.  The modified one for trunk.

gcc/testsuite/ChangeLog:

PR target/97701
* gcc.target/aarch64/pr97701.c: Modify.

diff --git a/gcc/testsuite/gcc.target/aarch64/pr97701.c b/gcc/testsuite/gcc.target/aarch64/pr97701.c
index ede3540c48d..05a8137fcd4 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr97701.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr97701.c
@@ -11,7 +11,7 @@ void e(signed f, int g, char h, char i, char j) {
   for (long m = f + 6LL; m < (h ? h : i); m += 2)
 for (int n = 0; n < 16; n += 3LL) {
   for (int o = g; o; o++)
-a[k][l][m][n][o] = b[k][l][m][n][o] = d[k] ? 2 : 0;
+a[k][l][m][n][o] = b[k][l][m][n][o] = d[k] ? 2 : 1;
   c[k][l][m][0] = 0;
 }
 }

Re: [PATCH] PING lra: clear lra_insn_recog_data after simplifying a mem subreg

2021-02-02 Thread Vladimir Makarov via Gcc-patches




On 2021-01-28 5:40 a.m., Ilya Leoshkevich via Gcc-patches wrote:

Hello,

I would like to ping the following patch:

lra: clear lra_insn_recog_data after simplifying a mem subreg
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563428.html

Sorry, I missed your original email.  The patch is ok to submit into the 
trunk.


Thank you for addressing the issue.

[committed] [PR97701] LRA: Don't narrow class only for REG or MEM. A version modified for gcc 10 branch.

2021-02-04 Thread Vladimir Makarov via Gcc-patches

It seem that my recent patch for PR97701 for the trunk did not create 
new problems.  Therefore I am committing the following patch into 
gcc-10-branch.



commit 4918937f4c76b05eaa331f8d6f2571e2fddcc22b (HEAD -> releases/gcc-10)
Author: Vladimir N. Makarov 
Date:   Thu Feb 4 15:57:55 2021 -0500

[PR97701] LRA: Don't narrow class only for REG or MEM.  A version modified for gcc-10.

This is modified version of the patch committed for the trunk.  The
modification for gcc-10 includes lra-constraint.c code and the test.

gcc/ChangeLog:

PR target/97701
* lra-constraints.c (in_class_p): Don't narrow class only for REG
or MEM.

gcc/testsuite/ChangeLog:

PR target/97701
* gcc.target/aarch64/pr97701.c: New.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 088208b9c6e..bf04eb48ba6 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -244,6 +244,7 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class)
 {
   enum reg_class rclass, common_class;
   machine_mode reg_mode;
+  rtx src;
   int class_size, hard_regno, nregs, i, j;
   int regno = REGNO (reg);
 
@@ -259,6 +260,7 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class)
 }
   reg_mode = GET_MODE (reg);
   rclass = get_reg_class (regno);
+  src = curr_insn_set != NULL ? SET_SRC (curr_insn_set) : NULL;
   if (regno < new_regno_start
   /* Do not allow the constraints for reload instructions to
 	 influence the classes of new pseudos.  These reloads are
@@ -266,12 +268,10 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class)
 	 reload pseudos for one alternative may lead to situations
 	 where other reload pseudos are no longer allocatable.  */
   || (INSN_UID (curr_insn) >= new_insn_uid_start
-	  && curr_insn_set != NULL
-	  && ((OBJECT_P (SET_SRC (curr_insn_set))
-	   && ! CONSTANT_P (SET_SRC (curr_insn_set)))
-	  || (GET_CODE (SET_SRC (curr_insn_set)) == SUBREG
-		  && OBJECT_P (SUBREG_REG (SET_SRC (curr_insn_set)))
-		  && ! CONSTANT_P (SUBREG_REG (SET_SRC (curr_insn_set)))
+	  && src != NULL
+	  && ((REG_P (src) || MEM_P (src))
+	  || (GET_CODE (src) == SUBREG
+		  && (REG_P (SUBREG_REG (src)) || MEM_P (SUBREG_REG (src)))
 /* When we don't know what class will be used finally for reload
pseudos, we use ALL_REGS.  */
 return ((regno >= new_regno_start && rclass == ALL_REGS)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr97701.c b/gcc/testsuite/gcc.target/aarch64/pr97701.c
new file mode 100644
index 000..ede3540c48d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr97701.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+extern char a[][12][18][17][17];
+extern short b[][12][18][17][17];
+extern int c[][2][8][7];
+short *d;
+void e(signed f, int g, char h, char i, char j) {
+  for (int k = 648; k; k += f)
+for (short l; l < j; l += 9)
+  for (long m = f + 6LL; m < (h ? h : i); m += 2)
+for (int n = 0; n < 16; n += 3LL) {
+  for (int o = g; o; o++)
+a[k][l][m][n][o] = b[k][l][m][n][o] = d[k] ? 2 : 0;
+  c[k][l][m][0] = 0;
+}
+}

[PATCH] PR98096: inline-asm: Take inout operands into account for access to labels by names.

2021-02-04 Thread Vladimir Makarov via Gcc-patches


The following patch solves

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98096

The patch is for a new GCC extension -- asm goto with output reloads.

GCC splits inout operands (with constraint "+") into output and new 
matched input operands during gimplfication.  Addressing input or output 
operands by name or number is not a problem as the new input operands 
are added at the end of existing input operands.


However it became a problem for labels in asm goto with output reloads.  
Addressing labels should take into account the added matched input 
operands.  The patch solves the problem.


The patch was successfully bootstrapped and tested on x86-64.

Is it ok to commit into the trunk?



diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8daa1c67974..71b35252b84 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10656,16 +10656,23 @@ should use @samp{+} constraint modifier meaning that the operand is
 input and output one.  With this modifier you will have the correct
 values on all possible paths from the @code{asm goto}.
 
-To reference a label in the assembler template,
-prefix it with @samp{%l} (lowercase @samp{L}) followed 
-by its (zero-based) position in @var{GotoLabels} plus the number of input 
-operands.  For example, if the @code{asm} has three inputs and references two 
-labels, refer to the first label as @samp{%l3} and the second as @samp{%l4}).
-
-Alternately, you can reference labels using the actual C label name enclosed
-in brackets.  For example, to reference a label named @code{carry}, you can
-use @samp{%l[carry]}.  The label must still be listed in the @var{GotoLabels}
-section when using this approach.
+To reference a label in the assembler template, prefix it with
+@samp{%l} (lowercase @samp{L}) followed by its (zero-based) position
+in @var{GotoLabels} plus the number of input and output operands.
+Output operand with constraint modifier @samp{+} is counted as two
+operands because it is considered as one output and one input operand.
+For example, if the @code{asm} has three inputs, one output operand
+with constraint modifier @samp{+} and one output operand with
+constraint modifier @samp{=} and references two labels, refer to the
+first label as @samp{%l6} and the second as @samp{%l7}).
+
+Alternately, you can reference labels using the actual C label name
+enclosed in brackets.  For example, to reference a label named
+@code{carry}, you can use @samp{%l[carry]}.  The label must still be
+listed in the @var{GotoLabels} section when using this approach.  It
+is better to use the named references for labels as in this case you
+can avoid counting input and output operands and special treatment of
+output operands with constraint modifier @samp{+}.
 
 Here is an example of @code{asm goto} for i386:
 
diff --git a/gcc/stmt.c b/gcc/stmt.c
index bd836d8f65f..f52ffaf8e75 100644
--- a/gcc/stmt.c
+++ b/gcc/stmt.c
@@ -611,7 +611,7 @@ static char *
 resolve_operand_name_1 (char *p, tree outputs, tree inputs, tree labels)
 {
   char *q;
-  int op;
+  int op, op_inout;
   tree t;
 
   /* Collect the operand name.  */
@@ -624,11 +624,14 @@ resolve_operand_name_1 (char *p, tree outputs, tree inputs, tree labels)
   *q = '\0';
 
   /* Resolve the name to a number.  */
-  for (op = 0, t = outputs; t ; t = TREE_CHAIN (t), op++)
+  for (op_inout = op = 0, t = outputs; t ; t = TREE_CHAIN (t), op++)
 {
   tree name = TREE_PURPOSE (TREE_PURPOSE (t));
   if (name && strcmp (TREE_STRING_POINTER (name), p) == 0)
 	goto found;
+  tree constraint = TREE_VALUE (TREE_PURPOSE (t));
+  if (constraint && strchr (TREE_STRING_POINTER (constraint), '+') != NULL)
+op_inout++;
 }
   for (t = inputs; t ; t = TREE_CHAIN (t), op++)
 {
@@ -636,6 +639,7 @@ resolve_operand_name_1 (char *p, tree outputs, tree inputs, tree labels)
   if (name && strcmp (TREE_STRING_POINTER (name), p) == 0)
 	goto found;
 }
+  op += op_inout;
   for (t = labels; t ; t = TREE_CHAIN (t), op++)
 {
   tree name = TREE_PURPOSE (t);
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr98096.c b/gcc/testsuite/gcc.c-torture/compile/pr98096.c
new file mode 100644
index 000..95ad55c81aa
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr98096.c
@@ -0,0 +1,10 @@
+/* Test for correct naming of label operands in asm goto in case of presence of
+   input/output operands. */
+/* { dg-do compile } */
+int i, j;
+int f(void) {
+  asm goto ("# %0 %2" : "+r" (i) ::: jmp);
+  i += 2;
+  asm goto ("# %0 %1 %l[jmp]" : "+r" (i), "+r" (j) ::: jmp);
+ jmp: return i;
+}

[committed] [PR96264] LRA: Check output insn hard regs when updating available rematerialization insns

2021-02-18 Thread Vladimir Makarov via Gcc-patches


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96264

The patch was successfully bootstrapped and tested on ppc64le.


commit e0d3041c9caece8b48be016fa515747eb2746d35
Author: Vladimir Makarov 
Date:   Thu Feb 18 22:40:54 2021 +

[PR96264] LRA: Check output insn hard regs when updating available rematerialization after the insn

 Insn for rematerialization can contain a clobbered hard register.  We
can not move such insn through another insn setting up the same hard
register.  The patch adds such check.

gcc/ChangeLog:

PR rtl-optimization/96264
* lra-remat.c (reg_overlap_for_remat_p): Check also output insn
	hard regs.

gcc/testsuite/ChangeLog:

PR rtl-optimization/96264
* gcc.target/powerpc/pr96264.c: New.

diff --git a/gcc/lra-remat.c b/gcc/lra-remat.c
index 8bd9ffa..d983731 100644
--- a/gcc/lra-remat.c
+++ b/gcc/lra-remat.c
@@ -651,7 +651,11 @@ calculate_local_reg_remat_bb_data (void)
 
 
 
-/* Return true if REG overlaps an input operand of INSN.  */
+/* Return true if REG overlaps an input operand or non-input hard register of
+   INSN.  Basically the function returns false if we can move rematerialization
+   candidate INSN through another insn with output REG or dead input REG (we
+   consider it to avoid extending reg live range) with possible output pseudo
+   renaming in INSN.  */
 static bool
 reg_overlap_for_remat_p (lra_insn_reg *reg, rtx_insn *insn)
 {
@@ -675,10 +679,11 @@ reg_overlap_for_remat_p (lra_insn_reg *reg, rtx_insn *insn)
 	 reg2 != NULL;
 	 reg2 = reg2->next)
   {
-	if (reg2->type != OP_IN)
-	  continue;
-	unsigned regno2 = reg2->regno;
 	int nregs2;
+	unsigned regno2 = reg2->regno;
+
+	if (reg2->type != OP_IN && regno2 >= FIRST_PSEUDO_REGISTER)
+	  continue;
 
 	if (regno2 >= FIRST_PSEUDO_REGISTER && reg_renumber[regno2] >= 0)
 	  regno2 = reg_renumber[regno2];
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96264.c b/gcc/testsuite/gcc.target/powerpc/pr96264.c
new file mode 100644
index 000..e89979b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96264.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-Os -fno-forward-propagate -fschedule-insns -fno-tree-ter -Wno-psabi" } */
+/* { dg-require-effective-target p8vector_hw } */
+
+typedef unsigned char __attribute__ ((__vector_size__ (64))) v512u8;
+typedef unsigned short u16;
+typedef unsigned short __attribute__ ((__vector_size__ (64))) v512u16;
+typedef unsigned __int128 __attribute__ ((__vector_size__ (64))) v512u128;
+
+v512u16 d;
+v512u128 f;
+
+v512u8
+foo (u16 e)
+{
+  v512u128 g = f - -e;
+  d = (5 / (d + 1)) < e;
+  return (v512u8) g;
+}
+
+int
+main (void)
+{
+  v512u8 x = foo (2);
+  for (unsigned i = 0; i < sizeof (x); i++)
+if (x[i] != (i % 16 ? 0 : 2))
+  __builtin_abort ();
+}

Re: [PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered

2021-02-19 Thread Vladimir Makarov via Gcc-patches




On 2021-02-19 5:53 a.m., Andre Vieira (lists) wrote:

Hi,

This patch makes sure that allocno copies are not created for 
unordered modes. The testcases in the PR highlighted a case where an 
allocno copy was being created for:

(insn 121 120 123 11 (parallel [
    (set (reg:VNx2QI 217)
    (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 
]) 0)))

    (clobber (scratch:VNx16BI))
    ]) 4750 {*vec_duplicatevnx2qi_reg}
 (expr_list:REG_DEAD (reg:SI 93 [ _2 ])
    (nil)))

As the compiler detected that the vec_duplicate_reg pattern 
allowed the input and output operand to be of the same register class, 
it tried to create an allocno copy for these two operands, stripping 
subregs in the process. However, this meant that the copy was between 
VNx2QI and SI, which have unordered mode precisions.


So at compile time we do not know which of the two modes is smaller 
which is a requirement when updating allocno copy costs.


Regression tested on aarch64-linux-gnu.

Is this OK for trunk (and after a week backport to gcc-10) ?

OK.  Yes, it is wise to wait a bit and see how the patch behaves on the 
trunk before submitting it to gcc-10 branch.  Sometimes such changes can 
have quite unexpected consequences.  But I guess not in this is case.


Thank you for working on the issue.


gcc/ChangeLog:
2021-02-19  Andre Vieira  

    PR rtl-optimization/98791
    * ira-conflicts.c (process_regs_for_copy): Don't create 
allocno copies for unordered modes.


gcc/testsuite/ChangeLog:
2021-02-19  Andre Vieira  

    PR rtl-optimization/98791
    * gcc.target/aarch64/sve/pr98791.c: New test.

[committed] [PR99123] inline-asm: Don't use decompose_mem_address to find used hard regs

2021-02-24 Thread Vladimir Makarov via Gcc-patches


The following patch solves

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99123

The patch was successfully bootstrapped and tested on x86-64


commit b6680c2084521d2612c3a08aa01b274078c4f3e3
Author: Vladimir N. Makarov 
Date:   Wed Feb 24 13:54:10 2021 -0500

[PR99123] inline-asm: Don't use decompose_mem_address to find used hard regs

Inline asm in question has empty constraint which means anything
including memory with invalid address.  To check used hard regs we
used decompose_mem_address which assumes memory with valid address.
The patch implements the same semantics without assuming valid
addresses.

gcc/ChangeLog:

PR inline-asm/99123
* lra-constraints.c (uses_hard_regs_p): Don't use decompose_mem_address.

gcc/testsuite/ChangeLog:

PR inline-asm/99123
* gcc.target/i386/pr99123.c: New.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 6a5aa41ed55..51acf7f0701 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1904,16 +1904,6 @@ uses_hard_regs_p (rtx x, HARD_REG_SET set)
   return (x_hard_regno >= 0
 	  && overlaps_hard_reg_set_p (set, mode, x_hard_regno));
 }
-  if (MEM_P (x))
-{
-  struct address_info ad;
-
-  decompose_mem_address (&ad, x);
-  if (ad.base_term != NULL && uses_hard_regs_p (*ad.base_term, set))
-	return true;
-  if (ad.index_term != NULL && uses_hard_regs_p (*ad.index_term, set))
-	return true;
-}
   fmt = GET_RTX_FORMAT (code);
   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr99123.c b/gcc/testsuite/gcc.target/i386/pr99123.c
new file mode 100644
index 000..4f32547d5b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr99123.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+static inline void *
+baz (void *s, unsigned long c, unsigned int count)
+{
+  int d0, d1;
+  __asm__ __volatile__ (""
+: "=&c" (d0), "=&D" (d1)
+:"a" (c), "q" (count), "0" (count / 4), "" ((long) s)   /// "1"
+:"memory");
+  return s;
+}
+
+struct A
+{
+  unsigned long *a;
+};
+
+inline static void *
+bar (struct A *x, int y)
+{
+  char *ptr;
+
+  ptr = (void *) x->a[y >> 12];
+  ptr += y % (1UL << 12);
+  return (void *) ptr;
+}
+
+int
+foo (struct A *x, unsigned int *y, int z, int u)
+{
+  int a, b, c, d, e;
+
+  z += *y;
+  c = z + u;
+  a = (z >> 12) + 1;
+  do
+{
+  b = (a << 12);
+  d = b - z;
+  e = c - z;
+  if (e < d)
+d = e;
+  baz (bar (x, z), 0, d);
+  z = b;
+  a++;
+}
+  while (z < c);
+  return 0;
+}

[committed] [PR99233] testsuite: Run test pr96264.c only for little endian target

2021-02-25 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99233


commit 557a0d3b1b389c46d5a8aa24e27abed4c401d17e
Author: Vladimir N. Makarov 
Date:   Thu Feb 25 11:20:32 2021 -0500

[PR99233] tesstsuite: Run test pr96264.c only for little endian

The test in question is assumed to work only for little endian target.

gcc/testsuite/ChangeLog:

PR testsuite/99233
* gcc.target/powerpc/pr96264.c: Run it only for powerpc64le.

diff --git a/gcc/testsuite/gcc.target/powerpc/pr96264.c b/gcc/testsuite/gcc.target/powerpc/pr96264.c
index e89979b8998..9f7d885daf2 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr96264.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr96264.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target { powerpc64le-*-* } } } */
 /* { dg-options "-Os -fno-forward-propagate -fschedule-insns -fno-tree-ter -Wno-psabi" } */
 /* { dg-require-effective-target p8vector_hw } */

[commited] [PR99378] LRA: Skip decomposing address for asm insn operand with unknown constraint

2021-03-05 Thread Vladimir Makarov via Gcc-patches


  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99378

  The patch was successfully bootstrapped and tested on x86-64.


commit e786c7547eda4edd90797f6cae0f5e6405d64773 (HEAD -> master)
Author: Vladimir N. Makarov 
Date:   Fri Mar 5 11:41:25 2021 -0500

[PR99378] LRA: Skip decomposing address for asm insn operand with unknown constraint.

  Function get_constraint_type returns CT__UNKNOWN for empty constraint
and CT_FIXED_FORM for "X".  So process_address_1 skipped
decompose_mem_address only for "X" constraint.  To do the same for empty
constraint, skip decompose_mem_address for CT__UNKNOWN.

gcc/ChangeLog:

PR target/99378
* lra-constraints.c (process_address_1): Skip decomposing address
for asm insn operand with unknown constraint.

gcc/testsuite/ChangeLog:

PR target/99378
* gcc.target/i386/pr99123-2.c: New.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 51acf7f0701..9253690561a 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -3450,8 +3450,9 @@ process_address_1 (int nop, bool check_only_p,
  i.e. bcst_mem_operand in i386 backend.  */
   else if (MEM_P (mem)
 	   && !(INSN_CODE (curr_insn) < 0
-		&& get_constraint_type (cn) == CT_FIXED_FORM
-	&& constraint_satisfied_p (op, cn)))
+		&& (cn == CONSTRAINT__UNKNOWN
+		|| (get_constraint_type (cn) == CT_FIXED_FORM
+			&& constraint_satisfied_p (op, cn)
 decompose_mem_address (&ad, mem);
   else if (GET_CODE (op) == SUBREG
 	   && MEM_P (SUBREG_REG (op)))
diff --git a/gcc/testsuite/gcc.target/i386/pr99123-2.c b/gcc/testsuite/gcc.target/i386/pr99123-2.c
new file mode 100644
index 000..def4eae3c9d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr99123-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funroll-loops" } */
+
+static inline void *
+baz (void *s, unsigned long c, unsigned int count)
+{
+  int d0, d1;
+  __asm__ __volatile__ (""
+: "=&c" (d0), "=&D" (d1)
+:"a" (c), "q" (count), "0" (count / 4), "" ((long) s)   /// "1"
+:"memory");
+  return s;
+}
+
+struct A
+{
+  unsigned long *a;
+};
+
+inline static void *
+bar (struct A *x, int y)
+{
+  char *ptr;
+
+  ptr = (void *) x->a[y >> 12];
+  ptr += y % (1UL << 12);
+  return (void *) ptr;
+}
+
+int
+foo (struct A *x, unsigned int *y, int z, int u)
+{
+  int a, b, c, d, e;
+
+  z += *y;
+  c = z + u;
+  a = (z >> 12) + 1;
+  do
+{
+  b = (a << 12);
+  d = b - z;
+  e = c - z;
+  if (e < d)
+d = e;
+  baz (bar (x, z), 0, d);
+  z = b;
+  a++;
+}
+  while (z < c);
+  return 0;
+}

[committed] [PR99422] LRA: Skip modifiers when processing memory address.

2021-03-08 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99422

The patch was successfully bootstrapped and tested on ppc64le, x86-64 
and arm64.


commit 04b4828c6dd215385fde6964a5e13da8a01a78ba (HEAD -> master)
Author: Vladimir N. Makarov 
Date:   Mon Mar 8 09:24:57 2021 -0500

[PR99422] LRA: Skip modifiers when processing memory address.

  Function process_address_1 can wrongly look at constraint modifiers
instead of the 1st constraint itself.  The patch solves the problem.

gcc/ChangeLog:

PR target/99422
* lra-constraints.c (skip_contraint_modifiers): New function.
(process_address_1): Use it before lookup_constraint call.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 9253690561a..76e3ff7efe6 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -3392,6 +3392,21 @@ equiv_address_substitution (struct address_info *ad)
   return change_p;
 }
 
+/* Skip all modifiers and whitespaces in constraint STR and return the
+   result.  */
+static const char *
+skip_contraint_modifiers (const char *str)
+{
+  for (;;str++)
+switch (*str)
+  {
+  case '+' : case '&' : case '=': case '*': case ' ': case '\t':
+  case '$': case '^' : case '%': case '?': case '!':
+	break;
+  default: return str;
+  }
+}
+
 /* Major function to make reloads for an address in operand NOP or
check its correctness (If CHECK_ONLY_P is true). The supported
cases are:
@@ -3426,8 +3441,8 @@ process_address_1 (int nop, bool check_only_p,
   HOST_WIDE_INT scale;
   rtx op = *curr_id->operand_loc[nop];
   rtx mem = extract_mem_from_operand (op);
-  const char *constraint = curr_static_id->operand[nop].constraint;
-  enum constraint_num cn = lookup_constraint (constraint);
+  const char *constraint;
+  enum constraint_num cn;
   bool change_p = false;
 
   if (MEM_P (mem)
@@ -3435,6 +3450,9 @@ process_address_1 (int nop, bool check_only_p,
   && GET_CODE (XEXP (mem, 0)) == SCRATCH)
 return false;
 
+  constraint
+= skip_contraint_modifiers (curr_static_id->operand[nop].constraint);
+  cn = lookup_constraint (constraint);
   if (insn_extra_address_constraint (cn)
   /* When we find an asm operand with an address constraint that
 	 doesn't satisfy address_operand to begin with, we clear

Re: PING^3 [GCC 10] [PATCH] IRA: Don't make a global register eliminable

2020-09-29 Thread Vladimir Makarov via Gcc-patches




On 2020-09-29 8:38 a.m., H.J. Lu wrote:

On Fri, Sep 25, 2020 at 6:46 AM H.J. Lu  wrote:

OK for GCC 10 branch?

Thanks.

PING:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554268.html


PING.


PING.

Sorry, I thought Jeff Law already approved this.  In any case the patch 
is also ok for me for the trunk and gcc-10 branch.

[PUSHED] Patch to fix a LRA ICE [PR 97313]

2020-10-09 Thread Vladimir Makarov via Gcc-patches


The following patch has been committed into the main line.  The patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97313

The patch was successfully bootstrapped and tested on x86-64.


gcc/ChangeLog:

2020-10-09  Vladimir Makarov  

	PR rtl-optimization/97313
	* lra-constraints.c (match_reload): Don't keep strict_low_part in
	reloads for non-registers.

gcc/testsuite/ChangeLog:

2020-10-09  Vladimir Makarov  

	PR rtl-optimization/97313
	* gcc.target/i386/pr97313.c: New.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 301c912cb21..f761d7dfe3c 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1132,8 +1132,13 @@ match_reload (signed char out, signed char *ins, signed char *outs,
   narrow_reload_pseudo_class (out_rtx, goal_class);
   if (find_reg_note (curr_insn, REG_UNUSED, out_rtx) == NULL_RTX)
 {
+  reg = SUBREG_P (out_rtx) ? SUBREG_REG (out_rtx) : out_rtx;
   start_sequence ();
-  if (out >= 0 && curr_static_id->operand[out].strict_low)
+  /* If we had strict_low_part, use it also in reload to keep other
+	 parts unchanged but do it only for regs as strict_low_part
+	 has no sense for memory and probably there is no insn pattern
+	 to match the reload insn in memory case.  */
+  if (out >= 0 && curr_static_id->operand[out].strict_low && REG_P (reg))
 	out_rtx = gen_rtx_STRICT_LOW_PART (VOIDmode, out_rtx);
   lra_emit_move (out_rtx, copy_rtx (new_out_reg));
   emit_insn (*after);
diff --git a/gcc/testsuite/gcc.target/i386/pr97313.c b/gcc/testsuite/gcc.target/i386/pr97313.c
new file mode 100644
index 000..ef93cf1cca8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97313.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fPIE" } */
+
+typedef struct {
+  int unspecified : 1;
+  int secure : 1;
+} MemTxAttrs;
+
+enum { MSCAllowNonSecure } tz_msc_read_pdata;
+
+int tz_msc_read_s_0;
+int tz_msc_check();
+int address_space_ldl_le();
+
+void tz_msc_read(MemTxAttrs attrs) {
+  int as = tz_msc_read_s_0;
+  long long data;
+  switch (tz_msc_check()) {
+  case MSCAllowNonSecure:
+attrs.secure = attrs.unspecified = 0;
+data = address_space_ldl_le(as, attrs);
+  }
+  tz_msc_read_pdata = data;
+}

[committed] [PR104637] LRA: Split hard regs as many as possible on one subpass

2022-02-28 Thread Vladimir Makarov via Gcc-patches


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104637

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64.


commit ec1b9ba2d7913fe5e9deacc8e55e7539262f5124
Author: Vladimir N. Makarov 
Date:   Mon Feb 28 16:43:50 2022 -0500

[PR104637] LRA: Split hard regs as many as possible on one subpass

LRA hard reg split subpass is a small subpass used as the last
resort for LRA when it can not assign a hard reg to a reload
pseudo by other ways (e.g. by spilling non-reload pseudos).  For
simplicity the subpass works on one split base (as each split
changes pseudo live range info).  In this case it results in
reaching maximal possible number of subpasses.  The patch
implements as many non-overlapping hard reg splits
splits as possible on each subpass.

gcc/ChangeLog:

PR rtl-optimization/104637
* lra-assigns.cc (lra_split_hard_reg_for): Split hard regs as many
as possible on one subpass.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104637
* gcc.target/i386/pr104637.c: New.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index c1d40ea2a14..ab3a6e6e9cc 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1774,8 +1774,8 @@ lra_split_hard_reg_for (void)
  iterations.  Either it's an asm and something is wrong with the
  constraints, or we have run out of spill registers; error out in
  either case.  */
-  bool asm_p = false;
-  bitmap_head failed_reload_insns, failed_reload_pseudos;
+  bool asm_p = false, spill_p = false;
+  bitmap_head failed_reload_insns, failed_reload_pseudos, over_split_insns;
   
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
@@ -1786,6 +1786,7 @@ lra_split_hard_reg_for (void)
   bitmap_ior (&non_reload_pseudos, &lra_inheritance_pseudos, &lra_split_regs);
   bitmap_ior_into (&non_reload_pseudos, &lra_subreg_reload_pseudos);
   bitmap_ior_into (&non_reload_pseudos, &lra_optional_reload_pseudos);
+  bitmap_initialize (&over_split_insns, ®_obstack);
   for (i = lra_constraint_new_regno_start; i < max_regno; i++)
 if (reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
 	&& (rclass = lra_get_allocno_class (i)) != NO_REGS
@@ -1793,14 +1794,41 @@ lra_split_hard_reg_for (void)
   {
 	if (! find_reload_regno_insns (i, first, last))
 	  continue;
-	if (BLOCK_FOR_INSN (first) == BLOCK_FOR_INSN (last)
-	&& spill_hard_reg_in_range (i, rclass, first, last))
+	if (BLOCK_FOR_INSN (first) == BLOCK_FOR_INSN (last))
 	  {
-	bitmap_clear (&failed_reload_pseudos);
-	return true;
+	/* Check that we are not trying to split over the same insn
+	   requiring reloads to avoid splitting the same hard reg twice or
+	   more.  If we need several hard regs splitting over the same insn
+	   it can be finished on the next iterations.
+
+	   The following loop iteration number is small as we split hard
+	   reg in a very small range.  */
+	for (insn = first;
+		 insn != NEXT_INSN (last);
+		 insn = NEXT_INSN (insn))
+	  if (bitmap_bit_p (&over_split_insns, INSN_UID (insn)))
+		break;
+	if (insn != NEXT_INSN (last)
+		|| !spill_hard_reg_in_range (i, rclass, first, last))
+	  {
+		bitmap_set_bit (&failed_reload_pseudos, i);
+	  }
+	else
+	  {
+		for (insn = first;
+		 insn != NEXT_INSN (last);
+		 insn = NEXT_INSN (insn))
+		  bitmap_set_bit (&over_split_insns, INSN_UID (insn));
+		spill_p = true;
+	  }
 	  }
-	bitmap_set_bit (&failed_reload_pseudos, i);
   }
+  bitmap_clear (&over_split_insns);
+  if (spill_p)
+{
+  bitmap_clear (&failed_reload_pseudos);
+  return true;
+}
   bitmap_clear (&non_reload_pseudos);
   bitmap_initialize (&failed_reload_insns, ®_obstack);
   EXECUTE_IF_SET_IN_BITMAP (&failed_reload_pseudos, 0, u, bi)
diff --git a/gcc/testsuite/gcc.target/i386/pr104637.c b/gcc/testsuite/gcc.target/i386/pr104637.c
new file mode 100644
index 000..65e8635d55e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104637.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -fno-forward-propagate -mavx" } */
+
+typedef short __attribute__((__vector_size__ (64))) U;
+typedef unsigned long long __attribute__((__vector_size__ (32))) V;
+typedef long double __attribute__((__vector_size__ (64))) F;
+
+int i;
+U u;
+F f;
+
+void
+foo (char a, char b, _Complex char c, V v)
+{
+  u = (U) { u[0] / 0, u[1] / 0, u[2] / 0, u[3] / 0, u[4] / 0, u[5] / 0, u[6] / 0, u[7] / 0,
+	u[8] / 0, u[0] / 0, u[9] / 0, u[10] / 0, u[11] / 0, u[12] / 0, u[13] / 0, u[14] / 0, u[15] / 0,
+	u[16] / 0, u[17] / 0, u[18] / 0, u[19] / 0, u[20] / 0, u[21] / 0, u[22] / 0, u[23] / 0,
+	u[24] / 0, u[25] / 0, u[26] / 0, u[27] / 0, u[28] / 0, u[29] / 0, u[30] / 0, u[31] / 0 };
+  c += i;
+  f = (F) { v[0], v[1], v[2], v[3] };
+  i = (char) (__imag__ c + i);
+}

Re: [PATCH] rtl-optimization/104686 - speedup IRA allocno conflict test

2022-03-02 Thread Vladimir Makarov via Gcc-patches




On 2022-03-02 03:58, Richard Biener wrote:

In this PR allocnos_conflict_p takes 90% of the compile-time via
the calls from update_conflict_hard_regno_costs.  This is due to
the high number of conflicts recorded in the dense bitvector
representation.  Fortunately we can take advantage of the bitvector
representation here and turn the O(n) conflict test into an O(1) one,
greatly speeding up the compile of the testcase from 39s to just 4s
(93% IRA time to 26% IRA time).

While for the testcase in question the first allocno is almost always
the nice one the patch tries a more systematic approach to finding
the allocno to iterate object conflicts over.  That does reduce
the actual number of compares for the testcase but it doesn't make
a measurable difference wall-clock wise.  That's not guaranteed
though I think so I've kept this systematic way of choosing the
cheapest allocno.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?


Yes.

Richard, thank you again for working on this issue.


2022-03-02  Richard Biener  

PR rtl-optimization/104686
* ira-color.cc (object_conflicts_with_allocno_p): New function
using a bitvector test instead of iterating when possible.
(allocnos_conflict_p): Choose the best allocno to iterate over
object conflicts.
(update_conflict_hard_regno_costs): Do allocnos_conflict_p test
last.
other_allocno),

Re: [PR103302] skip multi-word pre-move clobber during lra

2022-03-02 Thread Vladimir Makarov via Gcc-patches




On 2022-03-02 07:25, Alexandre Oliva wrote:

Regstrapped on x86_64-linux-gnu, also tested on various riscv and arm
targets (with gcc-11).  Ok to install?


Yes.

Thank you on working this, Alex.


for  gcc/ChangeLog
* lra-constraints.cc (undo_optional_reloads): Recognize and
drop insns of multi-word move sequences, tolerate removal
iteration on an already-removed clobber, and refuse to
substitute original pseudos into clobbers.

[committed] [PR103074] LRA: Check new conflicts when splitting hard reg live range

2022-03-10 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103074

The patch was successfully bootstrapped and tested on x86-64 and aarch64.

commit d8e5fff6b74b82c2ac3254be9a1f0fb6b30dbdbf
Author: Vladimir N. Makarov 
Date:   Thu Mar 10 16:16:00 2022 -0500

[PR103074] LRA: Check new conflicts when splitting hard reg live range.

Splitting hard register live range can create (artificial)
conflict of the hard register with another pseudo because of simplified
conflict calculation in LRA.  We should check such conflict on the next
assignment sub-pass and spill and reassign the pseudo if necessary.
The patch implements this.

gcc/ChangeLog:

PR target/103074
* lra-constraints.cc (split_reg): Set up
check_and_force_assignment_correctness_p when splitting hard
register live range.

gcc/testsuite/ChangeLog:

PR target/103074
* gcc.target/i386/pr103074.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 080b44ad87a..d92ab76908c 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5994,12 +5994,17 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn,
 			 before_p ? NULL : save,
 			 call_save_p
 			 ?  "Add save<-reg" : "Add split<-reg");
-  if (nregs > 1)
+  if (nregs > 1 || original_regno < FIRST_PSEUDO_REGISTER)
 /* If we are trying to split multi-register.  We should check
conflicts on the next assignment sub-pass.  IRA can allocate on
sub-register levels, LRA do this on pseudos level right now and
this discrepancy may create allocation conflicts after
-   splitting.  */
+   splitting.
+
+   If we are trying to split hard register we should also check conflicts
+   as such splitting can create artificial conflict of the hard register
+   with another pseudo because of simplified conflict calculation in
+   LRA.  */
 check_and_force_assignment_correctness_p = true;
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
diff --git a/gcc/testsuite/gcc.target/i386/pr103074.c b/gcc/testsuite/gcc.target/i386/pr103074.c
new file mode 100644
index 000..276ad82a1de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103074.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=bonnell -Os -fPIC -fschedule-insns -w" } */
+
+void
+serialize_collection (char *ptr, int a, int need_owner)
+{
+  if (need_owner)
+__builtin_sprintf(ptr, "%d:%d", 0, a);
+  else
+{
+  static char buff[32];
+
+  __builtin_sprintf(buff, "%d:%d", a >> 32, a);
+  __builtin_sprintf(ptr, "%d:%d:\"%s\"", 0, 0, buff);
+}
+}

Re: [PATCH] lra: Fix up debug_p handling in lra_substitute_pseudo [PR104778]

2022-03-14 Thread Vladimir Makarov via Gcc-patches




On 2022-03-12 14:37, Jakub Jelinek wrote:

Hi!

The following testcase ICEs on powerpc-linux, because lra_substitute_pseudo
substitutes (const_int 1) into a subreg operand.  First a subreg of subreg
of a reg appears in a debug insn (which surely is invalid outside of
debug insns, but in debug insns we allow even what is normally invalid in
RTL like subregs which the target doesn't like, because either dwarf2out
is able to handle it, or we just throw away the location expression,
making some var .

lra_substitute_pseudo already has some code to deal with specifically
SUBREG of REG with the REG being substituted for VOIDmode constant,
but that doesn't cover this case, so the following patch extends
lra_substitute_pseudo for debug_p mode to treat stuff like e.g.
combiner's subst function to ensure we don't lose mode which is essential
for the IL.

Bootstrapped/regtested on {powerpc64{,le},x86_64,i686}-linux, ok for trunk?



Sure.  Thank you for working on this PR, Jakub.



2022-03-12  Jakub Jelinek  

PR debug/104778
* lra.cc (lra_substitute_pseudo): For debug_p mode, simplify
SUBREG, ZERO_EXTEND, SIGN_EXTEND, FLOAT or UNSIGNED_FLOAT if recursive
call simplified the first operand into VOIDmode constant.

* gcc.target/powerpc/pr104778.c: New test.

[committed] [PR104961] LRA: split hard reg for reload pseudo with clobber

2022-03-18 Thread Vladimir Makarov via Gcc-patches


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104961

The patch was successfully bootstrapped and tested on x86-64.

commit 4e2291789a8b31c550271405782356e8aeddcee3
Author: Vladimir N. Makarov 
Date:   Fri Mar 18 14:23:40 2022 -0400

[PR104961] LRA: split hard reg for reload pseudo with clobber.

Splitting hard register live range did not work for subreg of a
multi-reg reload pseudo.  Reload insns for such pseudo contain clobber
of the pseudo and splitting did not take this into account.  The patch
fixes it.

gcc/ChangeLog:

PR rtl-optimization/104961
* lra-assigns.cc (find_reload_regno_insns): Process reload pseudo clobber.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104961
* gcc.target/i386/pr104961.c: New.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index ab3a6e6e9cc..af30a673142 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1706,7 +1706,8 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish)
 {
   unsigned int uid;
   bitmap_iterator bi;
-  int n = 0;
+  int insns_num = 0;
+  bool clobber_p = false;
   rtx_insn *prev_insn, *next_insn;
   rtx_insn *start_insn = NULL, *first_insn = NULL, *second_insn = NULL;
   
@@ -1714,28 +1715,32 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish)
 {
   if (start_insn == NULL)
 	start_insn = lra_insn_recog_data[uid]->insn;
-  n++;
+  if (GET_CODE (PATTERN (lra_insn_recog_data[uid]->insn)) == CLOBBER)
+	clobber_p = true;
+  else
+	insns_num++;
 }
-  /* For reload pseudo we should have at most 3 insns referring for
+  /* For reload pseudo we should have at most 3 insns besides clobber referring for
  it: input/output reload insns and the original insn.  */
-  if (n > 3)
+  if (insns_num > 3)
 return false;
-  if (n > 1)
+  if (clobber_p)
+insns_num++;
+  if (insns_num > 1)
 {
   for (prev_insn = PREV_INSN (start_insn),
 	 next_insn = NEXT_INSN (start_insn);
-	   n != 1 && (prev_insn != NULL || next_insn != NULL); )
+	   insns_num != 1 && (prev_insn != NULL || next_insn != NULL); )
 	{
-	  if (prev_insn != NULL && first_insn == NULL)
+	  if (prev_insn != NULL)
 	{
-	  if (! bitmap_bit_p (&lra_reg_info[regno].insn_bitmap,
-  INSN_UID (prev_insn)))
-		prev_insn = PREV_INSN (prev_insn);
-	  else
+	  if (bitmap_bit_p (&lra_reg_info[regno].insn_bitmap,
+INSN_UID (prev_insn)))
 		{
 		  first_insn = prev_insn;
-		  n--;
+		  insns_num--;
 		}
+		prev_insn = PREV_INSN (prev_insn);
 	}
 	  if (next_insn != NULL && second_insn == NULL)
 	{
@@ -1745,11 +1750,11 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish)
 	  else
 		{
 		  second_insn = next_insn;
-		  n--;
+		  insns_num--;
 		}
 	}
 	}
-  if (n > 1)
+  if (insns_num > 1)
 	return false;
 }
   start = first_insn != NULL ? first_insn : start_insn;
diff --git a/gcc/testsuite/gcc.target/i386/pr104961.c b/gcc/testsuite/gcc.target/i386/pr104961.c
new file mode 100644
index 000..11ea95afe44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104961.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-Og" } */
+
+__int128 i;
+
+void bar (int);
+
+void
+foo (int a, char b, _Complex unsigned char c)
+{
+  __int128 j = i * i;
+  c -= 1;
+  bar (j);
+  bar (__imag__ c);
+}

Re: [PATCH] rtl-optimization/105028 - fix compile-time hog in form_threads_from_copies

2022-03-23 Thread Vladimir Makarov via Gcc-patches




On 2022-03-23 07:49, Richard Biener wrote:

form_threads_from_copies processes a sorted array of copies, skipping
those with the same thread and conflicting threads and merging the
first non-conflicting ones.  After that it terminates the loop and
gathers the remaining elements of the array, skipping same thread
copies, re-starting the process.  For a large number of copies this
gathering of the rest takes considerable time and it also appears
pointless.  The following simply continues processing the array
which should be equivalent as far as I can see.


It looks the same to me that the result code is equivalent to the 
original one.


As I remember originally it was more sophisticated but even more slower 
algorithm taking into account that merging 2 threads could remove 
several copies (not just one) from the array and choosing the best copy 
with this point of view.  It was transformed into this ineffective 
leftover code.



This takes form_threads_from_copies off the profile radar from
previously taking ~50% of the compile-time.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK if testing succeeds?

Yes.  Thank you for working on this, Richard.

[committed] [PR104971] LRA: check live hard regs to remove a dead insn

2022-03-25 Thread Vladimir Makarov via Gcc-patches


The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

The PR was already fixed by Jakub but his patch did not fix a latent LRA 
bug mentioned in the PR comments.  The current patch fixes the latent bug.


The patch was successfully bootstrapped and tested on x86-64 and aarch64.

commit 33904327c92bd914d4e0e076be12dc0a6b453c2d
Author: Vladimir N. Makarov 
Date:   Fri Mar 25 12:22:08 2022 -0400

[PR104971] LRA: check live hard regs to remove a dead insn

LRA removes insn modifying sp for given PR test set.  We should also have
checked living hard regs to prevent this.  The patch fixes this.

gcc/ChangeLog:

PR middle-end/104971
* lra-lives.cc (process_bb_lives): Check hard_regs_live for hard
regs to clear remove_p flag.

diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index 796f00629b4..a755464ee81 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -724,7 +724,10 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	  bool remove_p = true;
 
 	  for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN && sparseset_bit_p (pseudos_live, reg->regno))
+	if (reg->type != OP_IN
+		&& (reg->regno < FIRST_PSEUDO_REGISTER
+		? TEST_HARD_REG_BIT (hard_regs_live, reg->regno)
+		: sparseset_bit_p (pseudos_live, reg->regno)))
 	  {
 		remove_p = false;
 		break;

[committed] [PR105032] LRA: modify loop condition to find reload insns for hard reg splitting

2022-03-30 Thread Vladimir Makarov via Gcc-patches


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032

The patch was successfully bootstrapped and tested on x86-64.
commit 25de4889c16fec80172a5e2d1825f3ff505d0cc4
Author: Vladimir N. Makarov 
Date:   Wed Mar 30 13:03:44 2022 -0400

[PR105032] LRA: modify loop condition to find reload insns for hard reg splitting

When trying to split hard reg live range to assign hard reg to a reload
pseudo, LRA searches for reload insns of the reload pseudo
assuming a specific order of the reload insns.  This order is violated if
reload involved in inheritance transformation. In such case, the loop used
for reload insn searching can become infinite.  The patch fixes this.

gcc/ChangeLog:

PR middle-end/105032
* lra-assigns.cc (find_reload_regno_insns): Modify loop condition.

gcc/testsuite/ChangeLog:

PR middle-end/105032
* gcc.target/i386/pr105032.c: New.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index af30a673142..486e94f2006 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1730,7 +1730,8 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish)
 {
   for (prev_insn = PREV_INSN (start_insn),
 	 next_insn = NEXT_INSN (start_insn);
-	   insns_num != 1 && (prev_insn != NULL || next_insn != NULL); )
+	   insns_num != 1 && (prev_insn != NULL
+			  || (next_insn != NULL && second_insn == NULL)); )
 	{
 	  if (prev_insn != NULL)
 	{
diff --git a/gcc/testsuite/gcc.target/i386/pr105032.c b/gcc/testsuite/gcc.target/i386/pr105032.c
new file mode 100644
index 000..57b21d3cd7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105032.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-w" } */
+/* { dg-additional-options "-m32" { target x86_64-*-* } } */
+
+typedef unsigned int size_t;	
+__extension__ typedef long int __off_t;
+typedef __off_t off_t;
+static void *__sys_mmap(void *addr, size_t length, int prot, int flags, int fd,
+			off_t offset)
+{
+  offset >>= 12;
+  return (void *)({ long _ret;
+  register long _num asm("eax") = (192);
+  register long _arg1 asm("ebx") = (long)(addr);
+  register long _arg2 asm("ecx") = (long)(length);
+  register long _arg3 asm("edx") = (long)(prot);
+  register long _arg4 asm("esi") = (long)(flags);
+  register long _arg5 asm("edi") = (long)(fd);
+  long _arg6 = (long)(offset);
+  asm volatile ("pushl	%[_arg6]\n\t"
+		"pushl	%%ebp\n\t"
+		"movl	4(%%esp), %%ebp\n\t"
+		"int	$0x80\n\t"
+		"popl	%%ebp\n\t"
+		"addl	$4,%%esp\n\t"
+		: "=a"(_ret)
+		: "r"(_num), "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4),"r"(_arg5), [_arg6]"m"(_arg6)
+		: "memory", "cc" );
+  _ret; });
+}
+
+int main(void)
+{
+  __sys_mmap(((void *)0), 0x1000, 0x1 | 0x2, 0x20 | 0x02, -1, 0);
+  return 0;
+}

Re: [committed] [PR105032] LRA: modify loop condition to find reload insns for hard reg splitting

2022-03-30 Thread Vladimir Makarov via Gcc-patches




On 2022-03-30 15:18, Uros Bizjak wrote:

On Wed, Mar 30, 2022 at 7:15 PM Vladimir Makarov via Gcc-patches
 wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032

The patch was successfully bootstrapped and tested on x86-64.

diff --git a/gcc/testsuite/gcc.target/i386/pr105032.c
b/gcc/testsuite/gcc.target/i386/pr105032.c
new file mode 100644
index 000..57b21d3cd7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105032.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-w" } */
+/* { dg-additional-options "-m32" { target x86_64-*-* } } */

Please don't use -m32 in options, but instead conditionally compile
the testcase with


Sorry for may be a stupid question.  I am interesting what are the 
reasons for this.  Is it just for saving computer cycles?


I think the test is important therefore I'd like to run the test on 
x86-64 too because people rarely test i686 target.



/* { dg-do compile { target ia32 } } */

[pushed] [PR103676] LRA: Calculate and exclude some start hard registers for reload pseudos

2022-01-21 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103676

The patch was successfully bootstrapped and tested on x86_64, aarch64, 
and ppc64.
commit 85419ac59724b7ce710ebb4acf03dbd747edeea3
Author: Vladimir N. Makarov 
Date:   Fri Jan 21 13:34:32 2022 -0500

[PR103676] LRA: Calculate and exclude some start hard registers for reload pseudos

LRA and old reload pass uses only one register class for reload pseudos even if
operand constraints contain more one register class.  Let us consider
constraint 'lh' for thumb arm which means low and high thumb registers.
Reload pseudo for such constraint will have general reg class (union of
low and high reg classes).  Assigning the last low register to the reload
pseudo is wrong if the pseudo is of DImode as it requires two hard regs.
But it is considered OK if we use general reg class.  The following patch
solves this problem for LRA.

gcc/ChangeLog:

PR target/103676
* ira.h (struct target_ira): Add member
x_ira_exclude_class_mode_regs.
(ira_exclude_class_mode_regs): New macro.
* lra.h (lra_create_new_reg): Add arg exclude_start_hard_regs and
move from here ...
* lra-int.h: ... to here.
(lra_create_new_reg_with_unique_value): Add arg
exclude_start_hard_regs.
(class lra_reg): Add member exclude_start_hard_regs.
* lra-assigns.cc (find_hard_regno_for_1): Setup
impossible_start_hard_regs from exclude_start_hard_regs.
* lra-constraints.cc (get_reload_reg): Add arg exclude_start_hard_regs and pass
it lra_create_new_reg[_with_unique_value].
(match_reload): Ditto.
(check_and_process_move): Pass NULL
exclude_start_hard_regs to lra_create_new_reg_with_unique_value.
(goal_alt_exclude_start_hard_regs): New static variable.
(process_addr_reg, simplify_operand_subreg): Pass NULL
exclude_start_hard_regs to lra_create_new_reg_with_unique_value
and get_reload_reg.
(process_alt_operands): Setup goal_alt_exclude_start_hard_regs.
Use this_alternative_exclude_start_hard_regs additionally to find
winning operand alternative.
(base_to_reg, base_plus_disp_to_reg, index_part_to_reg): Pass NULL
exclude_start_hard_regs to lra_create_new_reg.
(process_address_1, emit_inc): Ditto.
(curr_insn_transform): Pass exclude_start_hard_regs value to
lra_create_new_reg, get_reload_reg, match_reload.
(inherit_reload_reg, split_reg): Pass NULL exclude_start_hard_regs
to lra_create_new_reg.
(process_invariant_for_inheritance): Ditto.
* lra-remat.cc (update_scratch_ops): Ditto.
* lra.cc (lra_create_new_reg_with_unique_value): Add arg
exclude_start_hard_regs.  Setup the corresponding member of
lra reg info.
(lra_create_new_reg): Add arg exclude_start_hard_regs and pass it
to lra_create_new_reg_with_unique_value.
(initialize_lra_reg_info_element): Initialize member
exclude_start_hard_regs.
(get_scratch_reg): Pass NULL to lra_create_new_reg.
* ira.cc (setup_prohibited_class_mode_regs): Rename to
setup_prohibited_and_exclude_class_mode_regs and calculate
ira_exclude_class_mode_regs.

gcc/testsuite/ChangeLog:

PR target/103676
* g++.target/arm/pr103676.C: New.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index f294f035d74..e3b3c549120 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -1465,10 +1465,11 @@ setup_reg_class_nregs (void)
 
 
 
-/* Set up IRA_PROHIBITED_CLASS_MODE_REGS and IRA_CLASS_SINGLETON.
-   This function is called once IRA_CLASS_HARD_REGS has been initialized.  */
+/* Set up IRA_PROHIBITED_CLASS_MODE_REGS, IRA_EXCLUDE_CLASS_MODE_REGS, and
+   IRA_CLASS_SINGLETON.  This function is called once IRA_CLASS_HARD_REGS has
+   been initialized.  */
 static void
-setup_prohibited_class_mode_regs (void)
+setup_prohibited_and_exclude_class_mode_regs (void)
 {
   int j, k, hard_regno, cl, last_hard_regno, count;
 
@@ -1480,6 +1481,7 @@ setup_prohibited_class_mode_regs (void)
 	  count = 0;
 	  last_hard_regno = -1;
 	  CLEAR_HARD_REG_SET (ira_prohibited_class_mode_regs[cl][j]);
+	  CLEAR_HARD_REG_SET (ira_exclude_class_mode_regs[cl][j]);
 	  for (k = ira_class_hard_regs_num[cl] - 1; k >= 0; k--)
 	{
 	  hard_regno = ira_class_hard_regs[cl][k];
@@ -1492,6 +1494,10 @@ setup_prohibited_class_mode_regs (void)
 		  last_hard_regno = hard_regno;
 		  count++;
 		}
+	  else
+		{
+		  SET_HARD_REG_BIT (ira_exclude_class_mode_regs[cl][j], hard_regno);
+		}
 	}
 	  ira_class_singleton[cl][j] = (count == 1 ? last_hard_regno : -1);
 	}
@@ -1707,7 +1713,7 @@ ira_init (void)
   setup_alloc

[committed] [PR104400] LRA: Modify exclude start hard register calculation for insn alternative

2022-02-11 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104400

The patch was successfully tested and bootstrapped on x86-64 and aarch64.
commit 274a4d29421e73c9b40c1641986c6ed904e20184
Author: Vladimir N. Makarov 
Date:   Fri Feb 11 09:52:14 2022 -0500

[PR104400] LRA: Modify exclude start hard register calculation for insn alternative

v850 target has an interesting insn alternative constraint 'e!r' where e
denotes even general regs and e is a subset of r.  We cannot just make
union of exclude start hard registers for e and r and should use only
exclude start hard registers of r.  The following patch implements this.

gcc/ChangeLog:

PR rtl-optimization/104400
* lra-constraints.cc (process_alt_operands): Don't make union of
this_alternative_exclude_start_hard_regs when reg class in insn
alternative covers other reg classes in the same alternative.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104400
* gcc.target/v850/pr104400.c: New.
* gcc.target/v850/v850.exp: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 9cee17479ba..fdff9e0720a 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -2498,9 +2498,15 @@ process_alt_operands (int only_alternative)
 		  if (mode == BLKmode)
 		break;
 		  this_alternative = reg_class_subunion[this_alternative][cl];
+		  if (hard_reg_set_subset_p (this_alternative_set,
+	 reg_class_contents[cl]))
+		this_alternative_exclude_start_hard_regs
+		  = ira_exclude_class_mode_regs[cl][mode];
+		  else if (!hard_reg_set_subset_p (reg_class_contents[cl],
+		   this_alternative_set))
+		this_alternative_exclude_start_hard_regs
+		  |= ira_exclude_class_mode_regs[cl][mode];
 		  this_alternative_set |= reg_class_contents[cl];
-		  this_alternative_exclude_start_hard_regs
-		|= ira_exclude_class_mode_regs[cl][mode];
 		  if (costly_p)
 		{
 		  this_costly_alternative
diff --git a/gcc/testsuite/gcc.target/v850/pr104400.c b/gcc/testsuite/gcc.target/v850/pr104400.c
new file mode 100644
index 000..5d78a77345c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/v850/pr104400.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mv850e3v5" } */
+
+double frob (double r)
+{
+r = -r;
+return r;
+}
diff --git a/gcc/testsuite/gcc.target/v850/v850.exp b/gcc/testsuite/gcc.target/v850/v850.exp
new file mode 100644
index 000..4e8c745a0b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/v850/v850.exp
@@ -0,0 +1,41 @@
+# Copyright (C) 2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an v850 target.
+if ![istarget v850*-*-*] then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \
+	"" $DEFAULT_CFLAGS
+
+# All done.
+dg-finish

Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-14 Thread Vladimir Makarov via Gcc-patches




On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:

Iain Sandoe via Gcc-patches  writes:

Two issues resulted in this PR, which manifests when we force a constant into
memory in LRA (in PIC code on Darwin).  The presence of such forced constants
is quite dependent on other RTL optimisations, and it is easy for the issue to
become latent for a specific case.

First, in the Darwin-specific rs6000 backend code, we were not being careful
enough in rejecting invalid symbolic addresses.  Specifically, when generating
PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.

Second, LRA was attempting to load a register using an invalid lo_sum address.

The LRA changes are approved in the PR by Vladimir, and the RS6000 changes are
Darwin-specific (although, of course, any observations are welcome).

Tested on several lo_sum targets and x86_64 all languages except as noted:
powerpc64-linux (m32/m64) -D
powerpc64le-linux  -D
powerpc64-aix -Ada -Go -D
aarch64-linux -Ada -D
x86_64-linux all langs -D
powerpc-darwin9 (master and 11.2) -D -Go.

pushed to master, thanks,
Iain

Signed-off-by: Iain Sandoe 
Co-authored-by: Vladimir Makarov 

PR target/104117

gcc/ChangeLog:

* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
emitting PIC code.
(legitimate_lo_sum_address_p): Likewise.
* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
load from an invalid lo_sum address.
---
  gcc/config/rs6000/rs6000.cc | 38 +++--
  gcc/lra-constraints.cc  | 17 ++---
  2 files changed, 38 insertions(+), 17 deletions(-)

[…]
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index fdff9e0720a..c700c3f4578 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
  if (!valid_address_p (op, &ad, cn))
{
- /* Try to put lo_sum into register.  */
- insn = emit_insn (gen_rtx_SET
-   (new_reg,
-gen_rtx_LO_SUM (Pmode, new_reg, 
addr)));
- code = recog_memoized (insn);
- if (code >= 0)
-   {
- *ad.inner = new_reg;
- if (!valid_address_p (op, &ad, cn))
-   {
- *ad.inner = addr;
- code = -1;
-   }
-   }
-
+ *ad.inner = addr; /* Punt.  */
+ code = -1;
}
}
  if (code < 0)

Could you go into more details about this?  Why is it OK to continue
to try:

   (lo_sum new_reg addr)

directly as an address (the context at the top of the hunk), but not try
moving the lo_sum into a register?  They should be semantically equivalent,
so it seems that if one is wrong, the other would be too.


Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.

I think there is no need for this code and it is misleading.  If 
'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
will help for any existing target.  As machine-dependent code for any 
target most probably (for ppc64 darwin it is exactly the case) checks 
address only in memory, it can wrongly accept wrong address by reloading 
it into reg and use it in memory. So these are my arguments for the 
remove this code from process_address_1.

Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-14 Thread Vladimir Makarov via Gcc-patches




On 2022-02-14 11:00, Richard Sandiford wrote:

Hi Vlad,

Vladimir Makarov via Gcc-patches  writes:


Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.

I think there is no need for this code and it is misleading.  If
'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]'
will help for any existing target.  As machine-dependent code for any
target most probably (for ppc64 darwin it is exactly the case) checks
address only in memory, it can wrongly accept wrong address by reloading
it into reg and use it in memory. So these are my arguments for the
remove this code from process_address_1.

I'm probably making too much of this, but:

I think the code is potentially useful in that existing targets do forbid
forbid lo_sum addresses in certain contexts (due to limited offset range)
while still wanting lo_sum to be used to be load the address.  If we
handle the high/lo_sum split in generic code then we have more chance
of being able to optimise things.  So it feels like this is setting an
unfortunate precedent.

I still don't understand what went wrong before though (the PR trail
was a bit too long to process :-)).  Is there a case where
(lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.

Sometimes it is hard to make a line where an RA bug is a bug in 
machine-dependent code or in RA itself.


For this case I would say it is a bug in the both parts.

Low-sum is generated by LRA and it does not know that it should be 
wrapped by unspec for darwin. Generally speaking we could avoid the 
change in LRA but it would require to do non-trivial analysis in machine 
dependent code to find cases when 'reg=low_sum ... mem[reg]' is 
incorrect code for darwin (PIC) target (and may be some other PIC 
targets too). Therefore I believe the change in LRA is a good solution 
even if the change can potentially result in less optimized code for 
some cases.  Taking your concern into account we could probably improve 
the patch by introducing a hook (I never liked such solutions as we 
already have too many hooks directing RA) or better to make the LRA 
change working only for PIC target. Something like this (it probably 
needs better recognition of pic target):


--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3616,21 +3616,21 @@ process_address_1 (int nop, bool check_only_p,
  if (HAVE_lo_sum)
    {
  /* addr => lo_sum (new_base, addr), case (2) above.  */
  insn = emit_insn (gen_rtx_SET
    (new_reg,
 gen_rtx_HIGH (Pmode, copy_rtx (addr;
  code = recog_memoized (insn);
  if (code >= 0)
    {
  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
- if (!valid_address_p (op, &ad, cn))
+ if (!valid_address_p (op, &ad, cn) && !flag_pic)
    {
  /* Try to put lo_sum into register.  */
  insn = emit_insn (gen_rtx_SET
    (new_reg,
 gen_rtx_LO_SUM (Pmode, 
new_reg, addr)));

  code = recog_memoized (insn);
  if (code >= 0)
    {
  *ad.inner = new_reg;
  if (!valid_address_p (op, &ad, cn))

[committed] [PR104447] LRA: Do not split non-alloc hard regs

2022-02-17 Thread Vladimir Makarov via Gcc-patches


The patch solves the following PR:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104447

The patch was successfully bootstrapped and tested on x86-64.

commit db69f666a728ce800a840115829f6b64bc3174d2
Author: Vladimir N. Makarov 
Date:   Thu Feb 17 11:31:50 2022 -0500

[PR104447] LRA: Do not split non-alloc hard regs.

LRA tried to split non-allocated hard reg for reload pseudos again and
again until number of assignment passes reaches the limit.  The patch fixes
this.

gcc/ChangeLog:

PR rtl-optimization/104447
* lra-constraints.cc (spill_hard_reg_in_range): Initiate ignore
hard reg set by lra_no_alloc_regs.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104447
* gcc.target/i386/pr104447.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index c700c3f4578..b2c4590153c 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6008,7 +6008,7 @@ spill_hard_reg_in_range (int regno, enum reg_class rclass, rtx_insn *from, rtx_i
   HARD_REG_SET ignore;
   
   lra_assert (from != NULL && to != NULL);
-  CLEAR_HARD_REG_SET (ignore);
+  ignore = lra_no_alloc_regs;
   EXECUTE_IF_SET_IN_BITMAP (&lra_reg_info[regno].insn_bitmap, 0, uid, bi)
 {
   lra_insn_recog_data_t id = lra_insn_recog_data[uid];
diff --git a/gcc/testsuite/gcc.target/i386/pr104447.c b/gcc/testsuite/gcc.target/i386/pr104447.c
new file mode 100644
index 000..bf11e8696e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104447.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -pg" } */
+
+int
+bar (int x)
+{
+  asm goto ("" : : "r" (x), "r" (x + 1), "r" (x + 2), "r" (x + 3), /* { dg-error "operand has impossible constraints" } */
+	"r" (x + 4), "r" (x + 5), "r" (x + 6), "r" (x + 7),
+	"r" (x + 8), "r" (x + 9), "r" (x + 10), "r" (x + 11),
+	"r" (x + 12), "r" (x + 13), "r" (x + 14), "r" (x + 15),
+	"r" (x + 16) : : lab);
+ lab:
+  return 0;
+}

Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-22 Thread Vladimir Makarov via Gcc-patches




On 2022-02-20 12:34, Iain Sandoe wrote:


^^^ this is mostly for my education - the stuff below is a potential solution 
to leaving lra-constraints unchanged and fixing the Darwin bug….

I'd be really glad if you do manage to fix this w/o changing LRA. 
Richard has a legitimate point that my proposed change in LRA 
prohibiting `...;reg=low_sum; ...mem[reg]` might force LRA to generate 
less optimized code or even might make LRA to generate unrecognized 
insns `reg = orginal addr` for some ports requiring further fixes in 
machine-dependent code of the ports.

Re: [PATCH] rtl-optimization/104686 - speed up conflict iteration

2022-02-25 Thread Vladimir Makarov via Gcc-patches




On 2022-02-25 09:14, Richard Biener wrote:

The following replaces

/* Skip bits that are zero.  */
for (; (word & 1) == 0; word >>= 1)
  bit_num++;

idioms in ira-int.h in the attempt to speedup update_conflict_hard_regno_costs
which we're bound on in PR104686.  The trick is to use ctz_hwi here
which should pay off even with dense bitmaps on architectures that
have HW support for this.

For the PR in question this speeds up compile-time from 31s to 24s for
me.

It is a really significant improvement.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

Yes.  Thank you for working on this PR, Richard.

2022-02-25  Richard Biener  

PR rtl-optimization/104686
* ira-int.h (minmax_set_iter_cond): Use ctz_hwi to elide loop
skipping bits that are zero.
(ira_object_conflict_iter_cond): Likewise.

[PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103437

The patch was successfully bootstrapped and tested on x86-64. There is 
no test as the bug occurs on GCC built with sanitizing for an existing 
go test.
commit c6cf5ac1522c54b2ced98fc687e973a9ff17ba1e
Author: Vladimir N. Makarov 
Date:   Thu Dec 2 08:29:45 2021 -0500

[PR103437] Process multiplication overflow in priority calculation for allocno assignments

We process overflows in cost calculations but for huge functions
priority calculation can overflow as priority can be bigger the cost
used for it.  The patch fixes the problem.

gcc/ChangeLog:

PR rtl-optimization/103437
* ira-color.c (setup_allocno_priorities): Process multiplication
overflow.

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 3d01c60800c..1f80cbea0e2 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2796,7 +2796,7 @@ static int *allocno_priorities;
 static void
 setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
 {
-  int i, length, nrefs, priority, max_priority, mult;
+  int i, length, nrefs, priority, max_priority, mult, diff;
   ira_allocno_t a;
 
   max_priority = 0;
@@ -2807,11 +2807,14 @@ setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
   ira_assert (nrefs >= 0);
   mult = floor_log2 (ALLOCNO_NREFS (a)) + 1;
   ira_assert (mult >= 0);
-  allocno_priorities[ALLOCNO_NUM (a)]
-	= priority
-	= (mult
-	   * (ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a))
-	   * ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]);
+  mult *= ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
+  diff = ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a);
+  /* Multiplication can overflow for very large functions.
+	 Check the overflow and constrain the result if necessary: */
+  if (__builtin_smul_overflow (mult, diff, &priority)
+	  || priority <= -INT_MAX)
+	priority = diff >= 0 ? INT_MAX : -INT_MAX;
+  allocno_priorities[ALLOCNO_NUM (a)] = priority;
   if (priority < 0)
 	priority = -priority;
   if (max_priority < priority)

Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches




On 2021-12-02 09:00, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 08:53:31AM -0500, Vladimir Makarov via Gcc-patches 
wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103437

The patch was successfully bootstrapped and tested on x86-64. There is no
test as the bug occurs on GCC built with sanitizing for an existing go test.

I'm afraid we can't use __builtin_smul_overflow, not all system compilers
will have that.
But, as it is done in int and we kind of rely on int being 32-bit on host
and rely on long long being 64-bit, I think you can do something like:
   long long priorityll = (long long) mult * diff;
   priority = priorityll;
   if (priorityll != priority
...


My 1st version of the patch was based on long long but the standard does 
not guarantee that int size is smaller than long long size.  Although it 
is true for all targets supported by GCC.


Another solution would be to switching to int32_t instead of int for 
costs but it will require a lot of changes in RA code.


I see your point for usage system compiler different from GCC and LLVM.  
I guess I could change it to


#if __GNUC__ >= 5

current code

#else

long long code

#endif


What do you think?

Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches




On 2021-12-02 09:29, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 09:23:20AM -0500, Vladimir Makarov wrote:

On 2021-12-02 09:00, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 08:53:31AM -0500, Vladimir Makarov via Gcc-patches 
wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103437

The patch was successfully bootstrapped and tested on x86-64. There is no
test as the bug occurs on GCC built with sanitizing for an existing go test.

I'm afraid we can't use __builtin_smul_overflow, not all system compilers
will have that.
But, as it is done in int and we kind of rely on int being 32-bit on host
and rely on long long being 64-bit, I think you can do something like:
long long priorityll = (long long) mult * diff;
priority = priorityll;
if (priorityll != priority
...



My 1st version of the patch was based on long long but the standard does not
guarantee that int size is smaller than long long size.  Although it is true
for all targets supported by GCC.

Another solution would be to switching to int32_t instead of int for costs
but it will require a lot of changes in RA code.

I see your point for usage system compiler different from GCC and LLVM.  I
guess I could change it to

#if __GNUC__ >= 5

#ifdef __has_builtin
# if __has_builtin(__builtin_smul_overflow)
would be the best check.
And you can just gcc_assert (sizeof (long long) >= 2 * sizeof (int));
in the fallback code ;)


I used static_assert in my 1st patch version.  I think it is better than 
gcc_assert..


I'll commit patch fix today.  Thank you for your feedback, Jakub.

Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches



On 2021-12-02 10:52, Christophe Lyon wrote:



On Thu, Dec 2, 2021 at 3:38 PM Vladimir Makarov via Gcc-patches 
 wrote:



On 2021-12-02 09:29, Jakub Jelinek wrote:
> On Thu, Dec 02, 2021 at 09:23:20AM -0500, Vladimir Makarov wrote:
>> On 2021-12-02 09:00, Jakub Jelinek wrote:
>>> On Thu, Dec 02, 2021 at 08:53:31AM -0500, Vladimir Makarov via
Gcc-patches wrote:
>>>> The following patch fixes
>>>>
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103437
>>>>
>>>> The patch was successfully bootstrapped and tested on x86-64.
There is no
>>>> test as the bug occurs on GCC built with sanitizing for an
existing go test.
>>> I'm afraid we can't use __builtin_smul_overflow, not all
system compilers
>>> will have that.
>>> But, as it is done in int and we kind of rely on int being
32-bit on host
>>> and rely on long long being 64-bit, I think you can do
something like:
>>>         long long priorityll = (long long) mult * diff;
>>>         priority = priorityll;
>>>         if (priorityll != priority
>>> ...
>>>
>>>
>> My 1st version of the patch was based on long long but the
standard does not
>> guarantee that int size is smaller than long long size. 
Although it is true
>> for all targets supported by GCC.
>>
>> Another solution would be to switching to int32_t instead of
int for costs
>> but it will require a lot of changes in RA code.
>>
>> I see your point for usage system compiler different from GCC
and LLVM.  I
>> guess I could change it to
>>
>> #if __GNUC__ >= 5
> #ifdef __has_builtin
> # if __has_builtin(__builtin_smul_overflow)
> would be the best check.
> And you can just gcc_assert (sizeof (long long) >= 2 * sizeof
(int));
> in the fallback code ;)

I used static_assert in my 1st patch version.  I think it is
better than
gcc_assert..

I'll commit patch fix today.  Thank you for your feedback, Jakub.


Thanks, I confirm I am seeing build failures with gcc-4.8.5 ;-)

I've committed the following patch with the backup code.  Sorry for 
inconvenience.


commit 0eb22e619c294efb0f007178a230cac413dccb87
Author: Vladimir N. Makarov 
Date:   Thu Dec 2 10:55:59 2021 -0500

[PR103437] Use long long multiplication as backup for overflow processing

__builtin_smul_overflow can be unavailable for some C++ compilers.
Add long long multiplication as backup for overflow processing.

gcc/ChangeLog:
PR rtl-optimization/103437
* ira-color.c (setup_allocno_priorities): Use long long
multiplication as backup for overflow processing.

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 1f80cbea0e2..3b19a58e1f0 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2797,6 +2797,7 @@ static void
 setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
 {
   int i, length, nrefs, priority, max_priority, mult, diff;
+  bool overflow_backup_p = true;
   ira_allocno_t a;
 
   max_priority = 0;
@@ -2811,9 +2812,25 @@ setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
   diff = ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a);
   /* Multiplication can overflow for very large functions.
 	 Check the overflow and constrain the result if necessary: */
+#ifdef __has_builtin
+#if __has_builtin(__builtin_smul_overflow)
+  overflow_backup_p = false;
   if (__builtin_smul_overflow (mult, diff, &priority)
 	  || priority <= -INT_MAX)
 	priority = diff >= 0 ? INT_MAX : -INT_MAX;
+#endif
+#endif
+  if (overflow_backup_p)
+	{
+	  static_assert
+	(sizeof (long long) >= 2 * sizeof (int),
+	 "overflow code does not work for such int and long long sizes");
+	  long long priorityll = (long long) mult * diff;
+	  if (priorityll < -INT_MAX || priorityll > INT_MAX)
+	priority = diff >= 0 ? INT_MAX : -INT_MAX;
+	  else
+	priority = priorityll;
+	}
   allocno_priorities[ALLOCNO_NUM (a)] = priority;
   if (priority < 0)
 	priority = -priority;

Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches




On 2021-12-02 11:13, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 11:03:46AM -0500, Vladimir Makarov wrote:

--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2797,6 +2797,7 @@ static void
  setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
  {
int i, length, nrefs, priority, max_priority, mult, diff;
+  bool overflow_backup_p = true;
ira_allocno_t a;
  
max_priority = 0;

@@ -2811,9 +2812,25 @@ setup_allocno_priorities (ira_allocno_t 
*consideration_allocnos, int n)
diff = ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a);
/* Multiplication can overflow for very large functions.
 Check the overflow and constrain the result if necessary: */
+#ifdef __has_builtin
+#if __has_builtin(__builtin_smul_overflow)
+  overflow_backup_p = false;
if (__builtin_smul_overflow (mult, diff, &priority)
  || priority <= -INT_MAX)
priority = diff >= 0 ? INT_MAX : -INT_MAX;
+#endif
+#endif
+  if (overflow_backup_p)
+   {
+ static_assert
+   (sizeof (long long) >= 2 * sizeof (int),
+"overflow code does not work for such int and long long sizes");
+ long long priorityll = (long long) mult * diff;
+ if (priorityll < -INT_MAX || priorityll > INT_MAX)
+   priority = diff >= 0 ? INT_MAX : -INT_MAX;
+ else
+   priority = priorityll;
+   }

So simple problem and so many details :)

This will require that long long is at least twice as large as int
everywhere, I thought you wanted to do that only when
__builtin_smul_overflow isn't available.


That is not critical as GCC and probably all others C++ compiler support 
only targets with this assertion.  I guess it is better to find this 
problem earlier on targets (if any) where it is not true *independently* 
on used compiler.


So it is difficult for me to know what is better.  Probably for 
GCC/Clang oriented world, your variant is better as it permits to 
compile the code by GCC even on targets where the assertion is false.



That would be
#ifdef __has_builtin
#if __has_builtin(__builtin_smul_overflow)
#define HAS_SMUL_OVERFLOW
#endif
#endif
#ifdef HAS_SMUL_OVERFLOW
   if (__builtin_smul_overflow (mult, diff, &priority)
  || priority <= -INT_MAX)
priority = diff >= 0 ? INT_MAX : -INT_MAX;
#else
   static_assert (sizeof (long long) >= 2 * sizeof (int),
 "overflow code does not work for int wider"
 "than half of long long");
   long long priorityll = (long long) mult * diff;
   if (priorityll < -INT_MAX || priorityll > INT_MAX)
priority = diff >= 0 ? INT_MAX : -INT_MAX;
   else
priority = priorityll;
#endif
Why priority <= -INT_MAX in the first case though,
shouldn't that be < -INT_MAX ?


My thought was to avoid 'always false' warning for non two's compliment 
binary representation targets.  As I remember C++17 started to require 
only two-compliment integers.  If we require to use only c++17 and 
upper, then probably it is better to fix it.


In any case, I feel these details are not my area of expertise. If you 
believe I should do these changes, please confirm that you want these 
changes and I'll definitely do this.  Thank you, Jakub.

Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches




On 2021-12-02 12:06, Vladimir Makarov wrote:


On 2021-12-02 11:13, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 11:03:46AM -0500, Vladimir Makarov wrote:

--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2797,6 +2797,7 @@ static void
  setup_allocno_priorities (ira_allocno_t *consideration_allocnos, 
int n)

  {
    int i, length, nrefs, priority, max_priority, mult, diff;
+  bool overflow_backup_p = true;
    ira_allocno_t a;
      max_priority = 0;
@@ -2811,9 +2812,25 @@ setup_allocno_priorities (ira_allocno_t 
*consideration_allocnos, int n)

    diff = ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a);
    /* Multiplication can overflow for very large functions.
   Check the overflow and constrain the result if necessary: */
+#ifdef __has_builtin
+#if __has_builtin(__builtin_smul_overflow)
+  overflow_backup_p = false;
    if (__builtin_smul_overflow (mult, diff, &priority)
    || priority <= -INT_MAX)
  priority = diff >= 0 ? INT_MAX : -INT_MAX;
+#endif
+#endif
+  if (overflow_backup_p)
+    {
+  static_assert
+    (sizeof (long long) >= 2 * sizeof (int),
+ "overflow code does not work for such int and long long 
sizes");

+  long long priorityll = (long long) mult * diff;
+  if (priorityll < -INT_MAX || priorityll > INT_MAX)
+    priority = diff >= 0 ? INT_MAX : -INT_MAX;
+  else
+    priority = priorityll;
+    }

So simple problem and so many details :)

This will require that long long is at least twice as large as int
everywhere, I thought you wanted to do that only when
__builtin_smul_overflow isn't available.


That is not critical as GCC and probably all others C++ compiler 
support only targets with this assertion.  I guess it is better to 
find this problem earlier on targets (if any) where it is not true 
*independently* on used compiler.


So it is difficult for me to know what is better.  Probably for 
GCC/Clang oriented world, your variant is better as it permits to 
compile the code by GCC even on targets where the assertion is false.



After some more considerations, I think you are right and the backup 
code should be conditional.  Because otherwise, there is no sense to use 
code with __builtin_smul_overflow.  I'll do the changes.

Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches



On 2021-12-02 12:21, Vladimir Makarov via Gcc-patches wrote:


On 2021-12-02 12:06, Vladimir Makarov wrote:



So simple problem and so many details :)

This will require that long long is at least twice as large as int
everywhere, I thought you wanted to do that only when
__builtin_smul_overflow isn't available.


That is not critical as GCC and probably all others C++ compiler 
support only targets with this assertion.  I guess it is better to 
find this problem earlier on targets (if any) where it is not true 
*independently* on used compiler.


So it is difficult for me to know what is better.  Probably for 
GCC/Clang oriented world, your variant is better as it permits to 
compile the code by GCC even on targets where the assertion is false.



After some more considerations, I think you are right and the backup 
code should be conditional.  Because otherwise, there is no sense to 
use code with __builtin_smul_overflow.  I'll do the changes.



Here is one more patch I've committed.  Jakub, thank your for the 
discussion and your patience.


commit a72b8f376a176c620f1c1c684f2eee2016e6b4c3
Author: Vladimir N. Makarov 
Date:   Thu Dec 2 12:31:28 2021 -0500

[PR103437] Make backup code for overflow conditional

Switch off long long variant overflow code by preprocessor if the
build compiler has __builtin_smul_overflow.

gcc/ChangeLog:
PR rtl-optimization/103437
* ira-color.c (setup_allocno_priorities): Switch off backup code
for overflow if compiler has __builtin_smul_overflow.  Use <
for comparison with -INT_MAX.

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 3b19a58e1f0..a1b02776e77 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2797,7 +2797,6 @@ static void
 setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
 {
   int i, length, nrefs, priority, max_priority, mult, diff;
-  bool overflow_backup_p = true;
   ira_allocno_t a;
 
   max_priority = 0;
@@ -2810,27 +2809,27 @@ setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
   ira_assert (mult >= 0);
   mult *= ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
   diff = ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a);
-  /* Multiplication can overflow for very large functions.
-	 Check the overflow and constrain the result if necessary: */
 #ifdef __has_builtin
 #if __has_builtin(__builtin_smul_overflow)
-  overflow_backup_p = false;
+#define HAS_SMUL_OVERFLOW
+#endif
+#endif
+  /* Multiplication can overflow for very large functions.
+	 Check the overflow and constrain the result if necessary: */
+#ifdef HAS_SMUL_OVERFLOW
   if (__builtin_smul_overflow (mult, diff, &priority)
-	  || priority <= -INT_MAX)
+	  || priority < -INT_MAX)
 	priority = diff >= 0 ? INT_MAX : -INT_MAX;
+#else
+  static_assert
+	(sizeof (long long) >= 2 * sizeof (int),
+	 "overflow code does not work for such int and long long sizes");
+  long long priorityll = (long long) mult * diff;
+  if (priorityll < -INT_MAX || priorityll > INT_MAX)
+	priority = diff >= 0 ? INT_MAX : -INT_MAX;
+  else
+	priority = priorityll;
 #endif
-#endif
-  if (overflow_backup_p)
-	{
-	  static_assert
-	(sizeof (long long) >= 2 * sizeof (int),
-	 "overflow code does not work for such int and long long sizes");
-	  long long priorityll = (long long) mult * diff;
-	  if (priorityll < -INT_MAX || priorityll > INT_MAX)
-	priority = diff >= 0 ? INT_MAX : -INT_MAX;
-	  else
-	priority = priorityll;
-	}
   allocno_priorities[ALLOCNO_NUM (a)] = priority;
   if (priority < 0)
 	priority = -priority;

Re: [PATCH] IRA: Make sure array is big enough

2022-10-26 Thread Vladimir Makarov via Gcc-patches




On 2022-10-25 06:01, Torbjörn SVENSSON wrote:

In commit 081c96621da, the call to resize_reg_info() was moved before
the call to remove_scratches() and the latter one can increase the
number of regs and that would cause an out of bounds usage on the
reg_renumber global array.

Without this patch, the following testcase randomly fails with:
during RTL pass: ira
In file included from 
/src/gcc/testsuite/gcc.dg/compat//struct-by-value-5b_y.c:13:
/src/gcc/testsuite/gcc.dg/compat//struct-by-value-5b_y.c: In function 
'checkgSf13':
/src/gcc/testsuite/gcc.dg/compat//fp-struct-test-by-value-y.h:28:1: internal 
compiler error: Segmentation fault
/src/gcc/testsuite/gcc.dg/compat//struct-by-value-5b_y.c:22:1: note: in 
expansion of macro 'TEST'

gcc/ChangeLog:

* ira.c: Resize array after reg number increased.


The patch is ok to commit it into gcc-11,12 branches and master.

Thank you for fixing this.


Co-Authored-By: Yvan ROUX 
Signed-off-by: Torbjörn SVENSSON 
---
  gcc/ira.cc | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 42c9cead9f8..d28a67b2546 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -5718,6 +5718,7 @@ ira (FILE *f)
  regstat_free_ri ();
  regstat_init_n_sets_and_refs ();
  regstat_compute_ri ();
+resize_reg_info ();
};
  
int max_regno_before_rm = max_reg_num ();

Re: [PATCH] [PR100106] Reject unaligned subregs when strict alignment is required

2022-05-06 Thread Vladimir Makarov via Gcc-patches




On 2022-05-05 02:52, Alexandre Oliva wrote:


Regstrapped on x86_64-linux-gnu and ppc64le-linux-gnu, also tested
targeting ppc- and ppc64-vx7r2.  Ok to install?

I am ok with the modified version of the patch.  It looks reasonable for 
me and I support its commit.


But I think I can not approve the patch formally as emit-rtl.cc is out 
of my jurisdiction and validate_subreg is used in many places besides RA.


Sorry, Alex, some global reviewer should do this.


for  gcc/ChangeLog

PR target/100106
* emit-rtl.c (validate_subreg): Reject a SUBREG of a MEM that
requires stricter alignment than MEM's.

for  gcc/testsuite/ChangeLog

PR target/100106
* gcc.target/powerpc/pr100106-sa.c: New.
---
  gcc/emit-rtl.cc|3 +++
  gcc/testsuite/gcc.target/powerpc/pr100106-sa.c |4 
  2 files changed, 7 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr100106-sa.c

diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 1e02ae254d012..642e47eada0d7 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -982,6 +982,9 @@ validate_subreg (machine_mode omode, machine_mode imode,
  
return subreg_offset_representable_p (regno, imode, offset, omode);

  }
+  else if (reg && MEM_P (reg)
+  && STRICT_ALIGNMENT && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
+return false;
  
/* The outer size must be ordered wrt the register size, otherwise

   we wouldn't know at compile time how many registers the outer
diff --git a/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c 
b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
new file mode 100644
index 0..6cc29595c8b25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target { ilp32 } } } */
+/* { dg-options "-mcpu=604 -O -mstrict-align" } */
+
+#include "../../gcc.c-torture/compile/pr100106.c"

Re: [patch] Fixing ppc64 test failure after patch dealing with scratches in IRA

2020-11-01 Thread Vladimir Makarov via Gcc-patches




On 2020-10-30 7:36 p.m., Segher Boessenkool wrote:

Thanks for the patch!  But it has a problem:


diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 67e4f2fd037..78de85ccbbb 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3717,7 +3717,7 @@
(vec_select:
 (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" ",v")
 (parallel [(match_operand:QI 2 "const_int_operand" "n,n")])))
-   (clobber (match_scratch: 3 "=,&r"))
+   (clobber (match_scratch: 3 "=*,&*r"))
 (clobber (match_scratch:SI 4 "=X,&r"))]
"VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
"#"

You add * to both alternatives here?  I would expect adding it to only
the second alternative, does it work better with both?
No.  It works the same.  When the both alternatives use the hint, the 
scratch pseudo got the class ALL_REGS.  When only the 2nd use the hint, 
the class is VSX_REGS.  As I understand now the preferable alternative 
is the 1st one (with  for the scratch).  In this case using the 
hint only for the 2nd alternative has more sense.

That also avoids a different problem: * won't work as expected.
'*' in IRA skips one constraint character, but  can be "wa", a
two-letter constraint (and we do have an "a" constraint as well,
something wholly different: "wa" means a VSX register, while "a" is an
indexed address).

 case '*':
   /* Ignore the next letter for this pass.  */
   c = *++p;
   break;


I see.  Thanks for pointing this out. Definitely it is better to use the 
hint only for the second alternative ("&*r") then.  Is this solution ok 
for you?

Re: [patch] Fixing ppc64 test failure after patch dealing with scratches in IRA

2020-11-02 Thread Vladimir Makarov via Gcc-patches




On 2020-11-02 6:43 a.m., Segher Boessenkool wrote:

Hi!

On Sun, Nov 01, 2020 at 06:32:02PM -0500, Vladimir Makarov wrote:

On 2020-10-30 7:36 p.m., Segher Boessenkool wrote:

Thanks for the patch!  But it has a problem:


diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 67e4f2fd037..78de85ccbbb 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3717,7 +3717,7 @@
(vec_select:
 (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" ",v")
 (parallel [(match_operand:QI 2 "const_int_operand" "n,n")])))
-   (clobber (match_scratch: 3 "=,&r"))
+   (clobber (match_scratch: 3 "=*,&*r"))
 (clobber (match_scratch:SI 4 "=X,&r"))]
"VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
"#"

You add * to both alternatives here?  I would expect adding it to only
the second alternative, does it work better with both?

No.  It works the same.  When the both alternatives use the hint, the
scratch pseudo got the class ALL_REGS.  When only the 2nd use the hint,
the class is VSX_REGS.  As I understand now the preferable alternative
is the 1st one (with  for the scratch).  In this case using the
hint only for the 2nd alternative has more sense.

That also avoids a different problem: * won't work as expected.
'*' in IRA skips one constraint character, but  can be "wa", a
two-letter constraint (and we do have an "a" constraint as well,
something wholly different: "wa" means a VSX register, while "a" is an
indexed address).

 case '*':
   /* Ignore the next letter for this pass.  */
   c = *++p;
   break;



I see.  Thanks for pointing this out. Definitely it is better to use the
hint only for the second alternative ("&*r") then.  Is this solution ok
for you?

Yes, certainly.  And thanks!


I've just committed the following patch

commit 1c689b827c6a0a5e164f22865696a94e6d7ec308 (HEAD -> master, 
origin/master, origin/HEAD)

Author: Vladimir N. Makarov 
Date:   Mon Nov 2 11:03:54 2020 -0500

    Add hint * too 2nd alternative of the 1st scratch in 
*vsx_extract__store_p9.


    gcc/ChangeLog:

    * config/rs6000/vsx.md (*vsx_extract__store_p9): Add 
hint *

    to 2nd alternative of the 1st scratch.

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 67e4f2fd037..947631d83ee 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3717,7 +3717,7 @@
    (vec_select:
 (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" ",v")
 (parallel [(match_operand:QI 2 "const_int_operand" "n,n")])))
-   (clobber (match_scratch: 3 "=,&r"))
+   (clobber (match_scratch: 3 "=,&*r"))
    (clobber (match_scratch:SI 4 "=X,&r"))]
   "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
   "#"



(Should "*" be changed so that it skips a whole constraint if it can,
instead of only a single char always?)


Yes, it should be changed.  I believe this is a leftover from the time 
when all constraints were just one character.  All hints (modifiers) 
should work on constraint base not the character one.

Re: [PATCH][PR target/97540] Don't extract memory from operand for normal memory constraint.

2020-11-02 Thread Vladimir Makarov via Gcc-patches




On 2020-10-27 2:53 a.m., Hongtao Liu wrote:

Hi:
   For inline asm, there could be an operand like (not (mem:)), it's
not a valid operand for normal memory constraint.
   Bootstrap is ok, regression test is ok for make check
RUNTESTFLAGS="--target_board='unix{-m32,}'"

gcc/ChangeLog
 PR target/97540
 * ira.c: (ira_setup_alts): Extract memory from operand only
 for special memory constraint.
 * recog.c (asm_operand_ok): Ditto.
 * lra-constraints.c (process_alt_operands): MEM_P is
 required for normal memory constraint.

gcc/testsuite/ChangeLog
 * gcc.target/i386/pr97540.c: New test.

I understand Richard's concerns and actually these concerns were my 
motivations to constraint possible cases for extract_mem_from_operand in 
the original patch introducing the function.


If Richard proposes a better solution we will reconsider the current 
approach and revert the changes if it is necessary.


Meanwhile I am approving this patch.  I hope it will not demotivate 
Richard's attempt to find a better solution.

Re: [PATCH][PR target/97532] Fix invalid address for special memory constraint.

2020-11-02 Thread Vladimir Makarov via Gcc-patches




On 2020-10-27 2:52 a.m., Hongtao Liu wrote:

Hi:
   Sorry for Incomplete test for my last patch at
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555948.html.
   This patch should fix invalid address introduced by special memory 
constraint.
   Bootstrap is ok, regression test is ok for make check
RUNTESTFLAGS="--target_board='unix{-m32,}'"

gcc/ChangeLog
 PR target/97532
 * gcc/lra-constraints.c (valid_address_p): Handle operand of
 special memory constraint.
 (process_address_1): Ditto.


OK.  Please, see also comments in my recent email about PR97540.

Re: [committed] patch to deal with insn scratches in global RA

2020-11-02 Thread Vladimir Makarov via Gcc-patches




On 2020-11-02 3:12 p.m., Christophe Lyon wrote:


Hi,

This patch causes ICEs on arm (eg arm-none-linux-gnueabi)
 gcc.c-torture/compile/sync-3.c   -O1  (internal compiler error)
 gcc.c-torture/compile/sync-3.c   -O2  (internal compiler error)
 gcc.c-torture/compile/sync-3.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  (internal compiler error)
 gcc.c-torture/compile/sync-3.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (internal compiler error)
 gcc.c-torture/compile/sync-3.c   -O3 -g  (internal compiler error)
 gcc.c-torture/compile/sync-3.c   -Os  (internal compiler error)

gcc.log says:
FAIL: gcc.c-torture/compile/sync-3.c   -O1  (internal compiler error)
PASS: gcc.c-torture/compile/sync-3.c   -O1   (test for warnings, line )
FAIL: gcc.c-torture/compile/sync-3.c   -O1  (test for excess errors)
Excess errors:
during RTL pass: ira
/gcc/testsuite/gcc.c-torture/compile/sync-3.c:85:1: internal compiler
error: Segmentation fault
0xcf8b1f crash_signal
 /gcc/toplev.c:330
0xaeb0a0 fix_reg_equiv_init
 /gcc/ira.c:2671
0xaf2113 find_moveable_pseudos
 /gcc/ira.c:4874
0xaf48e8 ira
 /gcc/ira.c:5533
0xaf48e8 execute
 /gcc/ira.c:5861



Thank you for sending this info.  I reproduced the crash with x86-64-arm 
cross-compiler although it is absent on native arm environment.  I will 
have a fix tomorrow.




FAIL: gcc.c-torture/compile/sync-3.c   -O2  (internal compiler error)
PASS: gcc.c-torture/compile/sync-3.c   -O2   (test for warnings, line )
FAIL: gcc.c-torture/compile/sync-3.c   -O2  (test for excess errors)
Excess errors:
during RTL pass: ira
/gcc/testsuite/gcc.c-torture/compile/sync-3.c:85:1: internal compiler
error: Segmentation fault
0xcf8b1f crash_signal
 /gcc/toplev.c:330
0xaeb0a9 safe_as_a
 /gcc/is-a.h:210
0xaeb0a9 rtx_insn_list::next() const
 /gcc/rtl.h:1408
0xaeb0a9 fix_reg_equiv_init
 /gcc/ira.c:2683
0xaf2113 find_moveable_pseudos
 /gcc/ira.c:4874
0xaf48e8 ira
 /gcc/ira.c:5533
0xaf48e8 execute
 /gcc/ira.c:5861

Christophe

[committed] patch to fix arm sync-3.c failure after submitting patch to deal with scratches in IRA

2020-11-02 Thread Vladimir Makarov via Gcc-patches

After submitting the patch dealing with insn scratches in IRA, a report 
came that sync-3.c started to fail with x86_64-aarch64 cross-compiler.  
The following patch fixes this problem.  The patch was successfully 
bootstrapped on x86-64.


commit 885cbb4a0a677299de34d9e413818df5bb8272b1
Author: Vladimir N. Makarov 
Date:   Mon Nov 2 16:52:17 2020 -0500

Expand reg_equiv when scratches are removed.

gcc/ChangeLog:

* ira.c (ira_remove_scratches): Rename to remove_scratches.  Make
it static and returning flag of any change.
(ira.c): Call ira_expand_reg_equiv in case of removing scratches.

diff --git a/gcc/ira.c b/gcc/ira.c
index 682d092c2f5..bc94e15a50e 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5214,7 +5214,8 @@ contains_X_constraint_p (const char *str)
   return false;
 }
   
-/* Change INSN's scratches into pseudos and save their location.  */
+/* Change INSN's scratches into pseudos and save their location.
+   Return true if we changed any scratch.  */
 bool
 ira_remove_insn_scratches (rtx_insn *insn, bool all_p, FILE *dump_file,
 			   rtx (*get_reg) (rtx original))
@@ -5245,17 +5246,19 @@ ira_remove_insn_scratches (rtx_insn *insn, bool all_p, FILE *dump_file,
 }
 
 /* Return new register of the same mode as ORIGINAL.  Used in
-   ira_remove_scratches.  */
+   remove_scratches.  */
 static rtx
 get_scratch_reg (rtx original)
 {
   return gen_reg_rtx (GET_MODE (original));
 }
 
-/* Change scratches into pseudos and save their location.  */
-void
-ira_remove_scratches (void)
+/* Change scratches into pseudos and save their location.  Return true
+   if we changed any scratch.  */
+static bool
+remove_scratches (void)
 {
+  bool change_p = false;
   basic_block bb;
   rtx_insn *insn;
 
@@ -5266,8 +5269,12 @@ ira_remove_scratches (void)
 FOR_BB_INSNS (bb, insn)
 if (INSN_P (insn)
 	&& ira_remove_insn_scratches (insn, false, ira_dump_file, get_scratch_reg))
-  /* Because we might use DF, we need to keep DF info up to date.  */
-  df_insn_rescan (insn);
+  {
+	/* Because we might use DF, we need to keep DF info up to date.  */
+	df_insn_rescan (insn);
+	change_p = true;
+  }
+  return change_p;
 }
 
 /* Changes pseudos created by function remove_scratches onto scratches.	 */
@@ -5514,8 +5521,8 @@ ira (FILE *f)
   end_alias_analysis ();
   free (reg_equiv);
 
-  if (ira_use_lra_p)
-ira_remove_scratches ();
+  if (ira_use_lra_p && remove_scratches ())
+ira_expand_reg_equiv ();
 
   if (resize_reg_info () && flag_ira_loop_pressure)
 ira_set_pseudo_classes (true, ira_dump_file);

Re: [committed] patch to deal with insn scratches in global RA

2020-11-02 Thread Vladimir Makarov via Gcc-patches




On 2020-11-02 4:30 p.m., Vladimir Makarov via Gcc-patches wrote:


On 2020-11-02 3:12 p.m., Christophe Lyon wrote:


Hi,

This patch causes ICEs on arm (eg arm-none-linux-gnueabi)
 gcc.c-torture/compile/sync-3.c   -O1  (internal compiler error)
 gcc.c-torture/compile/sync-3.c   -O2  (internal compiler error)
 gcc.c-torture/compile/sync-3.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  (internal compiler error)
 gcc.c-torture/compile/sync-3.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (internal compiler error)
 gcc.c-torture/compile/sync-3.c   -O3 -g  (internal compiler error)
 gcc.c-torture/compile/sync-3.c   -Os  (internal compiler error)

gcc.log says:
FAIL: gcc.c-torture/compile/sync-3.c   -O1  (internal compiler error)
PASS: gcc.c-torture/compile/sync-3.c   -O1   (test for warnings, line )
FAIL: gcc.c-torture/compile/sync-3.c   -O1  (test for excess errors)
Excess errors:
during RTL pass: ira
/gcc/testsuite/gcc.c-torture/compile/sync-3.c:85:1: internal compiler
error: Segmentation fault
0xcf8b1f crash_signal
 /gcc/toplev.c:330
0xaeb0a0 fix_reg_equiv_init
 /gcc/ira.c:2671
0xaf2113 find_moveable_pseudos
 /gcc/ira.c:4874
0xaf48e8 ira
 /gcc/ira.c:5533
0xaf48e8 execute
 /gcc/ira.c:5861



Thank you for sending this info.  I reproduced the crash with 
x86-64-arm cross-compiler although it is absent on native arm 
environment.  I will have a fix tomorrow.




I've fixed it.

Re: [PATCH]ira: recompute regstat as max_regno changes [PR97705]

2020-11-06 Thread Vladimir Makarov via Gcc-patches




On 2020-11-06 1:15 a.m., Kewen.Lin wrote:

Hi,

As PR97705 shows, my commit r11-4637 caused some dumping
comparison difference error on pass ira.  It exposed one
issue about the newly introduced function remove_scratches,
which can increase the largest pseudo reg number if it
succeeds, later some function will use the max_reg_num()
to get the latest max_regno, when iterating the numbers
we can access some data structures which are allocated as
the previous max_regno, some out of array bound accesses
can occur, the failure can be random since the values
beyond the array could be random.

This patch is to free/reinit/recompute the relevant data
structures that is regstat_n_sets_and_refs and reg_info_p
to ensure we won't access beyond some array bounds.

Bootstrapped/regtested on powerpc64le-linux-gnu P9 and
powerpc64-linux-gnu P8.

Any thoughts?  Is it a reasonable fix?

Sure, Kewen.  A bit unexpected to see lambda to use for this but I 
checked and found couple places in GCC where lambdas are already used.


The patch is ok.  Please, commit it to the mainline.

Thank you for the patch.

[PATCH] Implementation of asm goto outputs

2020-11-12 Thread Vladimir Makarov via Gcc-patches


  The following patch implements asm goto with outputs.  Kernel
developers several times expressed wish to have this feature. Asm
goto with outputs was implemented in LLVM recently.  This new feature
was presented on 2020 linux plumbers conference
(https://linuxplumbersconf.org/event/7/contributions/801/attachments/659/1212/asm_goto_w__Outputs.pdf)
and 2020 LLVM conference
(https://www.youtube.com/watch?v=vcPD490s-hE).

  The patch permits to use outputs in asm gotos only when LRA is used.
It is problematic to implement it in the old reload pass.  To be
honest it was hard to implement it in LRA too until global live info
update was added to LRA few years ago.

  Different from LLVM asm goto output implementation, you can use
outputs on any path from the asm goto (not only on fallthrough path as
in LLVM).

  The patch removes critical edges on which potentially asm output
reloads could occur (it means you can have several asm gotos using the
same labels and the same outputs).  It is done in IRA as it is
difficult to create new BBs in LRA.  The most of the work (placement
of output reloads in BB destinations of asm goto basic block) is done in
LRA.  When it happens, LRA updates global live info to reflect that
new pseudos live on the BB borders and the old ones do not live there
anymore.

  I tried also approach to split live ranges of pseudos involved in
asm goto outputs to guarantee they get hard registers in IRA. But
this approach did not work as it is difficult to keep this assignment
through all LRA. Also probably it would result in worse code as move
insn coalescing is not guaranteed.

  Asm goto with outputs will not work for targets which were not
converted to LRA (probably some outdated targets as the old reload
pass is not supported anymore).  An error will be generated when the
old reload pass meets asm goto with an output.  A precaution is taken
not to crash compiler after this error.

  The patch is pretty small as all necessary infrastructure was
already implemented, practically in all compiler pipeline.  It did not
required adding new RTL insns opposite to what Google engineers did to
LLVM MIR.

  The patch could be also useful for implementing jump insns with
output reloads in the future (e.g. branch and count insns).

  I think asm gotos with outputs should be considered as an experimental
feature as there are no real usage of this yet.  Earlier adoption of
this feature could help with debugging and hardening the
implementation.

  The patch was successfully bootstrapped and tested on x86-64, ppc64, 
and aarch64.


Are non-RA changes ok in the patch?

2020-11-12  Vladimir Makarov 

    * c/c-parser.c (c_parser_asm_statement): Parse outputs for asm
    goto too.
    * c/c-typeck.c (build_asm_expr): Remove an assert checking output
    absence for asm goto.
    * cfgexpand.c (expand_asm_stmt): Output asm goto with outputs too.
    Place insns after asm goto on edges.
    * cp/parser.c (cp_parser_asm_definition): Parse outputs for asm
    goto too.
    * doc/extend.texi: Reflect the changes in asm goto documentation.
    * gcc/gimple.c (gimple_build_asm_1): Remove an assert checking 
output

    absence for asm goto.
    * gimple.h (gimple_asm_label_op, gimple_asm_set_label_op): Take
    possible asm goto outputs into account.
    * ira.c (ira): Remove critical edges for potential asm goto output
    reloads.
    (ira_nullify_asm_goto): New function.
    * ira.h (ira_nullify_asm_goto): New prototype.
    * lra-assigns.c (lra_split_hard_reg_for): Use ira_nullify_asm_goto.
    Check that splitting is done inside a basic block.
    * lra-constraints.c (curr_insn_transform): Permit output reloads
    for any jump insn.
    * lra-spills.c (lra_final_code_change): Remove USEs added in 
ira for asm gotos.

    * lra.c (lra_process_new_insns): Place output reload insns after
    jumps in the beginning of destination BBs.
    * reload.c (find_reloads): Report error for asm gotos with
    outputs.  Modify them to keep CFG consistency to avoid crashes.
    * tree-into-ssa.c (rewrite_stmt): Don't put debug stmt after asm
    goto.


2020-11-12  Vladimir Makarov  

    * c-c++-common/asmgoto-2.c: Permit output in asm goto.
    * gcc.c-torture/compile/asmgoto-[2345].c: New tests.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index ecc3d2119fa..db719fad58c 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7144,10 +7144,7 @@ c_parser_asm_statement (c_parser *parser)
 	switch (section)
 	  {
 	  case 0:
-	/* For asm goto, we don't allow output operands, but reserve
-	   the slot for a future extension that does allow them.  */
-	if (!is_goto)
-	  outputs = c_parser_asm_operands (parser);
+	outputs = c_parser_asm_operands (parser);
 	break;
 	  case 1:
 	inputs = c_parser_asm_operands (parser);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 968403

Re: [PATCH] Implementation of asm goto outputs

2020-11-13 Thread Vladimir Makarov via Gcc-patches




On 2020-11-13 4:00 a.m., Richard Biener wrote:

On Thu, Nov 12, 2020 at 8:55 PM Vladimir Makarov via Gcc-patches
 wrote:

The following patch implements asm goto with outputs.  Kernel
developers several times expressed wish to have this feature. Asm
goto with outputs was implemented in LLVM recently.  This new feature
was presented on 2020 linux plumbers conference
(https://linuxplumbersconf.org/event/7/contributions/801/attachments/659/1212/asm_goto_w__Outputs.pdf)
and 2020 LLVM conference
(https://www.youtube.com/watch?v=vcPD490s-hE).

The patch permits to use outputs in asm gotos only when LRA is used.
It is problematic to implement it in the old reload pass.  To be
honest it was hard to implement it in LRA too until global live info
update was added to LRA few years ago.

Different from LLVM asm goto output implementation, you can use
outputs on any path from the asm goto (not only on fallthrough path as
in LLVM).

The patch removes critical edges on which potentially asm output
reloads could occur (it means you can have several asm gotos using the
same labels and the same outputs).  It is done in IRA as it is
difficult to create new BBs in LRA.  The most of the work (placement
of output reloads in BB destinations of asm goto basic block) is done in
LRA.  When it happens, LRA updates global live info to reflect that
new pseudos live on the BB borders and the old ones do not live there
anymore.

I tried also approach to split live ranges of pseudos involved in
asm goto outputs to guarantee they get hard registers in IRA. But
this approach did not work as it is difficult to keep this assignment
through all LRA. Also probably it would result in worse code as move
insn coalescing is not guaranteed.

Asm goto with outputs will not work for targets which were not
converted to LRA (probably some outdated targets as the old reload
pass is not supported anymore).  An error will be generated when the
old reload pass meets asm goto with an output.  A precaution is taken
not to crash compiler after this error.

The patch is pretty small as all necessary infrastructure was
already implemented, practically in all compiler pipeline.  It did not
required adding new RTL insns opposite to what Google engineers did to
LLVM MIR.

The patch could be also useful for implementing jump insns with
output reloads in the future (e.g. branch and count insns).

I think asm gotos with outputs should be considered as an experimental
feature as there are no real usage of this yet.  Earlier adoption of
this feature could help with debugging and hardening the
implementation.

The patch was successfully bootstrapped and tested on x86-64, ppc64,
and aarch64.

Are non-RA changes ok in the patch?

Minor nit for the RA parts:

+  if (i < recog_data.n_operands)
+   {
+ error_for_asm (insn,
+"old reload pass does not support asm goto "
+"with outputs in %");
+ ira_nullify_asm_goto (insn);

I'd say "the target does not support ...", the user shouldn't be concerned
about a thing called "reload".


Yes, it has sense.  A regular user hardly knows our internal kitchen.

diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c
index 1493b323956..9be8e295627 100644
--- a/gcc/tree-into-ssa.c
+++ b/gcc/tree-into-ssa.c
@@ -1412,6 +1412,11 @@ rewrite_stmt (gimple_stmt_iterator *si)
 SET_DEF (def_p, name);
 register_new_def (DEF_FROM_PTR (def_p), var);

+   /* Do not insert debug stmt after asm goto: */
+   if (gimple_code (stmt) == GIMPLE_ASM
+   && gimple_asm_nlabels (as_a  (stmt)) > 0)
+ continue;
+

why?  Ah, the next line explains.  I guess it's better done as

/* Do not insert debug stmts if the stmt ends the BB.  */
if (stmt_ends_bb_p (stmt))
  continue;


Richard, thank you for your review.  I am not familiar well with the 
middle-end.  So your comments are really useful.



I wonder why the code never ran into issues for calls that throw
internal ...



I have no idea.  But I really ran into this problem when I tested asm 
goto with outputs.



You have plenty compile testcases but not a single execute one.
So - does it actually work? ;)
Yes, it works.  Two tests actually produces output reloads at least on 
x86-64.  As for execution tests, it is difficult to write for me 
something meaningful, especially with generated output reloads. I'll try 
to add execution tests too.

Otherwise OK.



Richard, thank you again for your quick review. I'll update the patch 
according to your proposals, test it again and commit it.




2020-11-12  Vladimir Makarov 

  * c/c-parser.c (c_parser_asm_statement): Parse outputs for asm
  goto too.
  * c/c-typeck.c (build_asm_expr): Remove an assert checking output
  absence for asm goto.

I'm sure this will be rejected by the commit

Re: [PATCH] Implementation of asm goto outputs

2020-11-13 Thread Vladimir Makarov via Gcc-patches




On 2020-11-13 10:51 a.m., Uros Bizjak wrote:

diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-4.c
b/gcc/testsuite/gcc.c-torture/compile/asmgoto-4.c
new file mode 100644
index 000..8685ca2a1cb
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-4.c
@@ -0,0 +1,14 @@
+/* Check that LRA really puts output reloads for p4 in two successors blocks */
+/* { dg-do compile { target x86_64-*-* } } */

Please use:

/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && { ! ia32 } } } } */

to correctly select 64bit x86 targets.

diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c
b/gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c
new file mode 100644
index 000..57359192f62
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c
@@ -0,0 +1,56 @@
+/* Test to generate output reload in asm goto on x86_64.  */
+/* { dg-do compile } */
+/* { dg-skip-if "no O0" { x86_64-*-* } { "-O0" } { "" } } */

Same here:

+/* { dg-skip-if "no O0" { { i?86-*-* x86_64-*-* } && { ! ia32 } } {
"-O0" } { "" } } */


OK. Thank you, Uros.  I've changed these tests.

[COMMITTED] Implementation of asm goto outputs

2020-11-13 Thread Vladimir Makarov via Gcc-patches

The original patch has been modified according to the reviewers comments 
and the following patch has been committed.



commit e3b3b59683c1e7d31a9d313dd97394abebf644be
Author: Vladimir N. Makarov 
Date:   Fri Nov 13 12:45:59 2020 -0500

[PATCH] Implementation of asm goto outputs

gcc/
* cfgexpand.c (expand_asm_stmt): Output asm goto with outputs too.
Place insns after asm goto on edges.
* doc/extend.texi: Reflect the changes in asm goto documentation.
* gimple.c (gimple_build_asm_1): Remove an assert checking output
absence for asm goto.
* gimple.h (gimple_asm_label_op, gimple_asm_set_label_op): Take
possible asm goto outputs into account.
* ira.c (ira): Remove critical edges for potential asm goto output
reloads.
(ira_nullify_asm_goto): New function.
* ira.h (ira_nullify_asm_goto): New prototype.
* lra-assigns.c (lra_split_hard_reg_for): Use ira_nullify_asm_goto.
Check that splitting is done inside a basic block.
* lra-constraints.c (curr_insn_transform): Permit output reloads
for any jump insn.
* lra-spills.c (lra_final_code_change): Remove USEs added in ira
for asm gotos.
* lra.c (lra_process_new_insns): Place output reload insns after
jumps in the beginning of destination BBs.
* reload.c (find_reloads): Report error for asm gotos with
outputs.  Modify them to keep CFG consistency to avoid crashes.
* tree-into-ssa.c (rewrite_stmt): Don't put debug stmt after asm
goto.

gcc/c/
* c-parser.c (c_parser_asm_statement): Parse outputs for asm
goto too.
* c-typeck.c (build_asm_expr): Remove an assert checking output
absence for asm goto.

gcc/cp
* parser.c (cp_parser_asm_definition): Parse outputs for asm
goto too.

gcc/testsuite/
* c-c++-common/asmgoto-2.c: Permit output in asm goto.
* gcc.c-torture/compile/asmgoto-2.c: New.
* gcc.c-torture/compile/asmgoto-3.c: New.
* gcc.c-torture/compile/asmgoto-4.c: New.
* gcc.c-torture/compile/asmgoto-5.c: New.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index f4c4cf7bf8f..7540a15d65d 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7144,10 +7144,7 @@ c_parser_asm_statement (c_parser *parser)
 	switch (section)
 	  {
 	  case 0:
-	/* For asm goto, we don't allow output operands, but reserve
-	   the slot for a future extension that does allow them.  */
-	if (!is_goto)
-	  outputs = c_parser_asm_operands (parser);
+	outputs = c_parser_asm_operands (parser);
 	break;
 	  case 1:
 	inputs = c_parser_asm_operands (parser);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 26a5f7128d2..413109c916c 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -10666,10 +10666,6 @@ build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
   TREE_VALUE (tail) = input;
 }
 
-  /* ASMs with labels cannot have outputs.  This should have been
- enforced by the parser.  */
-  gcc_assert (outputs == NULL || labels == NULL);
-
   args = build_stmt (loc, ASM_EXPR, string, outputs, inputs, clobbers, labels);
 
   /* asm statements without outputs, including simple ones, are treated
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1b7bdbc15be..1df6f4bc55a 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3371,20 +3371,21 @@ expand_asm_stmt (gasm *stmt)
 			   ARGVEC CONSTRAINTS OPNAMES))
  If there is more than one, put them inside a PARALLEL.  */
 
-  if (nlabels > 0 && nclobbers == 0)
-{
-  gcc_assert (noutputs == 0);
-  emit_jump_insn (body);
-}
-  else if (noutputs == 0 && nclobbers == 0)
+  if (noutputs == 0 && nclobbers == 0)
 {
   /* No output operands: put in a raw ASM_OPERANDS rtx.  */
-  emit_insn (body);
+  if (nlabels > 0)
+	emit_jump_insn (body);
+  else
+	emit_insn (body);
 }
   else if (noutputs == 1 && nclobbers == 0)
 {
   ASM_OPERANDS_OUTPUT_CONSTRAINT (body) = constraints[0];
-  emit_insn (gen_rtx_SET (output_rvec[0], body));
+  if (nlabels > 0)
+	emit_jump_insn (gen_rtx_SET (output_rvec[0], body));
+  else 
+	emit_insn (gen_rtx_SET (output_rvec[0], body));
 }
   else
 {
@@ -3461,7 +3462,27 @@ expand_asm_stmt (gasm *stmt)
   if (after_md_seq)
 emit_insn (after_md_seq);
   if (after_rtl_seq)
-emit_insn (after_rtl_seq);
+{
+  if (nlabels == 0)
+	emit_insn (after_rtl_seq);
+  else
+	{
+	  edge e;
+	  edge_iterator ei;
+	  
+	  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+	{
+	  start_sequence ();
+	  for (rtx_insn *curr = after_rtl_seq;
+		   curr != NULL_RTX;
+		   curr = NEXT_INSN (curr))
+		emit_insn (copy_insn (PATTERN (curr)));
+	  rtx_insn *cop

[Committed] patch fixing LRA crash on s390x

2020-11-15 Thread Vladimir Makarov via Gcc-patches

My last patch implementing output reloads in asm goto resulted in LRA 
crash in compiling kernel on s390x.  Jeff Law reported it recently.  The 
culprit was in incorrect emitting reload insns in last empty BB.  The 
emitted insns got null BB which is wrong. Actually in this case we do 
not need to emit such insns as they will removed as dead lately.


The following patch fixes the problem.



Author: Vladimir N. Makarov 
Date:   Sun Nov 15 11:22:19 2020 -0500

Do not put reload insns in the last empty BB.

gcc/
* lra.c (lra_process_new_insns): Don't put reload insns in the
last empty BB.

diff --git a/gcc/lra.c b/gcc/lra.c
index 673554d0a4b..b318cfd7456 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -1903,15 +1903,23 @@ lra_process_new_insns (rtx_insn *insn, rtx_insn 
*before, rtx_insn *after,
  {
/* We already made the edge no-critical in ira.c::ira */
lra_assert (!EDGE_CRITICAL_P (e));
-   rtx_insn *tmp = BB_HEAD (e->dest);
+   rtx_insn *curr, *tmp = BB_HEAD (e->dest);
if (LABEL_P (tmp))
  tmp = NEXT_INSN (tmp);
if (NOTE_INSN_BASIC_BLOCK_P (tmp))
  tmp = NEXT_INSN (tmp);
-   start_sequence ();
-   for (rtx_insn *curr = after;
-curr != NULL_RTX;
+   for (curr = tmp;
+curr != NULL
+  && (!INSN_P (curr) || BLOCK_FOR_INSN (curr) == e->dest);
 curr = NEXT_INSN (curr))
+ ;
+   /* Do not put reload insns if it is the last BB
+  without actual insns.  In this case the reload insns
+  can get null BB after emitting.  */
+   if (curr == NULL)
+ continue;
+   start_sequence ();
+   for (curr = after; curr != NULL_RTX; curr = NEXT_INSN (curr))
  emit_insn (copy_insn (PATTERN (curr)));
rtx_insn *copy = get_insns (), *last = get_last_insn ();
end_sequence ();

[pushed] IRA: Make profitability calculation of RA conflict presentations independent of host compiler type sizes of RA conflict presentations independent of host compiler type sizes [PR102147]

2021-09-24 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102147

The patch was successfully bootstrapped and tested on x86-64.


commit ec4c30b64942e615b4bb4b9761cd3b2635158608 (HEAD -> master)
Author: Vladimir N. Makarov 
Date:   Fri Sep 24 10:06:45 2021 -0400

    Make profitability calculation of RA conflict presentations 
independent of host compiler type sizes. [PR102147]


    gcc/ChangeLog:

    2021-09-24  Vladimir Makarov  

    PR rtl-optimization/102147
    * ira-build.c (ira_conflict_vector_profitable_p): Make
    profitability calculation independent of host compiler 
pointer and

    IRA_INT_BITS sizes.

diff --git a/gcc/ira-build.c b/gcc/ira-build.c
index 42120656366..2a30efc4f2f 100644
--- a/gcc/ira-build.c
+++ b/gcc/ira-build.c
@@ -629,7 +629,7 @@ ior_hard_reg_conflicts (ira_allocno_t a, 
const_hard_reg_set set)

 bool
 ira_conflict_vector_profitable_p (ira_object_t obj, int num)
 {
-  int nw;
+  int nbytes;
   int max = OBJECT_MAX (obj);
   int min = OBJECT_MIN (obj);

@@ -638,9 +638,14 @@ ira_conflict_vector_profitable_p (ira_object_t obj, 
int num)

    in allocation.  */
 return false;

-  nw = (max - min + IRA_INT_BITS) / IRA_INT_BITS;
-  return (2 * sizeof (ira_object_t) * (num + 1)
- < 3 * nw * sizeof (IRA_INT_TYPE));
+  nbytes = (max - min) / 8 + 1;
+  STATIC_ASSERT (sizeof (ira_object_t) <= 8);
+  /* Don't use sizeof (ira_object_t), use constant 8.  Size of 
ira_object_t (a

+ pointer) is different on 32-bit and 64-bit targets.  Usage sizeof
+ (ira_object_t) can result in different code generation by GCC 
built as 32-
+ and 64-bit program.  In any case the profitability is just an 
estimation

+ and border cases are rare.  */
+  return (2 * 8 /* sizeof (ira_object_t) */ * (num + 1) < 3 * nbytes);
 }

 /* Allocates and initialize the conflict vector of OBJ for NUM

Re: [backport gcc10, gcc9] Requet to backport PR97969

2021-05-31 Thread Vladimir Makarov via Gcc-patches




On 2021-05-25 5:14 a.m., Przemyslaw Wirkus wrote:

Hi,
Just a follow up after GCC 11 release.

I've backported to gcc-10 branch (without any change to original patches)
PR97969 and following PR98722 & PR98777 patches.

Commits apply cleanly without changes.
Built and regression tested on:
* arm-none-eabi and
* aarch64-none-linux-gnu cross toolchains.

There were no issues and no regressions (all OK).

OK for backport to gcc-10 branch ?


Sorry for delay with the answer due to my vacation.

As the patches did not introduce new PRs I believe they are ok for gcc-10.

Thank you.



Kind regards,
Przemyslaw Wirkus

---
commits I've backported:

commit cf2ac1c30af0fa783c8d72e527904dda5d8cc330
Author: Vladimir N. Makarov 
Date:   Tue Jan 12 11:26:15 2021 -0500

 [PR97969] LRA: Transform pattern `plus (plus (hard reg, const), pseudo)` 
after elimination

commit 4334b524274203125193a08a8485250c41c2daa9
Author: Vladimir N. Makarov 
Date:   Wed Jan 20 11:40:14 2021 -0500

 [PR98722] LRA: Check that target has no 3-op add insn to transform 2 plus 
expression.

commit 68ba1039c7daf0485b167fe199ed7e8031158091
Author: Vladimir N. Makarov 
Date:   Thu Jan 21 17:27:01 2021 -0500

 [PR98777] LRA: Use preliminary created pseudo for in LRA elimination 
subpass

$ ./contrib/git-backport.py cf2ac1c30af0fa783c8d72e527904dda5d8cc330
$ ./contrib/git-backport.py 4334b524274203125193a08a8485250c41c2daa9
$ ./contrib/git-backport.py 68ba1039c7daf0485b167fe199ed7e8031158091



Richard.

[committed] LRA: [PR102627] Use at least natural mode during splitting hard reg live range

2021-10-08 Thread Vladimir Makarov via Gcc-patches


The following patch fixes

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102627

The patch was successfully bootstrapped and tested on x86-64.


commit fab2d977e69539aad9bef81caff17de48e53aedf (HEAD -> master)
Author: Vladimir N. Makarov 
Date:   Fri Oct 8 10:16:09 2021 -0400

[PR102627] Use at least natural mode during splitting hard reg live range

In the PR test case SImode was used to split live range of cx on x86-64
because it was the biggest mode for this hard reg in the function.  But
all 64-bits of cx contain structure members.  We need always to use at least
natural mode of hard reg in splitting to fix this problem.

gcc/ChangeLog:

PR rtl-optimization/102627
* lra-constraints.c (split_reg): Use at least natural mode of hard reg.

gcc/testsuite/ChangeLog:

PR rtl-optimization/102627
* gcc.target/i386/pr102627.c: New test.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 4d734548c38..8f75125fc2e 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5799,11 +5799,12 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn,
 	 part of a multi-word register.  In that case, just use the reg_rtx
 	 mode.  Do the same also if the biggest mode was larger than a register
 	 or we can not compare the modes.  Otherwise, limit the size to that of
-	 the biggest access in the function.  */
+	 the biggest access in the function or to the natural mode at least.  */
   if (mode == VOIDmode
 	  || !ordered_p (GET_MODE_PRECISION (mode),
 			 GET_MODE_PRECISION (reg_rtx_mode))
-	  || paradoxical_subreg_p (mode, reg_rtx_mode))
+	  || paradoxical_subreg_p (mode, reg_rtx_mode)
+	  || maybe_gt (GET_MODE_PRECISION (reg_rtx_mode), GET_MODE_PRECISION (mode)))
 	{
 	  original_reg = regno_reg_rtx[hard_regno];
 	  mode = reg_rtx_mode;
diff --git a/gcc/testsuite/gcc.target/i386/pr102627.c b/gcc/testsuite/gcc.target/i386/pr102627.c
new file mode 100644
index 000..8ab9acaf002
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr102627.c
@@ -0,0 +1,41 @@
+/* PR rtl-optimization/102627 */
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+int a, f, l, m, q, c, d, g;
+long b, e;
+struct g {
+  signed h;
+  signed i;
+  unsigned j;
+  unsigned k;
+};
+unsigned n;
+char o;
+int *p = &m;
+long r(int s) { return s && b ?: b; }
+long __attribute__((noipa)) v() {
+  l = 0 || r(n & o);
+  return q;
+}
+void w(int, unsigned, struct g x) {
+  c ?: a;
+  for (; d < 2; d++)
+*p = x.k;
+}
+struct g __attribute__((noipa)) y() {
+  struct g h = {3, 908, 1, 20};
+  for (; g; g++)
+;
+  return h;
+}
+int main() {
+  long t;
+  struct g u = y();
+  t = e << f;
+  w(0, t, u);
+  v(0, 4, 4, 4);
+  if (m != 20)
+__builtin_abort ();
+  return 0;
+}

[committed] [PR102842] LRA: Consider all outputs in generation of matching reloads

2021-10-26 Thread Vladimir Makarov via Gcc-patches


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102842

As the patch touches a sensitive LRA code, the patch was bootstrapped 
tested on x86-64, aarch64, and ppc64.


I've committed the patch only in master branch.  Later (after some 
observation), I'll commit it into gcc-10 and gcc-11 branches.


commit 8c59f4118357789cfa8df2cf0d3ecb61be7e9041
Author: Vladimir N. Makarov 
Date:   Tue Oct 26 14:03:42 2021 -0400

[PR102842] Consider all outputs in generation of matching reloads

Without considering all output insn operands (not only processed
before), in rare cases LRA can use the same hard register for
different outputs of the insn on different assignment subpasses.  The
patch fixes the problem.

gcc/ChangeLog:

PR rtl-optimization/102842
* lra-constraints.c (match_reload): Ignore out in checking values
of outs.
(curr_insn_transform): Collect outputs before doing reloads of operands.

gcc/testsuite/ChangeLog:

PR rtl-optimization/102842
* g++.target/arm/pr102842.C: New test.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 8f75125fc2e..0195b4fb9c3 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1102,7 +1102,7 @@ match_reload (signed char out, signed char *ins, signed char *outs,
 	  for (i = 0; outs[i] >= 0; i++)
 	{
 	  rtx other_out_rtx = *curr_id->operand_loc[outs[i]];
-	  if (REG_P (other_out_rtx)
+	  if (outs[i] != out && REG_P (other_out_rtx)
 		  && (regno_val_use_in (REGNO (in_rtx), other_out_rtx)
 		  != NULL_RTX))
 		{
@@ -4382,7 +4382,10 @@ curr_insn_transform (bool check_only_p)
   }
 
   n_outputs = 0;
-  outputs[0] = -1;
+  for (i = 0; i < n_operands; i++)
+if (curr_static_id->operand[i].type == OP_OUT)
+  outputs[n_outputs++] = i;
+  outputs[n_outputs] = -1;
   for (i = 0; i < n_operands; i++)
 {
   int regno;
@@ -4457,8 +4460,6 @@ curr_insn_transform (bool check_only_p)
 		 lra-lives.c.  */
 		  match_reload (i, goal_alt_matched[i], outputs, goal_alt[i], &before,
 &after, TRUE);
-		  outputs[n_outputs++] = i;
-		  outputs[n_outputs] = -1;
 		}
 	  continue;
 	}
@@ -4636,14 +4637,6 @@ curr_insn_transform (bool check_only_p)
 	   process_alt_operands decides that it is possible.  */
 	gcc_unreachable ();
 
-  /* Memorise processed outputs so that output remaining to be processed
-	 can avoid using the same register value (see match_reload).  */
-  if (curr_static_id->operand[i].type == OP_OUT)
-	{
-	  outputs[n_outputs++] = i;
-	  outputs[n_outputs] = -1;
-	}
-
   if (optional_p)
 	{
 	  rtx reg = op;
diff --git a/gcc/testsuite/g++.target/arm/pr102842.C b/gcc/testsuite/g++.target/arm/pr102842.C
new file mode 100644
index 000..a2bac66091a
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/pr102842.C
@@ -0,0 +1,30 @@
+/* PR rtl-optimization/102842 */
+/* { dg-do compile } */
+/* { dg-options "-fPIC  -O2 -fno-omit-frame-pointer -mthumb -march=armv7-a+fp" } */
+
+struct Plane {
+  using T = float;
+  T *Row();
+};
+using ImageF = Plane;
+long long Mirror_x;
+struct EnsurePaddingInPlaceRowByRow {
+  void Process() {
+switch (strategy_) {
+case kSlow:
+  float *row = img_.Row();
+  long long xsize = x1_;
+  while (Mirror_x >= xsize)
+if (Mirror_x)
+  Mirror_x = 2 * xsize - 1;
+  *row = Mirror_x;
+}
+  }
+  ImageF img_;
+  unsigned x1_;
+  enum { kSlow } strategy_;
+};
+void FinalizeImageRect() {
+  EnsurePaddingInPlaceRowByRow ensure_padding;
+  ensure_padding.Process();
+}

Re: [PATCH v4] ira: Support more matching constraint forms with param [PR100328]

2021-07-05 Thread Vladimir Makarov via Gcc-patches




On 2021-07-01 10:11 p.m., Kewen.Lin wrote:

Hi Vladimir,

on 2021/6/30 下午11:24, Vladimir Makarov wrote:


Many thanks for your review!  I've updated the patch according to your comments 
and also polished some comments and document words a bit.  Does it look better 
to you?

Sorry for the delay with the answer.  The patch is better for me now and 
can be committed into the trunk.


Thanks again for working on this performance issue.

[committed] [PR90706] IRA: Check that reg classes contain a hard reg of given mode in reg move cost calculation

2022-12-15 Thread Vladimir Makarov via Gcc-patches


The following patch solves a spill problem for

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

There are still redundant moves which should be removed to solve PR. 
I'll continue my work on this in Jan.



commit 12abd5a7d13209f79664ea603b3f3517f71b8c4f
Author: Vladimir N. Makarov 
Date:   Thu Dec 15 14:11:05 2022 -0500

IRA: Check that reg classes contain a hard reg of given mode in reg move cost calculation

IRA calculates wrong AVR costs for moving general hard regs of SFmode.  To
calculate the costs we did not exclude sub-classes which do not contain
hard regs of given mode.  This was the reason for spilling a pseudo in the
PR. The patch fixes this.

PR rtl-optimization/90706

gcc/ChangeLog:

* ira-costs.cc: Include print-rtl.h.
(record_reg_classes, scan_one_insn): Add code to print debug info.
* ira.cc (ira_init_register_move_cost): Check that at least one hard
reg of the mode are in the class contents to calculate the
register move costs.

gcc/testsuite/ChangeLog:

* gcc.target/avr/pr90706.c: New.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index 964c94a06ef..732a0edd4c1 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ira-int.h"
 #include "addresses.h"
 #include "reload.h"
+#include "print-rtl.h"
 
 /* The flags is set up every time when we calculate pseudo register
classes through function ira_set_pseudo_classes.  */
@@ -503,6 +504,18 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
   int insn_allows_mem[MAX_RECOG_OPERANDS];
   move_table *move_in_cost, *move_out_cost;
   short (*mem_cost)[2];
+  const char *p;
+
+  if (ira_dump_file != NULL && internal_flag_ira_verbose > 5)
+{
+  fprintf (ira_dump_file, "Processing insn %u", INSN_UID (insn));
+  if (INSN_CODE (insn) >= 0
+	  && (p = get_insn_name (INSN_CODE (insn))) != NULL)
+	fprintf (ira_dump_file, " {%s}", p);
+  fprintf (ira_dump_file, " (freq=%d)\n",
+	   REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn)));
+  dump_insn_slim (ira_dump_file, insn);
+  }
 
   for (i = 0; i < n_ops; i++)
 insn_allows_mem[i] = 0;
@@ -526,6 +539,21 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
 	  continue;
 	}
 
+  if (ira_dump_file != NULL && internal_flag_ira_verbose > 5)
+	{
+	  fprintf (ira_dump_file, "  Alt %d:", alt);
+	  for (i = 0; i < n_ops; i++)
+	{
+	  p = constraints[i];
+	  if (*p == '\0')
+		continue;
+	  fprintf (ira_dump_file, "  (%d) ", i);
+	  for (; *p != '\0' && *p != ',' && *p != '#'; p++)
+		fputc (*p, ira_dump_file);
+	}
+	  fprintf (ira_dump_file, "\n");
+	}
+  
   for (i = 0; i < n_ops; i++)
 	{
 	  unsigned char c;
@@ -593,12 +621,16 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
 		 register, this alternative can't be used.  */
 
 		  if (classes[j] == NO_REGS)
-		alt_fail = 1;
-		  /* Otherwise, add to the cost of this alternative
-		 the cost to copy the other operand to the hard
-		 register used for this operand.  */
+		{
+		  alt_fail = 1;
+		}
 		  else
-		alt_cost += copy_cost (ops[j], mode, classes[j], 1, NULL);
+		/* Otherwise, add to the cost of this alternative the cost
+		   to copy the other operand to the hard register used for
+		   this operand.  */
+		{
+		  alt_cost += copy_cost (ops[j], mode, classes[j], 1, NULL);
+		}
 		}
 	  else
 		{
@@ -1021,18 +1053,45 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
   for (i = 0; i < n_ops; i++)
 	if (REG_P (ops[i]) && REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER)
 	  {
+	int old_cost;
+	bool cost_change_p = false;
 	struct costs *pp = op_costs[i], *qq = this_op_costs[i];
 	int *pp_costs = pp->cost, *qq_costs = qq->cost;
 	int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);
 	cost_classes_t cost_classes_ptr
 	  = regno_cost_classes[REGNO (ops[i])];
 
-	pp->mem_cost = MIN (pp->mem_cost,
+	old_cost = pp->mem_cost;
+	pp->mem_cost = MIN (old_cost,
 (qq->mem_cost + op_cost_add) * scale);
 
+	if (ira_dump_file != NULL && internal_flag_ira_verbose > 5
+		&& pp->mem_cost < old_cost)
+	  {
+		cost_change_p = true;
+		fprintf (ira_dump_file, "op %d(r=%u) new costs MEM:%d",
+			 i, REGNO(ops[i]), pp->mem_cost);
+	  }
 	for (k = cost_classes_ptr->num - 1; k >= 0; k--)
-	  pp_costs[k]
-		= MIN (pp_costs[k], (qq_costs[k] + op_cost_add) * scale);
+	  {
+		old_cost = pp_costs[k];
+		pp_costs[k]
+		  = MIN (old_cost, (qq_costs[k] + op_cost_add) * scale);
+		if (ira_dump_file != NULL && internal_flag_ira_verbose > 5
+		&& pp_costs[k] < old_cost)
+		  {
+		if (!cost_change_p)
+		  fprintf (ira_dump_file, "op %d(r=%u) new costs",
+			   i, REGNO(ops[i]));
+		cost_change_p = true;
+		fprintf (ira_dum

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-06 Thread Vladimir Makarov via Gcc-patches




On 9/1/23 05:07, Hongyu Wang wrote:

Uros Bizjak via Gcc-patches  于2023年8月31日周四 18:16写道：

On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang  wrote:

From: Kong Lingling 

Current reload infrastructure does not support selective base_reg_class
for backend insn. Add insn argument to base_reg_class for
lra/reload usage.

I don't think this is the correct approach. Ideally, a memory
constraint should somehow encode its BASE/INDEX register class.
Instead of passing "insn", simply a different constraint could be used
in the constraint string of the relevant insn.

We tried constraint only at the beginning, but then we found the
reload infrastructure
does not work like that.

The BASE/INDEX reg classes are determined before choosing alternatives, in
process_address under curr_insn_transform. Process_address creates the mem
operand according to the BASE/INDEX reg class. Then, the memory operand
constraint check will evaluate the mem op with targetm.legitimate_address_p.

If we want to make use of EGPR in base/index we need to either extend BASE/INDEX
reg class in the backend, or, for specific insns, add a target hook to
tell reload
that the extended reg class with EGPR can be used to construct memory operand.

CC'd Vladimir as git send-mail failed to add recipient.



I think the approach proposed by Intel developers is better.  In some way
we already use such approach when we pass memory mode to get the base
reg class.  Although we could use different memory constraints for
different modes when the possible base reg differs for some memory
modes.

Using special memory constraints probably can be implemented too (I
understand attractiveness of such approach for readability of the
machine description).  But in my opinion it will require much bigger
work in IRA/LRA/reload.  It also significantly slow down RA as we need
to process insn constraints for processing each memory in many places
(e.g. for calculation of reg classes and costs in IRA).  Still I think
there will be a few cases for this approach resulting in a bigger
probability of assigning hard reg out of specific base reg class and
this will result in additional reloads.

So the approach proposed by Intel is ok for me.  Although if x86 maintainers
are strongly against this approach and the changes in x86 machine
dependent code and Intel developers implement Uros approach, I am
ready to review this.  But still I prefer the current Intel developers
approach for reasons I mentioned above.

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-07 Thread Vladimir Makarov via Gcc-patches




On 9/7/23 02:23, Uros Bizjak wrote:

On Wed, Sep 6, 2023 at 9:43 PM Vladimir Makarov  wrote:


On 9/1/23 05:07, Hongyu Wang wrote:



I think the approach proposed by Intel developers is better.  In some way
we already use such approach when we pass memory mode to get the base
reg class.  Although we could use different memory constraints for
different modes when the possible base reg differs for some memory
modes.

Using special memory constraints probably can be implemented too (I
understand attractiveness of such approach for readability of the
machine description).  But in my opinion it will require much bigger
work in IRA/LRA/reload.  It also significantly slow down RA as we need
to process insn constraints for processing each memory in many places
(e.g. for calculation of reg classes and costs in IRA).  Still I think
there will be a few cases for this approach resulting in a bigger
probability of assigning hard reg out of specific base reg class and
this will result in additional reloads.

So the approach proposed by Intel is ok for me.  Although if x86 maintainers
are strongly against this approach and the changes in x86 machine
dependent code and Intel developers implement Uros approach, I am
ready to review this.  But still I prefer the current Intel developers
approach for reasons I mentioned above.

My above proposal is more or less a wish from a target maintainer PoV.
Ideally, we would have a bunch of different memory constraints, and a
target hook that returns corresponding BASE/INDEX reg classes.
However, I have no idea about the complexity of the implementation in
the infrastructure part of the compiler.

Basically, it needs introducing new hooks which return base and index 
classes from special memory constraints. When we process memory in an 
insn (a lot of places in IRA, LRA,reload) we should consider all 
possible memory insn constraints, take intersection of basic and index 
reg classes for the constraints and use them instead of the default base 
and reg classes.


The required functionality is absent in reload too.

I would say that it is a moderate size project (1-2 months for me).  It 
still requires to introduce new hooks and I guess there are few cases 
when we will still assign hard regs out of desirable base class for 
address pseudos and this will results in generation of additional reload 
insns.  It also means much more additional changes in RA source code and 
x86 machine dependent files.


Probably, with this approach there will be also edge cases when we need 
to solve new PRs because of LRA failures to generate the correct code 
but I believe they can be solved.


Therefore I lean toward the current Intel approach when to get base reg 
class we pass the insn as a parameter additionally to memory mode.

[pushed][PR111225][LRA]: Don't reuse chosen insn alternative with special memory constraint

2023-09-07 Thread Vladimir Makarov via Gcc-patches


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111225

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


commit f7bca44d97ad01b39f9d6e7809df7bf517eeb2fb
Author: Vladimir N. Makarov 
Date:   Thu Sep 7 09:59:10 2023 -0400

[LRA]: Don't reuse chosen insn alternative with special memory constraint

To speed up GCC, LRA reuses chosen alternative from previous
constraint subpass.  A spilled pseudo is considered ok for any memory
constraint although stack slot assigned to the pseudo later might not
satisfy the chosen alternative constraint.  As we don't consider all insn
alternatives on the subsequent LRA sub-passes, it might result in LRA failure
to generate the correct insn.  This patch solves the problem.

gcc/ChangeLog:

PR target/111225
* lra-constraints.cc (goal_reuse_alt_p): New global flag.
(process_alt_operands): Set up the flag.  Clear flag for chosen
alternative with special memory constraints.
(process_alt_operands): Set up used insn alternative depending on the flag.

gcc/testsuite/ChangeLog:

PR target/111225
* gcc.target/i386/pr111225.c: New test.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index c718bedff32..3aaa4906999 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1462,6 +1462,9 @@ static int goal_alt_matches[MAX_RECOG_OPERANDS];
 static int goal_alt_dont_inherit_ops_num;
 /* Numbers of operands whose reload pseudos should not be inherited.  */
 static int goal_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+/* True if we should try only this alternative for the next constraint sub-pass
+   to speed up the sub-pass.  */
+static bool goal_reuse_alt_p;
 /* True if the insn commutative operands should be swapped.  */
 static bool goal_alt_swapped;
 /* The chosen insn alternative.	 */
@@ -2130,6 +2133,7 @@ process_alt_operands (int only_alternative)
   int curr_alt_dont_inherit_ops_num;
   /* Numbers of operands whose reload pseudos should not be inherited.	*/
   int curr_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+  bool curr_reuse_alt_p;
   /* True if output stack pointer reload should be generated for the current
  alternative.  */
   bool curr_alt_out_sp_reload_p;
@@ -2217,6 +2221,7 @@ process_alt_operands (int only_alternative)
   reject += static_reject;
   early_clobbered_regs_num = 0;
   curr_alt_out_sp_reload_p = false;
+  curr_reuse_alt_p = true;
   
   for (nop = 0; nop < n_operands; nop++)
 	{
@@ -2574,7 +2579,10 @@ process_alt_operands (int only_alternative)
 		  if (satisfies_memory_constraint_p (op, cn))
 			win = true;
 		  else if (spilled_pseudo_p (op))
-			win = true;
+			{
+			  curr_reuse_alt_p = false;
+			  win = true;
+			}
 		  break;
 		}
 		  break;
@@ -3318,6 +3326,7 @@ process_alt_operands (int only_alternative)
 	  goal_alt_offmemok[nop] = curr_alt_offmemok[nop];
 	}
 	  goal_alt_dont_inherit_ops_num = curr_alt_dont_inherit_ops_num;
+	  goal_reuse_alt_p = curr_reuse_alt_p;
 	  for (nop = 0; nop < curr_alt_dont_inherit_ops_num; nop++)
 	goal_alt_dont_inherit_ops[nop] = curr_alt_dont_inherit_ops[nop];
 	  goal_alt_swapped = curr_swapped;
@@ -4399,7 +4408,8 @@ curr_insn_transform (bool check_only_p)
 }
 
   lra_assert (goal_alt_number >= 0);
-  lra_set_used_insn_alternative (curr_insn, goal_alt_number);
+  lra_set_used_insn_alternative (curr_insn, goal_reuse_alt_p
+ ? goal_alt_number : LRA_UNKNOWN_ALT);
 
   if (lra_dump_file != NULL)
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr111225.c b/gcc/testsuite/gcc.target/i386/pr111225.c
new file mode 100644
index 000..5d92daf215b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr111225.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fsanitize=thread -mforce-drap -mavx512cd" } */
+
+typedef long long __m256i __attribute__ ((__vector_size__ (32)));
+
+int foo (__m256i x, __m256i y)
+{
+  __m256i a = x & ~y;
+  return !__builtin_ia32_ptestz256 (a, a);
+}
+
+int bar (__m256i x, __m256i y)
+{
+  __m256i a = ~x & y;
+  return !__builtin_ia32_ptestz256 (a, a);
+}

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-08 Thread Vladimir Makarov via Gcc-patches




On 8/31/23 04:20, Hongyu Wang wrote:

@@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression 
(@code{MEM} for the top level
  of an address, @code{ADDRESS} for something that occurs in an
  @code{address_operand}).  @var{index_code} is the code of the corresponding
  index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
  @end defmac


I'd prefer more general description of 'insn' argument for the macros.  
Something like that:


@code{insn} can be used to define an insn-specific base register class.

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-14 Thread Vladimir Makarov via Gcc-patches




On 9/10/23 00:49, Hongyu Wang wrote:

Vladimir Makarov via Gcc-patches  于2023年9月9日周六 01:04写道：


On 8/31/23 04:20, Hongyu Wang wrote:

@@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression 
(@code{MEM} for the top level
   of an address, @code{ADDRESS} for something that occurs in an
   @code{address_operand}).  @var{index_code} is the code of the corresponding
   index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} 
otherwise.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
   @end defmac

I'd prefer more general description of 'insn' argument for the macros.
Something like that:

@code{insn} can be used to define an insn-specific base register class.


Sure, will adjust in the V2 patch.
Also, currently we reuse the old macro MODE_CODE_BASE_REG_CLASS, do
you think we need a new macro like INSN_BASE_REG_CLASS as other
parameters are actually unused? Then we don't need to change other
targets like avr/gcn.

I thought about this too.  Using new macros would be definitely worth to 
add, especially when you are already adding INSN_INDEX_REG_CLASS.


The names INSN_BASE_REG_CLASS instead of MODE_CODE_BASE_REG_CLASS and 
REGNO_OK_FOR_INSN_BASE_P instead of REGNO_MODE_CODE_OK_FOR_BASE_P are ok 
for me too.


When you submit the v2 patch, I'll review the RA part as soon as 
possible (actually I already looked at this) and most probably give my 
approval for the RA part because I prefer you current approach for RA 
instead of introducing new memory constraints.

[pushed] [RA]: Improve cost calculation of pseudos with equivalences

2023-09-14 Thread Vladimir Makarov via Gcc-patches

I've committed the following patch.  The reason for this patch is 
explained in its commit message.


The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


commit 3c834d85f2ec42c60995c2b678196a06cb744959
Author: Vladimir N. Makarov 
Date:   Thu Sep 14 10:26:48 2023 -0400

[RA]: Improve cost calculation of pseudos with equivalences

RISCV target developers reported that RA can spill pseudo used in a
loop although there are enough registers to assign.  It happens when
the pseudo has an equivalence outside the loop and the equivalence is
not merged into insns using the pseudo.  IRA sets up that memory cost
to zero when the pseudo has an equivalence and it means that the
pseudo will be probably spilled.  This approach worked well for i686
(different approaches were benchmarked long time ago on spec2k).
Although common sense says that the code is wrong and this was
confirmed by RISCV developers.

I've tried the following patch on I7-9700k and it improved spec17 fp
by 1.5% (21.1 vs 20.8) although spec17 int is a bit worse by 0.45%
(8.54 vs 8.58).  The average generated code size is practically the
same (0.001% difference).

In the future we probably need to try more sophisticated cost
calculation which should take into account that the equiv can not be
combined in usage insns and the costs of reloads because of this.

gcc/ChangeLog:

* ira-costs.cc (find_costs_and_classes): Decrease memory cost
by equiv savings.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index d9e700e8947..8c93ace5094 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -1947,15 +1947,8 @@ find_costs_and_classes (FILE *dump_file)
 	}
 	  if (i >= first_moveable_pseudo && i < last_moveable_pseudo)
 	i_mem_cost = 0;
-	  else if (equiv_savings < 0)
-	i_mem_cost = -equiv_savings;
-	  else if (equiv_savings > 0)
-	{
-	  i_mem_cost = 0;
-	  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
-		i_costs[k] += equiv_savings;
-	}
-
+	  else
+	i_mem_cost -= equiv_savings;
 	  best_cost = (1 << (HOST_BITS_PER_INT - 2)) - 1;
 	  best = ALL_REGS;
 	  alt_class = NO_REGS;

Re: [PATCH] ira: Consider save/restore costs of callee-save registers [PR110071]

2023-09-15 Thread Vladimir Makarov via Gcc-patches




On 9/14/23 06:45, Surya Kumari Jangala wrote:

ira: Consider save/restore costs of callee-save registers [PR110071]

In improve_allocation() routine, IRA checks for each allocno if spilling
any conflicting allocnos can improve the allocation of this allocno.
This routine computes the cost improvement for usage of each profitable
hard register for a given allocno. The existing code in
improve_allocation() does not consider the save/restore costs of callee
save registers while computing the cost improvement.

This can result in a callee save register being assigned to a pseudo
that is live in the entire function and across a call, overriding a
non-callee save register assigned to the pseudo by graph coloring. So
the entry basic block requires a prolog, thereby causing shrink wrap to
fail.


Yes, that can be a problem. The general idea is ok for me and common 
sense says me that the performance should be better but I would like to 
benchmark the patch on x86-64 spec2017 first.  Real applications have 
high register pressure and results might be not what we expect.  So I'll 
do it, report the results, and give my approval if there is no big 
performance degradation.  I think the results will be ready on Monday.

Re: [PATCH] ira: Consider save/restore costs of callee-save registers [PR110071]

2023-09-18 Thread Vladimir Makarov via Gcc-patches




On 9/15/23 10:48, Vladimir Makarov wrote:


On 9/14/23 06:45, Surya Kumari Jangala wrote:

ira: Consider save/restore costs of callee-save registers [PR110071]

In improve_allocation() routine, IRA checks for each allocno if spilling
any conflicting allocnos can improve the allocation of this allocno.
This routine computes the cost improvement for usage of each profitable
hard register for a given allocno. The existing code in
improve_allocation() does not consider the save/restore costs of callee
save registers while computing the cost improvement.

This can result in a callee save register being assigned to a pseudo
that is live in the entire function and across a call, overriding a
non-callee save register assigned to the pseudo by graph coloring. So
the entry basic block requires a prolog, thereby causing shrink wrap to
fail.


Yes, that can be a problem. The general idea is ok for me and common 
sense says me that the performance should be better but I would like 
to benchmark the patch on x86-64 spec2017 first.  Real applications 
have high register pressure and results might be not what we expect.  
So I'll do it, report the results, and give my approval if there is no 
big performance degradation.  I think the results will be ready on 
Monday.



I've benchmarked the patch on x86-64.  Specint2017 rate changed from 
8.54 to 8.51 and specfp2017 rate changed from 21.1 to 21.2. It is 
probably in a range of measurement error.


So the patch is ok for me to commit.  Thank you for working on the issue.

Re: [PATCH] lra: Canonicalize mult to shift in address reloads

2020-08-25 Thread Vladimir Makarov via Gcc-patches




On 2020-08-25 6:18 a.m., Alex Coplan wrote:

The motivation here is to be able to remove several redundant patterns
in the AArch64 backend. See the previous thread [1] for context.

Testing:
  * Bootstrapped and regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu.
  * New unit test which checks that we're using the shift pattern (rather
than the problematic mult pattern) on AArch64.

OK for master?

Yes. Thank you for working on this issue and the patch.

Thanks,
Alex

[0] : https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html
[1] : https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552066.html

---

gcc/ChangeLog:

* lra-constraints.c (canonicalize_reload_addr): New.
(curr_insn_transform): Use canonicalize_reload_addr to ensure we
generate canonical RTL for an address reload.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/mem-shift-canonical.c: New test.

Re: [PATCH] lra: Canonicalize mult to shift in address reloads

2020-08-26 Thread Vladimir Makarov via Gcc-patches




On 2020-08-26 5:06 a.m., Richard Sandiford wrote:

Alex Coplan  writes:

Minor nit, should be formatted as:

static rtx
canonicalize_reload_addr (rtx addr)

Sorry for missing this.  Alex, it should be fixed anyway.


I don't think we should we restrict this to (plus (mult X Y) Z),
since addresses can be more complicated than that.  One way to
search for all MULTs is:

   subrtx_iterator::array_type array;
   FOR_EACH_SUBRTX (iter, array, x, NONCONST)
 {
   rtx x = *iter;
   if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
 ...
 }

(Needs rtl-iter.h)


I am agree it would be nice to process a general case.  Alex, you can do 
this as a separate patch if you want.


Richard, thank you for looking at this patch too.

Re: [PATCH] lra: Canonicalize mult to shift in address reloads

2020-08-26 Thread Vladimir Makarov via Gcc-patches


On 2020-08-26 11:15 a.m., Alex Coplan wrote:

Thanks for the review, both.


Please find a reworked version of the patch attached incorporating
Richard's feedback.

Testing:
  * Bootstrap and regtest on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu: no regressions.

Is the updated patch OK for master?


Yes.  Thank you, Alex.

Re: [PATCH] lra: Avoid cycling on certain subreg reloads [PR96796]

2020-09-04 Thread Vladimir Makarov via Gcc-patches

Richard, thank you for working on this issue and for as usually detailed 
explanation of the problem.


On 2020-08-28 9:52 a.m., Richard Sandiford wrote:

...



The patch is quite aggressive in that it does this for all reload
pseudos in all reload instructions.  I wondered about reusing the
condition for a reload move in in_class_p:

   INSN_UID (curr_insn) >= new_insn_uid_start
   && curr_insn_set != NULL
   && ((OBJECT_P (SET_SRC (curr_insn_set))
&& ! CONSTANT_P (SET_SRC (curr_insn_set)))
   || (GET_CODE (SET_SRC (curr_insn_set)) == SUBREG
   && OBJECT_P (SUBREG_REG (SET_SRC (curr_insn_set)))
   && ! CONSTANT_P (SUBREG_REG (SET_SRC (curr_insn_set)))

but I can't really justify that on first principles.  I think we
should apply the rule consistently until we have a specific reason
for doing otherwise.
I can not predict how the patch will behave and I can not say either 
that less aggressive patch is better.  So let us try.


OK for trunk?

Yes, thank you again.

   If so, I think we should leave it on trunk for a bit
before backporting.


...

It seems reasonable.


I'm off next week, so if the patch is OK, I'll hold off applying
until I get back.

Richard


gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter.  Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instruction.

gcc/testsuite/
PR rtl-optimization/96796
* gcc.c-torture/compile/pr96796.c: New test.

1 2 >

1 - 100 of 179 matches

Mail list logo