[PATCH v3 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions

2025-05-14 Thread dhruvc
From: Dhruv Chawla 

This patch modifies the shift expander to immediately lower constant
shifts without unspec. It also modifies the ADR, SRA and ADDHNB patterns
to match the lowered forms of the shifts, as the predicate register is
not required for these instructions.

Bootstrapped and regtested on aarch64-linux-gnu.

Signed-off-by: Dhruv Chawla 

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-sve.md (@aarch64_adr_shift):
Match lowered form of ashift.
(*aarch64_adr_shift): Likewise.
(*aarch64_adr_shift_sxtw): Likewise.
(*aarch64_adr_shift_uxtw): Likewise.
(3): Avoid moving legal immediate shift
amounts into a new register.
(v3): Generate unpredicated shifts for constant
operands.
(*post_ra_v_ashl3): Rename to ...
(aarch64_vashl3_const): ... this and remove reload requirement.
(*post_ra_v_3): Rename to ...
(aarch64_v3_const): ... this and remove reload
requirement.
* gcc/config/aarch64/aarch64-sve2.md
(@aarch64_sve_add_): Match lowered form of
SHIFTRT.
(*aarch64_sve2_sra): Likewise.
(*bitmask_shift_plus): Match lowered form of lshiftrt.
---
 gcc/config/aarch64/aarch64-sve.md  | 90 +-
 gcc/config/aarch64/aarch64-sve2.md | 46 +--
 2 files changed, 53 insertions(+), 83 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index bf7569f932b..cb88d6d95a6 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4234,80 +4234,57 @@
 (define_expand "@aarch64_adr_shift"
   [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
(plus:SVE_FULL_SDI
- (unspec:SVE_FULL_SDI
-   [(match_dup 4)
-(ashift:SVE_FULL_SDI
-  (match_operand:SVE_FULL_SDI 2 "register_operand")
-  (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:SVE_FULL_SDI
+   (match_operand:SVE_FULL_SDI 2 "register_operand")
+   (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))
  (match_operand:SVE_FULL_SDI 1 "register_operand")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {
-operands[4] = CONSTM1_RTX (mode);
-  }
 )
 
-(define_insn_and_rewrite "*aarch64_adr_shift"
+(define_insn "*aarch64_adr_shift"
   [(set (match_operand:SVE_24I 0 "register_operand" "=w")
(plus:SVE_24I
- (unspec:SVE_24I
-   [(match_operand 4)
-(ashift:SVE_24I
-  (match_operand:SVE_24I 2 "register_operand" "w")
-  (match_operand:SVE_24I 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:SVE_24I
+   (match_operand:SVE_24I 2 "register_operand" "w")
+   (match_operand:SVE_24I 3 "const_1_to_3_operand"))
  (match_operand:SVE_24I 1 "register_operand" "w")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0., [%1., %2., lsl %3]"
-  "&& !CONSTANT_P (operands[4])"
-  {
-operands[4] = CONSTM1_RTX (mode);
-  }
 )
 
 ;; Same, but with the index being sign-extended from the low 32 bits.
 (define_insn_and_rewrite "*aarch64_adr_shift_sxtw"
   [(set (match_operand:VNx2DI 0 "register_operand" "=w")
(plus:VNx2DI
- (unspec:VNx2DI
-   [(match_operand 4)
-(ashift:VNx2DI
-  (unspec:VNx2DI
-[(match_operand 5)
- (sign_extend:VNx2DI
-   (truncate:VNx2SI
- (match_operand:VNx2DI 2 "register_operand" "w")))]
-UNSPEC_PRED_X)
-  (match_operand:VNx2DI 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:VNx2DI
+   (unspec:VNx2DI
+ [(match_operand 4)
+  (sign_extend:VNx2DI
+(truncate:VNx2SI
+  (match_operand:VNx2DI 2 "register_operand" "w")))]
+UNSPEC_PRED_X)
+   (match_operand:VNx2DI 3 "const_1_to_3_operand"))
  (match_operand:VNx2DI 1 "register_operand" "w")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0.d, [%1.d, %2.d, sxtw %3]"
-  "&& (!CONSTANT_P (operands[4]) || !CONSTANT_P (operands[5]))"
+  "&& !CONSTANT_P (operands[4])"
   {
-operands[5] = operands[4] = CONSTM1_RTX (VNx2BImode);
+operands[4] = CONSTM1_RTX (VNx2BImode);
   }
 )
 
 ;; Same, but with the index being zero-extended from the low 32 bits.
-(define_insn_and_rewrite "*aarch64_adr_shift_uxtw"
+(define_insn "*aarch64_adr_shift_uxtw"
   [(set (match_operand:VNx2DI 0 "register_operand" "=w")
(plus:VNx2DI
- (unspec:VNx2DI
-   [(match_operand 5)
-(ashift:VNx2DI
-  (and:VNx2DI
-(match_operand:VNx2DI 2 "register_operand" "w")
-(match_operand:VNx2DI 4 "aarch64_sve_uxtw_immediate"))
-  (match_operand:VNx2DI 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:VNx2DI
+   

[PATCH v3 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-14 Thread dhruvc
From: Dhruv Chawla 

This patch modifies the intrinsic expanders to expand svlsl and svlsr to
unpredicated forms when the predicate is a ptrue. It also folds the
following pattern:

  lsl , , 
  lsr , , 
  orr , , 

to:

  revb/h/w , 

when the shift amount is equal to half the bitwidth of the 
register.

Bootstrapped and regtested on aarch64-linux-gnu.

Signed-off-by: Dhruv Chawla 
Co-authored-by: Richard Sandiford 

gcc/ChangeLog:

* expmed.cc (expand_rotate_as_vec_perm): Avoid a no-op move if the
target already provided the result in the expected register.
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const):
Avoid forcing subregs into fresh registers unnecessarily.
* config/aarch64/aarch64-sve-builtins-base.cc
(svlsl_impl::expand): Define.
(svlsr_impl): New class.
(svlsr_impl::fold): Define.
(svlsr_impl::expand): Likewise.
* config/aarch64/aarch64-sve.md: Add define_split for rotate.
(*v_revvnx8hi): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/shift_rev_1.c: New test.
* gcc.target/aarch64/sve/shift_rev_2.c: Likewise.
* gcc.target/aarch64/sve/shift_rev_3.c: Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.cc  | 33 +++-
 gcc/config/aarch64/aarch64-sve.md | 55 
 gcc/config/aarch64/aarch64.cc | 10 ++-
 gcc/expmed.cc |  3 +-
 .../gcc.target/aarch64/sve/shift_rev_1.c  | 83 +++
 .../gcc.target/aarch64/sve/shift_rev_2.c  | 63 ++
 .../gcc.target/aarch64/sve/shift_rev_3.c  | 83 +++
 7 files changed, 326 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_3.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index b4396837c24..90dd5c97a10 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2086,6 +2086,37 @@ public:
   {
 return f.fold_const_binary (LSHIFT_EXPR);
   }
+
+  rtx expand (function_expander &e) const override
+  {
+tree pred = TREE_OPERAND (e.call_expr, 3);
+tree shift = TREE_OPERAND (e.call_expr, 5);
+if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ()))
+   && uniform_integer_cst_p (shift))
+  return e.use_unpred_insn (e.direct_optab_handler (ashl_optab));
+return rtx_code_function::expand (e);
+  }
+};
+
+class svlsr_impl : public rtx_code_function
+{
+public:
+  CONSTEXPR svlsr_impl () : rtx_code_function (LSHIFTRT, LSHIFTRT) {}
+
+  gimple *fold (gimple_folder &f) const override
+  {
+return f.fold_const_binary (RSHIFT_EXPR);
+  }
+
+  rtx expand (function_expander &e) const override
+  {
+tree pred = TREE_OPERAND (e.call_expr, 3);
+tree shift = TREE_OPERAND (e.call_expr, 5);
+if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ()))
+   && uniform_integer_cst_p (shift))
+  return e.use_unpred_insn (e.direct_optab_handler (lshr_optab));
+return rtx_code_function::expand (e);
+  }
 };
 
 class svmad_impl : public function_base
@@ -3586,7 +3617,7 @@ FUNCTION (svldnt1, svldnt1_impl,)
 FUNCTION (svlen, svlen_impl,)
 FUNCTION (svlsl, svlsl_impl,)
 FUNCTION (svlsl_wide, shift_wide, (ASHIFT, UNSPEC_ASHIFT_WIDE))
-FUNCTION (svlsr, rtx_code_function, (LSHIFTRT, LSHIFTRT))
+FUNCTION (svlsr, svlsr_impl,)
 FUNCTION (svlsr_wide, shift_wide, (LSHIFTRT, UNSPEC_LSHIFTRT_WIDE))
 FUNCTION (svmad, svmad_impl,)
 FUNCTION (svmax, rtx_code_function, (SMAX, UMAX, UNSPEC_COND_FMAX,
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index cb88d6d95a6..0156afc1e7d 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3317,6 +3317,61 @@
 ;; - REVW
 ;; -
 
+(define_split
+  [(set (match_operand:SVE_FULL_HSDI 0 "register_operand")
+   (rotate:SVE_FULL_HSDI
+ (match_operand:SVE_FULL_HSDI 1 "register_operand")
+ (match_operand:SVE_FULL_HSDI 2 "aarch64_constant_vector_operand")))]
+  "TARGET_SVE && can_create_pseudo_p ()"
+  [(set (match_dup 3)
+   (ashift:SVE_FULL_HSDI (match_dup 1)
+ (match_dup 2)))
+   (set (match_dup 0)
+   (plus:SVE_FULL_HSDI
+ (lshiftrt:SVE_FULL_HSDI (match_dup 1)
+ (match_dup 4))
+ (match_dup 3)))]
+  {
+if (aarch64_emit_opt_vec_rotate (operands[0], operands[1], operands[2]))
+  DONE;
+
+if (!TARGET_SVE2)
+  FAIL;
+
+operands[3] = gen_reg_rtx (mode);
+HOST_WIDE_INT shift_amount =
+  INTVAL (unwrap_const_vec_duplicate (operands[2]));
+int bitwidth = GET_MODE_UNIT_BITSIZE (mode);
+

[PATCH v4 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions

2025-05-21 Thread dhruvc
From: Dhruv Chawla 

This patch modifies the shift expander to immediately lower constant
shifts without unspec. It also modifies the ADR, SRA and ADDHNB patterns
to match the lowered forms of the shifts, as the predicate register is
not required for these instructions.

Bootstrapped and regtested on aarch64-linux-gnu.

Signed-off-by: Dhruv Chawla 
Co-authored-by: Richard Sandiford 

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-sve.md (@aarch64_adr_shift):
Match lowered form of ashift.
(*aarch64_adr_shift): Likewise.
(*aarch64_adr_shift_sxtw): Likewise.
(*aarch64_adr_shift_uxtw): Likewise.
(3): Check amount instead of operands[2] in
aarch64_sve_shift_operand.
(v3): Generate unpredicated shifts for constant
operands.
(@aarch64_pred_): Convert to a define_expand.
(*aarch64_pred_): Create define_insn_and_split pattern
from @aarch64_pred_.
(*post_ra_v_ashl3): Rename to ...
(aarch64_vashl3_const): ... this and remove reload requirement.
(*post_ra_v_3): Rename to ...
(aarch64_v3_const): ... this and remove reload
requirement.
* gcc/config/aarch64/aarch64-sve2.md
(@aarch64_sve_add_): Match lowered form of
SHIFTRT.
(*aarch64_sve2_sra): Likewise.
(*bitmask_shift_plus): Match lowered form of lshiftrt.
---
 gcc/config/aarch64/aarch64-sve.md  | 119 +++--
 gcc/config/aarch64/aarch64-sve2.md |  46 ---
 2 files changed, 75 insertions(+), 90 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index bf7569f932b..e1ec778b10d 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4234,80 +4234,57 @@
 (define_expand "@aarch64_adr_shift"
   [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
(plus:SVE_FULL_SDI
- (unspec:SVE_FULL_SDI
-   [(match_dup 4)
-(ashift:SVE_FULL_SDI
-  (match_operand:SVE_FULL_SDI 2 "register_operand")
-  (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:SVE_FULL_SDI
+   (match_operand:SVE_FULL_SDI 2 "register_operand")
+   (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))
  (match_operand:SVE_FULL_SDI 1 "register_operand")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {
-operands[4] = CONSTM1_RTX (mode);
-  }
 )
 
-(define_insn_and_rewrite "*aarch64_adr_shift"
+(define_insn "*aarch64_adr_shift"
   [(set (match_operand:SVE_24I 0 "register_operand" "=w")
(plus:SVE_24I
- (unspec:SVE_24I
-   [(match_operand 4)
-(ashift:SVE_24I
-  (match_operand:SVE_24I 2 "register_operand" "w")
-  (match_operand:SVE_24I 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:SVE_24I
+   (match_operand:SVE_24I 2 "register_operand" "w")
+   (match_operand:SVE_24I 3 "const_1_to_3_operand"))
  (match_operand:SVE_24I 1 "register_operand" "w")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0., [%1., %2., lsl %3]"
-  "&& !CONSTANT_P (operands[4])"
-  {
-operands[4] = CONSTM1_RTX (mode);
-  }
 )
 
 ;; Same, but with the index being sign-extended from the low 32 bits.
 (define_insn_and_rewrite "*aarch64_adr_shift_sxtw"
   [(set (match_operand:VNx2DI 0 "register_operand" "=w")
(plus:VNx2DI
- (unspec:VNx2DI
-   [(match_operand 4)
-(ashift:VNx2DI
-  (unspec:VNx2DI
-[(match_operand 5)
- (sign_extend:VNx2DI
-   (truncate:VNx2SI
- (match_operand:VNx2DI 2 "register_operand" "w")))]
-UNSPEC_PRED_X)
-  (match_operand:VNx2DI 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:VNx2DI
+   (unspec:VNx2DI
+ [(match_operand 4)
+  (sign_extend:VNx2DI
+(truncate:VNx2SI
+  (match_operand:VNx2DI 2 "register_operand" "w")))]
+UNSPEC_PRED_X)
+   (match_operand:VNx2DI 3 "const_1_to_3_operand"))
  (match_operand:VNx2DI 1 "register_operand" "w")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0.d, [%1.d, %2.d, sxtw %3]"
-  "&& (!CONSTANT_P (operands[4]) || !CONSTANT_P (operands[5]))"
+  "&& !CONSTANT_P (operands[4])"
   {
-operands[5] = operands[4] = CONSTM1_RTX (VNx2BImode);
+operands[4] = CONSTM1_RTX (VNx2BImode);
   }
 )
 
 ;; Same, but with the index being zero-extended from the low 32 bits.
-(define_insn_and_rewrite "*aarch64_adr_shift_uxtw"
+(define_insn "*aarch64_adr_shift_uxtw"
   [(set (match_operand:VNx2DI 0 "register_operand" "=w")
(plus:VNx2DI
- (unspec:VNx2DI
-   [(match_operand 5)
-(ashift:VNx2DI
-  (and:VNx2DI
-(match_operand:VNx2DI 2 "register_operand" "w")
-

[PATCH v4 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-21 Thread dhruvc
From: Dhruv Chawla 

This patch folds the following pattern:

  lsl , , 
  lsr , , 
  orr , , 

to:

  revb/h/w , 

when the shift amount is equal to half the bitwidth of the 
register.

Bootstrapped and regtested on aarch64-linux-gnu.

Signed-off-by: Dhruv Chawla 
Co-authored-by: Richard Sandiford 

gcc/ChangeLog:

* expmed.cc (expand_rotate_as_vec_perm): Avoid a no-op move if the
target already provided the result in the expected register.
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const):
Avoid forcing subregs into fresh registers unnecessarily.
* config/aarch64/aarch64-sve.md: Add define_split for rotate.
(*v_revvnx8hi): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/shift_rev_1.c: New test.
* gcc.target/aarch64/sve/shift_rev_2.c: Likewise.
* gcc.target/aarch64/sve/shift_rev_3.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve.md | 55 
 gcc/config/aarch64/aarch64.cc | 10 ++-
 gcc/expmed.cc |  3 +-
 .../gcc.target/aarch64/sve/shift_rev_1.c  | 83 +++
 .../gcc.target/aarch64/sve/shift_rev_2.c  | 63 ++
 .../gcc.target/aarch64/sve/shift_rev_3.c  | 83 +++
 6 files changed, 294 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_3.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index e1ec778b10d..fa431c9c060 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3317,6 +3317,61 @@
 ;; - REVW
 ;; -
 
+(define_split
+  [(set (match_operand:SVE_FULL_HSDI 0 "register_operand")
+   (rotate:SVE_FULL_HSDI
+ (match_operand:SVE_FULL_HSDI 1 "register_operand")
+ (match_operand:SVE_FULL_HSDI 2 "aarch64_constant_vector_operand")))]
+  "TARGET_SVE && can_create_pseudo_p ()"
+  [(set (match_dup 3)
+   (ashift:SVE_FULL_HSDI (match_dup 1)
+ (match_dup 2)))
+   (set (match_dup 0)
+   (plus:SVE_FULL_HSDI
+ (lshiftrt:SVE_FULL_HSDI (match_dup 1)
+ (match_dup 4))
+ (match_dup 3)))]
+  {
+if (aarch64_emit_opt_vec_rotate (operands[0], operands[1], operands[2]))
+  DONE;
+
+if (!TARGET_SVE2)
+  FAIL;
+
+operands[3] = gen_reg_rtx (mode);
+HOST_WIDE_INT shift_amount =
+  INTVAL (unwrap_const_vec_duplicate (operands[2]));
+int bitwidth = GET_MODE_UNIT_BITSIZE (mode);
+operands[4] = aarch64_simd_gen_const_vector_dup (mode,
+bitwidth - shift_amount);
+  }
+)
+
+;; The RTL combiners are able to combine "ior (ashift, ashiftrt)" to a "bswap".
+;; Match that as well.
+(define_insn_and_split "*v_revvnx8hi"
+  [(parallel
+[(set (match_operand:VNx8HI 0 "register_operand")
+ (bswap:VNx8HI (match_operand 1 "register_operand")))
+ (clobber (match_scratch:VNx8BI 2))])]
+  "TARGET_SVE"
+  "#"
+  ""
+  [(set (match_dup 0)
+   (unspec:VNx8HI
+ [(match_dup 2)
+  (unspec:VNx8HI
+[(match_dup 1)]
+UNSPEC_REVB)]
+ UNSPEC_PRED_X))]
+  {
+if (!can_create_pseudo_p ())
+  operands[2] = CONSTM1_RTX (VNx8BImode);
+else
+  operands[2] = aarch64_ptrue_reg (VNx8BImode);
+  }
+)
+
 ;; Predicated integer unary operations.
 (define_insn "@aarch64_pred_"
   [(set (match_operand:SVE_FULL_I 0 "register_operand")
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 1da615c8955..7cdd5fda903 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -27067,11 +27067,17 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   d.op_mode = op_mode;
   d.op_vec_flags = aarch64_classify_vector_mode (d.op_mode);
   d.target = target;
-  d.op0 = op0 ? force_reg (op_mode, op0) : NULL_RTX;
+  d.op0 = op0;
+  if (d.op0 && !register_operand (d.op0, op_mode))
+d.op0 = force_reg (op_mode, d.op0);
   if (op0 && d.one_vector_p)
 d.op1 = copy_rtx (d.op0);
   else
-d.op1 = op1 ? force_reg (op_mode, op1) : NULL_RTX;
+{
+  d.op1 = op1;
+  if (d.op1 && !register_operand (d.op1, op_mode))
+   d.op1 = force_reg (op_mode, d.op1);
+}
   d.testing_p = !target;
 
   if (!d.testing_p)
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 72dbafe5d9f..deb4e48d14f 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -6324,7 +6324,8 @@ expand_rotate_as_vec_perm (machine_mode mode, rtx dst, 
rtx x, rtx amt)
 qimode, perm_dst);
   if (!res)
 return NULL_RTX;
-  emit_move_insn (dst, lowpart_subreg (mode, res, qimode));
+  if (!rtx_equal_p (res, perm_dst

[PATCH] [MAINTAINERS] Add myself to write after approval and DCO.

2025-05-22 Thread dhruvc
From: Dhruv Chawla 

Committed as 3213828f74f2f27a2dd91792cef27117ba1a522e.

ChangeLog:

* MAINTAINERS: Add myself to write after approval and DCO.
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8993d176c22..f40d6350462 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -402,6 +402,7 @@ Stephane Carrez ciceron 

 Gabriel Charettegchare  
 Arnaud Charlet  charlet 
 Chandra Chavva  -   
+Dhruv Chawladhruvc  
 Dehao Chen  dehao   
 Fabien ChĂȘnefabien  
 Bin Cheng   amker   
@@ -932,6 +933,7 @@ information.
 
 
 Soumya AR   
+Dhruv Chawla
 Juergen Christ  
 Giuseppe D'Angelo   
 Robin Dapp  
-- 
2.39.3 (Apple Git-146)



[PATCH] widening_mul: Make better use of overflowing operations in codegen of min/max(a, add/sub(a, b))

2025-05-29 Thread dhruvc
From: Dhruv Chawla 

This patch folds the following patterns:
- max (a, add (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? sum : a
- max (a, sub (a, b)) -> [sum, ovf] = subo (a, b); !ovf ? a : sum
- min (a, add (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? a : sum
- min (a, sub (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? sum : a

Where ovf is the overflow flag, addo and subo are overflowing addition and
subtraction, respectively. The folded patterns can normally be implemented as
an overflowing operation combined with a conditional move/select instruction.

Explanation for the conditions handled in arith_overflow_check_p:

Case 1/2: r = a + b; max/min (a, r) or max/min (r, a)
  lhs (r)
if crhs1 (a) and crhs2 (r)
  => lhs (r) == crhs2 (r) &&
 (rhs1 (a or b) == crhs1 (a) || rhs2 (a or b) == crhs1 (a))
if crhs1 (r) and crhs2 (a)
  => lhs (r) == crhs1 (r) &&
 (rhs1 (a or b) == crhs2 (a) || rhs2 (a or b) == crhs2 (a))

Both rhs1 and rhs2 are checked in (rhs == crhs) as addition is
commutative.

Case 3/4: r = a - b; max/min (a, r) or max/min (r, a)
  lhs (r)
if crhs1 (a) and crhs2 (r)
  => lhs (r) == crhs2 (r) && rhs1 (a) == crhs1 (a)
if crhs1 (r) and crhs2 (a)
  => lhs (r) == crhs1 (r) && rhs1 (a) == crhs2 (a)

Bootstrapped and regtested on aarch64-unknown-linux-gnu.

Signed-off-by: Dhruv Chawla 

gcc/ChangeLog:

PR middle-end/116815
* tree-ssa-math-opts.cc (arith_overflow_check_p): Match min/max
patterns.
(build_minmax_replacement_statements): New function.
(match_arith_overflow): Update to handle min/max patterns.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr116815-1.c: New test.
* gcc.dg/tree-ssa/pr116815-2.c: Likewise.
* gcc.dg/tree-ssa/pr116815-3.c: Likewise.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c |  42 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c |  93 +
 gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c |  43 ++
 gcc/tree-ssa-math-opts.cc  | 151 +++--
 4 files changed, 318 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c
new file mode 100644
index 000..5d62843d63c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* PR middle-end/116815 */
+
+/* Single-use tests.  */
+
+static inline unsigned
+max (unsigned a, unsigned b)
+{
+  return a > b ? a : b;
+}
+
+static inline unsigned
+min (unsigned a, unsigned b)
+{
+  return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2) 
\
+  unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, exp2); }
+
+OPERATION (max, add, 1, a, a + b)
+OPERATION (max, add, 2, a, b + a)
+OPERATION (max, add, 3, a + b, a)
+OPERATION (max, add, 4, b + a, a)
+
+OPERATION (min, add, 1, a, a + b)
+OPERATION (min, add, 2, a, b + a)
+OPERATION (min, add, 3, a + b, a)
+OPERATION (min, add, 4, b + a, a)
+
+OPERATION (max, sub, 1, a, a - b)
+OPERATION (max, sub, 2, a - b, a)
+
+OPERATION (min, sub, 1, a, a - b)
+OPERATION (min, sub, 2, a - b, a)
+
+/* { dg-final { scan-tree-dump-not "MAX_EXPR" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "MIN_EXPR" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 4 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c
new file mode 100644
index 000..56e8038ef82
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c
@@ -0,0 +1,93 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* PR middle-end/116815 */
+
+/* Negative tests.  */
+
+static inline int
+smax (int a, int b)
+{
+  return a > b ? a : b;
+}
+
+static inline int
+smin (int a, int b)
+{
+  return a < b ? a : b;
+}
+
+static inline unsigned
+umax (unsigned a, unsigned b)
+{
+  return a > b ? a : b;
+}
+
+static inline unsigned
+umin (unsigned a, unsigned b)
+{
+  return a < b ? a : b;
+}
+
+#define ASSUME(cond) if (!(cond)) __builtin_unreachable ();
+
+/* This transformation does not trigger on signed types.  */
+
+int
+smax_add (int a, int b)
+{
+  ASSUME (b >= 0);
+  return smax (a, a + b);
+}
+
+int
+smin_add (int a, int b)
+{
+  ASSUME (b >= 0);
+  return smin (a, a + b);
+}
+
+int
+smax_sub (int a, int b)
+{
+  ASSUME (b >= 0);
+  return smax (a, a - b);
+}
+
+int
+smin_sub (int a, int b)
+{
+  ASSUME (b >= 0);
+  return smin (a, a - b);
+}
+
+/* Invalid patterns.  */
+
+/* This can potentially be matched, but the RHS gets factored to
+   (a + b) * b.  */
+unsigned
+umax_fa

[PATCH] [RFC][AutoFDO] Source filename tracking in GCOV

2025-06-16 Thread dhruvc
From: Dhruv Chawla 

This patch modifies the auto-profile pass to read file names from GCOV. A
function is only annotated with a set of profile counts if its file name
matches the file name that the function in the GCOV file was recorded
with. It also bumps the GCOV version to 3 as the file format has
changed.

gcc/ChangeLog:
* auto-profile.cc (AUTO_PROFILE_VERSION): Bump from 2 to 3.
(string_table::get_real_name): Define new member function.
(string_table::get_file_name): Likewise.
(string_table::get_file_name_idx): Likewise.
(string_table::real_names_): Define new class member.
(string_table::file_names_): Likewise.
(string_table::file_map_): Likewise.
(string_table::name_instance_map): Update to be a 2-dimensional
map from function name to file name to function_instance *.
(string_table::~string_table): Deallocate from real_names_ and
file_names_ as well.
(string_table::read): Read file name header from GCOV file.
(autofdo_source_profile::~autofdo_source_profile): Deallocate
from nested map.
(autofdo_source_profile::read): Use file name when storing entry
into map_.
(autofdo_source_profile::get_function_instance_by_decl): Use
DECL_SOURCE_FILE for function_instance lookup.
(autofdo_source_profile::get_function_instance_by_inline_stack):
Likewise.
---
 gcc/auto-profile.cc   | 101 ++
 gcc/testsuite/lib/profopt.exp |   2 +-
 2 files changed, 91 insertions(+), 12 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index e12b3048f20..de186598d43 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -99,7 +99,7 @@ along with GCC; see the file COPYING3.  If not see
 */
 
 #define DEFAULT_AUTO_PROFILE_FILE "fbdata.afdo"
-#define AUTO_PROFILE_VERSION 2
+#define AUTO_PROFILE_VERSION 3
 
 namespace autofdo
 {
@@ -182,13 +182,25 @@ public:
   /* For a given index, returns the string.  */
   const char *get_name (int index) const;
 
+  /* For a given index, get the name without the suffix stripped from it.  */
+  const char *get_real_name (int index) const;
+
+  /* For a given suffixed function name, get the source file name if known.  */
+  const char *get_file_name (const char *real_name) const;
+
+  /* Get the file name from the function name index.  */
+  const char *get_file_name_idx (int index) const;
+
   /* Read profile, return TRUE on success.  */
   bool read ();
 
 private:
   typedef std::map string_index_map;
   string_vector vector_;
+  string_vector real_names_;
+  string_vector file_names_;
   string_index_map map_;
+  string_index_map file_map_;
 };
 
 /* Profile of a function instance:
@@ -318,8 +330,10 @@ public:
 
 private:
   /* Map from function_instance name index (in string_table) to
- function_instance.  */
-  typedef std::map name_function_instance_map;
+ map from source file name to function_instance.  */
+  typedef std::map>
+name_function_instance_map;
 
   autofdo_source_profile () {}
 
@@ -457,7 +471,12 @@ has_indirect_call (basic_block bb)
 string_table::~string_table ()
 {
   for (unsigned i = 0; i < vector_.length (); i++)
-free (vector_[i]);
+{
+  free (vector_[i]);
+  free (real_names_[i]);
+}
+  for (unsigned i = 0; i < file_names_.length (); i++)
+free (file_names_[i]);
 }
 
 
@@ -506,6 +525,34 @@ string_table::get_name (int index) const
   return vector_[index];
 }
 
+/* For a given index, get the name without the suffix stripped from it.  */
+
+const char *
+string_table::get_real_name (int index) const
+{
+  gcc_assert (index > 0 && index < (int) real_names_.length ());
+  return real_names_[index];
+}
+
+/* For a given suffixed function name, get the source file name if known.  */
+
+const char *
+string_table::get_file_name (const char *real_name) const
+{
+  auto it = file_map_.find (real_name);
+  if (it != file_map_.end () && it->second < file_names_.length ())
+return file_names_[it->second];
+  else
+return "";
+}
+
+/* Get the file name from the function name index.  */
+const char *
+string_table::get_file_name_idx (int index) const
+{
+  return get_file_name (get_real_name (index));
+}
+
 /* Read the string table. Return TRUE if reading is successful.  */
 
 bool
@@ -515,12 +562,21 @@ string_table::read ()
 return false;
   /* Skip the length of the section.  */
   gcov_read_unsigned ();
+  /* Read in the file names.  */
+  unsigned file_num = gcov_read_unsigned ();
+  for (unsigned i = 0; i < file_num; i++)
+file_names_.safe_push (const_cast (gcov_read_string ()));
   /* Read in the file name table.  */
   unsigned string_num = gcov_read_unsigned ();
   for (unsigned i = 0; i < string_num; i++)
 {
-  vector_.safe_push (get_original_name (gcov_read_string ()));
+  char *string = const_cast (gcov_read_string ());
+  unsigned file_index = gcov_read_unsigned ();
+  vector_.safe_

[PATCH 0/1] [RFC][AutoFDO]: Source filename tracking in GCOV

2025-06-16 Thread dhruvc
From: Dhruv Chawla 

Introduction


Per PR120229 (gcc.gnu.org/PR120229), the auto-profile pass cannot distinguish
profile information for `function_instance's with the same base name, when
suffixes are removed. To fix this, source file names should be tracked in the
GCOV file information to disambiguate functions. This issue occurs when
privatized clones are created for an LTO partition, when there are
static functions that have the same name in the same partition.

Proposed solution
-

1. In the string_table section of the GCOV file, each function name will have
   the source file-name that it came from written after it, in sequence. The
   current layout of the file is:

   GCOV_TAG_AFDO_FILE_NAMES
 ...

   With this change the layout becomes:

   GCOV_TAG_AFDO_FILE_NAMES
 ...
   ...

2. AUTO_PROFILE_VERSION will be increased from 2 to 3 as this is a breaking
   change to the GCOV file format used by AutoFDO.

A patch is attached with this RFC for a prototype implementation. There
is an open question here: What about backwards compatibility? Should a lack of
source file-name information be handled in the code (to keep supporting version
2)?

Example
---

As an example, consider the following code:

=== test.c ===

#define TRIP 10

__attribute__((noinline, noipa)) static void effect_1() {}
__attribute__((noinline, noipa)) static void effect_2() {}
__attribute__((noinline, noipa)) static int foo() { return 5; }

// Prevent GCC from optimizing the loop
__attribute__((noinline, noipa)) int use(int x) { volatile int y = x; return x; 
}

extern void global();
int main() {
  // 1'000'000'000
  for (int i = 0; i < TRIP; i++) {
// Call only 50% of the time
if (use(i) < TRIP / 2) {
  global();
}

if (foo() < 5) {
  effect_1();
} else {
  effect_2();
}
  }
}

=== test-2.c ===

__attribute__((noinline, noipa)) static void do_nothing() {}
__attribute__((noinline, noipa)) static void effect_1() { do_nothing(); }
__attribute__((noinline, noipa)) static void effect_2() { do_nothing(); }

void global() { effect_1(); effect_2(); }

=== ===

There are four LTO privatized clones created here, two for effect_1() and
two for effect_2(). If effect_1.lto_priv.0 and effect_2.lto_priv.0 are created
for test.c, and effect_1.lto_priv.1 and effect_2.lto_priv.1 are created for
test-2.c, then:
- effect_1.lto_priv.0 is never executed
- effect_2.lto_priv.0 is executed 100% of the time
- effect_1.lto_priv.1 and effect_2.lto_priv.1 are executed 50% of the time

This is reflected in the gcov dump:

main total:3475985 head:0
  <...>
  11: 429139  effect_2.lto_priv.0:421383
  14: 0
  5: global total:407915
0: 204155  effect_1.lto_priv.1:203895
0.1: 203760  effect_2.lto_priv.1:203976
use total:436707 head:436706
  0: 436707
foo total:434247 head:434246
  0: 434247
effect_2.lto_priv.0 total:421383 head:421383
  0: 421383
do_nothing total:407756 head:407756
  0: 407756
effect_2.lto_priv.1 total:203976 head:203976
  0: 203976  do_nothing:204004
effect_1.lto_priv.1 total:203895 head:203895
  0: 203895  do_nothing:203752

Note that effect_1.lto_priv.0 does not show up at all.

When annotating the code, auto-profile is not able to distinguish between
the two effect_1 functions and ends up using the effect_1.lto_priv.1 profile
for both functions. It also merges the profiles for both effect_2 clones:

;; Function effect_1 (effect_1.lto_priv.0, funcdef_no=0, decl_uid=23321, 
cgraph_uid=1, symbol_order=0) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_1 ()
{
   [count: 209702]:
  return;
}

;; Function effect_2 (effect_2.lto_priv.0, funcdef_no=1, decl_uid=23322, 
cgraph_uid=2, symbol_order=1) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_2 ()
{
   [count: 627698]:
  return;
}

;; Function effect_1 (effect_1.lto_priv.1, funcdef_no=4, decl_uid=23329, 
cgraph_uid=8, symbol_order=6) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_1 ()
{
   [count: 209702]:
  do_nothing (); [tail call]
  return;
}

;; Function effect_2 (effect_2.lto_priv.1, funcdef_no=5, decl_uid=23330, 
cgraph_uid=9, symbol_order=7) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_2 ()
{
   [count: 627698]:
  do_nothing (); [tail call]
  return;
}

effect_1.lto_priv.0 should actually have a 0 count, and the profiles for
effect_2.lto_priv.{0,1} should not be merged. After adding the file names to
the GCOV info, the dump looks like the following:

main:test.c total:3373660 head:0
  <...>
  11: 421399  effect_2.lto_priv.0:test.c:412102
  14: 0
  5: global:test-2.c total:403456
0: 201888  effect_1.lto_priv.1:test-2.c:201719
0.1: 201568  effect_2.lto_priv.1:test-2.c:201696
foo:test.c total:432888 head:432888
  0: 432888
use:test.c total:412260 head:412260
  0: 412260
effect_2.lto_priv.0:test.c total:412104 head:412102
  0: 412104
do_nothing:test-2.c total:403359 head:403359
  0: 403359
effect_1.lto_priv.1:test-2

[PATCH 1/1] [RFC][AutoFDO] Propagate information to outline copies if not inlined

2025-06-13 Thread dhruvc
From: Dhruv Chawla 

This patch modifies afdo_set_bb_count to propagate profile information
to outline copies of functions if they are not inlined. This information
gets lost otherwise.

Signed-off-by: Dhruv Chawla 

gcc/ChangeLog:

* gcc/auto-profile.cc (count_info): Adjust comments.
(function_instance::in_afdo_source_profile): New function.
(function_instance::set_in_afdo_source_profile): Likewise.
(function_instance::get_callsites): Likewise.
(autofdo_source_profile::add_function_instance): Likewise.
(function_instance::in_afdo_source_profile_): New member.
(autofdo_source_profile::~autofdo_source_profile): Use
in_afdo_source_profile to prevent a double-free bug.
(afdo_set_bb_count): Propagate inline information to non-inlined
outline copy.
---
 gcc/auto-profile.cc | 72 +++--
 1 file changed, 63 insertions(+), 9 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index e12b3048f20..228b65601d1 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -138,7 +138,7 @@ typedef std::map icall_target_map;
to direct call.  */
 typedef std::set stmt_set;
 
-/* Represent count info of an inline stack.  */
+/* Represent count info of a source position.  */
 class count_info
 {
 public:
@@ -147,12 +147,6 @@ public:
 
   /* Map from indirect call target to its sample count.  */
   icall_target_map targets;
-
-  /* Whether this inline stack is already used in annotation.
-
- Each inline stack should only be used to annotate IR once.
- This will be enforced when instruction-level discriminator
- is supported.  */
 };
 
 /* operator< for "const char *".  */
@@ -230,6 +224,10 @@ public:
 return head_count_;
   }
 
+  bool in_afdo_source_profile () const { return in_afdo_source_profile_; }
+
+  void set_in_afdo_source_profile () { in_afdo_source_profile_ = true; }
+
   /* Traverse callsites of the current function_instance to find one at the
  location of LINENO and callee name represented in DECL.  */
   function_instance *get_function_instance_by_decl (unsigned lineno,
@@ -258,7 +256,8 @@ private:
   typedef std::map callsite_map;
 
   function_instance (unsigned name, gcov_type head_count)
-  : name_ (name), total_count_ (0), head_count_ (head_count)
+: name_ (name), total_count_ (0), head_count_ (head_count),
+  in_afdo_source_profile_ (false)
   {
   }
 
@@ -279,6 +278,13 @@ private:
 
   /* Map from source location to count_info.  */
   position_count_map pos_counts;
+
+  /* If this is an inline instance tracked in afdo_source_profile.  */
+  bool in_afdo_source_profile_;
+
+public:
+  /* Get the callsite map for the function_instance.  */
+  const callsite_map &get_callsites () const { return callsites; }
 };
 
 /* Profile for all functions.  */
@@ -316,6 +322,10 @@ public:
  Return true if INFO is updated.  */
   bool update_inlined_ind_target (gcall *stmt, count_info *info);
 
+  /* Add a new function_instance entry if an inlined function is found in the
+ profile that doesn't have a corresponding entry in the map.  */
+  bool add_function_instance (function_instance *fun);
+
 private:
   /* Map from function_instance name index (in string_table) to
  function_instance.  */
@@ -700,7 +710,8 @@ autofdo_source_profile::~autofdo_source_profile ()
 {
   for (name_function_instance_map::const_iterator iter = map_.begin ();
iter != map_.end (); ++iter)
-delete iter->second;
+if (!iter->second->in_afdo_source_profile ())
+   delete iter->second;
 }
 
 /* For a given DECL, returns the top-level function_instance.  */
@@ -814,6 +825,24 @@ autofdo_source_profile::update_inlined_ind_target (gcall 
*stmt,
   return true;
 }
 
+/* Add a new function_instance entry if an inlined function is found in the
+   profile that doesn't have a corresponding entry in the map. Return false if
+   the function_instance already exists, true if it doesn't.  */
+
+bool
+autofdo_source_profile::add_function_instance (function_instance *fun)
+{
+  int name = fun->name ();
+  if (auto fun_it = map_.find (name); fun_it != map_.end ())
+{
+  fun_it->second->merge (fun);
+  return false;
+}
+
+  map_[name] = fun;
+  return true;
+}
+
 /* Find total count of the callee of EDGE.  */
 
 gcov_type
@@ -1144,6 +1173,31 @@ afdo_set_bb_count (basic_block bb, const stmt_set 
&promoted)
   gimple *stmt = gsi_stmt (gsi);
   if (gimple_clobber_p (stmt) || is_gimple_debug (stmt))
 continue;
+  if (gimple_code (stmt) == GIMPLE_CALL)
+   {
+ tree fn = gimple_call_fndecl (stmt);
+ function_instance *cur
+   = afdo_source_profile->get_function_instance_by_decl (
+ current_function_decl);
+ if (!cur || !fn)
+   continue;
+
+ int name = afdo_string_table->get_index_by_decl (fn);
+ unsigned current_offset = get_relative_location_for_stmt (stmt);
+ 

[PATCH 0/1] [RFC][AutoFDO] Propagate inline information to outline definitions if not inlined

2025-06-12 Thread dhruvc
From: Dhruv Chawla 

For reasons explained in the patch, this patch prevents the loss of profile
information when inlining occurs in the profiled binary but not in the
auto-profile pass as a decision. As an example, for this code:

#define TRIP 10

#ifdef DO_NOINLINE
# define INLINE __attribute__((noinline))
#else
# define INLINE __attribute__((always_inline))
#endif

INLINE int baz(int x, int y, int z) {
if (x < TRIP / 4) {
return y + z * 8;
} else {
return y * z / 2;
}
}

__attribute__((noinline, noipa, optnone))
int passthrough(int x, int y, int z) {
return baz(x, y, z);
}

int main() {
for (int i = 0; i < TRIP; i++) {
passthrough(i, i + 1, i + 2);
}
}

This test case is first compiled without -DDO_NOINLINE, then the
resulting binary is profiled and the profile fed back while compiling
with -DDO_NOINLINE. This results in baz having an inline callsite in
passthrough in the GCOV but no inlining in the FDO binary.

Compiling this with and without the patch gives the following .afdo dumps:

- With the patch:

__attribute__((noinline))
int baz (int x, int y, int z)
{
  int _1;
  int _2;
  int _3;
  int _7;
  int _8;

   [count: 534583]:
  if (x_4(D) <= 24999)
goto ; [100.00%]
  else
goto ; [0.00%]

   [count: 534583]:
  _1 = z_6(D) * 8;
  _8 = _1 + y_5(D);
  goto ; [100.00%]

   [count: 0]:
  _2 = y_5(D) * z_6(D);
  _7 = _2 / 2;

   [count: 534583]:
  # _3 = PHI <_8(3), _7(4)>
  return _3;

}

- Without the patch:

__attribute__((noinline))
int baz (int x, int y, int z)
{
  int _1;
  int _2;
  int _3;
  int _7;
  int _8;

   [local count: 1073741824]:
  if (x_4(D) <= 24999)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870912]:
  _1 = z_6(D) * 8;
  _8 = _1 + y_5(D);
  goto ; [100.00%]

   [local count: 536870912]:
  _2 = y_5(D) * z_6(D);
  _7 = _2 / 2;

   [local count: 1073741824]:
  # _3 = PHI <_8(3), _7(4)>
  return _3;

}

Thus the profile counts are lost in this example, without the patch.

While developing this patch, a few other points also came up:

- Annotation, merging and inlining form a messy set of dependencies in
  the auto-profile pass. The order that functions get annotated in
  affects the decisions that the inliner makes, but the order of
  visiting them is effectively random due to the use of
  FOR_EACH_FUNCTION.

- The main issue is that annotation is performed after inlining. This is
  meant to more accurately mirror the hot path in the profiled binary,
  however there is no guarantee of this because of the randomness in the
  order of visitation.

- Consider the following example:

  int foo () { <...> }
  int bar_1 () { <...> foo (); <..> }
  int bar_2 () { <...> foo (); <..> }
  int bar_3 () { <...> foo (); <..> }

  If foo was always inlined in all three bar_ functions, the profile
  information will contain inline callsites for all bar_ functions.
  There will be no separate profile information for foo in the GCOV file.
  If auto-profile visits them in the order bar_1 -> foo -> bar_2 ->
  bar_3, it is possible that inlining could fail in bar_1 because foo
  would not have any profile counts associated with it. If foo was
  visited first, then that decision could change. This non-determinism
  raises the question of splitting out:

  1. Merging inline callsites into outline copies
  2. Annotating functions
  3. Inlining callsites

  As separate phases in auto-profile, where each effectively executes as a
  sub-pass. As modification of the cgraph is only done in 3., the order of
  visiting functions, at least in 1. and 2., should not matter. Does this
  sound okay?

Splitting out inlining as its own phase also means that it can
eventually be handed off to ipa-inline to handle, thus making
auto-profile independent of early inline. This will simplify the code a
fair bit. Is this a good direction to go in?

Bootstrapped and regtested on aarch64-linux-gnu.

Dhruv Chawla (1):
  [RFC][AutoFDO] Propagate information to outline copies if not inlined

 gcc/auto-profile.cc | 72 +++--
 1 file changed, 63 insertions(+), 9 deletions(-)

-- 
2.44.0



[PATCH] [contrib] Add process_make.py

2025-07-16 Thread dhruvc
From: Dhruv Chawla 

This is a script that makes it easier to visualize the output from make.
It filters out most of the output, leaving only (mostly) messages about
files being compiled, installed and linked. It is not 100% accurate in
the matching, but I feel it does a good enough job.

To use it, simply pipe make into it:

  make | contrib/process_make.py

It also automatically colorizes the output if stdout is a TTY.

Signed-off-by: Dhruv Chawla 

contrib/ChangeLog:

process_make.py: New script.
---
 contrib/process_make.py | 383 
 1 file changed, 383 insertions(+)
 create mode 100755 contrib/process_make.py

diff --git a/contrib/process_make.py b/contrib/process_make.py
new file mode 100755
index 000..bf5aea517a3
--- /dev/null
+++ b/contrib/process_make.py
@@ -0,0 +1,383 @@
+#!/usr/bin/env python3
+
+# Script to colorize the output of make, and simplify it to make reading 
easier.
+
+# Copyright The GNU Toolchain Authors.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+import os
+import re
+import sys
+
+from os import path
+from sys import stdin
+
+
+# Console colors
+
+
+
+class Colors:
+def __init__(self):
+if sys.stdout.isatty():
+self.BLACK = "\033[30m"
+self.RED = "\033[31m"
+self.GREEN = "\033[32m"
+self.YELLOW = "\033[33m"
+self.BLUE = "\033[34m"
+self.MAGENTA = "\033[35m"
+self.CYAN = "\033[36m"
+self.WHITE = "\033[37m"
+self.UNDERLINE = "\033[4m"
+self.RESET = "\033[0m"
+else:
+self.BLACK = ""
+self.RED = ""
+self.GREEN = ""
+self.YELLOW = ""
+self.BLUE = ""
+self.MAGENTA = ""
+self.CYAN = ""
+self.WHITE = ""
+self.UNDERLINE = ""
+self.RESET = ""
+
+
+colors = Colors()
+
+
+# Regular expressions to match output against
+
+
+# Messages printed by make
+MAKE_ENTERING_DIRECTORY = r"make\[\d+\]: Entering directory '(.*)'"
+MAKE_LEAVING_DIRECTORY = r"make\[\d+\]: Leaving directory '(.*)'"
+MAKE_NOTHING_TO_BE_DONE = r"make\[\d+\]: Nothing to be done for '(.*)'."
+
+# Messages printed by configure
+CONFIGURE_CHECKING_FOR = r"checking for (.*)\.\.\. (.*)"
+CONFIGURE_CHECKING_HOW = r"checking how (.*)\.\.\. (.*)"
+CONFIGURE_CHECKING_WHETHER = r"checking whether (.*)\.\.\. (.*)"
+CONFIGURE_OTHER_CHECKS = r"checking .*"
+
+# File patterns
+SOURCE_FILE_PATTERN = r"(.*\.c)|(.*\.cc)|(.*\.cxx)(.*\.m)"
+SOURCE_FILE_PATTERN_FALLBACK = r"(.*\.o)|(.*\.lo)|(.*\.gch)"
+LINKED_FILE_PATTERN = r"(.*\.so)|(.*\.a)"
+LINKED_FILE_PATTERN_FALLBACK = r"(.*\.la)|(.*\.o)"
+ARCHIVE_FILE_PATTERN = r"(.*\.a)|(.*\.la)"
+
+# Libtool itself
+LIBTOOL_COMPILE = "libtool: compile: (.*)"
+LIBTOOL_LINK = "libtool: link: (.*)"
+LIBTOOL_INSTALL = "libtool: install: (.*)"
+
+# Bash invoking libtool
+BASH_LT_COMPILE = "(?:/bin/bash (?:.*)libtool(?:.*)--mode=compile(.*))"
+BASH_LT_LINK = "(?:/bin/bash (?:.*)libtool(?:.*)--mode=link(.*))"
+
+# Noisy information
+IF_PATTERN = "if (.*)"
+ELSE_PATTERN = "else (.*); fi(.*)"
+ECHO_PATTERN = "(echo .*)"
+MOVE_IF_CHANGE_PATTERN = "(/bin/bash.*move-if-change.*)"
+MOVE_PATTERN = "mv -f (.*)"
+REMOVE_PATTERN = "rm -r?f (.*)"
+APPLYING_PATTERN = "Applying(.*)"
+FIXING_PATTERN = "Fixing(.*)"
+FIXED_PATTERN = "Fixed:(.*)"
+COMMENT_PATTERN = "# (.*)"
+DEPBASE_PATTERN = "depbase(.*)"
+
+# Compiled compiler invocation
+XGCC_PATTERN = r"[A-Za-z0-9_\-\./]+/xgcc (.*)"
+XGPP_PATTERN = r"[A-Za-z0-9_\-\./]+/xg\+\+ (.*)"
+GFORTRAN_PATTERN = r"[A-Za-z0-9_\-\./]+/gfortran (.*)"
+
+# Awk
+AWK_PATTERN = "(?:awk|gawk) (.*)"
+
+# Archive creation
+ARCHIVE_PATTERN = "(?:ar|ranlib) (.*)"
+
+
+# Helper function (print usage)
+
+
+
+def PrintUsage():
+print("Usage: process_make.py [ ]")
+print("")
+print("  Either no argum