[PATCH v3 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions

2025-05-14 Thread dhruvc
From: Dhruv Chawla This patch modifies the shift expander to immediately lower constant shifts without unspec. It also modifies the ADR, SRA and ADDHNB patterns to match the lowered forms of the shifts, as the predicate register is not required for these instructions. Bootstrapped and regtested

[PATCH v3 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-14 Thread dhruvc
From: Dhruv Chawla This patch modifies the intrinsic expanders to expand svlsl and svlsr to unpredicated forms when the predicate is a ptrue. It also folds the following pattern: lsl , , lsr , , orr , , to: revb/h/w , when the shift amount is equal to half the bitwidth of the reg

[PATCH v4 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions

2025-05-21 Thread dhruvc
From: Dhruv Chawla This patch modifies the shift expander to immediately lower constant shifts without unspec. It also modifies the ADR, SRA and ADDHNB patterns to match the lowered forms of the shifts, as the predicate register is not required for these instructions. Bootstrapped and regtested

[PATCH v4 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-21 Thread dhruvc
From: Dhruv Chawla This patch folds the following pattern: lsl , , lsr , , orr , , to: revb/h/w , when the shift amount is equal to half the bitwidth of the register. Bootstrapped and regtested on aarch64-linux-gnu. Signed-off-by: Dhruv Chawla Co-authored-by: Richard Sandiford

[PATCH] [MAINTAINERS] Add myself to write after approval and DCO.

2025-05-22 Thread dhruvc
/MAINTAINERS +++ b/MAINTAINERS @@ -402,6 +402,7 @@ Stephane Carrez ciceron Gabriel Charettegchare Arnaud Charlet charlet Chandra Chavva - +Dhruv Chawladhruvc

[PATCH] widening_mul: Make better use of overflowing operations in codegen of min/max(a, add/sub(a, b))

2025-05-29 Thread dhruvc
From: Dhruv Chawla This patch folds the following patterns: - max (a, add (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? sum : a - max (a, sub (a, b)) -> [sum, ovf] = subo (a, b); !ovf ? a : sum - min (a, add (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? a : sum - min (a, sub (a, b)) -> [sum, ovf] = a

[PATCH] [RFC][AutoFDO] Source filename tracking in GCOV

2025-06-16 Thread dhruvc
From: Dhruv Chawla This patch modifies the auto-profile pass to read file names from GCOV. A function is only annotated with a set of profile counts if its file name matches the file name that the function in the GCOV file was recorded with. It also bumps the GCOV version to 3 as the file format

[PATCH 0/1] [RFC][AutoFDO]: Source filename tracking in GCOV

2025-06-16 Thread dhruvc
From: Dhruv Chawla Introduction Per PR120229 (gcc.gnu.org/PR120229), the auto-profile pass cannot distinguish profile information for `function_instance's with the same base name, when suffixes are removed. To fix this, source file names should be tracked in the GCOV file informatio

[PATCH 1/1] [RFC][AutoFDO] Propagate information to outline copies if not inlined

2025-06-13 Thread dhruvc
From: Dhruv Chawla This patch modifies afdo_set_bb_count to propagate profile information to outline copies of functions if they are not inlined. This information gets lost otherwise. Signed-off-by: Dhruv Chawla gcc/ChangeLog: * gcc/auto-profile.cc (count_info): Adjust comments.

[PATCH 0/1] [RFC][AutoFDO] Propagate inline information to outline definitions if not inlined

2025-06-12 Thread dhruvc
From: Dhruv Chawla For reasons explained in the patch, this patch prevents the loss of profile information when inlining occurs in the profiled binary but not in the auto-profile pass as a decision. As an example, for this code: #define TRIP 10 #ifdef DO_NOINLINE # define INLINE __attri

[PATCH] [contrib] Add process_make.py

2025-07-16 Thread dhruvc
From: Dhruv Chawla This is a script that makes it easier to visualize the output from make. It filters out most of the output, leaving only (mostly) messages about files being compiled, installed and linked. It is not 100% accurate in the matching, but I feel it does a good enough job. To use it