Re: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

2023-10-03 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

It is not necessary to do any mods on your patch. I've just answered
the questions which you asked me. The adds are faster for the ARC CPUs
which are still in production, and I suppose we can leverage the LP
instruction use with DBNZ instructions for implementing loops. I'll
come back to you asap, after I've got the nightly results :)

Thank you,
Claudiu

On Tue, Oct 3, 2023 at 6:34 PM Roger Sayle  wrote:
>
>
> Hi Claudiu,
> Thanks for the answers to my technical questions.
> If you'd prefer to update arc.md's add3 pattern first,
> I'm happy to update/revise my patch based on this
> and your feedback, for example preferring add over
> asl_s (or controlling this choice with -Os).
>
> Thanks again.
> Roger
> --
>
> > -Original Message-
> > From: Claudiu Zissulescu 
> > Sent: 03 October 2023 15:26
> > To: Roger Sayle ; gcc-patches@gcc.gnu.org
> > Subject: RE: [ARC PATCH] Split SImode shifts pre-reload on
> > !TARGET_BARREL_SHIFTER.
> >
> > Hi Roger,
> >
> > It was nice to meet you too.
> >
> > Thank you in looking into the ARC's non-Barrel Shifter configurations.  I
> will dive
> > into your patch asap, but before starting here are a few of my comments:
> >
> > -Original Message-
> > From: Roger Sayle 
> > Sent: Thursday, September 28, 2023 2:27 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Claudiu Zissulescu 
> > Subject: [ARC PATCH] Split SImode shifts pre-reload on
> > !TARGET_BARREL_SHIFTER.
> >
> >
> > Hi Claudiu,
> > It was great meeting up with you and the Synopsys ARC team at the GNU
> tools
> > Cauldron in Cambridge.
> >
> > This patch is the first in a series to improve SImode and DImode shifts
> and rotates
> > in the ARC backend.  This first piece splits SImode shifts, for
> > !TARGET_BARREL_SHIFTER targets, after combine and before reload, in the
> split1
> > pass, as suggested by the FIXME comment above output_shift in arc.cc.  To
> do
> > this I've copied the implementation of the x86_pre_reload_split function
> from
> > i386 backend, and renamed it arc_pre_reload_split.
> >
> > Although the actual implementations of shifts remain the same (as in
> > output_shift), having them as explicit instructions in the RTL stream
> allows better
> > scheduling and use of compact forms when available.  The benefits can be
> seen in
> > two short examples below.
> >
> > For the function:
> > unsigned int foo(unsigned int x, unsigned int y) {
> >   return y << 2;
> > }
> >
> > GCC with -O2 -mcpu=em would previously generate:
> > foo:add r1,r1,r1
> > add r1,r1,r1
> > j_s.d   [blink]
> > mov_s   r0,r1   ;4
> >
> > [CZI] The move shouldn't be generated indeed. The use of ADDs are slightly
> > beneficial for older ARCv1 arches.
> >
> > and with this patch now generates:
> > foo:asl_s r0,r1
> > j_s.d   [blink]
> > asl_s r0,r0
> >
> > [CZI] Nice. This new sequence is as fast as we can get for our ARCv2 cpus.
> >
> > Notice the original (from shift_si3's output_shift) requires the shift
> sequence to be
> > monolithic with the same destination register as the source (requiring an
> extra
> > mov_s).  The new version can eliminate this move, and schedule the second
> asl in
> > the branch delay slot of the return.
> >
> > For the function:
> > int x,y,z;
> >
> > void bar()
> > {
> >   x <<= 3;
> >   y <<= 3;
> >   z <<= 3;
> > }
> >
> > GCC -O2 -mcpu=em currently generates:
> > bar:push_s  r13
> > ld.as   r12,[gp,@x@sda] ;23
> > ld.as   r3,[gp,@y@sda]  ;23
> > mov r2,0
> > add3 r12,r2,r12
> > mov r2,0
> > add3 r3,r2,r3
> > ld.as   r2,[gp,@z@sda]  ;23
> > st.as   r12,[gp,@x@sda] ;26
> > mov r13,0
> > add3 r2,r13,r2
> > st.as   r3,[gp,@y@sda]  ;26
> > st.as   r2,[gp,@z@sda]  ;26
> > j_s.d   [blink]
> > pop_s   r13
> >
> > where each shift by 3, uses ARC's add3 instruction, which is similar to
> x86's lea
> > implementing x = (y<<3) + z, but requires the value zero to be placed in a
> > temporary register "z".  Splitting this before reload allows these pseudos
> to be
> > shared/reused.  With this patch, we get
> >
> > bar:ld.as   r2,[gp,@x@sda]  ;23
> > mov_s   r3,0;3
> > add3r2,r3,r2
> > ld.as   r3,[gp,@y@sda]  ;23
> > st.as   r2,[gp,@x@sda]  ;26
> > ld.as   r2,[gp,@z@sda]  ;23
> > mov_s   r12,0   ;3
> > add3r3,r12,r3
> > add3r2,r12,r2
> > st.as   r3,[gp,@y@sda]  ;26
> > st.as   r2,[gp,@z@sda]  ;26
> > j_s [blink]
> >
> > [CZI] Looks great, but it also shows that I've forgot to add to ADD3
> instruction the
> > Ra,LIMM,RC variant, which will lead to have instead of
> > mov_s   r3,0;3
> > add3r2,r3,r2
> > Only this add3,0,r2, Indeed it is longer instruction but faster.
> >
> > Unfortunately, register allocation means that we only share two of the
> three
> > "mov_s z,0", but this is sufficient to red

Re: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

2023-10-04 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

The patch as it is passed the validation, and it is in general OK.
Although it doesn't address the elephant in the room, namely
output_shift function, it is a welcome cleanup.
I would like you to split the patch in two. One which deals with
improvements on shifts in absence of a barrel shifter, and one which
addresses the default instruction length, as they can be seen as
separate work. Please feel free to commit resulting patches to the
mainline.

Thank you for your contribution,
Claudiu

On Thu, Sep 28, 2023 at 2:27 PM Roger Sayle  wrote:
>
>
> Hi Claudiu,
> It was great meeting up with you and the Synopsys ARC team at the
> GNU tools Cauldron in Cambridge.
>
> This patch is the first in a series to improve SImode and DImode
> shifts and rotates in the ARC backend.  This first piece splits
> SImode shifts, for !TARGET_BARREL_SHIFTER targets, after combine
> and before reload, in the split1 pass, as suggested by the FIXME
> comment above output_shift in arc.cc.  To do this I've copied the
> implementation of the x86_pre_reload_split function from i386
> backend, and renamed it arc_pre_reload_split.
>
> Although the actual implementations of shifts remain the same
> (as in output_shift), having them as explicit instructions in
> the RTL stream allows better scheduling and use of compact forms
> when available.  The benefits can be seen in two short examples
> below.
>
> For the function:
> unsigned int foo(unsigned int x, unsigned int y) {
>   return y << 2;
> }
>
> GCC with -O2 -mcpu=em would previously generate:
> foo:add r1,r1,r1
> add r1,r1,r1
> j_s.d   [blink]
> mov_s   r0,r1   ;4
> and with this patch now generates:
> foo:asl_s r0,r1
> j_s.d   [blink]
> asl_s r0,r0
>
> Notice the original (from shift_si3's output_shift) requires the
> shift sequence to be monolithic with the same destination register
> as the source (requiring an extra mov_s).  The new version can
> eliminate this move, and schedule the second asl in the branch
> delay slot of the return.
>
> For the function:
> int x,y,z;
>
> void bar()
> {
>   x <<= 3;
>   y <<= 3;
>   z <<= 3;
> }
>
> GCC -O2 -mcpu=em currently generates:
> bar:push_s  r13
> ld.as   r12,[gp,@x@sda] ;23
> ld.as   r3,[gp,@y@sda]  ;23
> mov r2,0
> add3 r12,r2,r12
> mov r2,0
> add3 r3,r2,r3
> ld.as   r2,[gp,@z@sda]  ;23
> st.as   r12,[gp,@x@sda] ;26
> mov r13,0
> add3 r2,r13,r2
> st.as   r3,[gp,@y@sda]  ;26
> st.as   r2,[gp,@z@sda]  ;26
> j_s.d   [blink]
> pop_s   r13
>
> where each shift by 3, uses ARC's add3 instruction, which is similar
> to x86's lea implementing x = (y<<3) + z, but requires the value zero
> to be placed in a temporary register "z".  Splitting this before reload
> allows these pseudos to be shared/reused.  With this patch, we get
>
> bar:ld.as   r2,[gp,@x@sda]  ;23
> mov_s   r3,0;3
> add3r2,r3,r2
> ld.as   r3,[gp,@y@sda]  ;23
> st.as   r2,[gp,@x@sda]  ;26
> ld.as   r2,[gp,@z@sda]  ;23
> mov_s   r12,0   ;3
> add3r3,r12,r3
> add3r2,r12,r2
> st.as   r3,[gp,@y@sda]  ;26
> st.as   r2,[gp,@z@sda]  ;26
> j_s [blink]
>
> Unfortunately, register allocation means that we only share two of the
> three "mov_s z,0", but this is sufficient to reduce register pressure
> enough to avoid spilling r13 in the prologue/epilogue.
>
> This patch also contains a (latent?) bug fix.  The implementation of
> the default insn "length" attribute, assumes instructions of type
> "shift" have two input operands and accesses operands[2], hence
> specializations of shifts that don't have a operands[2], need to be
> categorized as type "unary" (which results in the correct length).
>
> This patch has been tested on a cross-compiler to arc-elf (hosted on
> x86_64-pc-linux-gnu), but because I've an incomplete tool chain many
> of the regression test fail, but there are no new failures with new
> test cases added below.  If you can confirm that there are no issues
> from additional testing, is this OK for mainline?
>
> Finally a quick technical question.  ARC's zero overhead loops require
> at least two instructions in the loop, so currently the backend's
> implementation of shr20 pads the loop body with a "nop".
>
> lshr20: mov.f lp_count, 20
> lpnz2f
> lsr r0,r0
> nop
> 2:  # end single insn loop
> j_s [blink]
>
> could this be more efficiently implemented as:
>
> lshr20: mov lp_count, 10
> lp 2f
> lsr_s r0,r0
> lsr_s r0,r0
> 2:  # end single insn loop
> j_s [blink]
>
> i.e. half the number of iterations, but doing twice as much useful
> work in each iteration?  Or might the nop be free on advanced
> microarchitectures, and/or the consecutive dependent shifts cause
> a pipeline stall?  It would be nice to fuse loops t

Re: [ARC PATCH] Split asl dst, 1, src into bset dst, 0, src to implement 1<

2023-10-16 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

Indeed, I was missing the patch file.

Approved.

Thank you for your contribution,
 Claudiu

On Sun, Oct 15, 2023 at 11:14 AM Roger Sayle  wrote:
>
> I’ve done it again. ENOPATCH.
>
>
>
> From: Roger Sayle 
> Sent: 15 October 2023 09:13
> To: 'gcc-patches@gcc.gnu.org' 
> Cc: 'Claudiu Zissulescu' 
> Subject: [ARC PATCH] Split asl dst,1,src into bset dst,0,src to implement 
> 1<
>
>
>
>
> This patch adds a pre-reload splitter to arc.md, to use the bset (set
>
> specific bit instruction) to implement 1<
> on ARC processors that don't have a barrel shifter.
>
>
>
> Currently,
>
>
>
> int foo(int x) {
>
>   return 1 << x;
>
> }
>
>
>
> when compiled with -O2 -mcpu=em is compiled as a loop:
>
>
>
> foo:mov_s   r2,1;3
>
> and.f lp_count,r0, 0x1f
>
> lpnz2f
>
> add r2,r2,r2
>
> nop
>
> 2:  # end single insn loop
>
> j_s.d   [blink]
>
> mov_s   r0,r2   ;4
>
>
>
> with this patch we instead generate a single instruction:
>
>
>
> foo:bsetr0,0,r0
>
> j_s [blink]
>
>
>
>
>
> Finger-crossed this passes Claudiu's nightly testing.  This patch
>
> has been minimally tested by building a cross-compiler cc1 to
>
> arc-linux hosted on x86_64-pc-linux-gnu with no additional failures
>
> seen with make -k check.  Ok for mainline?  Thanks in advance.
>
>
>
>
>
> 2023-10-15  Roger Sayle  
>
>
>
> gcc/ChangeLog
>
> * config/arc/arc.md (*ashlsi3_1): New pre-reload splitter to
>
> use bset dst,0,src to implement 1<
>
>
>
>
> Cheers,
>
> Roger
>
> --
>
>


Re: [ARC PATCH] Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER.

2023-10-24 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

Your patch doesn't introduce new regressions. However, before pushing
to the mainline you need to fix some issues:
1. Please fix the trailing spaces and blocks of 8 spaces which should
be replaced with tabs. You can use check_GNU_style.py script to spot
them.
2. Please use capital letters for code iterators (i.e., any_shift_rotate).

Once the above issues are fixed, please proceed with your commit.

Thank you for your contribution,
Claudiu

On Sun, Oct 8, 2023 at 10:07 PM Roger Sayle  wrote:
>
>
> This patch completes the ARC back-end's transition to using pre-reload
> splitters for SImode shifts and rotates on targets without a barrel
> shifter.  The core part is that the shift_si3 define_insn is no longer
> needed, as shifts and rotates that don't require a loop are split
> before reload, and then because shift_si3_loop is the only caller
> of output_shift, both can be significantly cleaned up and simplified.
> The output_shift function (Claudiu's "the elephant in the room") is
> renamed output_shift_loop, which handles just the four instruction
> zero-overhead loop implementations.
>
> Aside from the clean-ups, the user visible changes are much improved
> implementations of SImode shifts and rotates on affected targets.
>
> For the function:
> unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); }
>
> GCC with -O2 -mcpu=em would previously generate:
>
> rotr_1: lsr_s r2,r0
> bmsk_s r0,r0,0
> ror r0,r0
> j_s.d   [blink]
> or_sr0,r0,r2
>
> with this patch, we now generate:
>
> j_s.d   [blink]
> ror r0,r0
>
> For the function:
> unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); }
>
> GCC with -O2 -mcpu=em would previously generate:
>
> rotr_31:
> mov_s   r2,r0   ;4
> asl_s r0,r0
> add.f 0,r2,r2
> rlc r2,0
> j_s.d   [blink]
> or_sr0,r0,r2
>
> with this patch we now generate an add.f followed by an adc:
>
> rotr_31:
> add.f   r0,r0,r0
> j_s.d   [blink]
> add.cs  r0,r0,1
>
>
> Shifts by constants requiring a loop have been improved for even counts
> by performing two operations in each iteration:
>
> int shl10(int x) { return x >> 10; }
>
> Previously looked like:
>
> shl10:  mov.f lp_count, 10
> lpnz2f
> asr r0,r0
> nop
> 2:  # end single insn loop
> j_s [blink]
>
>
> And now becomes:
>
> shl10:
> mov lp_count,5
> lp  2f
> asr r0,r0
> asr r0,r0
> 2:  # end single insn loop
> j_s [blink]
>
>
> So emulating ARC's SWAP on architectures that don't have it:
>
> unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); }
>
> previously required 10 instructions and ~70 cycles:
>
> rotr_16:
> mov_s   r2,r0   ;4
> mov.f lp_count, 16
> lpnz2f
> add r0,r0,r0
> nop
> 2:  # end single insn loop
> mov.f lp_count, 16
> lpnz2f
> lsr r2,r2
> nop
> 2:  # end single insn loop
> j_s.d   [blink]
> or_sr0,r0,r2
>
> now becomes just 4 instructions and ~18 cycles:
>
> rotr_16:
> mov lp_count,8
> lp  2f
> ror r0,r0
> ror r0,r0
> 2:  # end single insn loop
> j_s [blink]
>
>
> This patch has been tested with a cross-compiler to arc-linux hosted
> on x86_64-pc-linux-gnu and (partially) tested with the compile-only
> portions of the testsuite with no regressions.  Ok for mainline, if
> your own testing shows no issues?
>
>
> 2023-10-07  Roger Sayle  
>
> gcc/ChangeLog
> * config/arc/arc-protos.h (output_shift): Rename to...
> (output_shift_loop): Tweak API to take an explicit rtx_code.
> (arc_split_ashl): Prototype new function here.
> (arc_split_ashr): Likewise.
> (arc_split_lshr): Likewise.
> (arc_split_rotl): Likewise.
> (arc_split_rotr): Likewise.
> * config/arc/arc.cc (output_shift): Delete local prototype.  Rename.
> (output_shift_loop): New function replacing output_shift to output
> a zero overheap loop for SImode shifts and rotates on ARC targets
> without barrel shifter (i.e. no hardware support for these insns).
> (arc_split_ashl): New helper function to split *ashlsi3_nobs.
> (arc_split_ashr): New helper function to split *ashrsi3_nobs.
> (arc_split_lshr): New helper function to split *lshrsi3_nobs.
> (arc_split_rotl): New helper function to split *rotlsi3_nobs.
> (arc_split_rotr): New helper function to split *rotrsi3_nobs.
> * config/arc/arc.md (any_shift_rotate): New define_code_iterator.
> (define_code_attr insn): New code attribute to map to pattern name.
> (si3): New expander unifying previous ashlsi3,
> ashrsi3 and lshrsi3 define_expands.  Adds rotlsi3 and rotrsi3.
> (*si3_nobs): New defin

Re: [ARC PATCH] Improved SImode shifts and rotates with -mswap.

2023-10-30 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

+(define_insn "si2_cnt16"
+  [(set (match_operand:SI 0 "dest_reg_operand" "=w")

Please use "register_operand", and "r" constraint.

+(ANY_ROTATE:SI (match_operand:SI 1 "register_operand" "c")

Please use "r" constraint instead of "c".

+   (const_int 16)))]
+  "TARGET_SWAP"
+  "swap\\t%0,%1"

Otherwise, it looks good to me. Please fix the above and proceed with
your commit.

Thank you for your contribution,
Claudiu


Re: [ARC PATCH] Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

2023-10-30 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

You have a block of 8 spaces that needs to be replaced by tabs:
gcc/config/arc/arc.cc:5538:0:   if (n < 4)

Please fix the above, and proceed with your commit.

Thank you,
Claudiu

On Sun, Oct 29, 2023 at 11:16 AM Roger Sayle  wrote:
>
>
> This patch overhauls the ARC backend's insn_cost target hook, and makes
> some related improvements to rtx_costs, BRANCH_COST, etc.  The primary
> goal is to allow the backend to indicate that shifts and rotates are
> slow (discouraged) when the CPU doesn't have a barrel shifter. I should
> also acknowledge Richard Sandiford for inspiring the use of set_cost
> in this rewrite of arc_insn_cost; this implementation borrows heavily
> for the target hooks for AArch64 and ARM.
>
> The motivating example is derived from PR rtl-optimization/110717.
>
> struct S { int a : 5; };
> unsigned int foo (struct S *p) {
>   return p->a;
> }
>
> With a barrel shifter, GCC -O2 generates the reasonable:
>
> foo:ldb_s   r0,[r0]
> asl_s   r0,r0,27
> j_s.d   [blink]
> asr_s   r0,r0,27
>
> What's interesting is that during combine, the middle-end actually
> has two shifts by three bits, and a sign-extension from QI to SI.
>
> Trying 8, 9 -> 11:
> 8: r158:SI=r157:QI#0<<0x3
>   REG_DEAD r157:QI
> 9: r159:SI=sign_extend(r158:SI#0)
>   REG_DEAD r158:SI
>11: r155:SI=r159:SI>>0x3
>   REG_DEAD r159:SI
>
> Whilst it's reasonable to simplify this to two shifts by 27 bits when
> the CPU has a barrel shifter, it's actually a significant pessimization
> when these shifts are implemented by loops.  This combination can be
> prevented if the backend provides accurate-ish estimates for insn_cost.
>
>
> Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:
>
> foo:ldb_s   r0,[r0]
> mov lp_count,27
> lp  2f
> add r0,r0,r0
> nop
> 2:  # end single insn loop
> mov lp_count,27
> lp  2f
> asr r0,r0
> nop
> 2:  # end single insn loop
> j_s [blink]
>
> which contains two loops and requires about ~113 cycles to execute.
> With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:
>
> foo:ldb_s   r0,[r0]
> mov_s   r2,0;3
> add3r0,r2,r0
> sexb_s  r0,r0
> asr_s   r0,r0
> asr_s   r0,r0
> j_s.d   [blink]
> asr_s   r0,r0
>
> which requires only ~6 cycles, for the shorter shifts by 3 and sign
> extension.
>
>
> Tested with a cross-compiler to arc-linux hosted on x86_64,
> with no new (compile-only) regressions from make -k check.
> Ok for mainline if this passes Claudiu's nightly testing?
>
>
> 2023-10-29  Roger Sayle  
>
> gcc/ChangeLog
> * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
> Provide reasonable values for SHIFTS and ROTATES by constant
> bit counts depending upon TARGET_BARREL_SHIFTER.
> (arc_insn_cost): Use insn attributes if the instruction is
> recognized.  Avoid calling get_attr_length for type "multi",
> i.e. define_insn_and_split patterns without explicit type.
> Fall-back to set_rtx_cost for single_set and pattern_cost
> otherwise.
> * config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
> (BRANCH_COST): Improve/correct definition.
> (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.
>
>
> Thanks again,
> Roger
> --
>


Re: [ARC PATCH] Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.

2023-10-30 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

Do you want to say bmsk_s instead of msk_s here:
+/* { dg-final { scan-assembler "msk_s\\s+r0,r0,0" } } */

Anyhow, the patch looks good. Proceed with your commit.

Thank you,
Claudiu

On Mon, Oct 30, 2023 at 5:05 AM Jeff Law  wrote:
>
>
>
> On 10/28/23 10:47, Roger Sayle wrote:
> >
> > This patch optimizes PR middle-end/101955 for the ARC backend.  On ARC
> > CPUs with a barrel shifter, using two shifts is (probably) optimal as:
> >
> >  asl_s   r0,r0,31
> >  asr_s   r0,r0,31
> >
> > but without a barrel shifter, GCC -O2 -mcpu=em currently generates:
> >
> >  and r2,r0,1
> >  ror r2,r2
> >  add.f   0,r2,r2
> >  sbc r0,r0,r0
> >
> > with this patch, we now generate the smaller, faster and non-flags
> > clobbering:
> >
> >  bmsk_s  r0,r0,0
> >  neg_s   r0,r0
> >
> > Tested with a cross-compiler to arc-linux hosted on x86_64,
> > with no new (compile-only) regressions from make -k check.
> > Ok for mainline if this passes Claudiu's nightly testing?
> >
> >
> > 2023-10-28  Roger Sayle  
> >
> > gcc/ChangeLog
> >  PR middle-end/101955
> >  * config/arc/arc.md (*extvsi_1_0): New define_insn_and_split
> >  to convert sign extract of the least significant bit into an
> >  AND $1 then a NEG when !TARGET_BARREL_SHIFTER.
> >
> > gcc/testsuite/ChangeLog
> >  PR middle-end/101955
> >  * gcc.target/arc/pr101955.c: New test case.
> Good catch.  Looking to do something very similar on the H8 based on
> your work here.
>
> One the H8 we can use bld to load a bit from an 8 bit register into the
> C flag.  Then we use subtract with carry to get an 8 bit 0/-1 which we
> can then sign extend to 16 or 32 bits.  That covers bit positions 0..15
> of an SImode input.
>
> For bits 16..31 we can move the high half into the low half, the use the
> bld sequence.
>
> For bit zero the and+neg is the same number of clocks and size as bld
> based sequence.  But it'll simulate faster, so it's special cased.
>
>
> Jeff
>


Re: [ARC PATCH] Improve DImode left shift by a single bit.

2023-11-03 Thread Claudiu Zissulescu Ianculescu
Missed this one.

Ok, please proceed with the commit.

Thank you for your contribution,
Claudiu

On Sat, Oct 28, 2023 at 4:05 PM Roger Sayle  wrote:
>
>
> This patch improves the code generated for X << 1 (and for X + X) when
> X is 64-bit DImode, using the same two instruction code sequence used
> for DImode addition.
>
> For the test case:
>
> long long foo(long long x) { return x << 1; }
>
> GCC -O2 currently generates the following code:
>
> foo:lsr r2,r0,31
> asl_s   r1,r1,1
> asl_s   r0,r0,1
> j_s.d   [blink]
> or_sr1,r1,r2
>
> and on CPU without a barrel shifter, i.e. -mcpu=em
>
> foo:add.f   0,r0,r0
> asl_s   r1,r1
> rlc r2,0
> asl_s   r0,r0
> j_s.d   [blink]
> or_sr1,r1,r2
>
> with this patch (both with and without a barrel shifter):
>
> foo:add.f   r0,r0,r0
> j_s.d   [blink]
> adc r1,r1,r1
>
> [For Jeff Law's benefit a similar optimization is also applicable to
> H8300H, that could also use a two instruction sequence (plus rts) but
> currently GCC generates 16 instructions (plus an rts) for foo above.]
>
> Tested with a cross-compiler to arc-linux hosted on x86_64,
> with no new (compile-only) regressions from make -k check.
> Ok for mainline if this passes Claudiu's nightly testing?
>
> 2023-10-28  Roger Sayle  
>
> gcc/ChangeLog
> * config/arc/arc.md (addsi3): Fix GNU-style code formatting.
> (adddi3): Change define_expand to generate an *adddi3.
> (*adddi3): New define_insn_and_split to lower DImode additions
> during the split1 pass (after combine and before reload).
> (ashldi3): New define_expand to (only) generate *ashldi3_cnt1
> for DImode left shifts by a single bit.
> (*ashldi3_cnt1): New define_insn_and_split to lower DImode
> left shifts by one bit to an *adddi3.
>
> gcc/testsuite/ChangeLog
> * gcc.target/arc/adddi3-1.c: New test case.
> * gcc.target/arc/ashldi3-1.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] [ARC] Use hardware support for double-precision compare instructions.

2020-01-17 Thread Claudiu Zissulescu Ianculescu
It is already ported :)
https://github.com/gcc-mirror/gcc/commit/555e4a053951a0ae24835a266e71819336d7f637#diff-5b8bd26eec6c2b9f560870c205416edc

Cheers,
Claudiu

On Wed, Jan 15, 2020 at 1:49 AM Vineet Gupta  wrote:
>
> On 12/9/19 1:52 AM, Claudiu Zissulescu wrote:
> > Although the FDCMP (the double precision floating point compare 
> > instruction) is added to the compiler, it is not properly used via cstoredi 
> > pattern. Fix it.
> >
> > OK to apply?
> > Claudidu
> >
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * config/arc/arc.md (iterator SDF): Check TARGET_FP_DP_BASE.
> >   (cstoredi4): Use TARGET_HARD_FLOAT.
> > ---
> >  gcc/config/arc/arc.md | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> > index b592f25afce..bd44030b409 100644
> > --- a/gcc/config/arc/arc.md
> > +++ b/gcc/config/arc/arc.md
> > @@ -3749,7 +3749,7 @@ archs4x, archs4xd"
> >  })
> >
> >  (define_mode_iterator SDF [(SF "TARGET_FP_SP_BASE || TARGET_OPTFPE")
> > -(DF "TARGET_OPTFPE")])
> > +(DF "TARGET_FP_DP_BASE || TARGET_OPTFPE")])
> >
> >  (define_expand "cstore4"
> >[(set (reg:CC CC_REG)
> > @@ -3759,7 +3759,7 @@ archs4x, archs4xd"
> >   (match_operator:SI 1 "comparison_operator" [(reg CC_REG)
> >   (const_int 0)]))]
> >
> > -  "TARGET_FP_SP_BASE || TARGET_OPTFPE"
> > +  "TARGET_HARD_FLOAT || TARGET_OPTFPE"
> >  {
> >gcc_assert (XEXP (operands[1], 0) == operands[2]);
> >gcc_assert (XEXP (operands[1], 1) == operands[3]);
>
> Can this be backported to gcc-9 please ?
> glibc testing uses gcc-9
>
> Thx,
> -Vineet


Re: [PATCH 3/4] [ARC] Save mlo/mhi registers when ISR.

2020-01-27 Thread Claudiu Zissulescu Ianculescu
Yes, I know :(

Thank you for your help. All four patches pushed.
Claudiu

On Wed, Jan 22, 2020 at 10:31 PM Jeff Law  wrote:
>
> On Wed, 2020-01-22 at 10:14 +0200, Claudiu Zissulescu wrote:
> > ARC600 when configured with mul64 instructions uses mlo and mhi
> > registers to store the 64 result of the multiplication. In the ARC600
> > ISA documentation we have the next register configuration when ARC600
> > is configured only with mul64 extension:
> >
> > Register | Name | Use
> > -+--+
> > r57  | mlo  | Multiply low 32 bits, read only
> > r58  | mmid | Multiply middle 32 bits, read only
> > r59  | mhi  | Multiply high 32 bits, read only
> > -
> >
> > When used for Co-existence configurations we have for mul64 the next
> > registers used:
> >
> > Register | Name | Use
> > -+--+
> > r58  | mlo  | Multiply low 32 bits, read only
> > r59  | mhi  | Multiply high 32 bits, read only
> > -
> >
> > Note that mlo/mhi assignment doesn't swap when bigendian CPU
> > configuration is used.
> >
> > The compiler will always use r58 for mlo, regardless of the
> > configuration choosen to ensure mlo/mhi correct splitting. Fixing mlo
> > to the right register number is done at assembly time. The dwarf info
> > is also notified via DBX_... macro. Both mlo/mhi registers needs to
> > saved when ISR happens using a custom sequence.
> >
> > gcc/
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * config/arc/arc-protos.h (gen_mlo): Remove.
> >   (gen_mhi): Likewise.
> >   * config/arc/arc.c (AUX_MULHI): Define.
> >   (arc_must_save_reister): Special handling for r58/59.
> >   (arc_compute_frame_size): Consider mlo/mhi registers.
> >   (arc_save_callee_saves): Emit fp/sp move only when emit_move
> >   paramter is true.
> >   (arc_conditional_register_usage): Remove TARGET_BIG_ENDIAN from
> >   mlo/mhi name selection.
> >   (arc_restore_callee_saves): Don't early restore blink when ISR.
> >   (arc_expand_prologue): Add mlo/mhi saving.
> >   (arc_expand_epilogue): Add mlo/mhi restoring.
> >   (gen_mlo): Remove.
> >   (gen_mhi): Remove.
> >   * config/arc/arc.h (DBX_REGISTER_NUMBER): Correct register
> >   numbering when MUL64 option is used.
> >   (DWARF2_FRAME_REG_OUT): Define.
> >   * config/arc/arc.md (arc600_stall): New pattern.
> >   (VUNSPEC_ARC_ARC600_STALL): Define.
> >   (mulsi64): Use correct mlo/mhi registers.
> >   (mulsi_600): Clean it up.
> >   * config/arc/predicates.md (mlo_operand): Remove any dependency on
> >   TARGET_BIG_ENDIAN.
> >   (mhi_operand): Likewise.
> >
> > testsuite/
> > -xx-xx  Claudiu Zissulescu  
> >   * gcc.target/arc/code-density-flag.c: Update test.
> >   * gcc.target/arc/interrupt-6.c: Likewise.
> Ugh.  But OK.
>
> jeff
> >
>


Re: [PATCH 2/4] [ARC] Use TARGET_INSN_COST.

2020-02-04 Thread Claudiu Zissulescu Ianculescu
> My only worry would be asking for the length early in the RTL pipeline
> may not be as accurate, but it's supposed to work, so if you're
> comfortable with the end results, then OK.
>
Indeed, the length is not accurate, but the results seem slightly
better than using COST_RTX. Using INSN_COSTS seems to me a better
manageable way in controlling what combiner does.
Anyhow, for ARC the instruction size is accurate quite late in the
compilation process as it needs register and immediate value info :(

Thank you for you review,
Claudiu


Re: [PATCH] [ARC] Use hardware support for double-precision compare instructions.

2019-12-12 Thread Claudiu Zissulescu Ianculescu
Thank you for your review. Patch pushed to mainline and gcc9 branch.

//Claudiu

On Wed, Dec 11, 2019 at 8:59 PM Jeff Law  wrote:
>
> On Mon, 2019-12-09 at 11:52 +0200, Claudiu Zissulescu wrote:
> > Although the FDCMP (the double precision floating point compare
> > instruction) is added to the compiler, it is not properly used via
> > cstoredi pattern. Fix it.
> >
> > OK to apply?
> > Claudidu
> >
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * config/arc/arc.md (iterator SDF): Check TARGET_FP_DP_BASE.
> >   (cstoredi4): Use TARGET_HARD_FLOAT.
> OK
> jeff
>


Re: [PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float comparisons

2019-12-12 Thread Claudiu Zissulescu Ianculescu
Pushed. Thank you for your contribution,
Claudiu

On Wed, Dec 11, 2019 at 12:47 AM Vineet Gupta
 wrote:
>
> On 12/10/19 1:12 AM, Claudiu Zissulescu wrote:
> > Hi,
> >
> > Thank you for your contribution, I'll push it asap. As far as I understand, 
> > you need this patch both in gcc9 branch and mainline.
> >
> > Cheers,
> > Claudiu
>
> Indeed both mainline and gcc9
>
> Thx
> -Vineet
>
> >
> >> -Original Message-
> >> From: Vineet Gupta [mailto:vgu...@synopsys.com]
> >> Sent: Monday, December 09, 2019 8:02 PM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Claudiu Zissulescu ;
> >> andrew.burg...@embecosm.com; linux-snps-...@lists.infradead.org;
> >> Vineet Gupta 
> >> Subject: [PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float
> >> comparisons
> >>
> >> ARC gcc generates FDCMP instructions which raises Invalid operation for
> >> signaling NaN only. This causes glibc iseqsig() primitives to fail (in
> >> the current ongoing glibc port to ARC)
> >>
> >> So split up the hard float compares into two categories and for unordered
> >> compares generate the FDCMPF instruction (vs. FDCMP) which raises
> >> exception
> >> for either NaNs.
> >>
> >> With this fix testsuite/gcc.dg/torture/pr52451.c passes for ARC.
> >>
> >> Also passes 6 additional tests in glibc testsuite (test*iseqsig) and no
> >> regressions
> >>
> >> gcc/
> >> -xx-xx  Vineet Gupta  
> >>
> >>  * config/arc/arc-modes.def (CC_FPUE): New Mode CC_FPUE which
> >>  helps codegen generate exceptions even for quiet NaN.
> >>  * config/arc/arc.c (arc_init_reg_tables): Handle New CC_FPUE mode.
> >>  (get_arc_condition_code): Likewise.
> >>  (arc_select_cc_mode): LT, LE, GT, GE to use the New CC_FPUE
> >> mode.
> >>  * config/arc/arc.h (REVERSE_CONDITION): Handle New CC_FPUE
> >> mode.
> >>  * config/arc/predicates.md (proper_comparison_operator):
> >> Likewise.
> >>  * config/arc/fpu.md (cmpsf_fpu_trap): New Pattern for CC_FPUE.
> >>  (cmpdf_fpu_trap): Likewise.
> >>
> >> Signed-off-by: Vineet Gupta 
> >> ---
> >>  gcc/config/arc/arc-modes.def |  1 +
> >>  gcc/config/arc/arc.c |  8 ++--
> >>  gcc/config/arc/arc.h |  2 +-
> >>  gcc/config/arc/fpu.md| 24 
> >>  gcc/config/arc/predicates.md |  1 +
> >>  5 files changed, 33 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def
> >> index 36a2f4abfb25..d16b6a289a15 100644
> >> --- a/gcc/config/arc/arc-modes.def
> >> +++ b/gcc/config/arc/arc-modes.def
> >> @@ -38,4 +38,5 @@ VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI
> >> */
> >>
> >>  /* FPU condition flags.  */
> >>  CC_MODE (CC_FPU);
> >> +CC_MODE (CC_FPUE);
> >>  CC_MODE (CC_FPU_UNEQ);
> >> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> >> index 28305f459dcd..cbb95d6e9043 100644
> >> --- a/gcc/config/arc/arc.c
> >> +++ b/gcc/config/arc/arc.c
> >> @@ -1564,6 +1564,7 @@ get_arc_condition_code (rtx comparison)
> >>  default : gcc_unreachable ();
> >>  }
> >>  case E_CC_FPUmode:
> >> +case E_CC_FPUEmode:
> >>switch (GET_CODE (comparison))
> >>  {
> >>  case EQ: return ARC_CC_EQ;
> >> @@ -1686,11 +1687,13 @@ arc_select_cc_mode (enum rtx_code op, rtx x,
> >> rtx y)
> >>case UNLE:
> >>case UNGT:
> >>case UNGE:
> >> +return CC_FPUmode;
> >> +
> >>case LT:
> >>case LE:
> >>case GT:
> >>case GE:
> >> -return CC_FPUmode;
> >> +return CC_FPUEmode;
> >>
> >>case LTGT:
> >>case UNEQ:
> >> @@ -1844,7 +1847,7 @@ arc_init_reg_tables (void)
> >>if (i == (int) CCmode || i == (int) CC_ZNmode || i == (int) CC_Zmode
> >>|| i == (int) CC_Cmode
> >>|| i == CC_FP_GTmode || i == CC_FP_GEmode || i ==
> >> CC_FP_ORDmode
> >> -  || i == CC_FPUmode || i == CC_FPU_UNEQmode)
> >> +  || i == CC_FPUmode || i == CC_FPUEmode || i ==
> >> CC_FPU_UNEQmode)
> >>  arc_mode_class[i] = 1 << (int) C_MODE;
> >>else
> >>  arc_mode_class[i] = 0;
> >> @@ -8401,6 +8404,7 @@ arc_reorg (void)
> >>
> >>/* Avoid FPU instructions.  */
> >>if ((GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUmode)
> >> +  || (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUEmode)
> >>|| (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) ==
> >> CC_FPU_UNEQmode))
> >>  continue;
> >>
> >> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> >> index 4d7ac3281b41..c08ca3d0d432 100644
> >> --- a/gcc/config/arc/arc.h
> >> +++ b/gcc/config/arc/arc.h
> >> @@ -1531,7 +1531,7 @@ enum arc_function_type {
> >>(((MODE) == CC_FP_GTmode || (MODE) == CC_FP_GEmode \
> >>  || (MODE) == CC_FP_UNEQmode || (MODE) == CC_FP_ORDmode   \
> >>  || (MODE) == CC_FPXmode || (MODE) == CC_FPU_UNEQmode \
> >> -|| (MODE) == CC_FPUmode) \
> >> +|| (MODE) == C

Re: [committed] arc: Remove mlra option [PR113954]

2024-09-24 Thread Claudiu Zissulescu Ianculescu
I'll include your comment in my second patch where I clean some
patterns used by reload.

Thank you,
claudiu

On Mon, Sep 23, 2024 at 5:05 PM Andreas Schwab  wrote:
>
> On Sep 23 2024, Claudiu Zissulescu wrote:
>
> > diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
> > index c800226b179..a225adeff57 100644
> > --- a/gcc/config/arc/arc.cc
> > +++ b/gcc/config/arc/arc.cc
> > @@ -721,7 +721,7 @@ static rtx arc_legitimize_address_0 (rtx, rtx, 
> > machine_mode mode);
> >arc_no_speculation_in_delay_slots_p
> >
> >  #undef TARGET_LRA_P
> > -#define TARGET_LRA_P arc_lra_p
> > +#define TARGET_LRA_P hook_bool_void_true
>
> This is the default for lra_p, so you can remove the override.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."


Re: [PATCH] arc: testsuite: Scan "rlc" instead of "mov.hs".

2025-03-19 Thread Claudiu Zissulescu Ianculescu
LGTM. I'll merge it once stage one is open.

Cheers,
Claudiu

On Tue, Mar 18, 2025 at 6:23 PM Luis Silva  wrote:
>
> Due to the patch by Roger Sayle,
> 09881218137f4af9b7c894c2d350cf2ff8e0ee23, which
> introduces the use of the `rlc rX,0` instruction in place
> of the `mov.hs`, the add overflow test case needs to be
> updated.  The previous test case was validating the `mov.hs`
> instruction, but now it must validate the `rlc` instruction
> as the new behavior.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arc/overflow-1.c: Replace mov.hs with rlc.
>
> Signed-off-by: Luis Silva 
> ---
>  gcc/testsuite/gcc.target/arc/overflow-1.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arc/overflow-1.c 
> b/gcc/testsuite/gcc.target/arc/overflow-1.c
> index 01b3e8ad0fa..694c25cfe66 100644
> --- a/gcc/testsuite/gcc.target/arc/overflow-1.c
> +++ b/gcc/testsuite/gcc.target/arc/overflow-1.c
> @@ -31,9 +31,8 @@ bool addi_overflow (int32_t a, int32_t *res)
>  /*
>   * add.f  r0,r0,r1
>   * st_s   r0,[r2]
> - * mov_s  r0,1
>   * j_s.d  [blink]
> - * mov.hs r0,0
> + * rlcr0,0
>   */
>  bool uadd_overflow (uint32_t a, uint32_t b, uint32_t *res)
>  {
> @@ -75,9 +74,8 @@ bool addi_overflow_p (int32_t a, int32_t res)
>
>  /*
>   * add.f   0,r0,r1
> - * mov_s   r0,1
>   * j_s.d   [blink]
> - * mov.hs  r0,0
> + * rlc r0,0
>   */
>  bool uadd_overflow_p (uint32_t a, uint32_t b, uint32_t res)
>  {
> @@ -95,6 +93,6 @@ bool uaddi_overflow_p (uint32_t a, uint32_t res)
>
>  /* { dg-final { scan-assembler-times "add.f\\s\+"   7 } } */
>  /* { dg-final { scan-assembler-times "mov\.nv\\s\+" 4 } } */
> -/* { dg-final { scan-assembler-times "mov\.hs\\s\+" 2 } } */
> +/* { dg-final { scan-assembler-times "rlc\\s\+" 2 } } */
>  /* { dg-final { scan-assembler-times "seths\\s\+"   2 } } */
>  /* { dg-final { scan-assembler-not   "cmp" } } */
> --
> 2.37.1
>


Re: [PATCH 1/2] arc: Add commutative multiplication patterns.

2025-03-19 Thread Claudiu Zissulescu Ianculescu
LGTM, I'll merge it once stage 1 is open.

Cheers,
Claudiu

On Tue, Mar 18, 2025 at 6:22 PM Luis Silva  wrote:
>
> This patch introduces two new instruction patterns:
>
> `*mulsi3_cmp0`:  This pattern performs a multiplication
> and sets the CC_Z register based on the result, while
> also storing the result of the multiplication in a
> general-purpose register.
>
> `*mulsi3_cmp0_noout`:  This pattern performs a
> multiplication and sets the CC_Z register based on the
> result without storing the result in a general-purpose
> register.
>
> These patterns are optimized to generate code using the `mpy.f`
> instruction, specifically used where the result is compared to zero.
>
> In addition, the previous commutative multiplication implementation
> was removed.  It incorrectly took into account the negative flag,
> which is wrong.  This new implementation only considers the zero
> flag.
>
> A test case has been added to verify the correctness of these
> changes.
>
> gcc/ChangeLog:
>
> * config/arc/arc.cc (arc_select_cc_mode): Handle multiplication
> results compared against zero, selecting CC_Zmode.
> * config/arc/arc.md (*mulsi3_cmp0): New define_insn.
> (*mulsi3_cmp0_noout): New define_insn.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arc/mult-cmp0.c: New test.
>
> Signed-off-by: Luis Silva 
> ---
>  gcc/config/arc/arc.cc|  7 +++
>  gcc/config/arc/arc.md| 34 ++--
>  gcc/testsuite/gcc.target/arc/mult-cmp0.c | 66 
>  3 files changed, 103 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/mult-cmp0.c
>
> diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
> index e3d53576768..8ad5649adc0 100644
> --- a/gcc/config/arc/arc.cc
> +++ b/gcc/config/arc/arc.cc
> @@ -1555,6 +1555,13 @@ arc_select_cc_mode (enum rtx_code op, rtx x, rtx y)
>machine_mode mode = GET_MODE (x);
>rtx x1;
>
> +  /* Matches all instructions which can do .f and clobbers only Z flag.  */
> +  if (GET_MODE_CLASS (mode) == MODE_INT
> +  && y == const0_rtx
> +  && GET_CODE (x) == MULT
> +  && (op == EQ || op == NE))
> +return CC_Zmode;
> +
>/* For an operation that sets the condition codes as a side-effect, the
>   C and V flags is not set as for cmp, so we can only use comparisons 
> where
>   this doesn't matter.  (For LT and GE we can use "mi" and "pl"
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 49dfc9d35af..bc2e8fadd91 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -253,7 +253,7 @@
> simd_vcompare, simd_vpermute, simd_vpack, simd_vpack_with_acc,
> simd_valign, simd_valign_with_acc, simd_vcontrol,
> simd_vspecial_3cycle, simd_vspecial_4cycle, simd_dma, mul16_em, div_rem,
> -   fpu, fpu_fuse, fpu_sdiv, fpu_ddiv, fpu_cvt, block"
> +   fpu, fpu_fuse, fpu_sdiv, fpu_ddiv, fpu_cvt, block, mpy"
>(cond [(eq_attr "is_sfunc" "yes")
>  (cond [(match_test "!TARGET_LONG_CALLS_SET && (!TARGET_MEDIUM_CALLS 
> || GET_CODE (PATTERN (insn)) != COND_EXEC)") (const_string "call")
> (match_test "flag_pic") (const_string "sfunc")]
> @@ -1068,11 +1068,37 @@ archs4x, archs4xd"
> (set_attr "cond" "set_zn")
> (set_attr "length" "*,4,4,4,8")])
>
> -;; The next two patterns are for plos, ior, xor, and, and mult.
> +(define_insn "*mulsi3_cmp0"
> +  [(set (reg:CC_Z CC_REG)
> +   (compare:CC_Z
> +(mult:SI
> + (match_operand:SI 1 "register_operand"  "%r,0,r")
> + (match_operand:SI 2 "nonmemory_operand" "rL,I,i"))
> +(const_int 0)))
> +   (set (match_operand:SI 0 "register_operand""=r,r,r")
> +   (mult:SI (match_dup 1) (match_dup 2)))]
> + "TARGET_MPY"
> + "mpy%?.f\\t%0,%1,%2"
> + [(set_attr "length" "4,4,8")
> +  (set_attr "type" "mpy")])
> +
> +(define_insn "*mulsi3_cmp0_noout"
> +  [(set (reg:CC_Z CC_REG)
> +   (compare:CC_Z
> +(mult:SI
> + (match_operand:SI 0 "register_operand"   "%r,r,r")
> + (match_operand:SI 1 "nonmemory_operand"  "rL,I,i"))
> +(const_int 0)))]
> + "TARGET_MPY"
> + "mpy%?.f\\t0,%0,%1"
> + [(set_attr "length" "4,4,8")
> +  (set_attr "type" "mpy")])
> +
> +;; The next two patterns are for plus, ior, xor, and.
>  (define_insn "*commutative_binary_cmp0_noout"
>[(set (match_operand 0 "cc_set_register" "")
> (match_operator 4 "zn_compare_operator"
> - [(match_operator:SI 3 "commutative_operator"
> + [(match_operator:SI 3 "commutative_operator_sans_mult"
>  [(match_operand:SI 1 "register_operand" "%r,r")
>   (match_operand:SI 2 "nonmemory_operand" "rL,Cal")])
>(const_int 0)]))]
> @@ -1085,7 +,7 @@ archs4x, archs4xd"
>  (define_insn "*commutative_binary_cmp0"
>[(set (match_operand 3 "cc_set_register" "")
> (match_operator 5 "zn_compare_operator"
> - [(match_operator:SI 4 "commutative_opera

Re: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow ()

2025-03-19 Thread Claudiu Zissulescu Ianculescu
LGTM,

Cheers,
Claudiu

On Tue, Mar 18, 2025 at 6:23 PM Luis Silva  wrote:
>
> This patch handles both signed and unsigned
> builtin multiplication overflow.
>
> Uses the "mpy.f" instruction to set the condition
> codes based on the result.  In the event of an
> overflow, the V flag is set, triggering a
> conditional move depending on the V flag status.
>
> For example, set "1" to "r0" in case of overflow:
>
> mov_s   r0,1
> mpy.f   r0,r0,r1
> j_s.d   [blink]
> mov.nv  r0,0
>
> gcc/ChangeLog:
>
> * config/arc/arc.md (mulvsi4): New define_expand.
> (mulsi3_Vcmp): New define_insn.
>
> Signed-off-by: Luis Silva 
> ---
>  gcc/config/arc/arc.md | 33 +
>  1 file changed, 33 insertions(+)
>
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index bc2e8fadd91..dd245d1813c 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -842,6 +842,9 @@ archs4x, archs4xd"
>  ; Optab prefix for sign/zero-extending operations
>  (define_code_attr su_optab [(sign_extend "") (zero_extend "u")])
>
> +;; Code iterator for sign/zero extension
> +(define_code_iterator ANY_EXTEND [sign_extend zero_extend])
> +
>  (define_insn "*xt_cmp0_noout"
>[(set (match_operand 0 "cc_set_register" "")
> (compare:CC_ZN (SEZ:SI (match_operand:SQH 1 "register_operand" "r"))
> @@ -1068,6 +1071,36 @@ archs4x, archs4xd"
> (set_attr "cond" "set_zn")
> (set_attr "length" "*,4,4,4,8")])
>
> +(define_expand "mulvsi4"
> +  [(ANY_EXTEND:DI (match_operand:SI 0 "register_operand"))
> +   (ANY_EXTEND:DI (match_operand:SI 1 "register_operand"))
> +   (ANY_EXTEND:DI (match_operand:SI 2 "register_operand"))
> +   (label_ref (match_operand 3 "" ""))]
> +  "TARGET_MPY"
> +  {
> +emit_insn (gen_mulsi3_Vcmp (operands[0], operands[1],
> + operands[2]));
> +arc_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
> +DONE;
> +  })
> +
> +(define_insn "mulsi3_Vcmp"
> +  [(parallel
> +[(set
> +  (reg:CC_V CC_REG)
> +  (compare:CC_V
> +   (mult:DI
> +   (ANY_EXTEND:DI (match_operand:SI 1 "register_operand"  "%0,r,r,r"))
> +   (ANY_EXTEND:DI (match_operand:SI 2 "nonmemory_operand"  "I,L,r,C32")))
> +   (ANY_EXTEND:DI (mult:SI (match_dup 1) (match_dup 2)
> + (set (match_operand:SI 0 "register_operand"  "=r,r,r,r")
> + (mult:SI (match_dup 1) (match_dup 2)))])]
> +  "register_operand (operands[1], SImode)
> +   || register_operand (operands[2], SImode)"
> +  "mpy.f\\t%0,%1,%2"
> +  [(set_attr "length" "4,4,4,8")
> +   (set_attr "type"   "mpy")])
> +
>  (define_insn "*mulsi3_cmp0"
>[(set (reg:CC_Z CC_REG)
> (compare:CC_Z
> --
> 2.37.1
>


Re: [PATCH] arc: testsuite: Scan "rlc" instead of "mov.hs".

2025-04-24 Thread Claudiu Zissulescu Ianculescu
Hi Jeff,

There is one patch missing, I'll add it to mainline as soon as the
main is open for commits.

Best,
Claudiu

On Fri, Apr 18, 2025 at 12:10 AM Jeff Law  wrote:
>
>
>
> On 3/18/25 10:23 AM, Luis Silva wrote:
> > Due to the patch by Roger Sayle,
> > 09881218137f4af9b7c894c2d350cf2ff8e0ee23, which
> > introduces the use of the `rlc rX,0` instruction in place
> > of the `mov.hs`, the add overflow test case needs to be
> > updated.  The previous test case was validating the `mov.hs`
> > instruction, but now it must validate the `rlc` instruction
> > as the new behavior.
> >
> > gcc/testsuite/ChangeLog:
> >
> >  * gcc.target/arc/overflow-1.c: Replace mov.hs with rlc.
> I don't see any test named "overflow-1.c" in the arc subdirectory?!?
>
> Is it possible that's a change in your local repo?
>
> jeff


Re: [PATCH 1/2] arc: Add commutative multiplication patterns.

2025-04-24 Thread Claudiu Zissulescu Ianculescu
Hi Jeff,

Indeed, Luis should have been using "umulti". The other attributes are
not required. I'll fix it before pushing to the mainline.

Thanks,
Claudiu

On Fri, Apr 18, 2025 at 8:41 PM Jeff Law  wrote:
>
>
>
> On 3/18/25 10:22 AM, Luis Silva wrote:
> > This patch introduces two new instruction patterns:
> >
> >  `*mulsi3_cmp0`:  This pattern performs a multiplication
> >  and sets the CC_Z register based on the result, while
> >  also storing the result of the multiplication in a
> >  general-purpose register.
> >
> >  `*mulsi3_cmp0_noout`:  This pattern performs a
> >  multiplication and sets the CC_Z register based on the
> >  result without storing the result in a general-purpose
> >  register.
> >
> > These patterns are optimized to generate code using the `mpy.f`
> > instruction, specifically used where the result is compared to zero.
> >
> > In addition, the previous commutative multiplication implementation
> > was removed.  It incorrectly took into account the negative flag,
> > which is wrong.  This new implementation only considers the zero
> > flag.
> >
> > A test case has been added to verify the correctness of these
> > changes.
> >
> > gcc/ChangeLog:
> >
> >  * config/arc/arc.cc (arc_select_cc_mode): Handle multiplication
> >  results compared against zero, selecting CC_Zmode.
> >  * config/arc/arc.md (*mulsi3_cmp0): New define_insn.
> >  (*mulsi3_cmp0_noout): New define_insn.
> >
> > gcc/testsuite/ChangeLog:
> >
> >  * gcc.target/arc/mult-cmp0.c: New test.
> So I'm not well versed in the ARC port, but a couple questions.
>
> First your new patterns use a new type "mpy".  Do you want/need to add
> that to the pipeline descriptions?  It would seem advisable to do so.
>
> Do the new patterns need to set "cond" and "predicable" attributes?
>
> Jeff


Fwd: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow ()

2025-04-24 Thread Claudiu Zissulescu Ianculescu
Adding missing email addresses.

-- Forwarded message -
From: Claudiu Zissulescu Ianculescu 
Date: Thu, Apr 24, 2025 at 8:48 PM
Subject: Re: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow ()
To: Jeff Law 


Hi Jeff,

The other attributes are not required as the pattern doesn't allow it
to be used in a predicated execution.  Thus, the default values for
the missing predicates are ok.

Best,
Claudiu

On Fri, Apr 18, 2025 at 8:43 PM Jeff Law  wrote:
>
>
>
> On 3/18/25 10:23 AM, Luis Silva wrote:
> > This patch handles both signed and unsigned
> > builtin multiplication overflow.
> >
> > Uses the "mpy.f" instruction to set the condition
> > codes based on the result.  In the event of an
> > overflow, the V flag is set, triggering a
> > conditional move depending on the V flag status.
> >
> > For example, set "1" to "r0" in case of overflow:
> >
> >   mov_s   r0,1
> >   mpy.f   r0,r0,r1
> >   j_s.d   [blink]
> >   mov.nv  r0,0
> >
> > gcc/ChangeLog:
> >
> >  * config/arc/arc.md (mulvsi4): New define_expand.
> >  (mulsi3_Vcmp): New define_insn.
> So similar to your other patch, there are other attributes (cond and
> predicable) that you may need to set.  I just don't know the port well
> enough to judge that.
>
> jeff
>


Re: [PATCH v3 2/9] opts: use uint64_t for sanitizer flags

2025-07-11 Thread Claudiu Zissulescu-Ianculescu
Hi,
> 
> Currently, the data type of sanitizer flags is unsigned int, with
> SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
> enumerator for enum sanitize_code.  Use 'uint64_t' data type to allow
> for more distinct instrumentation modes be added when needed.
> 
> 
> 
> I have not looked yet but does it make sense to use `unsigned
> HOST_WIDE_INT` instead of uint64_t? HWI should be the same as uint64_t
> but it is more consistent with the rest of gcc.
> Plus since tree_to_uhwi is more consistent there.
> 
That was in the v2, however, the reviewers suggested to use uint64_t.

Best wishes,
Claudiu


[PATCH v3 9/9] aarch64: Add memtag-stack tests

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Indu Bhagat 

Add basic tests for memtag-stack sanitizer.  Memtag stack sanitizer
uses target hooks to emit AArch64 specific MTE instructions.

gcc/testsuite:

* lib/target-supports.exp:
* gcc.target/aarch64/memtag/alloca-1.c: New test.
* gcc.target/aarch64/memtag/alloca-3.c: New test.
* gcc.target/aarch64/memtag/arguments-1.c: New test.
* gcc.target/aarch64/memtag/arguments-2.c: New test.
* gcc.target/aarch64/memtag/arguments-3.c: New test.
* gcc.target/aarch64/memtag/arguments-4.c: New test.
* gcc.target/aarch64/memtag/arguments.c: New test.
* gcc.target/aarch64/memtag/basic-1.c: New test.
* gcc.target/aarch64/memtag/basic-3.c: New test.
* gcc.target/aarch64/memtag/basic-struct.c: New test.
* gcc.target/aarch64/memtag/large-array.c: New test.
* gcc.target/aarch64/memtag/local-no-escape.c: New test.
* gcc.target/aarch64/memtag/memtag.exp: New file.
* gcc.target/aarch64/memtag/no-sanitize-attribute.c: New test.
* gcc.target/aarch64/memtag/value-init.c: New test.
* gcc.target/aarch64/memtag/vararray-gimple.c: New test.
* gcc.target/aarch64/memtag/vararray.c: New test.
* gcc.target/aarch64/memtag/zero-init.c: New test.
* gcc.target/aarch64/memtag/texec-1.c: New test.
* gcc.target/aarch64/memtag/texec-2.c: New test.
* gcc.target/aarch64/memtag/vla-1.c: New test.
* gcc.target/aarch64/memtag/vla-2.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_aarch64_mte): New funcction.

Co-authored-by: Indu Bhagat 
Signed-off-by: Claudiu Zissulescu 
---
 .../gcc.target/aarch64/memtag/alloca-1.c  | 14 
 .../gcc.target/aarch64/memtag/alloca-3.c  | 27 
 .../gcc.target/aarch64/memtag/arguments-1.c   |  3 +
 .../gcc.target/aarch64/memtag/arguments-2.c   |  3 +
 .../gcc.target/aarch64/memtag/arguments-3.c   |  3 +
 .../gcc.target/aarch64/memtag/arguments-4.c   | 16 +
 .../gcc.target/aarch64/memtag/arguments.c |  3 +
 .../gcc.target/aarch64/memtag/basic-1.c   | 15 +
 .../gcc.target/aarch64/memtag/basic-3.c   | 21 ++
 .../gcc.target/aarch64/memtag/basic-struct.c  | 22 +++
 .../aarch64/memtag/cfi-mte-memtag-frame-1.c   | 11 
 .../gcc.target/aarch64/memtag/large-array.c   | 24 +++
 .../aarch64/memtag/local-no-escape.c  | 20 ++
 .../gcc.target/aarch64/memtag/memtag.exp  | 64 +++
 .../gcc.target/aarch64/memtag/mte-sig.h   | 15 +
 .../aarch64/memtag/no-sanitize-attribute.c| 17 +
 .../gcc.target/aarch64/memtag/texec-1.c   | 27 
 .../gcc.target/aarch64/memtag/texec-2.c   | 22 +++
 .../gcc.target/aarch64/memtag/value-init.c| 14 
 .../aarch64/memtag/vararray-gimple.c  | 17 +
 .../gcc.target/aarch64/memtag/vararray.c  | 14 
 .../gcc.target/aarch64/memtag/vla-1.c | 39 +++
 .../gcc.target/aarch64/memtag/vla-2.c | 48 ++
 .../gcc.target/aarch64/memtag/zero-init.c | 14 
 gcc/testsuite/lib/target-supports.exp | 12 
 25 files changed, 485 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/alloca-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/alloca-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-struct.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/memtag/cfi-mte-memtag-frame-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/large-array.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/local-no-escape.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/memtag.exp
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/mte-sig.h
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/memtag/no-sanitize-attribute.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/texec-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/texec-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/value-init.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vararray-gimple.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vararray.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vla-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vla-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/zero-init.c

diff --git a/gcc/testsuite/gcc.target/aa

[PATCH v3 2/9] opts: use uint64_t for sanitizer flags

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Indu Bhagat 

Currently, the data type of sanitizer flags is unsigned int, with
SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
enumerator for enum sanitize_code.  Use 'uint64_t' data type to allow
for more distinct instrumentation modes be added when needed.

gcc/ChangeLog:

* asan.h (sanitize_flags_p): Use 'uint64_t' instead of 'unsigned
int'.
* common.opt: Likewise.
* dwarf2asm.cc (dw2_output_indirect_constant_1): Likewise.
* opts.cc (find_sanitizer_argument): Likewise.
(report_conflicting_sanitizer_options): Likewise.
(parse_sanitizer_options): Likewise.
(parse_no_sanitize_attribute): Likewise.
* opts.h (parse_sanitizer_options): Likewise.
(parse_no_sanitize_attribute): Likewise.
* tree-cfg.cc (print_no_sanitize_attr_value): Likewise.

gcc/c-family/ChangeLog:

* c-attribs.cc (add_no_sanitize_value): Likewise.
(handle_no_sanitize_attribute): Likewise.
(handle_no_sanitize_address_attribute): Likewise.
(handle_no_sanitize_thread_attribute): Likewise.
(handle_no_address_safety_analysis_attribute): Likewise.
* c-common.h (add_no_sanitize_value): Likewise.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_declaration_or_fndef): Likewise.

gcc/cp/ChangeLog:

* typeck.cc (get_member_function_from_ptrfunc): Likewise.

gcc/d/ChangeLog:

* d-attribs.cc (d_handle_no_sanitize_attribute): Likewise.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/asan.h|  5 +++--
 gcc/c-family/c-attribs.cc | 16 
 gcc/c-family/c-common.h   |  2 +-
 gcc/c/c-parser.cc |  4 ++--
 gcc/common.opt|  6 +++---
 gcc/cp/typeck.cc  |  2 +-
 gcc/d/d-attribs.cc|  8 
 gcc/dwarf2asm.cc  |  2 +-
 gcc/opts.cc   | 25 +
 gcc/opts.h|  8 
 gcc/tree-cfg.cc   |  2 +-
 11 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/gcc/asan.h b/gcc/asan.h
index 064d4f24823..d4443de4620 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -242,9 +242,10 @@ asan_protect_stack_decl (tree decl)
remove all flags mentioned in "no_sanitize" of DECL_ATTRIBUTES.  */
 
 inline bool
-sanitize_flags_p (unsigned int flag, const_tree fn = current_function_decl)
+sanitize_flags_p (uint64_t flag,
+ const_tree fn = current_function_decl)
 {
-  unsigned int result_flags = flag_sanitize & flag;
+  uint64_t result_flags = flag_sanitize & flag;
   if (result_flags == 0)
 return false;
 
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index ea04ed7f0d4..ddb173e3ccf 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -1409,23 +1409,23 @@ handle_cold_attribute (tree *node, tree name, tree 
ARG_UNUSED (args),
 /* Add FLAGS for a function NODE to no_sanitize_flags in DECL_ATTRIBUTES.  */
 
 void
-add_no_sanitize_value (tree node, unsigned int flags)
+add_no_sanitize_value (tree node, uint64_t flags)
 {
   tree attr = lookup_attribute ("no_sanitize", DECL_ATTRIBUTES (node));
   if (attr)
 {
-  unsigned int old_value = tree_to_uhwi (TREE_VALUE (attr));
+  uint64_t old_value = tree_to_uhwi (TREE_VALUE (attr));
   flags |= old_value;
 
   if (flags == old_value)
return;
 
-  TREE_VALUE (attr) = build_int_cst (unsigned_type_node, flags);
+  TREE_VALUE (attr) = build_int_cst (uint64_type_node, flags);
 }
   else
 DECL_ATTRIBUTES (node)
   = tree_cons (get_identifier ("no_sanitize"),
-  build_int_cst (unsigned_type_node, flags),
+  build_int_cst (uint64_type_node, flags),
   DECL_ATTRIBUTES (node));
 }
 
@@ -1436,7 +1436,7 @@ static tree
 handle_no_sanitize_attribute (tree *node, tree name, tree args, int,
  bool *no_add_attrs)
 {
-  unsigned int flags = 0;
+  uint64_t flags = 0;
   *no_add_attrs = true;
   if (TREE_CODE (*node) != FUNCTION_DECL)
 {
@@ -1473,7 +1473,7 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   if (TREE_CODE (*node) != FUNCTION_DECL)
 warning (OPT_Wattributes, "%qE attribute ignored", name);
   else
-add_no_sanitize_value (*node, SANITIZE_ADDRESS);
+add_no_sanitize_value (*node, (uint64_t) SANITIZE_ADDRESS);
 
   return NULL_TREE;
 }
@@ -1489,7 +1489,7 @@ handle_no_sanitize_thread_attribute (tree *node, tree 
name, tree, int,
   if (TREE_CODE (*node) != FUNCTION_DECL)
 warning (OPT_Wattributes, "%qE attribute ignored", name);
   else
-add_no_sanitize_value (*node, SANITIZE_THREAD);
+add_no_sanitize_value (*node, (uint64_t) SANITIZE_THREAD);
 
   return NULL_TREE;
 }
@@ -1506,7 +1506,7 @@ handle_no_address_safety_analysis_attribute (tree *node, 
tree name, tree, int,
   if (TREE_CODE (*node) != FUNCTION_DECL)
 warning (OPT_Wattributes, "%qE attribute ignored", name);
   else
-add_no_sanitize_value (*node, SAN

[PATCH v3 8/9] aarch64: Add support for memetag-stack sanitizer using MTE insns

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Claudiu Zissulescu 

MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
the target-specific hooks to create a random tag, add tag to memory
address, and finally tag and untag memory.

Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
is in effect.  Continue to use the default target hook if HWASAN is
being used.  Following target hooks are implemented:
   - TARGET_MEMTAG_INSERT_RANDOM_TAG
   - TARGET_MEMTAG_ADD_TAG
   - TARGET_MEMTAG_EXTRACT_TAG
   - TARGET_MEMTAG_COMPOSE_OFFSET_TAG

Apart from the target-specific hooks, set the following to values
defined by the Memory Tagging Extension (MTE) in aarch64:
   - TARGET_MEMTAG_TAG_SIZE
   - TARGET_MEMTAG_GRANULE_SIZE

The next instructions were (re-)defined:
   - addg/subg (used by TARGET_MEMTAG_ADD_TAG and
 TARGET_MEMTAG_COMPOSE_OFFSET_TAG hooks)
   - stg/st2g Used to tag/untag a memory granule.
   - tag_memory A target specific instruction, it will will emit MTE
 instructions to tag/untag memory of a given size.

Add documentation in gcc/doc/invoke.texi.
(AARCH64_MEMTAG_TAG_SIZE): Define.

gcc/

* config/aarch64/aarch64.md (addg): Update pattern to use
addg/subg instructions.
(stg): Update pattern.
(st2g): New pattern.
(tag_memory): Likewise.
* config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
Define.
(AARCH64_MEMTAG_TAG_BITSIZE): Likewise.
(AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Likewise.
(aarch64_override_options_internal): Error out if MTE instructions
are not available.
(aarch64_post_cfi_startproc): Emit .cfi_mte_tagged_frame.
(aarch64_can_tag_addresses): Add MEMTAG specific handling.
(aarch64_memtag_tag_bitsize): New function
(aarch64_memtag_granule_size): Likewise.
(aarch64_memtag_insert_random_tag): Likwise.
(aarch64_memtag_add_tag): Likewise.
(aarch64_memtag_compose_offset_tag): Likewise.
(aarch64_memtag_extract_tag): Likewise.
(aarch64_granule16_memory_address_p): Likewise.
(aarch64_emit_stxg_insn): Likewise.
(aarch64_gen_tag_memory_postindex): Likewise.
(aarch64_memtag_tag_memory_via_loop): New definition.
(aarch64_expand_tag_memory): Likewise.
(aarch64_check_memtag_ops): Likewise.
(aarch64_gen_tag_memory_postindex): Likewise.
(TARGET_MEMTAG_TAG_SIZE): Define.
(TARGET_MEMTAG_GRANULE_SIZE): Likewise.
(TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
(TARGET_MEMTAG_ADD_TAG): Likewise.
(TARGET_MEMTAG_EXTRACT_TAG): Likewise.
(TARGET_MEMTAG_COMPOSE_OFFSET_TAG): Likewise.
* config/aarch64/aarch64-builtins.cc
(aarch64_expand_builtin_memtag): Update set tag builtin logic.
* config/aarch64/aarch64-linux.h: Pass memtag-stack sanitizer
specific options to the linker.
* config/aarch64/aarch64-protos.h
(aarch64_granule16_memory_address_p): New prototype.
(aarch64_check_memtag_ops): Likewise.
(aarch64_expand_tag_memory): Likewise.
* config/aarch64/constraints.md (Umg): New memory constraint.
(Uag): New constraint.
(Ung): Likewise.
(Utg): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset):
Refactor it.
(aarch64_granule16_imm6): Rename from aarch64_granule16_uimm6 and
refactor it.
(aarch64_granule16_memory_operand): New constraint.

doc/
* invoke.texi: Update documentation.

gcc/testsuite:

* gcc.target/aarch64/acle/memtag_1.c: Update test.

Co-authored-by: Indu Bhagat 
Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/aarch64/aarch64-builtins.cc|   7 +-
 gcc/config/aarch64/aarch64-linux.h|   4 +-
 gcc/config/aarch64/aarch64-protos.h   |   4 +
 gcc/config/aarch64/aarch64.cc | 370 +-
 gcc/config/aarch64/aarch64.md |  60 ++-
 gcc/config/aarch64/constraints.md |  26 ++
 gcc/config/aarch64/predicates.md  |  13 +-
 gcc/doc/invoke.texi   |   6 +-
 .../gcc.target/aarch64/acle/memtag_1.c|   2 +-
 9 files changed, 464 insertions(+), 28 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 93f939a9c83..b2427e73880 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -3668,8 +3668,11 @@ aarch64_expand_builtin_memtag (int fcode, tree exp, rtx 
target)
pat = GEN_FCN (icode) (target, op0, const0_rtx);
break;
   case AARCH64_MEMTAG_BUILTIN_SET_TAG:
-   pat = GEN_FCN (icode) (op0, op0, const0_rtx);
-   break;
+   {
+ rtx mem = gen_rtx_MEM (TImode, op0);
+ pat = GEN_FCN (icode) (mem, mem, op0);
+ break;
+   }
   default:
gcc_unreachable();
 }
diff --git a/gcc/config/aarch64/aarch64-li

[PATCH v3 6/9] asan: add new memtag sanitizer

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Indu Bhagat 

Add new command line option -fsanitize=memtag-stack with the following
new params:
--param memtag-instrument-alloca [0,1] (default 1) to use MTE insns
for enabling dynamic checking of stack allocas.

Along with the new SANITIZE_MEMTAG_STACK, define a SANITIZE_MEMTAG
which will be set if any kind of memtag sanitizer is in effect (e.g.,
later we may add -fsanitize=memtag-globals).  Add errors to convey
that memtag sanitizer does not work with hwaddress and address
sanitizers.  Also error out if memtag ISA extension is not enabled.

MEMTAG sanitizer will use the HWASAN machinery, but with a few
differences:
  - The tags are always generated at runtime by the hardware, so
-fsanitize=memtag-stack enforces a --param hwasan-random-frame-tag=1

Add documentation in gcc/doc/invoke.texi.

gcc/
* builtins.def: Adjust the macro to include the new
SANTIZIE_MEMTAG_STACK.
* flag-types.h (enum sanitize_code): Add new enumerator for
SANITIZE_MEMTAG and SANITIZE_MEMTAG_STACK.
* opts.cc (finish_options): memtag-stack sanitizer conflicts with
hwaddress and address sanitizers.
(sanitizer_opts): Add new memtag-stack sanitizer.
(parse_sanitizer_options): memtag-stack sanitizer cannot recover.
* params.opt: Add new params for memtag-stack sanitizer.

doc/
* invoke.texi: Update documentation.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/builtins.def|  1 +
 gcc/doc/invoke.texi | 13 -
 gcc/flag-types.h|  4 
 gcc/opts.cc | 22 +-
 gcc/params.opt  |  4 
 5 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index d7b2894bcfa..5f0b1107347 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -257,6 +257,7 @@ along with GCC; see the file COPYING3.  If not see
   true, true, true, ATTRS, true, \
  (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
| SANITIZE_HWADDRESS \
+   | SANITIZE_MEMTAG_STACK \
| SANITIZE_UNDEFINED \
| SANITIZE_UNDEFINED_NONDEFAULT) \
   || flag_sanitize_coverage))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 74f5ee26042..d8f11201361 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17261,7 +17261,7 @@ When using stack instrumentation, decide tags for stack 
variables using a
 deterministic sequence beginning at a random tag for each frame.  With this
 parameter unset tags are chosen using the same sequence but beginning from 1.
 This is enabled by default for @option{-fsanitize=hwaddress} and unavailable
-for @option{-fsanitize=kernel-hwaddress}.
+for @option{-fsanitize=kernel-hwaddress} and @option{-fsanitize=memtag-stack}.
 To disable it use @option{--param hwasan-random-frame-tag=0}.
 
 @item hwasan-instrument-allocas
@@ -17294,6 +17294,11 @@ and @option{-fsanitize=kernel-hwaddress}.
 To disable instrumentation of builtin functions use
 @option{--param hwasan-instrument-mem-intrinsics=0}.
 
+@item memtag-instrument-allocas
+Enable hardware-assisted memory tagging of dynamically sized stack-allocated
+variables.  This kind of code generation is enabled by default when using
+@option{-fsanitize=memtag-stack}.
+
 @item use-after-scope-direct-emission-threshold
 If the size of a local variable in bytes is smaller or equal to this
 number, directly poison (or unpoison) shadow memory instead of using
@@ -18225,6 +18230,12 @@ possible by specifying the command-line options
 @option{--param hwasan-instrument-allocas=1} respectively. Using a random frame
 tag is not implemented for kernel instrumentation.
 
+@opindex fsanitize=memtag-stack
+@item -fsanitize=memtag-stack
+Use Memory Tagging Extension instructions instead of instrumentation to allow
+the detection of memory errors.  This option is available only on those AArch64
+architectures that support Memory Tagging Extensions.
+
 @opindex fsanitize=pointer-compare
 @item -fsanitize=pointer-compare
 Instrument comparison operation (<, <=, >, >=) with pointer operands.
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 9a3cc4a2e16..0c9c863a654 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -337,6 +337,10 @@ enum sanitize_code {
   SANITIZE_KERNEL_HWADDRESS = 1UL << 30,
   /* Shadow Call Stack.  */
   SANITIZE_SHADOW_CALL_STACK = 1UL << 31,
+  /* Memory Tagging for Stack.  */
+  SANITIZE_MEMTAG_STACK = 1ULL << 32,
+  /* Memory Tagging.  */
+  SANITIZE_MEMTAG = SANITIZE_MEMTAG_STACK,
   SANITIZE_SHIFT = SANITIZE_SHIFT_BASE | SANITIZE_SHIFT_EXPONENT,
   SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_UNREACHABLE
   | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN
diff --git a/gcc/opts.cc b/gcc/opts.cc
index d00e05f6321..b4f516fdce6 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -1307,6 +1307,24 @@ finish_options (struct gcc_opt

[PATCH v3 7/9] asan: memtag-stack add support for MTE instructions

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Claudiu Zissulescu 

Memory tagging is used for detecting memory safety bugs.  On AArch64, the
memory tagging extension (MTE) helps in reducing the overheads of memory
tagging:
 - CPU: MTE instructions for efficiently tagging and untagging memory.
 - Memory: New memory type, Normal Tagged Memory, added to the Arm
   Architecture.

The MEMory TAGging (MEMTAG) sanitizer uses the same infrastructure as
HWASAN.  MEMTAG and HWASAN are both hardware-assisted solutions, and
rely on the same sanitizer machinery in parts.  So, define new
constructs that allow MEMTAG and HWASAN to share the infrastructure:

  - hwassist_sanitize_p () is true when either SANITIZE_MEMTAG or
SANITIZE_HWASAN is true.
  - hwassist_sanitize_stack_p () is when hwassist_sanitize_p () and
stack variables are to be sanitized.

MEMTAG and HWASAN do have differences, however, and hence, the need to
conditionalize using memtag_sanitize_p () in the relevant places. E.g.,

  - Instead of generating the libcall __hwasan_tag_memory, MEMTAG needs
to invoke the target-specific hook TARGET_MEMTAG_TAG_MEMORY to tag
memory.  Similar approach can be seen for handling
handle_builtin_alloca, where instead of doing the gimple
transformations, target hooks are used.

  - Add a new internal function HWASAN_ALLOCA_POISON to handle
dynamically allocated stack when MEMTAG sanitizer is enabled. At
expansion, this allows to, in turn, invoke target-hooks to increment
tag, and use the generated tag to finally tag the dynamically allocated
memory.

The usual pattern:
irg x0, x0, x0
subgx0, x0, #16, #0
creates a tag in x0 and so on.  For alloca, we need to apply the
generated tag to the new sp.  In absense of an extract tag insn, the
implemenation in expand_HWASAN_ALLOCA_POISON resorts to invoking irg
again.

gcc/ChangeLog:

* asan.cc (handle_builtin_stack_restore): Accommodate MEMTAG
sanitizer.
(handle_builtin_alloca): Expand differently if MEMTAG sanitizer.
(get_mem_refs_of_builtin_call): Include MEMTAG along with
HWASAN.
(memtag_sanitize_stack_p): New definition.
(memtag_sanitize_allocas_p): Likewise.
(memtag_memintrin): Likewise.
(hwassist_sanitize_p): Likewise.
(hwassist_sanitize_stack_p): Likewise.
(report_error_func): Include MEMTAG along with HWASAN.
(build_check_stmt): Likewise.
(instrument_derefs): MEMTAG too does not deal with globals yet.
(instrument_builtin_call):
(maybe_instrument_call): Include MEMTAG along with HWASAN.
(asan_expand_mark_ifn): Likewise.
(asan_expand_check_ifn): Likewise.
(asan_expand_poison_ifn): Expand differently if MEMTAG sanitizer.
(asan_instrument):
(hwasan_frame_base):
(hwasan_record_stack_var):
(hwasan_emit_prologue): Expand differently if MEMTAG sanitizer.
(hwasan_emit_untag_frame): Likewise.
* asan.h (hwasan_record_stack_var):
(memtag_sanitize_stack_p): New declaration.
(memtag_sanitize_allocas_p): Likewise.
(hwassist_sanitize_p): Likewise.
(hwassist_sanitize_stack_p): Likewise.
(asan_sanitize_use_after_scope): Include MEMTAG along with
HWASAN.
* cfgexpand.cc (align_local_variable): Likewise.
(expand_one_stack_var_at): Likewise.
(expand_stack_vars): Likewise.
(expand_one_stack_var_1): Likewise.
(init_vars_expansion): Likewise.
(expand_used_vars): Likewise.
(pass_expand::execute): Likewise.
* gimplify.cc (asan_poison_variable): Likewise.
* internal-fn.cc (expand_HWASAN_ALLOCA_POISON): New definition.
(expand_HWASAN_ALLOCA_UNPOISON): Expand differently if MEMTAG
sanitizer.
(expand_HWASAN_MARK): Likewise.
* internal-fn.def (HWASAN_ALLOCA_POISON): Define new.
* params.opt: Document new param. FIXME.
* sanopt.cc (pass_sanopt::execute): Include MEMTAG along with
HWASAN.
* gcc.c (sanitize_spec_function): Add check for memtag-stack.

Co-authored-by: Indu Bhagat 
Signed-off-by: Claudiu Zissulescu 
---
 gcc/asan.cc | 214 +---
 gcc/asan.h  |  10 ++-
 gcc/cfgexpand.cc|  29 +++---
 gcc/gcc.cc  |   2 +
 gcc/gimplify.cc |   5 +-
 gcc/internal-fn.cc  |  68 --
 gcc/internal-fn.def |   1 +
 gcc/params.opt  |   4 +
 gcc/sanopt.cc   |   2 +-
 9 files changed, 258 insertions(+), 77 deletions(-)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 748b289d6f9..711e6a71eee 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -762,14 +762,15 @@ static void
 handle_builtin_stack_restore (gcall *call, gimple_stmt_iterator *iter)
 {
   if (!iter
-  || !(asan_sanitize_allocas_p () || hwasan_sanitize_allocas_p ()))
+  || !(asan_sanitize_allocas_p () || hwasan_sanitize_allocas_p ()
+  || memtag_sanitize_alloc

[PATCH v3 1/9] targhooks: i386: rename TAG_SIZE to TAG_BITSIZE

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Indu Bhagat 

gcc/Changelog:

* asan.h (HWASAN_TAG_SIZE): Use targetm.memtag.tag_bitsize.
* config/i386/i386.cc (ix86_memtag_tag_size): Rename to
ix86_memtag_tag_bitsize.
(TARGET_MEMTAG_TAG_SIZE): Renamed to TARGET_MEMTAG_TAG_BITSIZE.
* doc/tm.texi (TARGET_MEMTAG_TAG_SIZE): Likewise.
* doc/tm.texi.in (TARGET_MEMTAG_TAG_SIZE): Likewise.
* target.def (tag_size): Rename to tag_bitsize.
* targhooks.cc (default_memtag_tag_size): Rename to
default_memtag_tag_bitsize.
* targhooks.h (default_memtag_tag_size): Liewise.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/asan.h  | 2 +-
 gcc/config/i386/i386.cc | 8 
 gcc/doc/tm.texi | 2 +-
 gcc/doc/tm.texi.in  | 2 +-
 gcc/target.def  | 4 ++--
 gcc/targhooks.cc| 2 +-
 gcc/targhooks.h | 2 +-
 7 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/asan.h b/gcc/asan.h
index 273d6745c58..064d4f24823 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -103,7 +103,7 @@ extern hash_set  *asan_used_labels;
independently here.  */
 /* How many bits are used to store a tag in a pointer.
The default version uses the entire top byte of a pointer (i.e. 8 bits).  */
-#define HWASAN_TAG_SIZE targetm.memtag.tag_size ()
+#define HWASAN_TAG_SIZE targetm.memtag.tag_bitsize ()
 /* Tag Granule of HWASAN shadow stack.
This is the size in real memory that each byte in the shadow memory refers
to.  I.e. if a variable is X bytes long in memory then its tag in shadow
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b64175d6c93..17faf7ebd24 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -27095,9 +27095,9 @@ ix86_memtag_can_tag_addresses ()
   return ix86_lam_type != lam_none && TARGET_LP64;
 }
 
-/* Implement TARGET_MEMTAG_TAG_SIZE.  */
+/* Implement TARGET_MEMTAG_TAG_BITSIZE.  */
 unsigned char
-ix86_memtag_tag_size ()
+ix86_memtag_tag_bitsize ()
 {
   return IX86_HWASAN_TAG_SIZE;
 }
@@ -28071,8 +28071,8 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_MEMTAG_UNTAGGED_POINTER
 #define TARGET_MEMTAG_UNTAGGED_POINTER ix86_memtag_untagged_pointer
 
-#undef TARGET_MEMTAG_TAG_SIZE
-#define TARGET_MEMTAG_TAG_SIZE ix86_memtag_tag_size
+#undef TARGET_MEMTAG_TAG_BITSIZE
+#define TARGET_MEMTAG_TAG_BITSIZE ix86_memtag_tag_bitsize
 
 #undef TARGET_GEN_CCMP_FIRST
 #define TARGET_GEN_CCMP_FIRST ix86_gen_ccmp_first
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5e305643b3a..3f87abf97b2 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12860,7 +12860,7 @@ At preset, this feature does not support address 
spaces.  It also requires
 @code{Pmode} to be the same as @code{ptr_mode}.
 @end deftypefn
 
-@deftypefn {Target Hook} uint8_t TARGET_MEMTAG_TAG_SIZE ()
+@deftypefn {Target Hook} uint8_t TARGET_MEMTAG_TAG_BITSIZE ()
 Return the size of a tag (in bits) for this platform.
 
 The default returns 8.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index eccc4d88493..040d26c40f1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8124,7 +8124,7 @@ maintainer is familiar with.
 
 @hook TARGET_MEMTAG_CAN_TAG_ADDRESSES
 
-@hook TARGET_MEMTAG_TAG_SIZE
+@hook TARGET_MEMTAG_TAG_BITSIZE
 
 @hook TARGET_MEMTAG_GRANULE_SIZE
 
diff --git a/gcc/target.def b/gcc/target.def
index 38903eb567a..db48df9498d 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -7457,11 +7457,11 @@ At preset, this feature does not support address 
spaces.  It also requires\n\
  bool, (), default_memtag_can_tag_addresses)
 
 DEFHOOK
-(tag_size,
+(tag_bitsize,
  "Return the size of a tag (in bits) for this platform.\n\
 \n\
 The default returns 8.",
-  uint8_t, (), default_memtag_tag_size)
+  uint8_t, (), default_memtag_tag_bitsize)
 
 DEFHOOK
 (granule_size,
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index c79458e374e..0696f95adeb 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -2806,7 +2806,7 @@ default_memtag_can_tag_addresses ()
 }
 
 uint8_t
-default_memtag_tag_size ()
+default_memtag_tag_bitsize ()
 {
   return 8;
 }
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index f16b58798c2..c9e57e475dc 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -310,7 +310,7 @@ extern bool speculation_safe_value_not_needed (bool);
 extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx);
 
 extern bool default_memtag_can_tag_addresses ();
-extern uint8_t default_memtag_tag_size ();
+extern uint8_t default_memtag_tag_bitsize ();
 extern uint8_t default_memtag_granule_size ();
 extern rtx default_memtag_insert_random_tag (rtx, rtx);
 extern rtx default_memtag_add_tag (rtx, poly_int64, uint8_t);
-- 
2.50.0



[PATCH v3 0/9] Add memtag-stack sanitizer using MTE instructions.

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Claudiu Zissulescu 

Hi,

Please find a new series of patches that implememnts stack sanitizer
using AArch64 MTE instructions. This new series is based on Indu
previous patch series.

What is new:
 - Introduces a new target instruction tag_memory.
 - Introduces a new target hook to deal with tag computation
   (TARGET_MEMTAG_COMPOSE_OFFSET_TAG).
 - Simplify the stg/st2g instruction patterns to accept POST/PRE
   modify type of addresses.
 - Minimize asan.cc modification.
 - Add execution tests.
 - Improve and fix emitting stg/st2g instructions.
 - Various text improvements.

Thank you,
Claudiu

==
MTE on AArch64 and Memory Tagging

Memory Tagging Extension (MTE) is an AArch64 extension.  This
extension allows coloring of 16-byte memory granules with 4-bit tag
values.  The extension provides additional instructions in ISA and a
new memory type, Normal Tagged Memory, added to the Arm Architecture.
This hardware-assisted mechanism can be used to detect memory bugs
like buffer overrun or use-after-free.  The detection is
probabilistic.

Under the hoods, the MTE extension introduces two types of tags:
  - Address Tags, and,
  - Allocation Tags (a.k.a., Memory Tags)

Address Tag: which acts as the key.  This adds four bits to the top of
a virtual address.  It is built on AArch64 'top-byte-ignore'(TBI)
feature.

Allocation Tag: which acts as the lock.  Allocation tags also consist
of four bits, linked with every aligned 16-byte region in the physical
memory space.  Arm refers to these 16-byte regions as tag granules.
The way Allocation tags are stored is a hardware implementation
detail.

A subset of the MTE instructions which are relevant in the current
context are:

[Xn, Xd are registers containing addresses].

- irg Xd, Xn
  Copy Xn into Xd, insert a random 4-bit Address Tag into Xd.
- addg Xd, Xn, #, #
  Xd = Xn + immA, with Address Tag modified by #immB. Similarly, there
  exists a subg.
- stg Xd, [Xn]
  (Store Allocation Tag) updates Allocation Tag for [Xn, Xn + 16) to the
  Address Tag of Xd.

Additionally, note that load and store instructions with SP base
register do not check tags.

MEMTAG sanitizer for stack
Use MTE instructions to instrument stack accesses to detect memory safety
issues.

Detecting stack-related memory bugs requires the compiler to:
  - ensure that each object on the stack is allocated in its own 16-byte
granule. 
  - Tag/Color: put tags into each stack variable pointer.
  - Untag: the function epilogue will untag the (stack) memory.
Above should work with dynamic stack allocation as well.

GCC has HWASAN machinery for coloring stack variables.  Extend the
machinery to emit MTE instructions when MEMTAG sanitizer is in effect.

Deploying and running user space programs built with -fsanitizer=memtag will
need following additional pieces in place.  If there is any existing work /
ideas on any of the following, please send comments to help define the work.

Additional necessary pieces

* MTE aware exception handling and unwinding routines
The additional stack coloring must work with C++ exceptions and C 
setjmp/longjmp.

* When unwinding the stack for handling C++ exceptions, the unwinder
additionally also needs to untag the stack frame.  As per the
AADWARF64 document: "The character 'G' indicates that associated
frames may modify MTE tags on the stack space they use."

* When restoring the context in longjmp, we need to additionally untag the 
stack.

Claudiu Zissulescu (4):
  target-insns.def: (tag_memory) New pattern.
  targhooks: add TARGET_MEMTAG_COMPOSE_OFFSET_TAG
  asan: memtag-stack add support for MTE instructions
  aarch64: Add support for memetag-stack sanitizer using MTE insns

Indu Bhagat (5):
  targhooks: i386: rename TAG_SIZE to TAG_BITSIZE
  opts: use uint64_t for sanitizer flags
  aarch64: add new constants for MTE insns
  asan: add new memtag sanitizer
  aarch64: Add memtag-stack tests

 gcc/asan.cc   | 214 +++---
 gcc/asan.h|  17 +-
 gcc/builtins.def  |   1 +
 gcc/c-family/c-attribs.cc |  16 +-
 gcc/c-family/c-common.h   |   2 +-
 gcc/c/c-parser.cc |   4 +-
 gcc/cfgexpand.cc  |  29 +-
 gcc/common.opt|   6 +-
 gcc/config/aarch64/aarch64-builtins.cc|   7 +-
 gcc/config/aarch64/aarch64-linux.h|   4 +-
 gcc/config/aarch64/aarch64-protos.h   |   4 +
 gcc/config/aarch64/aarch64.cc | 370 +-
 gcc/config/aarch64/aarch64.md |  78 ++--
 gcc/config/aarch64/constraints.md |  26 ++
 gcc/config/aarch64/predicates.md  |  13 +-
 gcc/config/i386/i386.cc   |   8 +-
 gcc/cp/typeck.cc  |   2 +-
 gcc/d/d-attribs.cc|   8 +-
 gcc/doc/invoke.texi   

[PATCH v3 3/9] target-insns.def: (tag_memory) New pattern.

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Claudiu Zissulescu 

Add a new target instruction. Hardware-assisted sanitizers on
architectures providing insstructions to tag/untag memory can then
make use of this new instruction pattern. For example, the
memtag-stack sanitizer uses these instructions to tag and untag a
memory granule.

gcc/doc/

* md.texi (tag_memory): Add documentation.

gcc/

* target-insns.def (tag_memory): New target instruction.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/doc/md.texi  | 5 +
 gcc/target-insns.def | 1 +
 2 files changed, 6 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 28159b2e820..e4c9a472e3f 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8578,6 +8578,11 @@ the values were equal.
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
 
+@cindex @code{tag_memory} instruction pattern
+This pattern tags an object that begins at the address specified by
+operand 0, has the size indicated by the operand 2, and uses the tag
+from operand 1.
+
 @cindex @code{clear_cache} instruction pattern
 @item @samp{clear_cache}
 This pattern, if defined, flushes the instruction cache for a region of
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 59025a20bf7..16e1d8cf565 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -102,6 +102,7 @@ DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx 
x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
+DEF_TARGET_INSN (tag_memory, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (trap, (void))
 DEF_TARGET_INSN (unique, (void))
 DEF_TARGET_INSN (untyped_call, (rtx x0, rtx x1, rtx x2))
-- 
2.50.0



[PATCH v3 4/9] aarch64: add new constants for MTE insns

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Indu Bhagat 

Define new constants to be used by the MTE pattern definitions.

gcc/

* config/aarch64/aarch64.md (MEMTAG_TAG_MASK): New define
constant.
(MEMTAG_ADDR_MASK): Likewise.
(irg, subp, ldg): Use new constants.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/aarch64/aarch64.md | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 27efc9155dc..bade8af7997 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -440,6 +440,16 @@ (define_constants
; must not operate on inactive inputs if doing so could induce a fault.
(SVE_STRICT_GP 1)])
 
+;; These constants are used as a const_int in MTE instructions
+(define_constants
+  [; 0xf0ff...
+   ; Tag mask for the 4-bit tag stored in the top 8 bits of a pointer.
+   (MEMTAG_TAG_MASK -1080863910568919041)
+
+   ;  0x00ff...
+   ; Tag mask 56-bit address used by subp instruction.
+   (MEMTAG_ADDR_MASK 72057594037927935)])
+
 (include "constraints.md")
 (include "predicates.md")
 (include "iterators.md")
@@ -8556,7 +8566,7 @@ (define_insn "irg"
   [(set (match_operand:DI 0 "register_operand" "=rk")
(ior:DI
 (and:DI (match_operand:DI 1 "register_operand" "rk")
-(const_int -1080863910568919041)) ;; 0xf0ff...
+(const_int MEMTAG_TAG_MASK)) ;; 0xf0ff...
 (ashift:DI (unspec:QI [(match_operand:DI 2 "register_operand" "r")]
 UNSPEC_GEN_TAG_RND)
(const_int 56]
@@ -8599,9 +8609,9 @@ (define_insn "subp"
   [(set (match_operand:DI 0 "register_operand" "=r")
(minus:DI
  (and:DI (match_operand:DI 1 "register_operand" "rk")
- (const_int 72057594037927935)) ;; 0x00ff...
+ (const_int MEMTAG_ADDR_MASK)) ;; 0x00ff...
  (and:DI (match_operand:DI 2 "register_operand" "rk")
- (const_int 72057594037927935] ;; 0x00ff...
+ (const_int MEMTAG_ADDR_MASK] ;; 0x00ff...
   "TARGET_MEMTAG"
   "subp\\t%0, %1, %2"
   [(set_attr "type" "memtag")]
@@ -8611,7 +8621,7 @@ (define_insn "subp"
 (define_insn "ldg"
   [(set (match_operand:DI 0 "register_operand" "+r")
(ior:DI
-(and:DI (match_dup 0) (const_int -1080863910568919041)) ;; 0xf0ff...
+(and:DI (match_dup 0) (const_int MEMTAG_TAG_MASK)) ;; 0xf0ff...
 (ashift:DI
  (mem:QI (unspec:DI
   [(and:DI (plus:DI (match_operand:DI 1 "register_operand" "rk")
-- 
2.50.0



[PATCH v3 5/9] targhooks: add TARGET_MEMTAG_COMPOSE_OFFSET_TAG

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Claudiu Zissulescu 

Add a new target hook TARGET_MEMTAG_COMPOSE_OFFSET_TAG to perform
addition between two tags.

The default of this hook is to byte add the inputs.

Hardware-assisted sanitizers on architectures providing instructions
to compose (add) two tags like in the case of AArch64.

gcc/

* doc/tm.texi: Re-generate.
* doc/tm.texi.in: Add documentation for new target hooks.
* target.def: Add new hook.
* targhooks.cc (default_memtag_compose_offset_tag): New hook.
* targhooks.h (default_memtag_compose_offset_tag): Likewise.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/doc/tm.texi| 6 ++
 gcc/doc/tm.texi.in | 2 ++
 gcc/target.def | 7 +++
 gcc/targhooks.cc   | 7 +++
 gcc/targhooks.h| 2 +-
 5 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 3f87abf97b2..a4fba6d21b3 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12917,6 +12917,12 @@ Store the result in @var{target} if convenient.
 The default clears the top byte of the original pointer.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_MEMTAG_COMPOSE_OFFSET_TAG (rtx 
@var{base_tag}, uint8_t @var{tag_offset})
+Return an RTX that represnts the result of composing @var{tag_offset} with
+the base tag @var{base_tag}.
+The default of this hook is to byte add @var{tag_offset} to @var{base_tag}.
+@end deftypefn
+
 @deftypevr {Target Hook} bool TARGET_HAVE_SHADOW_CALL_STACK
 This value is true if the target platform supports
 @option{-fsanitize=shadow-call-stack}.  The default value is false.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 040d26c40f1..ff381b486e1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8138,6 +8138,8 @@ maintainer is familiar with.
 
 @hook TARGET_MEMTAG_UNTAGGED_POINTER
 
+@hook TARGET_MEMTAG_COMPOSE_OFFSET_TAG
+
 @hook TARGET_HAVE_SHADOW_CALL_STACK
 
 @hook TARGET_HAVE_LIBATOMIC
diff --git a/gcc/target.def b/gcc/target.def
index db48df9498d..89f96ca73c5 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -7521,6 +7521,13 @@ Store the result in @var{target} if convenient.\n\
 The default clears the top byte of the original pointer.",
   rtx, (rtx tagged_pointer, rtx target), default_memtag_untagged_pointer)
 
+DEFHOOK
+(compose_offset_tag,
+ "Return an RTX that represnts the result of composing @var{tag_offset} with\n\
+the base tag @var{base_tag}.\n\
+The default of this hook is to byte add @var{tag_offset} to @var{base_tag}.",
+  rtx, (rtx base_tag, uint8_t tag_offset), default_memtag_compose_offset_tag)
+
 HOOK_VECTOR_END (memtag)
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 0696f95adeb..cfea4a70403 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -2904,4 +2904,11 @@ default_memtag_untagged_pointer (rtx tagged_pointer, rtx 
target)
   return untagged_base;
 }
 
+/* The default implementation of TARGET_MEMTAG_COMPOSE_OFFSET_TAG.  */
+rtx
+default_memtag_compose_offset_tag (rtx base_tag, uint8_t tag_offset)
+{
+  return plus_constant (QImode, base_tag, tag_offset);
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index c9e57e475dc..76afce71baa 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -317,5 +317,5 @@ extern rtx default_memtag_add_tag (rtx, poly_int64, 
uint8_t);
 extern rtx default_memtag_set_tag (rtx, rtx, rtx);
 extern rtx default_memtag_extract_tag (rtx, rtx);
 extern rtx default_memtag_untagged_pointer (rtx, rtx);
-
+extern rtx default_memtag_compose_offset_tag (rtx, uint8_t);
 #endif /* GCC_TARGHOOKS_H */
-- 
2.50.0



Re: [PATCH v3 2/9] opts: use uint64_t for sanitizer flags

2025-07-14 Thread Claudiu Zissulescu-Ianculescu
> I see it now from Richard B.. Also I noticed you missed Richard S.'s
> suggestion of using a typedef which will definitely help in the future
> where we could even replace this with an enum class and overload the
> bitwise operators to do the right thing.
> 

Indeed, I've missed that message. Do you thing adding this type in
hwint.h is a good place, and what name shall I use for this new type?

Thank you,
Claudiu


Re: [PATCH] Avoid depending on destructor order

2022-09-26 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Hi Thomas,

This change prohibits compiling of ARC backend:

> +  gcc_assert (in_shutdown || ob);

in_shutdown is only defined when ATOMIC_FDE_FAST_PATH is defined,
while gcc_assert is outside of any ifdef. Please can you revisit this
line and change it accordingly.

Thanks,
Claudiu


Re: [PATCH] Avoid depending on destructor order

2022-09-26 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Thanks, I haven't observed it.

Waiting for it,
Claudiu

On Mon, Sep 26, 2022 at 2:49 PM Thomas Neumann  wrote:
>
> Hi Claudiu,
>
> > This change prohibits compiling of ARC backend:
> >
> >> +  gcc_assert (in_shutdown || ob);
> >
> > in_shutdown is only defined when ATOMIC_FDE_FAST_PATH is defined,
> > while gcc_assert is outside of any ifdef. Please can you revisit this
> > line and change it accordingly.
>
> I have a patch ready, I am waiting for someone to approve my patch:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602130.html
>
> Best
>
> Thomas


Re: [committed] arc: Fail conditional move expand patterns

2022-02-28 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Hi Robin,

I don't know how I missed your arc related patch, I'll bootstrap and test
your patch asap.

Thanks,
Claudiu


On Fri, Feb 25, 2022 at 3:29 PM Robin Dapp  wrote:

> > If the movcc comparison is not valid it triggers an assert in the
> > current implementation.  This behavior is not needed as we can FAIL
> > the movcc expand pattern.
>
> In case of a MODE_CC comparison you can also just return it as described
> here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104154
>
> or here:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590639.html
>
> If there already is a "CC comparison" the backend does not need to
> create one and ifcvt can make use of this, creating better sequences.
>
> Regards
>  Robin
>


Re: [PATCH] arc: Fix for new ifcvt behavior [PR104154]

2022-02-28 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Hi Robin,

The patch looks good. Please go ahead and merge it, please let me know if
you cannot.

Thank you,
Claudiu

On Mon, Feb 21, 2022 at 9:57 AM Robin Dapp via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi,
>
> I figured I'd just go ahead and post this patch as well since it seems
> to have fixed the arc build problems.
>
> It would be nice if someone could bootstrap/regtest if Jeff hasn't
> already done so.  I was able to verify that the two testcases attached
> to the PR build cleanly but not much more.  Thank you.
>
> Regards
>  Robin
>
> --
>
> PR104154
>
> gcc/ChangeLog:
>
> * config/arc/arc.cc (gen_compare_reg):  Return the CC-mode
> comparison ifcvt passed us.
>
> ---
>
> From fa98a40abd55e3a10653f6a8c5b2414a2025103b Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Mon, 7 Feb 2022 08:39:41 +0100
> Subject: [PATCH] arc: Fix for new ifcvt behavior [PR104154]
>
> ifcvt now passes a CC-mode "comparison" to backends.  This patch
> simply returns from gen_compare_reg () in that case since nothing
> needs to be prepared anymore.
>
> PR104154
>
> gcc/ChangeLog:
>
> * config/arc/arc.cc (gen_compare_reg):  Return the CC-mode
> comparison ifcvt passed us.
> ---
>  gcc/config/arc/arc.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
> index 8cc173519ab..5e40ec2c04d 100644
> --- a/gcc/config/arc/arc.cc
> +++ b/gcc/config/arc/arc.cc
> @@ -2254,6 +2254,12 @@ gen_compare_reg (rtx comparison, machine_mode omode)
>
>
>cmode = GET_MODE (x);
> +
> +  /* If ifcvt passed us a MODE_CC comparison we can
> + just return it.  It should be in the proper form already.   */
> +  if (GET_MODE_CLASS (cmode) == MODE_CC)
> +return comparison;
> +
>if (cmode == VOIDmode)
>  cmode = GET_MODE (y);
>gcc_assert (cmode == SImode || cmode == SFmode || cmode == DFmode);
> --
> 2.31.1
>
>


Re: [PATCH 1/2] ARC: Use intrinsics for __builtin_add_overflow*()

2023-09-07 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Ok.

Thank you for your contribution,
Claudiu

On Wed, Sep 6, 2023 at 3:50 PM Shahab Vahedi  wrote:
>
> This patch covers signed and unsigned additions.  The generated code
> would be something along these lines:
>
> signed:
>   add.f   r0, r1, r2
>   b.v @label
>
> unsigned:
>   add.f   r0, r1, r2
>   b.c @label
>
> gcc/ChangeLog:
>
> * config/arc/arc-modes.def: Add CC_V mode.
> * config/arc/predicates.md (proper_comparison_operator): Handle
> E_CC_Vmode.
> (equality_comparison_operator): Exclude CC_Vmode from eq/ne.
> (cc_set_register): Handle CC_Vmode.
> (cc_use_register): Likewise.
> * config/arc/arc.md (addsi3_v): New insn.
> (addvsi4): New expand.
> (addsi3_c): New insn.
> (uaddvsi4): New expand.
> * config/arc/arc-protos.h (arc_gen_unlikely_cbranch): New.
> * config/arc/arc.cc (arc_gen_unlikely_cbranch): New.
> (get_arc_condition_code): Handle E_CC_Vmode.
> (arc_init_reg_tables): Handle CC_Vmode.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arc/overflow-1.c: New.
>
> Signed-off-by: Shahab Vahedi 
> ---
>  gcc/config/arc/arc-modes.def  |   1 +
>  gcc/config/arc/arc-protos.h   |   1 +
>  gcc/config/arc/arc.cc |  26 +-
>  gcc/config/arc/arc.md |  49 +++
>  gcc/config/arc/predicates.md  |  14 ++-
>  gcc/testsuite/gcc.target/arc/overflow-1.c | 100 ++
>  6 files changed, 187 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/overflow-1.c
>
> diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def
> index 763e880317d..69eeec5935a 100644
> --- a/gcc/config/arc/arc-modes.def
> +++ b/gcc/config/arc/arc-modes.def
> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  CC_MODE (CC_ZN);
>  CC_MODE (CC_Z);
> +CC_MODE (CC_V);
>  CC_MODE (CC_C);
>  CC_MODE (CC_FP_GT);
>  CC_MODE (CC_FP_GE);
> diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
> index 4f2db7ffb59..bc78fb0b370 100644
> --- a/gcc/config/arc/arc-protos.h
> +++ b/gcc/config/arc/arc-protos.h
> @@ -50,6 +50,7 @@ extern bool arc_check_mov_const (HOST_WIDE_INT );
>  extern bool arc_split_mov_const (rtx *);
>  extern bool arc_can_use_return_insn (void);
>  extern bool arc_split_move_p (rtx *);
> +extern void arc_gen_unlikely_cbranch (enum rtx_code, machine_mode, rtx);
>  #endif /* RTX_CODE */
>
>  extern bool arc_ccfsm_branch_deleted_p (void);
> diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
> index f8c9bf17e2c..ec93d40aeb9 100644
> --- a/gcc/config/arc/arc.cc
> +++ b/gcc/config/arc/arc.cc
> @@ -1538,6 +1538,13 @@ get_arc_condition_code (rtx comparison)
> case GEU : return ARC_CC_NC;
> default : gcc_unreachable ();
> }
> +case E_CC_Vmode:
> +  switch (GET_CODE (comparison))
> +   {
> +   case EQ : return ARC_CC_NV;
> +   case NE : return ARC_CC_V;
> +   default : gcc_unreachable ();
> +   }
>  case E_CC_FP_GTmode:
>if (TARGET_ARGONAUT_SET && TARGET_SPFP)
> switch (GET_CODE (comparison))
> @@ -1868,7 +1875,7 @@ arc_init_reg_tables (void)
>   /* mode_class hasn't been initialized yet for EXTRA_CC_MODES, so
>  we must explicitly check for them here.  */
>   if (i == (int) CCmode || i == (int) CC_ZNmode || i == (int) CC_Zmode
> - || i == (int) CC_Cmode
> + || i == (int) CC_Cmode || i == (int) CC_Vmode
>   || i == CC_FP_GTmode || i == CC_FP_GEmode || i == CC_FP_ORDmode
>   || i == CC_FPUmode || i == CC_FPUEmode || i == CC_FPU_UNEQmode)
> arc_mode_class[i] = 1 << (int) C_MODE;
> @@ -11852,6 +11859,23 @@ arc_libm_function_max_error (unsigned cfn, 
> machine_mode mode,
>return default_libm_function_max_error (cfn, mode, boundary_p);
>  }
>
> +/* Generate RTL for conditional branch with rtx comparison CODE in mode
> +   CC_MODE.  */
> +
> +void
> +arc_gen_unlikely_cbranch (enum rtx_code cmp, machine_mode cc_mode, rtx label)
> +{
> +  rtx cc_reg, x;
> +
> +  cc_reg = gen_rtx_REG (cc_mode, CC_REG);
> +  label = gen_rtx_LABEL_REF (VOIDmode, label);
> +
> +  x = gen_rtx_fmt_ee (cmp, VOIDmode, cc_reg, const0_rtx);
> +  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, label, pc_rtx);
> +
> +  emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
> +}
> +
>  #undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
>  #define TARGET_USE_ANCHORS_FOR_SYMBOL_P arc_use_anchors_for_symbol_p
>
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index d37ecbf4292..9d011f6b4a9 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -2725,6 +2725,55 @@ archs4x, archs4xd"
>   }
>")
>
> +(define_insn "addsi3_v"
> + [(set (match_operand:SI 0 "register_operand"  "=r,r,r,  r")
> +   (plus:SI (match_operand:SI 1 "register_operand"   "r,r,0,  r")
> +   (match_operand:SI 2 "nonm

Re: [PATCH 2/2] ARC: Use intrinsics for __builtin_sub_overflow*()

2023-09-07 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
OK,

Thank you for your contribution,
Claudiu

On Wed, Sep 6, 2023 at 3:50 PM Shahab Vahedi  wrote:
>
> This patch covers signed and unsigned subtractions.  The generated code
> would be something along these lines:
>
> signed:
>   sub.f   r0, r1, r2
>   b.v @label
>
> unsigned:
>   sub.f   r0, r1, r2
>   b.c @label
>
> gcc/ChangeLog:
>
> * config/arc/arc.md (subsi3_v): New insn.
> (subvsi4): New expand.
> (subsi3_c): New insn.
> (usubvsi4): New expand.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arc/overflow-2.c: New.
>
> Signed-off-by: Shahab Vahedi 
> ---
>  gcc/config/arc/arc.md | 48 +++
>  gcc/testsuite/gcc.target/arc/overflow-2.c | 97 +++
>  2 files changed, 145 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arc/overflow-2.c
>
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 9d011f6b4a9..34e9e1a7f1d 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -2973,6 +2973,54 @@ archs4x, archs4xd"
>(set_attr "cpu_facility" "*,cd,*,*,*,*,*,*,*,*")
>])
>
> +(define_insn "subsi3_v"
> +  [(set (match_operand:SI  0 "register_operand"  "=r,r,r,  r")
> +   (minus:SI (match_operand:SI 1 "register_operand"   "r,r,0,  r")
> + (match_operand:SI 2 "nonmemory_operand"  "r,L,I,C32")))
> +   (set (reg:CC_V CC_REG)
> +   (compare:CC_V (sign_extend:DI (minus:SI (match_dup 1)
> +   (match_dup 2)))
> + (minus:DI (sign_extend:DI (match_dup 1))
> +   (sign_extend:DI (match_dup 2)]
> +   ""
> +   "sub.f\\t%0,%1,%2"
> +   [(set_attr "cond"   "set")
> +(set_attr "type"   "compare")
> +(set_attr "length" "4,4,4,8")])
> +
> +(define_expand "subvsi4"
> + [(match_operand:SI 0 "register_operand")
> +  (match_operand:SI 1 "register_operand")
> +  (match_operand:SI 2 "nonmemory_operand")
> +  (label_ref (match_operand 3 "" ""))]
> +  ""
> +  "emit_insn (gen_subsi3_v (operands[0], operands[1], operands[2]));
> +   arc_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
> +   DONE;")
> +
> +(define_insn "subsi3_c"
> +  [(set (match_operand:SI  0 "register_operand"  "=r,r,r,  r")
> +   (minus:SI (match_operand:SI 1 "register_operand"   "r,r,0,  r")
> + (match_operand:SI 2 "nonmemory_operand"  "r,L,I,C32")))
> +   (set (reg:CC_C CC_REG)
> +   (compare:CC_C (match_dup 1)
> + (match_dup 2)))]
> +   ""
> +   "sub.f\\t%0,%1,%2"
> +   [(set_attr "cond"   "set")
> +(set_attr "type"   "compare")
> +(set_attr "length" "4,4,4,8")])
> +
> +(define_expand "usubvsi4"
> +  [(match_operand:SI 0 "register_operand")
> +   (match_operand:SI 1 "register_operand")
> +   (match_operand:SI 2 "nonmemory_operand")
> +   (label_ref (match_operand 3 "" ""))]
> +   ""
> +   "emit_insn (gen_subsi3_c (operands[0], operands[1], operands[2]));
> +arc_gen_unlikely_cbranch (LTU, CC_Cmode, operands[3]);
> +DONE;")
> +
>  (define_expand "subdi3"
>[(set (match_operand:DI 0 "register_operand" "")
> (minus:DI (match_operand:DI 1 "register_operand" "")
> diff --git a/gcc/testsuite/gcc.target/arc/overflow-2.c 
> b/gcc/testsuite/gcc.target/arc/overflow-2.c
> new file mode 100644
> index 000..b4de8c03b22
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/overflow-2.c
> @@ -0,0 +1,97 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1" } */
> +
> +#include 
> +#include 
> +
> +/*
> + * sub.f  r0,r0,r1
> + * st_s   r0,[r2]
> + * mov_s  r0,1
> + * j_s.d  [blink]
> + * mov.nv r0,0
> + */
> +bool sub_overflow (int32_t a, int32_t b, int32_t *res)
> +{
> +  return __builtin_sub_overflow (a, b, res);
> +}
> +
> +/*
> + * sub.f  r0,r0,-1234
> + * st_s   r0,[r1]
> + * mov_s  r0,1
> + * j_s.d  [blink]
> + * mov.nv r0,0
> + */
> +bool subi_overflow (int32_t a, int32_t *res)
> +{
> +  return __builtin_sub_overflow (a, -1234, res);
> +}
> +
> +/*
> + * sub.f  r3,r0,r1
> + * st_s   r3,[r2]
> + * j_s.d  [blink]
> + * setlo  r0,r0,r1
> + */
> +bool usub_overflow (uint32_t a, uint32_t b, uint32_t *res)
> +{
> +  return __builtin_sub_overflow (a, b, res);
> +}
> +
> +/*
> + * sub.f  r2,r0,4321
> + * seths  r0,4320,r0
> + * j_s.d  [blink]
> + * st_s   r2,[r1]
> + */
> +bool usubi_overflow (uint32_t a, uint32_t *res)
> +{
> +  return __builtin_sub_overflow (a, 4321, res);
> +}
> +
> +/*
> + * sub.f  r0,r0,r1
> + * mov_s  r0,1
> + * j_s.d  [blink]
> + * mov.nv r0,0
> + */
> +bool sub_overflow_p (int32_t a, int32_t b, int32_t res)
> +{
> +  return __builtin_sub_overflow_p (a, b, res);
> +}
> +
> +/*
> + * sub.f  r0,r0,-1000
> + * mov_s  r0,1
> + * j_s.d  [blink]
> + * mov.nv r0,0
> + */
> +bool subi_overflow_p (int32_t a, int32_t res)
> +{
> +  return __builtin_sub_overflow_p (a, -1000, res);
> +}
> +
> +/*
> + * j_s.d  [blink]
> + * setlo  r0,r0,r1
> + */
> +bool usub_overflow_p (uint32_t a, uint32_t b, uint32_t res)
> +{
> + 

Re: [PATCH] [ARC] Allow more ABIs in GLIBC_DYNAMIC_LINKER

2020-03-31 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Pushed.

Thank you,
Claudiu

On Sun, Mar 29, 2020 at 2:05 AM Vineet Gupta via Gcc-patches
 wrote:
>
> Enable big-endian suffixed dynamic linker per glibc multi-abi support.
>
> And to avoid a future churn and version pairingi hassles, also allow
> arc700 although glibc for ARC currently doesn't support it.
>
> gcc/
> -xx-xx  Vineet Gupta 
> +
> +   * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/ChangeLog  | 4 
>  gcc/config/arc/linux.h | 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 86ad683a6cb0..c26a748fd51b 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,7 @@
> +2020-03-28  Vineet Gupta 
> +
> +   * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700
> +
>  2020-03-28  Jakub Jelinek  
>
> PR c/93573
> diff --git a/gcc/config/arc/linux.h b/gcc/config/arc/linux.h
> index 0b99da3fcdaf..1bbeccee7115 100644
> --- a/gcc/config/arc/linux.h
> +++ b/gcc/config/arc/linux.h
> @@ -29,7 +29,7 @@ along with GCC; see the file COPYING3.  If not see
>  }  \
>while (0)
>
> -#define GLIBC_DYNAMIC_LINKER   "/lib/ld-linux-arc.so.2"
> +#define GLIBC_DYNAMIC_LINKER   
> "/lib/ld-linux-arc%{mbig-endian:eb}%{mcpu=arc700:700}.so.2"
>  #define UCLIBC_DYNAMIC_LINKER  "/lib/ld-uClibc.so.0"
>
>  /* Note that the default is to link against dynamic libraries, if they are
> --
> 2.20.1
>


Re: [PATCH] [ARC] Allow more ABIs in GLIBC_DYNAMIC_LINKER

2020-04-10 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Done.

Thank you for your support,
Claudiu

On Thu, Apr 9, 2020 at 2:38 AM Vineet Gupta  wrote:
>
> Hi Claudiu,
>
> For glibc needs can this be backported to gcc-9 please !
>
> Thx,
> -Vineet
>
> On 3/31/20 3:06 AM, Claudiu Zissulescu Ianculescu wrote:
> > Pushed.
> >
> > Thank you,
> > Claudiu
> >
> > On Sun, Mar 29, 2020 at 2:05 AM Vineet Gupta via Gcc-patches
> >  wrote:
> >> Enable big-endian suffixed dynamic linker per glibc multi-abi support.
> >>
> >> And to avoid a future churn and version pairingi hassles, also allow
> >> arc700 although glibc for ARC currently doesn't support it.
> >>
> >> gcc/
> >> -xx-xx  Vineet Gupta 
> >> +
> >> +   * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700
> >>
> >> Signed-off-by: Vineet Gupta 
> >> ---
> >>  gcc/ChangeLog  | 4 
> >>  gcc/config/arc/linux.h | 2 +-
> >>  2 files changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> >> index 86ad683a6cb0..c26a748fd51b 100644
> >> --- a/gcc/ChangeLog
> >> +++ b/gcc/ChangeLog
> >> @@ -1,3 +1,7 @@
> >> +2020-03-28  Vineet Gupta 
> >> +
> >> +   * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700
> >> +
> >>  2020-03-28  Jakub Jelinek  
> >>
> >> PR c/93573
> >> diff --git a/gcc/config/arc/linux.h b/gcc/config/arc/linux.h
> >> index 0b99da3fcdaf..1bbeccee7115 100644
> >> --- a/gcc/config/arc/linux.h
> >> +++ b/gcc/config/arc/linux.h
> >> @@ -29,7 +29,7 @@ along with GCC; see the file COPYING3.  If not see
> >>  }  \
> >>while (0)
> >>
> >> -#define GLIBC_DYNAMIC_LINKER   "/lib/ld-linux-arc.so.2"
> >> +#define GLIBC_DYNAMIC_LINKER   
> >> "/lib/ld-linux-arc%{mbig-endian:eb}%{mcpu=arc700:700}.so.2"
> >>  #define UCLIBC_DYNAMIC_LINKER  "/lib/ld-uClibc.so.0"
> >>
> >>  /* Note that the default is to link against dynamic libraries, if they are
> >> --
> >> 2.20.1
> >>
> > ___
> > linux-snps-arc mailing list
> > linux-snps-...@lists.infradead.org
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dsnps-2Darc&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=7FgpX6o3vAhwMrMhLh-4ZJey5kjdNUwOL2CWsFwR4T8&m=MrObyH2ki95_7m_xHpnWX-k9eIMOsxMuSa48qhxYOCY&s=3ggbGwaiJuSFnFECy0ItuwBBMDAcriwCdSc3GA0UFig&e=
>


Re: [PATCH] arc: Use separate predicated patterns for mpyd(u)

2020-10-23 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Gentle PING.

On Wed, Oct 7, 2020 at 12:39 PM Claudiu Zissulescu  wrote:
>
> From: Claudiu Zissulescu 
>
> The compiler can match mpyd.eq r0,r1,r0 as a predicated instruction,
> which is incorrect. The mpyd(u) instruction takes as input two 32-bit
> registers, returning into a double 64-bit even-odd register pair.  For
> the predicated case, the ARC instruction decoder expects the
> destination register to be the same as the first input register. In
> the big-endian case the result is swaped in the destination register
> pair, however, the instruction encoding remains the same.  Refurbish
> the mpyd(u) patterns to take into account the above observation.
>
> Permission to apply this patch to master, gcc10 and gcc9 branches.
>
> Cheers,
> Claudiu
>
> -xx-xx  Claudiu Zissulescu  
>
> * testsuite/gcc.target/arc/pmpyd.c: New test.
> * testsuite/gcc.target/arc/tmac-1.c: Update.
> * config/arc/arc.md (mpyd_arcv2hs): New template
> pattern.
> (*pmpyd_arcv2hs): Likewise.
> (*pmpyd_imm_arcv2hs): Likewise.
> (mpyd_arcv2hs): Moved into above template.
> (mpyd_imm_arcv2hs): Moved into above template.
> (mpydu_arcv2hs): Likewise.
> (mpydu_imm_arcv2hs): Likewise.
> (su_optab): New optab prefix for sign/zero-extending operations.
>
> Signed-off-by: Claudiu Zissulescu 
> ---
>  gcc/config/arc/arc.md | 101 +-
>  gcc/testsuite/gcc.target/arc/pmpyd.c  |  15 
>  gcc/testsuite/gcc.target/arc/tmac-1.c |   2 +-
>  3 files changed, 67 insertions(+), 51 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/pmpyd.c
>
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 1720e8cd2f6f..d4d9f59a3eac 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -894,6 +894,8 @@ archs4x, archs4xd"
>
>  (define_code_iterator SEZ [sign_extend zero_extend])
>  (define_code_attr SEZ_prefix [(sign_extend "sex") (zero_extend "ext")])
> +; Optab prefix for sign/zero-extending operations
> +(define_code_attr su_optab [(sign_extend "") (zero_extend "u")])
>
>  (define_insn "*xt_cmp0_noout"
>[(set (match_operand 0 "cc_set_register" "")
> @@ -6436,66 +6438,65 @@ archs4x, archs4xd"
> (set_attr "predicable" "no")
> (set_attr "cond" "nocond")])
>
> -(define_insn "mpyd_arcv2hs"
> -  [(set (match_operand:DI 0 "even_register_operand"
> "=Rcr, r")
> -   (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand"  "  
> 0, c"))
> -(sign_extend:DI (match_operand:SI 2 "register_operand"  "  
> c, c"
> +(define_insn "mpyd_arcv2hs"
> +  [(set (match_operand:DI 0 "even_register_operand"   "=r")
> +   (mult:DI (SEZ:DI (match_operand:SI 1 "register_operand" "r"))
> +(SEZ:DI (match_operand:SI 2 "register_operand" "r"
> (set (reg:DI ARCV2_ACC)
> (mult:DI
> - (sign_extend:DI (match_dup 1))
> - (sign_extend:DI (match_dup 2]
> + (SEZ:DI (match_dup 1))
> + (SEZ:DI (match_dup 2]
>"TARGET_PLUS_MACD"
> -  "mpyd%? %0,%1,%2"
> -  [(set_attr "length" "4,4")
> -  (set_attr "iscompact" "false")
> -  (set_attr "type" "multi")
> -  (set_attr "predicable" "yes,no")
> -  (set_attr "cond" "canuse,nocond")])
> -
> -(define_insn "mpyd_imm_arcv2hs"
> -  [(set (match_operand:DI 0 "even_register_operand"
> "=Rcr, r,r,Rcr,  r")
> -   (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand"  "  
> 0, c,0,  0,  c"))
> -(match_operand 2   "immediate_operand"  "  
> L, L,I,Cal,Cal")))
> +  "mpyd%?\\t%0,%1,%2"
> +  [(set_attr "length" "4")
> +   (set_attr "iscompact" "false")
> +   (set_attr "type" "multi")
> +   (set_attr "predicable" "no")])
> +
> +(define_insn "*pmpyd_arcv2hs"
> +  [(set (match_operand:DI 0 "even_register_operand" "=r")
> +   (mult:DI
> +(SEZ:DI (match_operand:SI 1 "even_register_operand" "%0"))
> +(SEZ:DI (match_operand:SI 2 "register_operand"  "r"
> (set (reg:DI ARCV2_ACC)
> -   (mult:DI (sign_extend:DI (match_dup 1))
> -(match_dup 2)))]
> +   (mult:DI
> + (SEZ:DI (match_dup 1))
> + (SEZ:DI (match_dup 2]
>"TARGET_PLUS_MACD"
> -  "mpyd%? %0,%1,%2"
> -  [(set_attr "length" "4,4,4,8,8")
> -  (set_attr "iscompact" "false")
> -  (set_attr "type" "multi")
> -  (set_attr "predicable" "yes,no,no,yes,no")
> -  (set_attr "cond" "canuse,nocond,nocond,canuse_limm,nocond")])
> -
> -(define_insn "mpydu_arcv2hs"
> -  [(set (match_operand:DI 0 "even_register_operand"
> "=Rcr, r")
> -   (mult:DI (zero_extend:DI (match_operand:SI 1 "register_operand"  "  
> 0, c"))
> -(zero_extend:DI (match_operand:SI 2 "register_operand" "   
> c, c"
> +  "mpyd%?\\t%0,%1,%2"
> +  [(set_attr "length" "4")
> +   (set_attr "iscompact" "false")
> +   (set_attr "type" "multi")
> +   (set_attr "predicable

Re: [PATCH] arc: Improve/add instruction patterns to better use MAC instructions.

2020-10-23 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Gentle PING.

On Fri, Oct 9, 2020 at 5:24 PM Claudiu Zissulescu  wrote:
>
> From: Claudiu Zissulescu 
>
> ARC MYP7+ instructions add MAC instructions for vector and scalar data
> types. This patch adds a madd pattern for 16it datum that is using the
> 32bit MAC instruction, and dot_prod patterns for v4hi vector
> types. The 64bit moves are also upgraded by using vadd2 instuction.
>
> gcc/
> -xx-xx  Claudiu Zissulescu  
>
> * config/arc/arc.c (arc_split_move): Recognize vadd2 instructions.
> * config/arc/arc.md (movdi_insn): Update pattern to use vadd2
> instructions.
> (movdf_insn): Likewise.
> (maddhisi4): New pattern.
> (umaddhisi4): Likewise.
> * config/arc/simdext.md (mov_int): Update pattern to use
> vadd2.
> (sdot_prodv4hi): New pattern.
> (udot_prodv4hi): Likewise.
> (arc_vec_mac_hi_v4hi): Update/renamed to
> arc_vec_mac_v2hiv2si.
> (arc_vec_mac_v2hiv2si_zero): New pattern.
>
> Signed-off-by: Claudiu Zissulescu 
> ---
>  gcc/config/arc/arc.c  |  8 
>  gcc/config/arc/arc.md | 71 ---
>  gcc/config/arc/constraints.md |  5 ++
>  gcc/config/arc/simdext.md | 90 +++
>  4 files changed, 147 insertions(+), 27 deletions(-)
>
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index ec55cfde87a9..d5b521e75e67 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -10202,6 +10202,14 @@ arc_split_move (rtx *operands)
>return;
>  }
>
> +  if (TARGET_PLUS_QMACW
> +  && even_register_operand (operands[0], mode)
> +  && even_register_operand (operands[1], mode))
> +{
> +  emit_move_insn (operands[0], operands[1]);
> +  return;
> +}
> +
>if (TARGET_PLUS_QMACW
>&& GET_CODE (operands[1]) == CONST_VECTOR)
>  {
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index f9fc11e51a85..1720e8cd2f6f 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -1345,8 +1345,8 @@ archs4x, archs4xd"
>")
>
>  (define_insn_and_split "*movdi_insn"
> -  [(set (match_operand:DI 0 "move_dest_operand"  "=w, w,r,   m")
> -   (match_operand:DI 1 "move_double_src_operand" "c,Hi,m,cCm3"))]
> +  [(set (match_operand:DI 0 "move_dest_operand"  "=r, r,r,   m")
> +   (match_operand:DI 1 "move_double_src_operand" "r,Hi,m,rCm3"))]
>"register_operand (operands[0], DImode)
> || register_operand (operands[1], DImode)
> || (satisfies_constraint_Cm3 (operands[1])
> @@ -1358,6 +1358,13 @@ archs4x, archs4xd"
>  default:
>return \"#\";
>
> +case 0:
> +if (TARGET_PLUS_QMACW
> +   && even_register_operand (operands[0], DImode)
> +   && even_register_operand (operands[1], DImode))
> +  return \"vadd2\\t%0,%1,0\";
> +return \"#\";
> +
>  case 2:
>  if (TARGET_LL64
>  && memory_operand (operands[1], DImode)
> @@ -1374,7 +1381,7 @@ archs4x, archs4xd"
>  return \"#\";
>  }
>  }"
> -  "reload_completed"
> +  "&& reload_completed"
>[(const_int 0)]
>{
> arc_split_move (operands);
> @@ -1420,15 +1427,24 @@ archs4x, archs4xd"
>"if (prepare_move_operands (operands, DFmode)) DONE;")
>
>  (define_insn_and_split "*movdf_insn"
> -  [(set (match_operand:DF 0 "move_dest_operand"  "=D,r,c,c,r,m")
> -   (match_operand:DF 1 "move_double_src_operand" "r,D,c,E,m,c"))]
> -  "register_operand (operands[0], DFmode) || register_operand (operands[1], 
> DFmode)"
> +  [(set (match_operand:DF 0 "move_dest_operand"  "=D,r,r,r,r,m")
> +   (match_operand:DF 1 "move_double_src_operand" "r,D,r,E,m,r"))]
> +  "register_operand (operands[0], DFmode)
> +   || register_operand (operands[1], DFmode)"
>"*
>  {
>   switch (which_alternative)
> {
>  default:
>return \"#\";
> +
> +case 2:
> +if (TARGET_PLUS_QMACW
> +   && even_register_operand (operands[0], DFmode)
> +   && even_register_operand (operands[1], DFmode))
> +  return \"vadd2\\t%0,%1,0\";
> +return \"#\";
> +
>  case 4:
>  if (TARGET_LL64
> && ((even_register_operand (operands[0], DFmode)
> @@ -6177,6 +6193,49 @@ archs4x, archs4xd"
>[(set_attr "length" "0")])
>
>  ;; MAC and DMPY instructions
> +
> +; Use MAC instruction to emulate 16bit mac.
> +(define_expand "maddhisi4"
> +  [(match_operand:SI 0 "register_operand" "")
> +   (match_operand:HI 1 "register_operand" "")
> +   (match_operand:HI 2 "extend_operand"   "")
> +   (match_operand:SI 3 "register_operand" "")]
> +  "TARGET_PLUS_DMPY"
> +  "{
> +   rtx acc_reg = gen_rtx_REG (DImode, ACC_REG_FIRST);
> +   rtx tmp1 = gen_reg_rtx (SImode);
> +   rtx tmp2 = gen_reg_rtx (SImode);
> +   rtx accl = gen_lowpart (SImode, acc_reg);
> +
> +   emit_move_insn (accl, operands[3]);
> +   emit_insn (gen_rtx_SET (tmp1, gen_rtx_SIGN_EXTEND (SImode, operands[1])));
> +   emit_insn (gen_rtx_SET (tmp2, gen_rtx_SIGN_EXTEND (SImode, operand

Re: [PATCH] arc: Add --with-fpu support for ARCv2 cpus

2021-06-14 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Thanks a lot guys. Patch is pushed.

//Claudiu

On Mon, Jun 14, 2021 at 12:34 AM Jeff Law  wrote:
>
>
>
> On 6/13/2021 4:06 AM, Bernhard Reutner-Fischer wrote:
> > On Fri, 11 Jun 2021 14:25:24 +0300
> > Claudiu Zissulescu  wrote:
> >
> >> Hi Bernhard,
> >>
> >> Please find attached my latest patch, it includes (hopefully) all your
> >> feedback.
> >>
> >> Thank you for comments,
> > concise and clean, i wouldn't know what to remove. LGTM.
> > thanks for your patience!
> THen let's consider it approved at this point.  Thanks for chiming in
> Bernhard and thanks for implementing the suggestions Claudiu!
>
> jeff