Re: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.
Hi Roger, It is not necessary to do any mods on your patch. I've just answered the questions which you asked me. The adds are faster for the ARC CPUs which are still in production, and I suppose we can leverage the LP instruction use with DBNZ instructions for implementing loops. I'll come back to you asap, after I've got the nightly results :) Thank you, Claudiu On Tue, Oct 3, 2023 at 6:34 PM Roger Sayle wrote: > > > Hi Claudiu, > Thanks for the answers to my technical questions. > If you'd prefer to update arc.md's add3 pattern first, > I'm happy to update/revise my patch based on this > and your feedback, for example preferring add over > asl_s (or controlling this choice with -Os). > > Thanks again. > Roger > -- > > > -Original Message- > > From: Claudiu Zissulescu > > Sent: 03 October 2023 15:26 > > To: Roger Sayle ; gcc-patches@gcc.gnu.org > > Subject: RE: [ARC PATCH] Split SImode shifts pre-reload on > > !TARGET_BARREL_SHIFTER. > > > > Hi Roger, > > > > It was nice to meet you too. > > > > Thank you in looking into the ARC's non-Barrel Shifter configurations. I > will dive > > into your patch asap, but before starting here are a few of my comments: > > > > -Original Message- > > From: Roger Sayle > > Sent: Thursday, September 28, 2023 2:27 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Claudiu Zissulescu > > Subject: [ARC PATCH] Split SImode shifts pre-reload on > > !TARGET_BARREL_SHIFTER. > > > > > > Hi Claudiu, > > It was great meeting up with you and the Synopsys ARC team at the GNU > tools > > Cauldron in Cambridge. > > > > This patch is the first in a series to improve SImode and DImode shifts > and rotates > > in the ARC backend. This first piece splits SImode shifts, for > > !TARGET_BARREL_SHIFTER targets, after combine and before reload, in the > split1 > > pass, as suggested by the FIXME comment above output_shift in arc.cc. To > do > > this I've copied the implementation of the x86_pre_reload_split function > from > > i386 backend, and renamed it arc_pre_reload_split. > > > > Although the actual implementations of shifts remain the same (as in > > output_shift), having them as explicit instructions in the RTL stream > allows better > > scheduling and use of compact forms when available. The benefits can be > seen in > > two short examples below. > > > > For the function: > > unsigned int foo(unsigned int x, unsigned int y) { > > return y << 2; > > } > > > > GCC with -O2 -mcpu=em would previously generate: > > foo:add r1,r1,r1 > > add r1,r1,r1 > > j_s.d [blink] > > mov_s r0,r1 ;4 > > > > [CZI] The move shouldn't be generated indeed. The use of ADDs are slightly > > beneficial for older ARCv1 arches. > > > > and with this patch now generates: > > foo:asl_s r0,r1 > > j_s.d [blink] > > asl_s r0,r0 > > > > [CZI] Nice. This new sequence is as fast as we can get for our ARCv2 cpus. > > > > Notice the original (from shift_si3's output_shift) requires the shift > sequence to be > > monolithic with the same destination register as the source (requiring an > extra > > mov_s). The new version can eliminate this move, and schedule the second > asl in > > the branch delay slot of the return. > > > > For the function: > > int x,y,z; > > > > void bar() > > { > > x <<= 3; > > y <<= 3; > > z <<= 3; > > } > > > > GCC -O2 -mcpu=em currently generates: > > bar:push_s r13 > > ld.as r12,[gp,@x@sda] ;23 > > ld.as r3,[gp,@y@sda] ;23 > > mov r2,0 > > add3 r12,r2,r12 > > mov r2,0 > > add3 r3,r2,r3 > > ld.as r2,[gp,@z@sda] ;23 > > st.as r12,[gp,@x@sda] ;26 > > mov r13,0 > > add3 r2,r13,r2 > > st.as r3,[gp,@y@sda] ;26 > > st.as r2,[gp,@z@sda] ;26 > > j_s.d [blink] > > pop_s r13 > > > > where each shift by 3, uses ARC's add3 instruction, which is similar to > x86's lea > > implementing x = (y<<3) + z, but requires the value zero to be placed in a > > temporary register "z". Splitting this before reload allows these pseudos > to be > > shared/reused. With this patch, we get > > > > bar:ld.as r2,[gp,@x@sda] ;23 > > mov_s r3,0;3 > > add3r2,r3,r2 > > ld.as r3,[gp,@y@sda] ;23 > > st.as r2,[gp,@x@sda] ;26 > > ld.as r2,[gp,@z@sda] ;23 > > mov_s r12,0 ;3 > > add3r3,r12,r3 > > add3r2,r12,r2 > > st.as r3,[gp,@y@sda] ;26 > > st.as r2,[gp,@z@sda] ;26 > > j_s [blink] > > > > [CZI] Looks great, but it also shows that I've forgot to add to ADD3 > instruction the > > Ra,LIMM,RC variant, which will lead to have instead of > > mov_s r3,0;3 > > add3r2,r3,r2 > > Only this add3,0,r2, Indeed it is longer instruction but faster. > > > > Unfortunately, register allocation means that we only share two of the > three > > "mov_s z,0", but this is sufficient to red
Re: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.
Hi Roger, The patch as it is passed the validation, and it is in general OK. Although it doesn't address the elephant in the room, namely output_shift function, it is a welcome cleanup. I would like you to split the patch in two. One which deals with improvements on shifts in absence of a barrel shifter, and one which addresses the default instruction length, as they can be seen as separate work. Please feel free to commit resulting patches to the mainline. Thank you for your contribution, Claudiu On Thu, Sep 28, 2023 at 2:27 PM Roger Sayle wrote: > > > Hi Claudiu, > It was great meeting up with you and the Synopsys ARC team at the > GNU tools Cauldron in Cambridge. > > This patch is the first in a series to improve SImode and DImode > shifts and rotates in the ARC backend. This first piece splits > SImode shifts, for !TARGET_BARREL_SHIFTER targets, after combine > and before reload, in the split1 pass, as suggested by the FIXME > comment above output_shift in arc.cc. To do this I've copied the > implementation of the x86_pre_reload_split function from i386 > backend, and renamed it arc_pre_reload_split. > > Although the actual implementations of shifts remain the same > (as in output_shift), having them as explicit instructions in > the RTL stream allows better scheduling and use of compact forms > when available. The benefits can be seen in two short examples > below. > > For the function: > unsigned int foo(unsigned int x, unsigned int y) { > return y << 2; > } > > GCC with -O2 -mcpu=em would previously generate: > foo:add r1,r1,r1 > add r1,r1,r1 > j_s.d [blink] > mov_s r0,r1 ;4 > and with this patch now generates: > foo:asl_s r0,r1 > j_s.d [blink] > asl_s r0,r0 > > Notice the original (from shift_si3's output_shift) requires the > shift sequence to be monolithic with the same destination register > as the source (requiring an extra mov_s). The new version can > eliminate this move, and schedule the second asl in the branch > delay slot of the return. > > For the function: > int x,y,z; > > void bar() > { > x <<= 3; > y <<= 3; > z <<= 3; > } > > GCC -O2 -mcpu=em currently generates: > bar:push_s r13 > ld.as r12,[gp,@x@sda] ;23 > ld.as r3,[gp,@y@sda] ;23 > mov r2,0 > add3 r12,r2,r12 > mov r2,0 > add3 r3,r2,r3 > ld.as r2,[gp,@z@sda] ;23 > st.as r12,[gp,@x@sda] ;26 > mov r13,0 > add3 r2,r13,r2 > st.as r3,[gp,@y@sda] ;26 > st.as r2,[gp,@z@sda] ;26 > j_s.d [blink] > pop_s r13 > > where each shift by 3, uses ARC's add3 instruction, which is similar > to x86's lea implementing x = (y<<3) + z, but requires the value zero > to be placed in a temporary register "z". Splitting this before reload > allows these pseudos to be shared/reused. With this patch, we get > > bar:ld.as r2,[gp,@x@sda] ;23 > mov_s r3,0;3 > add3r2,r3,r2 > ld.as r3,[gp,@y@sda] ;23 > st.as r2,[gp,@x@sda] ;26 > ld.as r2,[gp,@z@sda] ;23 > mov_s r12,0 ;3 > add3r3,r12,r3 > add3r2,r12,r2 > st.as r3,[gp,@y@sda] ;26 > st.as r2,[gp,@z@sda] ;26 > j_s [blink] > > Unfortunately, register allocation means that we only share two of the > three "mov_s z,0", but this is sufficient to reduce register pressure > enough to avoid spilling r13 in the prologue/epilogue. > > This patch also contains a (latent?) bug fix. The implementation of > the default insn "length" attribute, assumes instructions of type > "shift" have two input operands and accesses operands[2], hence > specializations of shifts that don't have a operands[2], need to be > categorized as type "unary" (which results in the correct length). > > This patch has been tested on a cross-compiler to arc-elf (hosted on > x86_64-pc-linux-gnu), but because I've an incomplete tool chain many > of the regression test fail, but there are no new failures with new > test cases added below. If you can confirm that there are no issues > from additional testing, is this OK for mainline? > > Finally a quick technical question. ARC's zero overhead loops require > at least two instructions in the loop, so currently the backend's > implementation of shr20 pads the loop body with a "nop". > > lshr20: mov.f lp_count, 20 > lpnz2f > lsr r0,r0 > nop > 2: # end single insn loop > j_s [blink] > > could this be more efficiently implemented as: > > lshr20: mov lp_count, 10 > lp 2f > lsr_s r0,r0 > lsr_s r0,r0 > 2: # end single insn loop > j_s [blink] > > i.e. half the number of iterations, but doing twice as much useful > work in each iteration? Or might the nop be free on advanced > microarchitectures, and/or the consecutive dependent shifts cause > a pipeline stall? It would be nice to fuse loops t
Re: [ARC PATCH] Split asl dst, 1, src into bset dst, 0, src to implement 1<
Hi Roger, Indeed, I was missing the patch file. Approved. Thank you for your contribution, Claudiu On Sun, Oct 15, 2023 at 11:14 AM Roger Sayle wrote: > > I’ve done it again. ENOPATCH. > > > > From: Roger Sayle > Sent: 15 October 2023 09:13 > To: 'gcc-patches@gcc.gnu.org' > Cc: 'Claudiu Zissulescu' > Subject: [ARC PATCH] Split asl dst,1,src into bset dst,0,src to implement > 1< > > > > > This patch adds a pre-reload splitter to arc.md, to use the bset (set > > specific bit instruction) to implement 1< > on ARC processors that don't have a barrel shifter. > > > > Currently, > > > > int foo(int x) { > > return 1 << x; > > } > > > > when compiled with -O2 -mcpu=em is compiled as a loop: > > > > foo:mov_s r2,1;3 > > and.f lp_count,r0, 0x1f > > lpnz2f > > add r2,r2,r2 > > nop > > 2: # end single insn loop > > j_s.d [blink] > > mov_s r0,r2 ;4 > > > > with this patch we instead generate a single instruction: > > > > foo:bsetr0,0,r0 > > j_s [blink] > > > > > > Finger-crossed this passes Claudiu's nightly testing. This patch > > has been minimally tested by building a cross-compiler cc1 to > > arc-linux hosted on x86_64-pc-linux-gnu with no additional failures > > seen with make -k check. Ok for mainline? Thanks in advance. > > > > > > 2023-10-15 Roger Sayle > > > > gcc/ChangeLog > > * config/arc/arc.md (*ashlsi3_1): New pre-reload splitter to > > use bset dst,0,src to implement 1< > > > > > Cheers, > > Roger > > -- > >
Re: [ARC PATCH] Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER.
Hi Roger, Your patch doesn't introduce new regressions. However, before pushing to the mainline you need to fix some issues: 1. Please fix the trailing spaces and blocks of 8 spaces which should be replaced with tabs. You can use check_GNU_style.py script to spot them. 2. Please use capital letters for code iterators (i.e., any_shift_rotate). Once the above issues are fixed, please proceed with your commit. Thank you for your contribution, Claudiu On Sun, Oct 8, 2023 at 10:07 PM Roger Sayle wrote: > > > This patch completes the ARC back-end's transition to using pre-reload > splitters for SImode shifts and rotates on targets without a barrel > shifter. The core part is that the shift_si3 define_insn is no longer > needed, as shifts and rotates that don't require a loop are split > before reload, and then because shift_si3_loop is the only caller > of output_shift, both can be significantly cleaned up and simplified. > The output_shift function (Claudiu's "the elephant in the room") is > renamed output_shift_loop, which handles just the four instruction > zero-overhead loop implementations. > > Aside from the clean-ups, the user visible changes are much improved > implementations of SImode shifts and rotates on affected targets. > > For the function: > unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); } > > GCC with -O2 -mcpu=em would previously generate: > > rotr_1: lsr_s r2,r0 > bmsk_s r0,r0,0 > ror r0,r0 > j_s.d [blink] > or_sr0,r0,r2 > > with this patch, we now generate: > > j_s.d [blink] > ror r0,r0 > > For the function: > unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); } > > GCC with -O2 -mcpu=em would previously generate: > > rotr_31: > mov_s r2,r0 ;4 > asl_s r0,r0 > add.f 0,r2,r2 > rlc r2,0 > j_s.d [blink] > or_sr0,r0,r2 > > with this patch we now generate an add.f followed by an adc: > > rotr_31: > add.f r0,r0,r0 > j_s.d [blink] > add.cs r0,r0,1 > > > Shifts by constants requiring a loop have been improved for even counts > by performing two operations in each iteration: > > int shl10(int x) { return x >> 10; } > > Previously looked like: > > shl10: mov.f lp_count, 10 > lpnz2f > asr r0,r0 > nop > 2: # end single insn loop > j_s [blink] > > > And now becomes: > > shl10: > mov lp_count,5 > lp 2f > asr r0,r0 > asr r0,r0 > 2: # end single insn loop > j_s [blink] > > > So emulating ARC's SWAP on architectures that don't have it: > > unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); } > > previously required 10 instructions and ~70 cycles: > > rotr_16: > mov_s r2,r0 ;4 > mov.f lp_count, 16 > lpnz2f > add r0,r0,r0 > nop > 2: # end single insn loop > mov.f lp_count, 16 > lpnz2f > lsr r2,r2 > nop > 2: # end single insn loop > j_s.d [blink] > or_sr0,r0,r2 > > now becomes just 4 instructions and ~18 cycles: > > rotr_16: > mov lp_count,8 > lp 2f > ror r0,r0 > ror r0,r0 > 2: # end single insn loop > j_s [blink] > > > This patch has been tested with a cross-compiler to arc-linux hosted > on x86_64-pc-linux-gnu and (partially) tested with the compile-only > portions of the testsuite with no regressions. Ok for mainline, if > your own testing shows no issues? > > > 2023-10-07 Roger Sayle > > gcc/ChangeLog > * config/arc/arc-protos.h (output_shift): Rename to... > (output_shift_loop): Tweak API to take an explicit rtx_code. > (arc_split_ashl): Prototype new function here. > (arc_split_ashr): Likewise. > (arc_split_lshr): Likewise. > (arc_split_rotl): Likewise. > (arc_split_rotr): Likewise. > * config/arc/arc.cc (output_shift): Delete local prototype. Rename. > (output_shift_loop): New function replacing output_shift to output > a zero overheap loop for SImode shifts and rotates on ARC targets > without barrel shifter (i.e. no hardware support for these insns). > (arc_split_ashl): New helper function to split *ashlsi3_nobs. > (arc_split_ashr): New helper function to split *ashrsi3_nobs. > (arc_split_lshr): New helper function to split *lshrsi3_nobs. > (arc_split_rotl): New helper function to split *rotlsi3_nobs. > (arc_split_rotr): New helper function to split *rotrsi3_nobs. > * config/arc/arc.md (any_shift_rotate): New define_code_iterator. > (define_code_attr insn): New code attribute to map to pattern name. > (si3): New expander unifying previous ashlsi3, > ashrsi3 and lshrsi3 define_expands. Adds rotlsi3 and rotrsi3. > (*si3_nobs): New defin
Re: [ARC PATCH] Improved SImode shifts and rotates with -mswap.
Hi Roger, +(define_insn "si2_cnt16" + [(set (match_operand:SI 0 "dest_reg_operand" "=w") Please use "register_operand", and "r" constraint. +(ANY_ROTATE:SI (match_operand:SI 1 "register_operand" "c") Please use "r" constraint instead of "c". + (const_int 16)))] + "TARGET_SWAP" + "swap\\t%0,%1" Otherwise, it looks good to me. Please fix the above and proceed with your commit. Thank you for your contribution, Claudiu
Re: [ARC PATCH] Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.
Hi Roger, You have a block of 8 spaces that needs to be replaced by tabs: gcc/config/arc/arc.cc:5538:0: if (n < 4) Please fix the above, and proceed with your commit. Thank you, Claudiu On Sun, Oct 29, 2023 at 11:16 AM Roger Sayle wrote: > > > This patch overhauls the ARC backend's insn_cost target hook, and makes > some related improvements to rtx_costs, BRANCH_COST, etc. The primary > goal is to allow the backend to indicate that shifts and rotates are > slow (discouraged) when the CPU doesn't have a barrel shifter. I should > also acknowledge Richard Sandiford for inspiring the use of set_cost > in this rewrite of arc_insn_cost; this implementation borrows heavily > for the target hooks for AArch64 and ARM. > > The motivating example is derived from PR rtl-optimization/110717. > > struct S { int a : 5; }; > unsigned int foo (struct S *p) { > return p->a; > } > > With a barrel shifter, GCC -O2 generates the reasonable: > > foo:ldb_s r0,[r0] > asl_s r0,r0,27 > j_s.d [blink] > asr_s r0,r0,27 > > What's interesting is that during combine, the middle-end actually > has two shifts by three bits, and a sign-extension from QI to SI. > > Trying 8, 9 -> 11: > 8: r158:SI=r157:QI#0<<0x3 > REG_DEAD r157:QI > 9: r159:SI=sign_extend(r158:SI#0) > REG_DEAD r158:SI >11: r155:SI=r159:SI>>0x3 > REG_DEAD r159:SI > > Whilst it's reasonable to simplify this to two shifts by 27 bits when > the CPU has a barrel shifter, it's actually a significant pessimization > when these shifts are implemented by loops. This combination can be > prevented if the backend provides accurate-ish estimates for insn_cost. > > > Previously, without a barrel shifter, GCC -O2 -mcpu=em generates: > > foo:ldb_s r0,[r0] > mov lp_count,27 > lp 2f > add r0,r0,r0 > nop > 2: # end single insn loop > mov lp_count,27 > lp 2f > asr r0,r0 > nop > 2: # end single insn loop > j_s [blink] > > which contains two loops and requires about ~113 cycles to execute. > With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates: > > foo:ldb_s r0,[r0] > mov_s r2,0;3 > add3r0,r2,r0 > sexb_s r0,r0 > asr_s r0,r0 > asr_s r0,r0 > j_s.d [blink] > asr_s r0,r0 > > which requires only ~6 cycles, for the shorter shifts by 3 and sign > extension. > > > Tested with a cross-compiler to arc-linux hosted on x86_64, > with no new (compile-only) regressions from make -k check. > Ok for mainline if this passes Claudiu's nightly testing? > > > 2023-10-29 Roger Sayle > > gcc/ChangeLog > * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates. > Provide reasonable values for SHIFTS and ROTATES by constant > bit counts depending upon TARGET_BARREL_SHIFTER. > (arc_insn_cost): Use insn attributes if the instruction is > recognized. Avoid calling get_attr_length for type "multi", > i.e. define_insn_and_split patterns without explicit type. > Fall-back to set_rtx_cost for single_set and pattern_cost > otherwise. > * config/arc/arc.h (COSTS_N_BYTES): Define helper macro. > (BRANCH_COST): Improve/correct definition. > (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior. > > > Thanks again, > Roger > -- >
Re: [ARC PATCH] Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.
Hi Roger, Do you want to say bmsk_s instead of msk_s here: +/* { dg-final { scan-assembler "msk_s\\s+r0,r0,0" } } */ Anyhow, the patch looks good. Proceed with your commit. Thank you, Claudiu On Mon, Oct 30, 2023 at 5:05 AM Jeff Law wrote: > > > > On 10/28/23 10:47, Roger Sayle wrote: > > > > This patch optimizes PR middle-end/101955 for the ARC backend. On ARC > > CPUs with a barrel shifter, using two shifts is (probably) optimal as: > > > > asl_s r0,r0,31 > > asr_s r0,r0,31 > > > > but without a barrel shifter, GCC -O2 -mcpu=em currently generates: > > > > and r2,r0,1 > > ror r2,r2 > > add.f 0,r2,r2 > > sbc r0,r0,r0 > > > > with this patch, we now generate the smaller, faster and non-flags > > clobbering: > > > > bmsk_s r0,r0,0 > > neg_s r0,r0 > > > > Tested with a cross-compiler to arc-linux hosted on x86_64, > > with no new (compile-only) regressions from make -k check. > > Ok for mainline if this passes Claudiu's nightly testing? > > > > > > 2023-10-28 Roger Sayle > > > > gcc/ChangeLog > > PR middle-end/101955 > > * config/arc/arc.md (*extvsi_1_0): New define_insn_and_split > > to convert sign extract of the least significant bit into an > > AND $1 then a NEG when !TARGET_BARREL_SHIFTER. > > > > gcc/testsuite/ChangeLog > > PR middle-end/101955 > > * gcc.target/arc/pr101955.c: New test case. > Good catch. Looking to do something very similar on the H8 based on > your work here. > > One the H8 we can use bld to load a bit from an 8 bit register into the > C flag. Then we use subtract with carry to get an 8 bit 0/-1 which we > can then sign extend to 16 or 32 bits. That covers bit positions 0..15 > of an SImode input. > > For bits 16..31 we can move the high half into the low half, the use the > bld sequence. > > For bit zero the and+neg is the same number of clocks and size as bld > based sequence. But it'll simulate faster, so it's special cased. > > > Jeff >
Re: [ARC PATCH] Improve DImode left shift by a single bit.
Missed this one. Ok, please proceed with the commit. Thank you for your contribution, Claudiu On Sat, Oct 28, 2023 at 4:05 PM Roger Sayle wrote: > > > This patch improves the code generated for X << 1 (and for X + X) when > X is 64-bit DImode, using the same two instruction code sequence used > for DImode addition. > > For the test case: > > long long foo(long long x) { return x << 1; } > > GCC -O2 currently generates the following code: > > foo:lsr r2,r0,31 > asl_s r1,r1,1 > asl_s r0,r0,1 > j_s.d [blink] > or_sr1,r1,r2 > > and on CPU without a barrel shifter, i.e. -mcpu=em > > foo:add.f 0,r0,r0 > asl_s r1,r1 > rlc r2,0 > asl_s r0,r0 > j_s.d [blink] > or_sr1,r1,r2 > > with this patch (both with and without a barrel shifter): > > foo:add.f r0,r0,r0 > j_s.d [blink] > adc r1,r1,r1 > > [For Jeff Law's benefit a similar optimization is also applicable to > H8300H, that could also use a two instruction sequence (plus rts) but > currently GCC generates 16 instructions (plus an rts) for foo above.] > > Tested with a cross-compiler to arc-linux hosted on x86_64, > with no new (compile-only) regressions from make -k check. > Ok for mainline if this passes Claudiu's nightly testing? > > 2023-10-28 Roger Sayle > > gcc/ChangeLog > * config/arc/arc.md (addsi3): Fix GNU-style code formatting. > (adddi3): Change define_expand to generate an *adddi3. > (*adddi3): New define_insn_and_split to lower DImode additions > during the split1 pass (after combine and before reload). > (ashldi3): New define_expand to (only) generate *ashldi3_cnt1 > for DImode left shifts by a single bit. > (*ashldi3_cnt1): New define_insn_and_split to lower DImode > left shifts by one bit to an *adddi3. > > gcc/testsuite/ChangeLog > * gcc.target/arc/adddi3-1.c: New test case. > * gcc.target/arc/ashldi3-1.c: Likewise. > > > Thanks in advance, > Roger > -- >
Re: [PATCH] [ARC] Use hardware support for double-precision compare instructions.
It is already ported :) https://github.com/gcc-mirror/gcc/commit/555e4a053951a0ae24835a266e71819336d7f637#diff-5b8bd26eec6c2b9f560870c205416edc Cheers, Claudiu On Wed, Jan 15, 2020 at 1:49 AM Vineet Gupta wrote: > > On 12/9/19 1:52 AM, Claudiu Zissulescu wrote: > > Although the FDCMP (the double precision floating point compare > > instruction) is added to the compiler, it is not properly used via cstoredi > > pattern. Fix it. > > > > OK to apply? > > Claudidu > > > > -xx-xx Claudiu Zissulescu > > > > * config/arc/arc.md (iterator SDF): Check TARGET_FP_DP_BASE. > > (cstoredi4): Use TARGET_HARD_FLOAT. > > --- > > gcc/config/arc/arc.md | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > > index b592f25afce..bd44030b409 100644 > > --- a/gcc/config/arc/arc.md > > +++ b/gcc/config/arc/arc.md > > @@ -3749,7 +3749,7 @@ archs4x, archs4xd" > > }) > > > > (define_mode_iterator SDF [(SF "TARGET_FP_SP_BASE || TARGET_OPTFPE") > > -(DF "TARGET_OPTFPE")]) > > +(DF "TARGET_FP_DP_BASE || TARGET_OPTFPE")]) > > > > (define_expand "cstore4" > >[(set (reg:CC CC_REG) > > @@ -3759,7 +3759,7 @@ archs4x, archs4xd" > > (match_operator:SI 1 "comparison_operator" [(reg CC_REG) > > (const_int 0)]))] > > > > - "TARGET_FP_SP_BASE || TARGET_OPTFPE" > > + "TARGET_HARD_FLOAT || TARGET_OPTFPE" > > { > >gcc_assert (XEXP (operands[1], 0) == operands[2]); > >gcc_assert (XEXP (operands[1], 1) == operands[3]); > > Can this be backported to gcc-9 please ? > glibc testing uses gcc-9 > > Thx, > -Vineet
Re: [PATCH 3/4] [ARC] Save mlo/mhi registers when ISR.
Yes, I know :( Thank you for your help. All four patches pushed. Claudiu On Wed, Jan 22, 2020 at 10:31 PM Jeff Law wrote: > > On Wed, 2020-01-22 at 10:14 +0200, Claudiu Zissulescu wrote: > > ARC600 when configured with mul64 instructions uses mlo and mhi > > registers to store the 64 result of the multiplication. In the ARC600 > > ISA documentation we have the next register configuration when ARC600 > > is configured only with mul64 extension: > > > > Register | Name | Use > > -+--+ > > r57 | mlo | Multiply low 32 bits, read only > > r58 | mmid | Multiply middle 32 bits, read only > > r59 | mhi | Multiply high 32 bits, read only > > - > > > > When used for Co-existence configurations we have for mul64 the next > > registers used: > > > > Register | Name | Use > > -+--+ > > r58 | mlo | Multiply low 32 bits, read only > > r59 | mhi | Multiply high 32 bits, read only > > - > > > > Note that mlo/mhi assignment doesn't swap when bigendian CPU > > configuration is used. > > > > The compiler will always use r58 for mlo, regardless of the > > configuration choosen to ensure mlo/mhi correct splitting. Fixing mlo > > to the right register number is done at assembly time. The dwarf info > > is also notified via DBX_... macro. Both mlo/mhi registers needs to > > saved when ISR happens using a custom sequence. > > > > gcc/ > > -xx-xx Claudiu Zissulescu > > > > * config/arc/arc-protos.h (gen_mlo): Remove. > > (gen_mhi): Likewise. > > * config/arc/arc.c (AUX_MULHI): Define. > > (arc_must_save_reister): Special handling for r58/59. > > (arc_compute_frame_size): Consider mlo/mhi registers. > > (arc_save_callee_saves): Emit fp/sp move only when emit_move > > paramter is true. > > (arc_conditional_register_usage): Remove TARGET_BIG_ENDIAN from > > mlo/mhi name selection. > > (arc_restore_callee_saves): Don't early restore blink when ISR. > > (arc_expand_prologue): Add mlo/mhi saving. > > (arc_expand_epilogue): Add mlo/mhi restoring. > > (gen_mlo): Remove. > > (gen_mhi): Remove. > > * config/arc/arc.h (DBX_REGISTER_NUMBER): Correct register > > numbering when MUL64 option is used. > > (DWARF2_FRAME_REG_OUT): Define. > > * config/arc/arc.md (arc600_stall): New pattern. > > (VUNSPEC_ARC_ARC600_STALL): Define. > > (mulsi64): Use correct mlo/mhi registers. > > (mulsi_600): Clean it up. > > * config/arc/predicates.md (mlo_operand): Remove any dependency on > > TARGET_BIG_ENDIAN. > > (mhi_operand): Likewise. > > > > testsuite/ > > -xx-xx Claudiu Zissulescu > > * gcc.target/arc/code-density-flag.c: Update test. > > * gcc.target/arc/interrupt-6.c: Likewise. > Ugh. But OK. > > jeff > > >
Re: [PATCH 2/4] [ARC] Use TARGET_INSN_COST.
> My only worry would be asking for the length early in the RTL pipeline > may not be as accurate, but it's supposed to work, so if you're > comfortable with the end results, then OK. > Indeed, the length is not accurate, but the results seem slightly better than using COST_RTX. Using INSN_COSTS seems to me a better manageable way in controlling what combiner does. Anyhow, for ARC the instruction size is accurate quite late in the compilation process as it needs register and immediate value info :( Thank you for you review, Claudiu
Re: [PATCH] [ARC] Use hardware support for double-precision compare instructions.
Thank you for your review. Patch pushed to mainline and gcc9 branch. //Claudiu On Wed, Dec 11, 2019 at 8:59 PM Jeff Law wrote: > > On Mon, 2019-12-09 at 11:52 +0200, Claudiu Zissulescu wrote: > > Although the FDCMP (the double precision floating point compare > > instruction) is added to the compiler, it is not properly used via > > cstoredi pattern. Fix it. > > > > OK to apply? > > Claudidu > > > > -xx-xx Claudiu Zissulescu > > > > * config/arc/arc.md (iterator SDF): Check TARGET_FP_DP_BASE. > > (cstoredi4): Use TARGET_HARD_FLOAT. > OK > jeff >
Re: [PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float comparisons
Pushed. Thank you for your contribution, Claudiu On Wed, Dec 11, 2019 at 12:47 AM Vineet Gupta wrote: > > On 12/10/19 1:12 AM, Claudiu Zissulescu wrote: > > Hi, > > > > Thank you for your contribution, I'll push it asap. As far as I understand, > > you need this patch both in gcc9 branch and mainline. > > > > Cheers, > > Claudiu > > Indeed both mainline and gcc9 > > Thx > -Vineet > > > > >> -Original Message- > >> From: Vineet Gupta [mailto:vgu...@synopsys.com] > >> Sent: Monday, December 09, 2019 8:02 PM > >> To: gcc-patches@gcc.gnu.org > >> Cc: Claudiu Zissulescu ; > >> andrew.burg...@embecosm.com; linux-snps-...@lists.infradead.org; > >> Vineet Gupta > >> Subject: [PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float > >> comparisons > >> > >> ARC gcc generates FDCMP instructions which raises Invalid operation for > >> signaling NaN only. This causes glibc iseqsig() primitives to fail (in > >> the current ongoing glibc port to ARC) > >> > >> So split up the hard float compares into two categories and for unordered > >> compares generate the FDCMPF instruction (vs. FDCMP) which raises > >> exception > >> for either NaNs. > >> > >> With this fix testsuite/gcc.dg/torture/pr52451.c passes for ARC. > >> > >> Also passes 6 additional tests in glibc testsuite (test*iseqsig) and no > >> regressions > >> > >> gcc/ > >> -xx-xx Vineet Gupta > >> > >> * config/arc/arc-modes.def (CC_FPUE): New Mode CC_FPUE which > >> helps codegen generate exceptions even for quiet NaN. > >> * config/arc/arc.c (arc_init_reg_tables): Handle New CC_FPUE mode. > >> (get_arc_condition_code): Likewise. > >> (arc_select_cc_mode): LT, LE, GT, GE to use the New CC_FPUE > >> mode. > >> * config/arc/arc.h (REVERSE_CONDITION): Handle New CC_FPUE > >> mode. > >> * config/arc/predicates.md (proper_comparison_operator): > >> Likewise. > >> * config/arc/fpu.md (cmpsf_fpu_trap): New Pattern for CC_FPUE. > >> (cmpdf_fpu_trap): Likewise. > >> > >> Signed-off-by: Vineet Gupta > >> --- > >> gcc/config/arc/arc-modes.def | 1 + > >> gcc/config/arc/arc.c | 8 ++-- > >> gcc/config/arc/arc.h | 2 +- > >> gcc/config/arc/fpu.md| 24 > >> gcc/config/arc/predicates.md | 1 + > >> 5 files changed, 33 insertions(+), 3 deletions(-) > >> > >> diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def > >> index 36a2f4abfb25..d16b6a289a15 100644 > >> --- a/gcc/config/arc/arc-modes.def > >> +++ b/gcc/config/arc/arc-modes.def > >> @@ -38,4 +38,5 @@ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI > >> */ > >> > >> /* FPU condition flags. */ > >> CC_MODE (CC_FPU); > >> +CC_MODE (CC_FPUE); > >> CC_MODE (CC_FPU_UNEQ); > >> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c > >> index 28305f459dcd..cbb95d6e9043 100644 > >> --- a/gcc/config/arc/arc.c > >> +++ b/gcc/config/arc/arc.c > >> @@ -1564,6 +1564,7 @@ get_arc_condition_code (rtx comparison) > >> default : gcc_unreachable (); > >> } > >> case E_CC_FPUmode: > >> +case E_CC_FPUEmode: > >>switch (GET_CODE (comparison)) > >> { > >> case EQ: return ARC_CC_EQ; > >> @@ -1686,11 +1687,13 @@ arc_select_cc_mode (enum rtx_code op, rtx x, > >> rtx y) > >>case UNLE: > >>case UNGT: > >>case UNGE: > >> +return CC_FPUmode; > >> + > >>case LT: > >>case LE: > >>case GT: > >>case GE: > >> -return CC_FPUmode; > >> +return CC_FPUEmode; > >> > >>case LTGT: > >>case UNEQ: > >> @@ -1844,7 +1847,7 @@ arc_init_reg_tables (void) > >>if (i == (int) CCmode || i == (int) CC_ZNmode || i == (int) CC_Zmode > >>|| i == (int) CC_Cmode > >>|| i == CC_FP_GTmode || i == CC_FP_GEmode || i == > >> CC_FP_ORDmode > >> - || i == CC_FPUmode || i == CC_FPU_UNEQmode) > >> + || i == CC_FPUmode || i == CC_FPUEmode || i == > >> CC_FPU_UNEQmode) > >> arc_mode_class[i] = 1 << (int) C_MODE; > >>else > >> arc_mode_class[i] = 0; > >> @@ -8401,6 +8404,7 @@ arc_reorg (void) > >> > >>/* Avoid FPU instructions. */ > >>if ((GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUmode) > >> + || (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUEmode) > >>|| (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == > >> CC_FPU_UNEQmode)) > >> continue; > >> > >> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h > >> index 4d7ac3281b41..c08ca3d0d432 100644 > >> --- a/gcc/config/arc/arc.h > >> +++ b/gcc/config/arc/arc.h > >> @@ -1531,7 +1531,7 @@ enum arc_function_type { > >>(((MODE) == CC_FP_GTmode || (MODE) == CC_FP_GEmode \ > >> || (MODE) == CC_FP_UNEQmode || (MODE) == CC_FP_ORDmode \ > >> || (MODE) == CC_FPXmode || (MODE) == CC_FPU_UNEQmode \ > >> -|| (MODE) == CC_FPUmode) \ > >> +|| (MODE) == C
Re: [committed] arc: Remove mlra option [PR113954]
I'll include your comment in my second patch where I clean some patterns used by reload. Thank you, claudiu On Mon, Sep 23, 2024 at 5:05 PM Andreas Schwab wrote: > > On Sep 23 2024, Claudiu Zissulescu wrote: > > > diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc > > index c800226b179..a225adeff57 100644 > > --- a/gcc/config/arc/arc.cc > > +++ b/gcc/config/arc/arc.cc > > @@ -721,7 +721,7 @@ static rtx arc_legitimize_address_0 (rtx, rtx, > > machine_mode mode); > >arc_no_speculation_in_delay_slots_p > > > > #undef TARGET_LRA_P > > -#define TARGET_LRA_P arc_lra_p > > +#define TARGET_LRA_P hook_bool_void_true > > This is the default for lra_p, so you can remove the override. > > -- > Andreas Schwab, SUSE Labs, sch...@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different."
Re: [PATCH] arc: testsuite: Scan "rlc" instead of "mov.hs".
LGTM. I'll merge it once stage one is open. Cheers, Claudiu On Tue, Mar 18, 2025 at 6:23 PM Luis Silva wrote: > > Due to the patch by Roger Sayle, > 09881218137f4af9b7c894c2d350cf2ff8e0ee23, which > introduces the use of the `rlc rX,0` instruction in place > of the `mov.hs`, the add overflow test case needs to be > updated. The previous test case was validating the `mov.hs` > instruction, but now it must validate the `rlc` instruction > as the new behavior. > > gcc/testsuite/ChangeLog: > > * gcc.target/arc/overflow-1.c: Replace mov.hs with rlc. > > Signed-off-by: Luis Silva > --- > gcc/testsuite/gcc.target/arc/overflow-1.c | 8 +++- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/arc/overflow-1.c > b/gcc/testsuite/gcc.target/arc/overflow-1.c > index 01b3e8ad0fa..694c25cfe66 100644 > --- a/gcc/testsuite/gcc.target/arc/overflow-1.c > +++ b/gcc/testsuite/gcc.target/arc/overflow-1.c > @@ -31,9 +31,8 @@ bool addi_overflow (int32_t a, int32_t *res) > /* > * add.f r0,r0,r1 > * st_s r0,[r2] > - * mov_s r0,1 > * j_s.d [blink] > - * mov.hs r0,0 > + * rlcr0,0 > */ > bool uadd_overflow (uint32_t a, uint32_t b, uint32_t *res) > { > @@ -75,9 +74,8 @@ bool addi_overflow_p (int32_t a, int32_t res) > > /* > * add.f 0,r0,r1 > - * mov_s r0,1 > * j_s.d [blink] > - * mov.hs r0,0 > + * rlc r0,0 > */ > bool uadd_overflow_p (uint32_t a, uint32_t b, uint32_t res) > { > @@ -95,6 +93,6 @@ bool uaddi_overflow_p (uint32_t a, uint32_t res) > > /* { dg-final { scan-assembler-times "add.f\\s\+" 7 } } */ > /* { dg-final { scan-assembler-times "mov\.nv\\s\+" 4 } } */ > -/* { dg-final { scan-assembler-times "mov\.hs\\s\+" 2 } } */ > +/* { dg-final { scan-assembler-times "rlc\\s\+" 2 } } */ > /* { dg-final { scan-assembler-times "seths\\s\+" 2 } } */ > /* { dg-final { scan-assembler-not "cmp" } } */ > -- > 2.37.1 >
Re: [PATCH 1/2] arc: Add commutative multiplication patterns.
LGTM, I'll merge it once stage 1 is open. Cheers, Claudiu On Tue, Mar 18, 2025 at 6:22 PM Luis Silva wrote: > > This patch introduces two new instruction patterns: > > `*mulsi3_cmp0`: This pattern performs a multiplication > and sets the CC_Z register based on the result, while > also storing the result of the multiplication in a > general-purpose register. > > `*mulsi3_cmp0_noout`: This pattern performs a > multiplication and sets the CC_Z register based on the > result without storing the result in a general-purpose > register. > > These patterns are optimized to generate code using the `mpy.f` > instruction, specifically used where the result is compared to zero. > > In addition, the previous commutative multiplication implementation > was removed. It incorrectly took into account the negative flag, > which is wrong. This new implementation only considers the zero > flag. > > A test case has been added to verify the correctness of these > changes. > > gcc/ChangeLog: > > * config/arc/arc.cc (arc_select_cc_mode): Handle multiplication > results compared against zero, selecting CC_Zmode. > * config/arc/arc.md (*mulsi3_cmp0): New define_insn. > (*mulsi3_cmp0_noout): New define_insn. > > gcc/testsuite/ChangeLog: > > * gcc.target/arc/mult-cmp0.c: New test. > > Signed-off-by: Luis Silva > --- > gcc/config/arc/arc.cc| 7 +++ > gcc/config/arc/arc.md| 34 ++-- > gcc/testsuite/gcc.target/arc/mult-cmp0.c | 66 > 3 files changed, 103 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/arc/mult-cmp0.c > > diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc > index e3d53576768..8ad5649adc0 100644 > --- a/gcc/config/arc/arc.cc > +++ b/gcc/config/arc/arc.cc > @@ -1555,6 +1555,13 @@ arc_select_cc_mode (enum rtx_code op, rtx x, rtx y) >machine_mode mode = GET_MODE (x); >rtx x1; > > + /* Matches all instructions which can do .f and clobbers only Z flag. */ > + if (GET_MODE_CLASS (mode) == MODE_INT > + && y == const0_rtx > + && GET_CODE (x) == MULT > + && (op == EQ || op == NE)) > +return CC_Zmode; > + >/* For an operation that sets the condition codes as a side-effect, the > C and V flags is not set as for cmp, so we can only use comparisons > where > this doesn't matter. (For LT and GE we can use "mi" and "pl" > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > index 49dfc9d35af..bc2e8fadd91 100644 > --- a/gcc/config/arc/arc.md > +++ b/gcc/config/arc/arc.md > @@ -253,7 +253,7 @@ > simd_vcompare, simd_vpermute, simd_vpack, simd_vpack_with_acc, > simd_valign, simd_valign_with_acc, simd_vcontrol, > simd_vspecial_3cycle, simd_vspecial_4cycle, simd_dma, mul16_em, div_rem, > - fpu, fpu_fuse, fpu_sdiv, fpu_ddiv, fpu_cvt, block" > + fpu, fpu_fuse, fpu_sdiv, fpu_ddiv, fpu_cvt, block, mpy" >(cond [(eq_attr "is_sfunc" "yes") > (cond [(match_test "!TARGET_LONG_CALLS_SET && (!TARGET_MEDIUM_CALLS > || GET_CODE (PATTERN (insn)) != COND_EXEC)") (const_string "call") > (match_test "flag_pic") (const_string "sfunc")] > @@ -1068,11 +1068,37 @@ archs4x, archs4xd" > (set_attr "cond" "set_zn") > (set_attr "length" "*,4,4,4,8")]) > > -;; The next two patterns are for plos, ior, xor, and, and mult. > +(define_insn "*mulsi3_cmp0" > + [(set (reg:CC_Z CC_REG) > + (compare:CC_Z > +(mult:SI > + (match_operand:SI 1 "register_operand" "%r,0,r") > + (match_operand:SI 2 "nonmemory_operand" "rL,I,i")) > +(const_int 0))) > + (set (match_operand:SI 0 "register_operand""=r,r,r") > + (mult:SI (match_dup 1) (match_dup 2)))] > + "TARGET_MPY" > + "mpy%?.f\\t%0,%1,%2" > + [(set_attr "length" "4,4,8") > + (set_attr "type" "mpy")]) > + > +(define_insn "*mulsi3_cmp0_noout" > + [(set (reg:CC_Z CC_REG) > + (compare:CC_Z > +(mult:SI > + (match_operand:SI 0 "register_operand" "%r,r,r") > + (match_operand:SI 1 "nonmemory_operand" "rL,I,i")) > +(const_int 0)))] > + "TARGET_MPY" > + "mpy%?.f\\t0,%0,%1" > + [(set_attr "length" "4,4,8") > + (set_attr "type" "mpy")]) > + > +;; The next two patterns are for plus, ior, xor, and. > (define_insn "*commutative_binary_cmp0_noout" >[(set (match_operand 0 "cc_set_register" "") > (match_operator 4 "zn_compare_operator" > - [(match_operator:SI 3 "commutative_operator" > + [(match_operator:SI 3 "commutative_operator_sans_mult" > [(match_operand:SI 1 "register_operand" "%r,r") > (match_operand:SI 2 "nonmemory_operand" "rL,Cal")]) >(const_int 0)]))] > @@ -1085,7 +,7 @@ archs4x, archs4xd" > (define_insn "*commutative_binary_cmp0" >[(set (match_operand 3 "cc_set_register" "") > (match_operator 5 "zn_compare_operator" > - [(match_operator:SI 4 "commutative_opera
Re: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow ()
LGTM, Cheers, Claudiu On Tue, Mar 18, 2025 at 6:23 PM Luis Silva wrote: > > This patch handles both signed and unsigned > builtin multiplication overflow. > > Uses the "mpy.f" instruction to set the condition > codes based on the result. In the event of an > overflow, the V flag is set, triggering a > conditional move depending on the V flag status. > > For example, set "1" to "r0" in case of overflow: > > mov_s r0,1 > mpy.f r0,r0,r1 > j_s.d [blink] > mov.nv r0,0 > > gcc/ChangeLog: > > * config/arc/arc.md (mulvsi4): New define_expand. > (mulsi3_Vcmp): New define_insn. > > Signed-off-by: Luis Silva > --- > gcc/config/arc/arc.md | 33 + > 1 file changed, 33 insertions(+) > > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > index bc2e8fadd91..dd245d1813c 100644 > --- a/gcc/config/arc/arc.md > +++ b/gcc/config/arc/arc.md > @@ -842,6 +842,9 @@ archs4x, archs4xd" > ; Optab prefix for sign/zero-extending operations > (define_code_attr su_optab [(sign_extend "") (zero_extend "u")]) > > +;; Code iterator for sign/zero extension > +(define_code_iterator ANY_EXTEND [sign_extend zero_extend]) > + > (define_insn "*xt_cmp0_noout" >[(set (match_operand 0 "cc_set_register" "") > (compare:CC_ZN (SEZ:SI (match_operand:SQH 1 "register_operand" "r")) > @@ -1068,6 +1071,36 @@ archs4x, archs4xd" > (set_attr "cond" "set_zn") > (set_attr "length" "*,4,4,4,8")]) > > +(define_expand "mulvsi4" > + [(ANY_EXTEND:DI (match_operand:SI 0 "register_operand")) > + (ANY_EXTEND:DI (match_operand:SI 1 "register_operand")) > + (ANY_EXTEND:DI (match_operand:SI 2 "register_operand")) > + (label_ref (match_operand 3 "" ""))] > + "TARGET_MPY" > + { > +emit_insn (gen_mulsi3_Vcmp (operands[0], operands[1], > + operands[2])); > +arc_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]); > +DONE; > + }) > + > +(define_insn "mulsi3_Vcmp" > + [(parallel > +[(set > + (reg:CC_V CC_REG) > + (compare:CC_V > + (mult:DI > + (ANY_EXTEND:DI (match_operand:SI 1 "register_operand" "%0,r,r,r")) > + (ANY_EXTEND:DI (match_operand:SI 2 "nonmemory_operand" "I,L,r,C32"))) > + (ANY_EXTEND:DI (mult:SI (match_dup 1) (match_dup 2) > + (set (match_operand:SI 0 "register_operand" "=r,r,r,r") > + (mult:SI (match_dup 1) (match_dup 2)))])] > + "register_operand (operands[1], SImode) > + || register_operand (operands[2], SImode)" > + "mpy.f\\t%0,%1,%2" > + [(set_attr "length" "4,4,4,8") > + (set_attr "type" "mpy")]) > + > (define_insn "*mulsi3_cmp0" >[(set (reg:CC_Z CC_REG) > (compare:CC_Z > -- > 2.37.1 >
Re: [PATCH] arc: testsuite: Scan "rlc" instead of "mov.hs".
Hi Jeff, There is one patch missing, I'll add it to mainline as soon as the main is open for commits. Best, Claudiu On Fri, Apr 18, 2025 at 12:10 AM Jeff Law wrote: > > > > On 3/18/25 10:23 AM, Luis Silva wrote: > > Due to the patch by Roger Sayle, > > 09881218137f4af9b7c894c2d350cf2ff8e0ee23, which > > introduces the use of the `rlc rX,0` instruction in place > > of the `mov.hs`, the add overflow test case needs to be > > updated. The previous test case was validating the `mov.hs` > > instruction, but now it must validate the `rlc` instruction > > as the new behavior. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/arc/overflow-1.c: Replace mov.hs with rlc. > I don't see any test named "overflow-1.c" in the arc subdirectory?!? > > Is it possible that's a change in your local repo? > > jeff
Re: [PATCH 1/2] arc: Add commutative multiplication patterns.
Hi Jeff, Indeed, Luis should have been using "umulti". The other attributes are not required. I'll fix it before pushing to the mainline. Thanks, Claudiu On Fri, Apr 18, 2025 at 8:41 PM Jeff Law wrote: > > > > On 3/18/25 10:22 AM, Luis Silva wrote: > > This patch introduces two new instruction patterns: > > > > `*mulsi3_cmp0`: This pattern performs a multiplication > > and sets the CC_Z register based on the result, while > > also storing the result of the multiplication in a > > general-purpose register. > > > > `*mulsi3_cmp0_noout`: This pattern performs a > > multiplication and sets the CC_Z register based on the > > result without storing the result in a general-purpose > > register. > > > > These patterns are optimized to generate code using the `mpy.f` > > instruction, specifically used where the result is compared to zero. > > > > In addition, the previous commutative multiplication implementation > > was removed. It incorrectly took into account the negative flag, > > which is wrong. This new implementation only considers the zero > > flag. > > > > A test case has been added to verify the correctness of these > > changes. > > > > gcc/ChangeLog: > > > > * config/arc/arc.cc (arc_select_cc_mode): Handle multiplication > > results compared against zero, selecting CC_Zmode. > > * config/arc/arc.md (*mulsi3_cmp0): New define_insn. > > (*mulsi3_cmp0_noout): New define_insn. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/arc/mult-cmp0.c: New test. > So I'm not well versed in the ARC port, but a couple questions. > > First your new patterns use a new type "mpy". Do you want/need to add > that to the pipeline descriptions? It would seem advisable to do so. > > Do the new patterns need to set "cond" and "predicable" attributes? > > Jeff
Fwd: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow ()
Adding missing email addresses. -- Forwarded message - From: Claudiu Zissulescu Ianculescu Date: Thu, Apr 24, 2025 at 8:48 PM Subject: Re: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow () To: Jeff Law Hi Jeff, The other attributes are not required as the pattern doesn't allow it to be used in a predicated execution. Thus, the default values for the missing predicates are ok. Best, Claudiu On Fri, Apr 18, 2025 at 8:43 PM Jeff Law wrote: > > > > On 3/18/25 10:23 AM, Luis Silva wrote: > > This patch handles both signed and unsigned > > builtin multiplication overflow. > > > > Uses the "mpy.f" instruction to set the condition > > codes based on the result. In the event of an > > overflow, the V flag is set, triggering a > > conditional move depending on the V flag status. > > > > For example, set "1" to "r0" in case of overflow: > > > > mov_s r0,1 > > mpy.f r0,r0,r1 > > j_s.d [blink] > > mov.nv r0,0 > > > > gcc/ChangeLog: > > > > * config/arc/arc.md (mulvsi4): New define_expand. > > (mulsi3_Vcmp): New define_insn. > So similar to your other patch, there are other attributes (cond and > predicable) that you may need to set. I just don't know the port well > enough to judge that. > > jeff >
Re: [PATCH v3 2/9] opts: use uint64_t for sanitizer flags
Hi, > > Currently, the data type of sanitizer flags is unsigned int, with > SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual > enumerator for enum sanitize_code. Use 'uint64_t' data type to allow > for more distinct instrumentation modes be added when needed. > > > > I have not looked yet but does it make sense to use `unsigned > HOST_WIDE_INT` instead of uint64_t? HWI should be the same as uint64_t > but it is more consistent with the rest of gcc. > Plus since tree_to_uhwi is more consistent there. > That was in the v2, however, the reviewers suggested to use uint64_t. Best wishes, Claudiu
[PATCH v3 9/9] aarch64: Add memtag-stack tests
From: Indu Bhagat Add basic tests for memtag-stack sanitizer. Memtag stack sanitizer uses target hooks to emit AArch64 specific MTE instructions. gcc/testsuite: * lib/target-supports.exp: * gcc.target/aarch64/memtag/alloca-1.c: New test. * gcc.target/aarch64/memtag/alloca-3.c: New test. * gcc.target/aarch64/memtag/arguments-1.c: New test. * gcc.target/aarch64/memtag/arguments-2.c: New test. * gcc.target/aarch64/memtag/arguments-3.c: New test. * gcc.target/aarch64/memtag/arguments-4.c: New test. * gcc.target/aarch64/memtag/arguments.c: New test. * gcc.target/aarch64/memtag/basic-1.c: New test. * gcc.target/aarch64/memtag/basic-3.c: New test. * gcc.target/aarch64/memtag/basic-struct.c: New test. * gcc.target/aarch64/memtag/large-array.c: New test. * gcc.target/aarch64/memtag/local-no-escape.c: New test. * gcc.target/aarch64/memtag/memtag.exp: New file. * gcc.target/aarch64/memtag/no-sanitize-attribute.c: New test. * gcc.target/aarch64/memtag/value-init.c: New test. * gcc.target/aarch64/memtag/vararray-gimple.c: New test. * gcc.target/aarch64/memtag/vararray.c: New test. * gcc.target/aarch64/memtag/zero-init.c: New test. * gcc.target/aarch64/memtag/texec-1.c: New test. * gcc.target/aarch64/memtag/texec-2.c: New test. * gcc.target/aarch64/memtag/vla-1.c: New test. * gcc.target/aarch64/memtag/vla-2.c: New test. * testsuite/lib/target-supports.exp (check_effective_target_aarch64_mte): New funcction. Co-authored-by: Indu Bhagat Signed-off-by: Claudiu Zissulescu --- .../gcc.target/aarch64/memtag/alloca-1.c | 14 .../gcc.target/aarch64/memtag/alloca-3.c | 27 .../gcc.target/aarch64/memtag/arguments-1.c | 3 + .../gcc.target/aarch64/memtag/arguments-2.c | 3 + .../gcc.target/aarch64/memtag/arguments-3.c | 3 + .../gcc.target/aarch64/memtag/arguments-4.c | 16 + .../gcc.target/aarch64/memtag/arguments.c | 3 + .../gcc.target/aarch64/memtag/basic-1.c | 15 + .../gcc.target/aarch64/memtag/basic-3.c | 21 ++ .../gcc.target/aarch64/memtag/basic-struct.c | 22 +++ .../aarch64/memtag/cfi-mte-memtag-frame-1.c | 11 .../gcc.target/aarch64/memtag/large-array.c | 24 +++ .../aarch64/memtag/local-no-escape.c | 20 ++ .../gcc.target/aarch64/memtag/memtag.exp | 64 +++ .../gcc.target/aarch64/memtag/mte-sig.h | 15 + .../aarch64/memtag/no-sanitize-attribute.c| 17 + .../gcc.target/aarch64/memtag/texec-1.c | 27 .../gcc.target/aarch64/memtag/texec-2.c | 22 +++ .../gcc.target/aarch64/memtag/value-init.c| 14 .../aarch64/memtag/vararray-gimple.c | 17 + .../gcc.target/aarch64/memtag/vararray.c | 14 .../gcc.target/aarch64/memtag/vla-1.c | 39 +++ .../gcc.target/aarch64/memtag/vla-2.c | 48 ++ .../gcc.target/aarch64/memtag/zero-init.c | 14 gcc/testsuite/lib/target-supports.exp | 12 25 files changed, 485 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/alloca-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/alloca-3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-struct.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/cfi-mte-memtag-frame-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/large-array.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/local-no-escape.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/memtag.exp create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/mte-sig.h create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/no-sanitize-attribute.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/texec-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/texec-2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/value-init.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vararray-gimple.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vararray.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vla-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vla-2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/zero-init.c diff --git a/gcc/testsuite/gcc.target/aa
[PATCH v3 2/9] opts: use uint64_t for sanitizer flags
From: Indu Bhagat Currently, the data type of sanitizer flags is unsigned int, with SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual enumerator for enum sanitize_code. Use 'uint64_t' data type to allow for more distinct instrumentation modes be added when needed. gcc/ChangeLog: * asan.h (sanitize_flags_p): Use 'uint64_t' instead of 'unsigned int'. * common.opt: Likewise. * dwarf2asm.cc (dw2_output_indirect_constant_1): Likewise. * opts.cc (find_sanitizer_argument): Likewise. (report_conflicting_sanitizer_options): Likewise. (parse_sanitizer_options): Likewise. (parse_no_sanitize_attribute): Likewise. * opts.h (parse_sanitizer_options): Likewise. (parse_no_sanitize_attribute): Likewise. * tree-cfg.cc (print_no_sanitize_attr_value): Likewise. gcc/c-family/ChangeLog: * c-attribs.cc (add_no_sanitize_value): Likewise. (handle_no_sanitize_attribute): Likewise. (handle_no_sanitize_address_attribute): Likewise. (handle_no_sanitize_thread_attribute): Likewise. (handle_no_address_safety_analysis_attribute): Likewise. * c-common.h (add_no_sanitize_value): Likewise. gcc/c/ChangeLog: * c-parser.cc (c_parser_declaration_or_fndef): Likewise. gcc/cp/ChangeLog: * typeck.cc (get_member_function_from_ptrfunc): Likewise. gcc/d/ChangeLog: * d-attribs.cc (d_handle_no_sanitize_attribute): Likewise. Signed-off-by: Claudiu Zissulescu --- gcc/asan.h| 5 +++-- gcc/c-family/c-attribs.cc | 16 gcc/c-family/c-common.h | 2 +- gcc/c/c-parser.cc | 4 ++-- gcc/common.opt| 6 +++--- gcc/cp/typeck.cc | 2 +- gcc/d/d-attribs.cc| 8 gcc/dwarf2asm.cc | 2 +- gcc/opts.cc | 25 + gcc/opts.h| 8 gcc/tree-cfg.cc | 2 +- 11 files changed, 41 insertions(+), 39 deletions(-) diff --git a/gcc/asan.h b/gcc/asan.h index 064d4f24823..d4443de4620 100644 --- a/gcc/asan.h +++ b/gcc/asan.h @@ -242,9 +242,10 @@ asan_protect_stack_decl (tree decl) remove all flags mentioned in "no_sanitize" of DECL_ATTRIBUTES. */ inline bool -sanitize_flags_p (unsigned int flag, const_tree fn = current_function_decl) +sanitize_flags_p (uint64_t flag, + const_tree fn = current_function_decl) { - unsigned int result_flags = flag_sanitize & flag; + uint64_t result_flags = flag_sanitize & flag; if (result_flags == 0) return false; diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc index ea04ed7f0d4..ddb173e3ccf 100644 --- a/gcc/c-family/c-attribs.cc +++ b/gcc/c-family/c-attribs.cc @@ -1409,23 +1409,23 @@ handle_cold_attribute (tree *node, tree name, tree ARG_UNUSED (args), /* Add FLAGS for a function NODE to no_sanitize_flags in DECL_ATTRIBUTES. */ void -add_no_sanitize_value (tree node, unsigned int flags) +add_no_sanitize_value (tree node, uint64_t flags) { tree attr = lookup_attribute ("no_sanitize", DECL_ATTRIBUTES (node)); if (attr) { - unsigned int old_value = tree_to_uhwi (TREE_VALUE (attr)); + uint64_t old_value = tree_to_uhwi (TREE_VALUE (attr)); flags |= old_value; if (flags == old_value) return; - TREE_VALUE (attr) = build_int_cst (unsigned_type_node, flags); + TREE_VALUE (attr) = build_int_cst (uint64_type_node, flags); } else DECL_ATTRIBUTES (node) = tree_cons (get_identifier ("no_sanitize"), - build_int_cst (unsigned_type_node, flags), + build_int_cst (uint64_type_node, flags), DECL_ATTRIBUTES (node)); } @@ -1436,7 +1436,7 @@ static tree handle_no_sanitize_attribute (tree *node, tree name, tree args, int, bool *no_add_attrs) { - unsigned int flags = 0; + uint64_t flags = 0; *no_add_attrs = true; if (TREE_CODE (*node) != FUNCTION_DECL) { @@ -1473,7 +1473,7 @@ handle_no_sanitize_address_attribute (tree *node, tree name, tree, int, if (TREE_CODE (*node) != FUNCTION_DECL) warning (OPT_Wattributes, "%qE attribute ignored", name); else -add_no_sanitize_value (*node, SANITIZE_ADDRESS); +add_no_sanitize_value (*node, (uint64_t) SANITIZE_ADDRESS); return NULL_TREE; } @@ -1489,7 +1489,7 @@ handle_no_sanitize_thread_attribute (tree *node, tree name, tree, int, if (TREE_CODE (*node) != FUNCTION_DECL) warning (OPT_Wattributes, "%qE attribute ignored", name); else -add_no_sanitize_value (*node, SANITIZE_THREAD); +add_no_sanitize_value (*node, (uint64_t) SANITIZE_THREAD); return NULL_TREE; } @@ -1506,7 +1506,7 @@ handle_no_address_safety_analysis_attribute (tree *node, tree name, tree, int, if (TREE_CODE (*node) != FUNCTION_DECL) warning (OPT_Wattributes, "%qE attribute ignored", name); else -add_no_sanitize_value (*node, SAN
[PATCH v3 8/9] aarch64: Add support for memetag-stack sanitizer using MTE insns
From: Claudiu Zissulescu MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke the target-specific hooks to create a random tag, add tag to memory address, and finally tag and untag memory. Implement the target hooks to emit MTE instructions if MEMTAG sanitizer is in effect. Continue to use the default target hook if HWASAN is being used. Following target hooks are implemented: - TARGET_MEMTAG_INSERT_RANDOM_TAG - TARGET_MEMTAG_ADD_TAG - TARGET_MEMTAG_EXTRACT_TAG - TARGET_MEMTAG_COMPOSE_OFFSET_TAG Apart from the target-specific hooks, set the following to values defined by the Memory Tagging Extension (MTE) in aarch64: - TARGET_MEMTAG_TAG_SIZE - TARGET_MEMTAG_GRANULE_SIZE The next instructions were (re-)defined: - addg/subg (used by TARGET_MEMTAG_ADD_TAG and TARGET_MEMTAG_COMPOSE_OFFSET_TAG hooks) - stg/st2g Used to tag/untag a memory granule. - tag_memory A target specific instruction, it will will emit MTE instructions to tag/untag memory of a given size. Add documentation in gcc/doc/invoke.texi. (AARCH64_MEMTAG_TAG_SIZE): Define. gcc/ * config/aarch64/aarch64.md (addg): Update pattern to use addg/subg instructions. (stg): Update pattern. (st2g): New pattern. (tag_memory): Likewise. * config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE): Define. (AARCH64_MEMTAG_TAG_BITSIZE): Likewise. (AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Likewise. (aarch64_override_options_internal): Error out if MTE instructions are not available. (aarch64_post_cfi_startproc): Emit .cfi_mte_tagged_frame. (aarch64_can_tag_addresses): Add MEMTAG specific handling. (aarch64_memtag_tag_bitsize): New function (aarch64_memtag_granule_size): Likewise. (aarch64_memtag_insert_random_tag): Likwise. (aarch64_memtag_add_tag): Likewise. (aarch64_memtag_compose_offset_tag): Likewise. (aarch64_memtag_extract_tag): Likewise. (aarch64_granule16_memory_address_p): Likewise. (aarch64_emit_stxg_insn): Likewise. (aarch64_gen_tag_memory_postindex): Likewise. (aarch64_memtag_tag_memory_via_loop): New definition. (aarch64_expand_tag_memory): Likewise. (aarch64_check_memtag_ops): Likewise. (aarch64_gen_tag_memory_postindex): Likewise. (TARGET_MEMTAG_TAG_SIZE): Define. (TARGET_MEMTAG_GRANULE_SIZE): Likewise. (TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise. (TARGET_MEMTAG_ADD_TAG): Likewise. (TARGET_MEMTAG_EXTRACT_TAG): Likewise. (TARGET_MEMTAG_COMPOSE_OFFSET_TAG): Likewise. * config/aarch64/aarch64-builtins.cc (aarch64_expand_builtin_memtag): Update set tag builtin logic. * config/aarch64/aarch64-linux.h: Pass memtag-stack sanitizer specific options to the linker. * config/aarch64/aarch64-protos.h (aarch64_granule16_memory_address_p): New prototype. (aarch64_check_memtag_ops): Likewise. (aarch64_expand_tag_memory): Likewise. * config/aarch64/constraints.md (Umg): New memory constraint. (Uag): New constraint. (Ung): Likewise. (Utg): Likewise. * config/aarch64/predicates.md (aarch64_memtag_tag_offset): Refactor it. (aarch64_granule16_imm6): Rename from aarch64_granule16_uimm6 and refactor it. (aarch64_granule16_memory_operand): New constraint. doc/ * invoke.texi: Update documentation. gcc/testsuite: * gcc.target/aarch64/acle/memtag_1.c: Update test. Co-authored-by: Indu Bhagat Signed-off-by: Claudiu Zissulescu --- gcc/config/aarch64/aarch64-builtins.cc| 7 +- gcc/config/aarch64/aarch64-linux.h| 4 +- gcc/config/aarch64/aarch64-protos.h | 4 + gcc/config/aarch64/aarch64.cc | 370 +- gcc/config/aarch64/aarch64.md | 60 ++- gcc/config/aarch64/constraints.md | 26 ++ gcc/config/aarch64/predicates.md | 13 +- gcc/doc/invoke.texi | 6 +- .../gcc.target/aarch64/acle/memtag_1.c| 2 +- 9 files changed, 464 insertions(+), 28 deletions(-) diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index 93f939a9c83..b2427e73880 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -3668,8 +3668,11 @@ aarch64_expand_builtin_memtag (int fcode, tree exp, rtx target) pat = GEN_FCN (icode) (target, op0, const0_rtx); break; case AARCH64_MEMTAG_BUILTIN_SET_TAG: - pat = GEN_FCN (icode) (op0, op0, const0_rtx); - break; + { + rtx mem = gen_rtx_MEM (TImode, op0); + pat = GEN_FCN (icode) (mem, mem, op0); + break; + } default: gcc_unreachable(); } diff --git a/gcc/config/aarch64/aarch64-li
[PATCH v3 6/9] asan: add new memtag sanitizer
From: Indu Bhagat Add new command line option -fsanitize=memtag-stack with the following new params: --param memtag-instrument-alloca [0,1] (default 1) to use MTE insns for enabling dynamic checking of stack allocas. Along with the new SANITIZE_MEMTAG_STACK, define a SANITIZE_MEMTAG which will be set if any kind of memtag sanitizer is in effect (e.g., later we may add -fsanitize=memtag-globals). Add errors to convey that memtag sanitizer does not work with hwaddress and address sanitizers. Also error out if memtag ISA extension is not enabled. MEMTAG sanitizer will use the HWASAN machinery, but with a few differences: - The tags are always generated at runtime by the hardware, so -fsanitize=memtag-stack enforces a --param hwasan-random-frame-tag=1 Add documentation in gcc/doc/invoke.texi. gcc/ * builtins.def: Adjust the macro to include the new SANTIZIE_MEMTAG_STACK. * flag-types.h (enum sanitize_code): Add new enumerator for SANITIZE_MEMTAG and SANITIZE_MEMTAG_STACK. * opts.cc (finish_options): memtag-stack sanitizer conflicts with hwaddress and address sanitizers. (sanitizer_opts): Add new memtag-stack sanitizer. (parse_sanitizer_options): memtag-stack sanitizer cannot recover. * params.opt: Add new params for memtag-stack sanitizer. doc/ * invoke.texi: Update documentation. Signed-off-by: Claudiu Zissulescu --- gcc/builtins.def| 1 + gcc/doc/invoke.texi | 13 - gcc/flag-types.h| 4 gcc/opts.cc | 22 +- gcc/params.opt | 4 5 files changed, 42 insertions(+), 2 deletions(-) diff --git a/gcc/builtins.def b/gcc/builtins.def index d7b2894bcfa..5f0b1107347 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -257,6 +257,7 @@ along with GCC; see the file COPYING3. If not see true, true, true, ATTRS, true, \ (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \ | SANITIZE_HWADDRESS \ + | SANITIZE_MEMTAG_STACK \ | SANITIZE_UNDEFINED \ | SANITIZE_UNDEFINED_NONDEFAULT) \ || flag_sanitize_coverage)) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 74f5ee26042..d8f11201361 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -17261,7 +17261,7 @@ When using stack instrumentation, decide tags for stack variables using a deterministic sequence beginning at a random tag for each frame. With this parameter unset tags are chosen using the same sequence but beginning from 1. This is enabled by default for @option{-fsanitize=hwaddress} and unavailable -for @option{-fsanitize=kernel-hwaddress}. +for @option{-fsanitize=kernel-hwaddress} and @option{-fsanitize=memtag-stack}. To disable it use @option{--param hwasan-random-frame-tag=0}. @item hwasan-instrument-allocas @@ -17294,6 +17294,11 @@ and @option{-fsanitize=kernel-hwaddress}. To disable instrumentation of builtin functions use @option{--param hwasan-instrument-mem-intrinsics=0}. +@item memtag-instrument-allocas +Enable hardware-assisted memory tagging of dynamically sized stack-allocated +variables. This kind of code generation is enabled by default when using +@option{-fsanitize=memtag-stack}. + @item use-after-scope-direct-emission-threshold If the size of a local variable in bytes is smaller or equal to this number, directly poison (or unpoison) shadow memory instead of using @@ -18225,6 +18230,12 @@ possible by specifying the command-line options @option{--param hwasan-instrument-allocas=1} respectively. Using a random frame tag is not implemented for kernel instrumentation. +@opindex fsanitize=memtag-stack +@item -fsanitize=memtag-stack +Use Memory Tagging Extension instructions instead of instrumentation to allow +the detection of memory errors. This option is available only on those AArch64 +architectures that support Memory Tagging Extensions. + @opindex fsanitize=pointer-compare @item -fsanitize=pointer-compare Instrument comparison operation (<, <=, >, >=) with pointer operands. diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 9a3cc4a2e16..0c9c863a654 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -337,6 +337,10 @@ enum sanitize_code { SANITIZE_KERNEL_HWADDRESS = 1UL << 30, /* Shadow Call Stack. */ SANITIZE_SHADOW_CALL_STACK = 1UL << 31, + /* Memory Tagging for Stack. */ + SANITIZE_MEMTAG_STACK = 1ULL << 32, + /* Memory Tagging. */ + SANITIZE_MEMTAG = SANITIZE_MEMTAG_STACK, SANITIZE_SHIFT = SANITIZE_SHIFT_BASE | SANITIZE_SHIFT_EXPONENT, SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_UNREACHABLE | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN diff --git a/gcc/opts.cc b/gcc/opts.cc index d00e05f6321..b4f516fdce6 100644 --- a/gcc/opts.cc +++ b/gcc/opts.cc @@ -1307,6 +1307,24 @@ finish_options (struct gcc_opt
[PATCH v3 7/9] asan: memtag-stack add support for MTE instructions
From: Claudiu Zissulescu Memory tagging is used for detecting memory safety bugs. On AArch64, the memory tagging extension (MTE) helps in reducing the overheads of memory tagging: - CPU: MTE instructions for efficiently tagging and untagging memory. - Memory: New memory type, Normal Tagged Memory, added to the Arm Architecture. The MEMory TAGging (MEMTAG) sanitizer uses the same infrastructure as HWASAN. MEMTAG and HWASAN are both hardware-assisted solutions, and rely on the same sanitizer machinery in parts. So, define new constructs that allow MEMTAG and HWASAN to share the infrastructure: - hwassist_sanitize_p () is true when either SANITIZE_MEMTAG or SANITIZE_HWASAN is true. - hwassist_sanitize_stack_p () is when hwassist_sanitize_p () and stack variables are to be sanitized. MEMTAG and HWASAN do have differences, however, and hence, the need to conditionalize using memtag_sanitize_p () in the relevant places. E.g., - Instead of generating the libcall __hwasan_tag_memory, MEMTAG needs to invoke the target-specific hook TARGET_MEMTAG_TAG_MEMORY to tag memory. Similar approach can be seen for handling handle_builtin_alloca, where instead of doing the gimple transformations, target hooks are used. - Add a new internal function HWASAN_ALLOCA_POISON to handle dynamically allocated stack when MEMTAG sanitizer is enabled. At expansion, this allows to, in turn, invoke target-hooks to increment tag, and use the generated tag to finally tag the dynamically allocated memory. The usual pattern: irg x0, x0, x0 subgx0, x0, #16, #0 creates a tag in x0 and so on. For alloca, we need to apply the generated tag to the new sp. In absense of an extract tag insn, the implemenation in expand_HWASAN_ALLOCA_POISON resorts to invoking irg again. gcc/ChangeLog: * asan.cc (handle_builtin_stack_restore): Accommodate MEMTAG sanitizer. (handle_builtin_alloca): Expand differently if MEMTAG sanitizer. (get_mem_refs_of_builtin_call): Include MEMTAG along with HWASAN. (memtag_sanitize_stack_p): New definition. (memtag_sanitize_allocas_p): Likewise. (memtag_memintrin): Likewise. (hwassist_sanitize_p): Likewise. (hwassist_sanitize_stack_p): Likewise. (report_error_func): Include MEMTAG along with HWASAN. (build_check_stmt): Likewise. (instrument_derefs): MEMTAG too does not deal with globals yet. (instrument_builtin_call): (maybe_instrument_call): Include MEMTAG along with HWASAN. (asan_expand_mark_ifn): Likewise. (asan_expand_check_ifn): Likewise. (asan_expand_poison_ifn): Expand differently if MEMTAG sanitizer. (asan_instrument): (hwasan_frame_base): (hwasan_record_stack_var): (hwasan_emit_prologue): Expand differently if MEMTAG sanitizer. (hwasan_emit_untag_frame): Likewise. * asan.h (hwasan_record_stack_var): (memtag_sanitize_stack_p): New declaration. (memtag_sanitize_allocas_p): Likewise. (hwassist_sanitize_p): Likewise. (hwassist_sanitize_stack_p): Likewise. (asan_sanitize_use_after_scope): Include MEMTAG along with HWASAN. * cfgexpand.cc (align_local_variable): Likewise. (expand_one_stack_var_at): Likewise. (expand_stack_vars): Likewise. (expand_one_stack_var_1): Likewise. (init_vars_expansion): Likewise. (expand_used_vars): Likewise. (pass_expand::execute): Likewise. * gimplify.cc (asan_poison_variable): Likewise. * internal-fn.cc (expand_HWASAN_ALLOCA_POISON): New definition. (expand_HWASAN_ALLOCA_UNPOISON): Expand differently if MEMTAG sanitizer. (expand_HWASAN_MARK): Likewise. * internal-fn.def (HWASAN_ALLOCA_POISON): Define new. * params.opt: Document new param. FIXME. * sanopt.cc (pass_sanopt::execute): Include MEMTAG along with HWASAN. * gcc.c (sanitize_spec_function): Add check for memtag-stack. Co-authored-by: Indu Bhagat Signed-off-by: Claudiu Zissulescu --- gcc/asan.cc | 214 +--- gcc/asan.h | 10 ++- gcc/cfgexpand.cc| 29 +++--- gcc/gcc.cc | 2 + gcc/gimplify.cc | 5 +- gcc/internal-fn.cc | 68 -- gcc/internal-fn.def | 1 + gcc/params.opt | 4 + gcc/sanopt.cc | 2 +- 9 files changed, 258 insertions(+), 77 deletions(-) diff --git a/gcc/asan.cc b/gcc/asan.cc index 748b289d6f9..711e6a71eee 100644 --- a/gcc/asan.cc +++ b/gcc/asan.cc @@ -762,14 +762,15 @@ static void handle_builtin_stack_restore (gcall *call, gimple_stmt_iterator *iter) { if (!iter - || !(asan_sanitize_allocas_p () || hwasan_sanitize_allocas_p ())) + || !(asan_sanitize_allocas_p () || hwasan_sanitize_allocas_p () + || memtag_sanitize_alloc
[PATCH v3 1/9] targhooks: i386: rename TAG_SIZE to TAG_BITSIZE
From: Indu Bhagat gcc/Changelog: * asan.h (HWASAN_TAG_SIZE): Use targetm.memtag.tag_bitsize. * config/i386/i386.cc (ix86_memtag_tag_size): Rename to ix86_memtag_tag_bitsize. (TARGET_MEMTAG_TAG_SIZE): Renamed to TARGET_MEMTAG_TAG_BITSIZE. * doc/tm.texi (TARGET_MEMTAG_TAG_SIZE): Likewise. * doc/tm.texi.in (TARGET_MEMTAG_TAG_SIZE): Likewise. * target.def (tag_size): Rename to tag_bitsize. * targhooks.cc (default_memtag_tag_size): Rename to default_memtag_tag_bitsize. * targhooks.h (default_memtag_tag_size): Liewise. Signed-off-by: Claudiu Zissulescu --- gcc/asan.h | 2 +- gcc/config/i386/i386.cc | 8 gcc/doc/tm.texi | 2 +- gcc/doc/tm.texi.in | 2 +- gcc/target.def | 4 ++-- gcc/targhooks.cc| 2 +- gcc/targhooks.h | 2 +- 7 files changed, 11 insertions(+), 11 deletions(-) diff --git a/gcc/asan.h b/gcc/asan.h index 273d6745c58..064d4f24823 100644 --- a/gcc/asan.h +++ b/gcc/asan.h @@ -103,7 +103,7 @@ extern hash_set *asan_used_labels; independently here. */ /* How many bits are used to store a tag in a pointer. The default version uses the entire top byte of a pointer (i.e. 8 bits). */ -#define HWASAN_TAG_SIZE targetm.memtag.tag_size () +#define HWASAN_TAG_SIZE targetm.memtag.tag_bitsize () /* Tag Granule of HWASAN shadow stack. This is the size in real memory that each byte in the shadow memory refers to. I.e. if a variable is X bytes long in memory then its tag in shadow diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index b64175d6c93..17faf7ebd24 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -27095,9 +27095,9 @@ ix86_memtag_can_tag_addresses () return ix86_lam_type != lam_none && TARGET_LP64; } -/* Implement TARGET_MEMTAG_TAG_SIZE. */ +/* Implement TARGET_MEMTAG_TAG_BITSIZE. */ unsigned char -ix86_memtag_tag_size () +ix86_memtag_tag_bitsize () { return IX86_HWASAN_TAG_SIZE; } @@ -28071,8 +28071,8 @@ ix86_libgcc_floating_mode_supported_p #undef TARGET_MEMTAG_UNTAGGED_POINTER #define TARGET_MEMTAG_UNTAGGED_POINTER ix86_memtag_untagged_pointer -#undef TARGET_MEMTAG_TAG_SIZE -#define TARGET_MEMTAG_TAG_SIZE ix86_memtag_tag_size +#undef TARGET_MEMTAG_TAG_BITSIZE +#define TARGET_MEMTAG_TAG_BITSIZE ix86_memtag_tag_bitsize #undef TARGET_GEN_CCMP_FIRST #define TARGET_GEN_CCMP_FIRST ix86_gen_ccmp_first diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 5e305643b3a..3f87abf97b2 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -12860,7 +12860,7 @@ At preset, this feature does not support address spaces. It also requires @code{Pmode} to be the same as @code{ptr_mode}. @end deftypefn -@deftypefn {Target Hook} uint8_t TARGET_MEMTAG_TAG_SIZE () +@deftypefn {Target Hook} uint8_t TARGET_MEMTAG_TAG_BITSIZE () Return the size of a tag (in bits) for this platform. The default returns 8. diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index eccc4d88493..040d26c40f1 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -8124,7 +8124,7 @@ maintainer is familiar with. @hook TARGET_MEMTAG_CAN_TAG_ADDRESSES -@hook TARGET_MEMTAG_TAG_SIZE +@hook TARGET_MEMTAG_TAG_BITSIZE @hook TARGET_MEMTAG_GRANULE_SIZE diff --git a/gcc/target.def b/gcc/target.def index 38903eb567a..db48df9498d 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -7457,11 +7457,11 @@ At preset, this feature does not support address spaces. It also requires\n\ bool, (), default_memtag_can_tag_addresses) DEFHOOK -(tag_size, +(tag_bitsize, "Return the size of a tag (in bits) for this platform.\n\ \n\ The default returns 8.", - uint8_t, (), default_memtag_tag_size) + uint8_t, (), default_memtag_tag_bitsize) DEFHOOK (granule_size, diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index c79458e374e..0696f95adeb 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -2806,7 +2806,7 @@ default_memtag_can_tag_addresses () } uint8_t -default_memtag_tag_size () +default_memtag_tag_bitsize () { return 8; } diff --git a/gcc/targhooks.h b/gcc/targhooks.h index f16b58798c2..c9e57e475dc 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -310,7 +310,7 @@ extern bool speculation_safe_value_not_needed (bool); extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx); extern bool default_memtag_can_tag_addresses (); -extern uint8_t default_memtag_tag_size (); +extern uint8_t default_memtag_tag_bitsize (); extern uint8_t default_memtag_granule_size (); extern rtx default_memtag_insert_random_tag (rtx, rtx); extern rtx default_memtag_add_tag (rtx, poly_int64, uint8_t); -- 2.50.0
[PATCH v3 0/9] Add memtag-stack sanitizer using MTE instructions.
From: Claudiu Zissulescu Hi, Please find a new series of patches that implememnts stack sanitizer using AArch64 MTE instructions. This new series is based on Indu previous patch series. What is new: - Introduces a new target instruction tag_memory. - Introduces a new target hook to deal with tag computation (TARGET_MEMTAG_COMPOSE_OFFSET_TAG). - Simplify the stg/st2g instruction patterns to accept POST/PRE modify type of addresses. - Minimize asan.cc modification. - Add execution tests. - Improve and fix emitting stg/st2g instructions. - Various text improvements. Thank you, Claudiu == MTE on AArch64 and Memory Tagging Memory Tagging Extension (MTE) is an AArch64 extension. This extension allows coloring of 16-byte memory granules with 4-bit tag values. The extension provides additional instructions in ISA and a new memory type, Normal Tagged Memory, added to the Arm Architecture. This hardware-assisted mechanism can be used to detect memory bugs like buffer overrun or use-after-free. The detection is probabilistic. Under the hoods, the MTE extension introduces two types of tags: - Address Tags, and, - Allocation Tags (a.k.a., Memory Tags) Address Tag: which acts as the key. This adds four bits to the top of a virtual address. It is built on AArch64 'top-byte-ignore'(TBI) feature. Allocation Tag: which acts as the lock. Allocation tags also consist of four bits, linked with every aligned 16-byte region in the physical memory space. Arm refers to these 16-byte regions as tag granules. The way Allocation tags are stored is a hardware implementation detail. A subset of the MTE instructions which are relevant in the current context are: [Xn, Xd are registers containing addresses]. - irg Xd, Xn Copy Xn into Xd, insert a random 4-bit Address Tag into Xd. - addg Xd, Xn, #, # Xd = Xn + immA, with Address Tag modified by #immB. Similarly, there exists a subg. - stg Xd, [Xn] (Store Allocation Tag) updates Allocation Tag for [Xn, Xn + 16) to the Address Tag of Xd. Additionally, note that load and store instructions with SP base register do not check tags. MEMTAG sanitizer for stack Use MTE instructions to instrument stack accesses to detect memory safety issues. Detecting stack-related memory bugs requires the compiler to: - ensure that each object on the stack is allocated in its own 16-byte granule. - Tag/Color: put tags into each stack variable pointer. - Untag: the function epilogue will untag the (stack) memory. Above should work with dynamic stack allocation as well. GCC has HWASAN machinery for coloring stack variables. Extend the machinery to emit MTE instructions when MEMTAG sanitizer is in effect. Deploying and running user space programs built with -fsanitizer=memtag will need following additional pieces in place. If there is any existing work / ideas on any of the following, please send comments to help define the work. Additional necessary pieces * MTE aware exception handling and unwinding routines The additional stack coloring must work with C++ exceptions and C setjmp/longjmp. * When unwinding the stack for handling C++ exceptions, the unwinder additionally also needs to untag the stack frame. As per the AADWARF64 document: "The character 'G' indicates that associated frames may modify MTE tags on the stack space they use." * When restoring the context in longjmp, we need to additionally untag the stack. Claudiu Zissulescu (4): target-insns.def: (tag_memory) New pattern. targhooks: add TARGET_MEMTAG_COMPOSE_OFFSET_TAG asan: memtag-stack add support for MTE instructions aarch64: Add support for memetag-stack sanitizer using MTE insns Indu Bhagat (5): targhooks: i386: rename TAG_SIZE to TAG_BITSIZE opts: use uint64_t for sanitizer flags aarch64: add new constants for MTE insns asan: add new memtag sanitizer aarch64: Add memtag-stack tests gcc/asan.cc | 214 +++--- gcc/asan.h| 17 +- gcc/builtins.def | 1 + gcc/c-family/c-attribs.cc | 16 +- gcc/c-family/c-common.h | 2 +- gcc/c/c-parser.cc | 4 +- gcc/cfgexpand.cc | 29 +- gcc/common.opt| 6 +- gcc/config/aarch64/aarch64-builtins.cc| 7 +- gcc/config/aarch64/aarch64-linux.h| 4 +- gcc/config/aarch64/aarch64-protos.h | 4 + gcc/config/aarch64/aarch64.cc | 370 +- gcc/config/aarch64/aarch64.md | 78 ++-- gcc/config/aarch64/constraints.md | 26 ++ gcc/config/aarch64/predicates.md | 13 +- gcc/config/i386/i386.cc | 8 +- gcc/cp/typeck.cc | 2 +- gcc/d/d-attribs.cc| 8 +- gcc/doc/invoke.texi
[PATCH v3 3/9] target-insns.def: (tag_memory) New pattern.
From: Claudiu Zissulescu Add a new target instruction. Hardware-assisted sanitizers on architectures providing insstructions to tag/untag memory can then make use of this new instruction pattern. For example, the memtag-stack sanitizer uses these instructions to tag and untag a memory granule. gcc/doc/ * md.texi (tag_memory): Add documentation. gcc/ * target-insns.def (tag_memory): New target instruction. Signed-off-by: Claudiu Zissulescu --- gcc/doc/md.texi | 5 + gcc/target-insns.def | 1 + 2 files changed, 6 insertions(+) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 28159b2e820..e4c9a472e3f 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8578,6 +8578,11 @@ the values were equal. If this pattern is not defined, then a plain compare pattern and conditional branch pattern is used. +@cindex @code{tag_memory} instruction pattern +This pattern tags an object that begins at the address specified by +operand 0, has the size indicated by the operand 2, and uses the tag +from operand 1. + @cindex @code{clear_cache} instruction pattern @item @samp{clear_cache} This pattern, if defined, flushes the instruction cache for a region of diff --git a/gcc/target-insns.def b/gcc/target-insns.def index 59025a20bf7..16e1d8cf565 100644 --- a/gcc/target-insns.def +++ b/gcc/target-insns.def @@ -102,6 +102,7 @@ DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2)) DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2)) DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2)) DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1)) +DEF_TARGET_INSN (tag_memory, (rtx x0, rtx x1, rtx x2)) DEF_TARGET_INSN (trap, (void)) DEF_TARGET_INSN (unique, (void)) DEF_TARGET_INSN (untyped_call, (rtx x0, rtx x1, rtx x2)) -- 2.50.0
[PATCH v3 4/9] aarch64: add new constants for MTE insns
From: Indu Bhagat Define new constants to be used by the MTE pattern definitions. gcc/ * config/aarch64/aarch64.md (MEMTAG_TAG_MASK): New define constant. (MEMTAG_ADDR_MASK): Likewise. (irg, subp, ldg): Use new constants. Signed-off-by: Claudiu Zissulescu --- gcc/config/aarch64/aarch64.md | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 27efc9155dc..bade8af7997 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -440,6 +440,16 @@ (define_constants ; must not operate on inactive inputs if doing so could induce a fault. (SVE_STRICT_GP 1)]) +;; These constants are used as a const_int in MTE instructions +(define_constants + [; 0xf0ff... + ; Tag mask for the 4-bit tag stored in the top 8 bits of a pointer. + (MEMTAG_TAG_MASK -1080863910568919041) + + ; 0x00ff... + ; Tag mask 56-bit address used by subp instruction. + (MEMTAG_ADDR_MASK 72057594037927935)]) + (include "constraints.md") (include "predicates.md") (include "iterators.md") @@ -8556,7 +8566,7 @@ (define_insn "irg" [(set (match_operand:DI 0 "register_operand" "=rk") (ior:DI (and:DI (match_operand:DI 1 "register_operand" "rk") -(const_int -1080863910568919041)) ;; 0xf0ff... +(const_int MEMTAG_TAG_MASK)) ;; 0xf0ff... (ashift:DI (unspec:QI [(match_operand:DI 2 "register_operand" "r")] UNSPEC_GEN_TAG_RND) (const_int 56] @@ -8599,9 +8609,9 @@ (define_insn "subp" [(set (match_operand:DI 0 "register_operand" "=r") (minus:DI (and:DI (match_operand:DI 1 "register_operand" "rk") - (const_int 72057594037927935)) ;; 0x00ff... + (const_int MEMTAG_ADDR_MASK)) ;; 0x00ff... (and:DI (match_operand:DI 2 "register_operand" "rk") - (const_int 72057594037927935] ;; 0x00ff... + (const_int MEMTAG_ADDR_MASK] ;; 0x00ff... "TARGET_MEMTAG" "subp\\t%0, %1, %2" [(set_attr "type" "memtag")] @@ -8611,7 +8621,7 @@ (define_insn "subp" (define_insn "ldg" [(set (match_operand:DI 0 "register_operand" "+r") (ior:DI -(and:DI (match_dup 0) (const_int -1080863910568919041)) ;; 0xf0ff... +(and:DI (match_dup 0) (const_int MEMTAG_TAG_MASK)) ;; 0xf0ff... (ashift:DI (mem:QI (unspec:DI [(and:DI (plus:DI (match_operand:DI 1 "register_operand" "rk") -- 2.50.0
[PATCH v3 5/9] targhooks: add TARGET_MEMTAG_COMPOSE_OFFSET_TAG
From: Claudiu Zissulescu Add a new target hook TARGET_MEMTAG_COMPOSE_OFFSET_TAG to perform addition between two tags. The default of this hook is to byte add the inputs. Hardware-assisted sanitizers on architectures providing instructions to compose (add) two tags like in the case of AArch64. gcc/ * doc/tm.texi: Re-generate. * doc/tm.texi.in: Add documentation for new target hooks. * target.def: Add new hook. * targhooks.cc (default_memtag_compose_offset_tag): New hook. * targhooks.h (default_memtag_compose_offset_tag): Likewise. Signed-off-by: Claudiu Zissulescu --- gcc/doc/tm.texi| 6 ++ gcc/doc/tm.texi.in | 2 ++ gcc/target.def | 7 +++ gcc/targhooks.cc | 7 +++ gcc/targhooks.h| 2 +- 5 files changed, 23 insertions(+), 1 deletion(-) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 3f87abf97b2..a4fba6d21b3 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -12917,6 +12917,12 @@ Store the result in @var{target} if convenient. The default clears the top byte of the original pointer. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_MEMTAG_COMPOSE_OFFSET_TAG (rtx @var{base_tag}, uint8_t @var{tag_offset}) +Return an RTX that represnts the result of composing @var{tag_offset} with +the base tag @var{base_tag}. +The default of this hook is to byte add @var{tag_offset} to @var{base_tag}. +@end deftypefn + @deftypevr {Target Hook} bool TARGET_HAVE_SHADOW_CALL_STACK This value is true if the target platform supports @option{-fsanitize=shadow-call-stack}. The default value is false. diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 040d26c40f1..ff381b486e1 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -8138,6 +8138,8 @@ maintainer is familiar with. @hook TARGET_MEMTAG_UNTAGGED_POINTER +@hook TARGET_MEMTAG_COMPOSE_OFFSET_TAG + @hook TARGET_HAVE_SHADOW_CALL_STACK @hook TARGET_HAVE_LIBATOMIC diff --git a/gcc/target.def b/gcc/target.def index db48df9498d..89f96ca73c5 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -7521,6 +7521,13 @@ Store the result in @var{target} if convenient.\n\ The default clears the top byte of the original pointer.", rtx, (rtx tagged_pointer, rtx target), default_memtag_untagged_pointer) +DEFHOOK +(compose_offset_tag, + "Return an RTX that represnts the result of composing @var{tag_offset} with\n\ +the base tag @var{base_tag}.\n\ +The default of this hook is to byte add @var{tag_offset} to @var{base_tag}.", + rtx, (rtx base_tag, uint8_t tag_offset), default_memtag_compose_offset_tag) + HOOK_VECTOR_END (memtag) #undef HOOK_PREFIX #define HOOK_PREFIX "TARGET_" diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index 0696f95adeb..cfea4a70403 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -2904,4 +2904,11 @@ default_memtag_untagged_pointer (rtx tagged_pointer, rtx target) return untagged_base; } +/* The default implementation of TARGET_MEMTAG_COMPOSE_OFFSET_TAG. */ +rtx +default_memtag_compose_offset_tag (rtx base_tag, uint8_t tag_offset) +{ + return plus_constant (QImode, base_tag, tag_offset); +} + #include "gt-targhooks.h" diff --git a/gcc/targhooks.h b/gcc/targhooks.h index c9e57e475dc..76afce71baa 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -317,5 +317,5 @@ extern rtx default_memtag_add_tag (rtx, poly_int64, uint8_t); extern rtx default_memtag_set_tag (rtx, rtx, rtx); extern rtx default_memtag_extract_tag (rtx, rtx); extern rtx default_memtag_untagged_pointer (rtx, rtx); - +extern rtx default_memtag_compose_offset_tag (rtx, uint8_t); #endif /* GCC_TARGHOOKS_H */ -- 2.50.0
Re: [PATCH v3 2/9] opts: use uint64_t for sanitizer flags
> I see it now from Richard B.. Also I noticed you missed Richard S.'s > suggestion of using a typedef which will definitely help in the future > where we could even replace this with an enum class and overload the > bitwise operators to do the right thing. > Indeed, I've missed that message. Do you thing adding this type in hwint.h is a good place, and what name shall I use for this new type? Thank you, Claudiu
Re: [PATCH] Avoid depending on destructor order
Hi Thomas, This change prohibits compiling of ARC backend: > + gcc_assert (in_shutdown || ob); in_shutdown is only defined when ATOMIC_FDE_FAST_PATH is defined, while gcc_assert is outside of any ifdef. Please can you revisit this line and change it accordingly. Thanks, Claudiu
Re: [PATCH] Avoid depending on destructor order
Thanks, I haven't observed it. Waiting for it, Claudiu On Mon, Sep 26, 2022 at 2:49 PM Thomas Neumann wrote: > > Hi Claudiu, > > > This change prohibits compiling of ARC backend: > > > >> + gcc_assert (in_shutdown || ob); > > > > in_shutdown is only defined when ATOMIC_FDE_FAST_PATH is defined, > > while gcc_assert is outside of any ifdef. Please can you revisit this > > line and change it accordingly. > > I have a patch ready, I am waiting for someone to approve my patch: > > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602130.html > > Best > > Thomas
Re: [committed] arc: Fail conditional move expand patterns
Hi Robin, I don't know how I missed your arc related patch, I'll bootstrap and test your patch asap. Thanks, Claudiu On Fri, Feb 25, 2022 at 3:29 PM Robin Dapp wrote: > > If the movcc comparison is not valid it triggers an assert in the > > current implementation. This behavior is not needed as we can FAIL > > the movcc expand pattern. > > In case of a MODE_CC comparison you can also just return it as described > here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104154 > > or here: > https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590639.html > > If there already is a "CC comparison" the backend does not need to > create one and ifcvt can make use of this, creating better sequences. > > Regards > Robin >
Re: [PATCH] arc: Fix for new ifcvt behavior [PR104154]
Hi Robin, The patch looks good. Please go ahead and merge it, please let me know if you cannot. Thank you, Claudiu On Mon, Feb 21, 2022 at 9:57 AM Robin Dapp via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > Hi, > > I figured I'd just go ahead and post this patch as well since it seems > to have fixed the arc build problems. > > It would be nice if someone could bootstrap/regtest if Jeff hasn't > already done so. I was able to verify that the two testcases attached > to the PR build cleanly but not much more. Thank you. > > Regards > Robin > > -- > > PR104154 > > gcc/ChangeLog: > > * config/arc/arc.cc (gen_compare_reg): Return the CC-mode > comparison ifcvt passed us. > > --- > > From fa98a40abd55e3a10653f6a8c5b2414a2025103b Mon Sep 17 00:00:00 2001 > From: Robin Dapp > Date: Mon, 7 Feb 2022 08:39:41 +0100 > Subject: [PATCH] arc: Fix for new ifcvt behavior [PR104154] > > ifcvt now passes a CC-mode "comparison" to backends. This patch > simply returns from gen_compare_reg () in that case since nothing > needs to be prepared anymore. > > PR104154 > > gcc/ChangeLog: > > * config/arc/arc.cc (gen_compare_reg): Return the CC-mode > comparison ifcvt passed us. > --- > gcc/config/arc/arc.cc | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc > index 8cc173519ab..5e40ec2c04d 100644 > --- a/gcc/config/arc/arc.cc > +++ b/gcc/config/arc/arc.cc > @@ -2254,6 +2254,12 @@ gen_compare_reg (rtx comparison, machine_mode omode) > > >cmode = GET_MODE (x); > + > + /* If ifcvt passed us a MODE_CC comparison we can > + just return it. It should be in the proper form already. */ > + if (GET_MODE_CLASS (cmode) == MODE_CC) > +return comparison; > + >if (cmode == VOIDmode) > cmode = GET_MODE (y); >gcc_assert (cmode == SImode || cmode == SFmode || cmode == DFmode); > -- > 2.31.1 > >
Re: [PATCH 1/2] ARC: Use intrinsics for __builtin_add_overflow*()
Ok. Thank you for your contribution, Claudiu On Wed, Sep 6, 2023 at 3:50 PM Shahab Vahedi wrote: > > This patch covers signed and unsigned additions. The generated code > would be something along these lines: > > signed: > add.f r0, r1, r2 > b.v @label > > unsigned: > add.f r0, r1, r2 > b.c @label > > gcc/ChangeLog: > > * config/arc/arc-modes.def: Add CC_V mode. > * config/arc/predicates.md (proper_comparison_operator): Handle > E_CC_Vmode. > (equality_comparison_operator): Exclude CC_Vmode from eq/ne. > (cc_set_register): Handle CC_Vmode. > (cc_use_register): Likewise. > * config/arc/arc.md (addsi3_v): New insn. > (addvsi4): New expand. > (addsi3_c): New insn. > (uaddvsi4): New expand. > * config/arc/arc-protos.h (arc_gen_unlikely_cbranch): New. > * config/arc/arc.cc (arc_gen_unlikely_cbranch): New. > (get_arc_condition_code): Handle E_CC_Vmode. > (arc_init_reg_tables): Handle CC_Vmode. > > gcc/testsuite/ChangeLog: > > * gcc.target/arc/overflow-1.c: New. > > Signed-off-by: Shahab Vahedi > --- > gcc/config/arc/arc-modes.def | 1 + > gcc/config/arc/arc-protos.h | 1 + > gcc/config/arc/arc.cc | 26 +- > gcc/config/arc/arc.md | 49 +++ > gcc/config/arc/predicates.md | 14 ++- > gcc/testsuite/gcc.target/arc/overflow-1.c | 100 ++ > 6 files changed, 187 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/arc/overflow-1.c > > diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def > index 763e880317d..69eeec5935a 100644 > --- a/gcc/config/arc/arc-modes.def > +++ b/gcc/config/arc/arc-modes.def > @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. If not see > > CC_MODE (CC_ZN); > CC_MODE (CC_Z); > +CC_MODE (CC_V); > CC_MODE (CC_C); > CC_MODE (CC_FP_GT); > CC_MODE (CC_FP_GE); > diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h > index 4f2db7ffb59..bc78fb0b370 100644 > --- a/gcc/config/arc/arc-protos.h > +++ b/gcc/config/arc/arc-protos.h > @@ -50,6 +50,7 @@ extern bool arc_check_mov_const (HOST_WIDE_INT ); > extern bool arc_split_mov_const (rtx *); > extern bool arc_can_use_return_insn (void); > extern bool arc_split_move_p (rtx *); > +extern void arc_gen_unlikely_cbranch (enum rtx_code, machine_mode, rtx); > #endif /* RTX_CODE */ > > extern bool arc_ccfsm_branch_deleted_p (void); > diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc > index f8c9bf17e2c..ec93d40aeb9 100644 > --- a/gcc/config/arc/arc.cc > +++ b/gcc/config/arc/arc.cc > @@ -1538,6 +1538,13 @@ get_arc_condition_code (rtx comparison) > case GEU : return ARC_CC_NC; > default : gcc_unreachable (); > } > +case E_CC_Vmode: > + switch (GET_CODE (comparison)) > + { > + case EQ : return ARC_CC_NV; > + case NE : return ARC_CC_V; > + default : gcc_unreachable (); > + } > case E_CC_FP_GTmode: >if (TARGET_ARGONAUT_SET && TARGET_SPFP) > switch (GET_CODE (comparison)) > @@ -1868,7 +1875,7 @@ arc_init_reg_tables (void) > /* mode_class hasn't been initialized yet for EXTRA_CC_MODES, so > we must explicitly check for them here. */ > if (i == (int) CCmode || i == (int) CC_ZNmode || i == (int) CC_Zmode > - || i == (int) CC_Cmode > + || i == (int) CC_Cmode || i == (int) CC_Vmode > || i == CC_FP_GTmode || i == CC_FP_GEmode || i == CC_FP_ORDmode > || i == CC_FPUmode || i == CC_FPUEmode || i == CC_FPU_UNEQmode) > arc_mode_class[i] = 1 << (int) C_MODE; > @@ -11852,6 +11859,23 @@ arc_libm_function_max_error (unsigned cfn, > machine_mode mode, >return default_libm_function_max_error (cfn, mode, boundary_p); > } > > +/* Generate RTL for conditional branch with rtx comparison CODE in mode > + CC_MODE. */ > + > +void > +arc_gen_unlikely_cbranch (enum rtx_code cmp, machine_mode cc_mode, rtx label) > +{ > + rtx cc_reg, x; > + > + cc_reg = gen_rtx_REG (cc_mode, CC_REG); > + label = gen_rtx_LABEL_REF (VOIDmode, label); > + > + x = gen_rtx_fmt_ee (cmp, VOIDmode, cc_reg, const0_rtx); > + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, label, pc_rtx); > + > + emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); > +} > + > #undef TARGET_USE_ANCHORS_FOR_SYMBOL_P > #define TARGET_USE_ANCHORS_FOR_SYMBOL_P arc_use_anchors_for_symbol_p > > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > index d37ecbf4292..9d011f6b4a9 100644 > --- a/gcc/config/arc/arc.md > +++ b/gcc/config/arc/arc.md > @@ -2725,6 +2725,55 @@ archs4x, archs4xd" > } >") > > +(define_insn "addsi3_v" > + [(set (match_operand:SI 0 "register_operand" "=r,r,r, r") > + (plus:SI (match_operand:SI 1 "register_operand" "r,r,0, r") > + (match_operand:SI 2 "nonm
Re: [PATCH 2/2] ARC: Use intrinsics for __builtin_sub_overflow*()
OK, Thank you for your contribution, Claudiu On Wed, Sep 6, 2023 at 3:50 PM Shahab Vahedi wrote: > > This patch covers signed and unsigned subtractions. The generated code > would be something along these lines: > > signed: > sub.f r0, r1, r2 > b.v @label > > unsigned: > sub.f r0, r1, r2 > b.c @label > > gcc/ChangeLog: > > * config/arc/arc.md (subsi3_v): New insn. > (subvsi4): New expand. > (subsi3_c): New insn. > (usubvsi4): New expand. > > gcc/testsuite/ChangeLog: > > * gcc.target/arc/overflow-2.c: New. > > Signed-off-by: Shahab Vahedi > --- > gcc/config/arc/arc.md | 48 +++ > gcc/testsuite/gcc.target/arc/overflow-2.c | 97 +++ > 2 files changed, 145 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/arc/overflow-2.c > > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > index 9d011f6b4a9..34e9e1a7f1d 100644 > --- a/gcc/config/arc/arc.md > +++ b/gcc/config/arc/arc.md > @@ -2973,6 +2973,54 @@ archs4x, archs4xd" >(set_attr "cpu_facility" "*,cd,*,*,*,*,*,*,*,*") >]) > > +(define_insn "subsi3_v" > + [(set (match_operand:SI 0 "register_operand" "=r,r,r, r") > + (minus:SI (match_operand:SI 1 "register_operand" "r,r,0, r") > + (match_operand:SI 2 "nonmemory_operand" "r,L,I,C32"))) > + (set (reg:CC_V CC_REG) > + (compare:CC_V (sign_extend:DI (minus:SI (match_dup 1) > + (match_dup 2))) > + (minus:DI (sign_extend:DI (match_dup 1)) > + (sign_extend:DI (match_dup 2)] > + "" > + "sub.f\\t%0,%1,%2" > + [(set_attr "cond" "set") > +(set_attr "type" "compare") > +(set_attr "length" "4,4,4,8")]) > + > +(define_expand "subvsi4" > + [(match_operand:SI 0 "register_operand") > + (match_operand:SI 1 "register_operand") > + (match_operand:SI 2 "nonmemory_operand") > + (label_ref (match_operand 3 "" ""))] > + "" > + "emit_insn (gen_subsi3_v (operands[0], operands[1], operands[2])); > + arc_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]); > + DONE;") > + > +(define_insn "subsi3_c" > + [(set (match_operand:SI 0 "register_operand" "=r,r,r, r") > + (minus:SI (match_operand:SI 1 "register_operand" "r,r,0, r") > + (match_operand:SI 2 "nonmemory_operand" "r,L,I,C32"))) > + (set (reg:CC_C CC_REG) > + (compare:CC_C (match_dup 1) > + (match_dup 2)))] > + "" > + "sub.f\\t%0,%1,%2" > + [(set_attr "cond" "set") > +(set_attr "type" "compare") > +(set_attr "length" "4,4,4,8")]) > + > +(define_expand "usubvsi4" > + [(match_operand:SI 0 "register_operand") > + (match_operand:SI 1 "register_operand") > + (match_operand:SI 2 "nonmemory_operand") > + (label_ref (match_operand 3 "" ""))] > + "" > + "emit_insn (gen_subsi3_c (operands[0], operands[1], operands[2])); > +arc_gen_unlikely_cbranch (LTU, CC_Cmode, operands[3]); > +DONE;") > + > (define_expand "subdi3" >[(set (match_operand:DI 0 "register_operand" "") > (minus:DI (match_operand:DI 1 "register_operand" "") > diff --git a/gcc/testsuite/gcc.target/arc/overflow-2.c > b/gcc/testsuite/gcc.target/arc/overflow-2.c > new file mode 100644 > index 000..b4de8c03b22 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arc/overflow-2.c > @@ -0,0 +1,97 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O1" } */ > + > +#include > +#include > + > +/* > + * sub.f r0,r0,r1 > + * st_s r0,[r2] > + * mov_s r0,1 > + * j_s.d [blink] > + * mov.nv r0,0 > + */ > +bool sub_overflow (int32_t a, int32_t b, int32_t *res) > +{ > + return __builtin_sub_overflow (a, b, res); > +} > + > +/* > + * sub.f r0,r0,-1234 > + * st_s r0,[r1] > + * mov_s r0,1 > + * j_s.d [blink] > + * mov.nv r0,0 > + */ > +bool subi_overflow (int32_t a, int32_t *res) > +{ > + return __builtin_sub_overflow (a, -1234, res); > +} > + > +/* > + * sub.f r3,r0,r1 > + * st_s r3,[r2] > + * j_s.d [blink] > + * setlo r0,r0,r1 > + */ > +bool usub_overflow (uint32_t a, uint32_t b, uint32_t *res) > +{ > + return __builtin_sub_overflow (a, b, res); > +} > + > +/* > + * sub.f r2,r0,4321 > + * seths r0,4320,r0 > + * j_s.d [blink] > + * st_s r2,[r1] > + */ > +bool usubi_overflow (uint32_t a, uint32_t *res) > +{ > + return __builtin_sub_overflow (a, 4321, res); > +} > + > +/* > + * sub.f r0,r0,r1 > + * mov_s r0,1 > + * j_s.d [blink] > + * mov.nv r0,0 > + */ > +bool sub_overflow_p (int32_t a, int32_t b, int32_t res) > +{ > + return __builtin_sub_overflow_p (a, b, res); > +} > + > +/* > + * sub.f r0,r0,-1000 > + * mov_s r0,1 > + * j_s.d [blink] > + * mov.nv r0,0 > + */ > +bool subi_overflow_p (int32_t a, int32_t res) > +{ > + return __builtin_sub_overflow_p (a, -1000, res); > +} > + > +/* > + * j_s.d [blink] > + * setlo r0,r0,r1 > + */ > +bool usub_overflow_p (uint32_t a, uint32_t b, uint32_t res) > +{ > +
Re: [PATCH] [ARC] Allow more ABIs in GLIBC_DYNAMIC_LINKER
Pushed. Thank you, Claudiu On Sun, Mar 29, 2020 at 2:05 AM Vineet Gupta via Gcc-patches wrote: > > Enable big-endian suffixed dynamic linker per glibc multi-abi support. > > And to avoid a future churn and version pairingi hassles, also allow > arc700 although glibc for ARC currently doesn't support it. > > gcc/ > -xx-xx Vineet Gupta > + > + * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700 > > Signed-off-by: Vineet Gupta > --- > gcc/ChangeLog | 4 > gcc/config/arc/linux.h | 2 +- > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 86ad683a6cb0..c26a748fd51b 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,7 @@ > +2020-03-28 Vineet Gupta > + > + * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700 > + > 2020-03-28 Jakub Jelinek > > PR c/93573 > diff --git a/gcc/config/arc/linux.h b/gcc/config/arc/linux.h > index 0b99da3fcdaf..1bbeccee7115 100644 > --- a/gcc/config/arc/linux.h > +++ b/gcc/config/arc/linux.h > @@ -29,7 +29,7 @@ along with GCC; see the file COPYING3. If not see > } \ >while (0) > > -#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-arc.so.2" > +#define GLIBC_DYNAMIC_LINKER > "/lib/ld-linux-arc%{mbig-endian:eb}%{mcpu=arc700:700}.so.2" > #define UCLIBC_DYNAMIC_LINKER "/lib/ld-uClibc.so.0" > > /* Note that the default is to link against dynamic libraries, if they are > -- > 2.20.1 >
Re: [PATCH] [ARC] Allow more ABIs in GLIBC_DYNAMIC_LINKER
Done. Thank you for your support, Claudiu On Thu, Apr 9, 2020 at 2:38 AM Vineet Gupta wrote: > > Hi Claudiu, > > For glibc needs can this be backported to gcc-9 please ! > > Thx, > -Vineet > > On 3/31/20 3:06 AM, Claudiu Zissulescu Ianculescu wrote: > > Pushed. > > > > Thank you, > > Claudiu > > > > On Sun, Mar 29, 2020 at 2:05 AM Vineet Gupta via Gcc-patches > > wrote: > >> Enable big-endian suffixed dynamic linker per glibc multi-abi support. > >> > >> And to avoid a future churn and version pairingi hassles, also allow > >> arc700 although glibc for ARC currently doesn't support it. > >> > >> gcc/ > >> -xx-xx Vineet Gupta > >> + > >> + * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700 > >> > >> Signed-off-by: Vineet Gupta > >> --- > >> gcc/ChangeLog | 4 > >> gcc/config/arc/linux.h | 2 +- > >> 2 files changed, 5 insertions(+), 1 deletion(-) > >> > >> diff --git a/gcc/ChangeLog b/gcc/ChangeLog > >> index 86ad683a6cb0..c26a748fd51b 100644 > >> --- a/gcc/ChangeLog > >> +++ b/gcc/ChangeLog > >> @@ -1,3 +1,7 @@ > >> +2020-03-28 Vineet Gupta > >> + > >> + * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700 > >> + > >> 2020-03-28 Jakub Jelinek > >> > >> PR c/93573 > >> diff --git a/gcc/config/arc/linux.h b/gcc/config/arc/linux.h > >> index 0b99da3fcdaf..1bbeccee7115 100644 > >> --- a/gcc/config/arc/linux.h > >> +++ b/gcc/config/arc/linux.h > >> @@ -29,7 +29,7 @@ along with GCC; see the file COPYING3. If not see > >> } \ > >>while (0) > >> > >> -#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-arc.so.2" > >> +#define GLIBC_DYNAMIC_LINKER > >> "/lib/ld-linux-arc%{mbig-endian:eb}%{mcpu=arc700:700}.so.2" > >> #define UCLIBC_DYNAMIC_LINKER "/lib/ld-uClibc.so.0" > >> > >> /* Note that the default is to link against dynamic libraries, if they are > >> -- > >> 2.20.1 > >> > > ___ > > linux-snps-arc mailing list > > linux-snps-...@lists.infradead.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dsnps-2Darc&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=7FgpX6o3vAhwMrMhLh-4ZJey5kjdNUwOL2CWsFwR4T8&m=MrObyH2ki95_7m_xHpnWX-k9eIMOsxMuSa48qhxYOCY&s=3ggbGwaiJuSFnFECy0ItuwBBMDAcriwCdSc3GA0UFig&e= >
Re: [PATCH] arc: Use separate predicated patterns for mpyd(u)
Gentle PING. On Wed, Oct 7, 2020 at 12:39 PM Claudiu Zissulescu wrote: > > From: Claudiu Zissulescu > > The compiler can match mpyd.eq r0,r1,r0 as a predicated instruction, > which is incorrect. The mpyd(u) instruction takes as input two 32-bit > registers, returning into a double 64-bit even-odd register pair. For > the predicated case, the ARC instruction decoder expects the > destination register to be the same as the first input register. In > the big-endian case the result is swaped in the destination register > pair, however, the instruction encoding remains the same. Refurbish > the mpyd(u) patterns to take into account the above observation. > > Permission to apply this patch to master, gcc10 and gcc9 branches. > > Cheers, > Claudiu > > -xx-xx Claudiu Zissulescu > > * testsuite/gcc.target/arc/pmpyd.c: New test. > * testsuite/gcc.target/arc/tmac-1.c: Update. > * config/arc/arc.md (mpyd_arcv2hs): New template > pattern. > (*pmpyd_arcv2hs): Likewise. > (*pmpyd_imm_arcv2hs): Likewise. > (mpyd_arcv2hs): Moved into above template. > (mpyd_imm_arcv2hs): Moved into above template. > (mpydu_arcv2hs): Likewise. > (mpydu_imm_arcv2hs): Likewise. > (su_optab): New optab prefix for sign/zero-extending operations. > > Signed-off-by: Claudiu Zissulescu > --- > gcc/config/arc/arc.md | 101 +- > gcc/testsuite/gcc.target/arc/pmpyd.c | 15 > gcc/testsuite/gcc.target/arc/tmac-1.c | 2 +- > 3 files changed, 67 insertions(+), 51 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/arc/pmpyd.c > > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > index 1720e8cd2f6f..d4d9f59a3eac 100644 > --- a/gcc/config/arc/arc.md > +++ b/gcc/config/arc/arc.md > @@ -894,6 +894,8 @@ archs4x, archs4xd" > > (define_code_iterator SEZ [sign_extend zero_extend]) > (define_code_attr SEZ_prefix [(sign_extend "sex") (zero_extend "ext")]) > +; Optab prefix for sign/zero-extending operations > +(define_code_attr su_optab [(sign_extend "") (zero_extend "u")]) > > (define_insn "*xt_cmp0_noout" >[(set (match_operand 0 "cc_set_register" "") > @@ -6436,66 +6438,65 @@ archs4x, archs4xd" > (set_attr "predicable" "no") > (set_attr "cond" "nocond")]) > > -(define_insn "mpyd_arcv2hs" > - [(set (match_operand:DI 0 "even_register_operand" > "=Rcr, r") > - (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" " > 0, c")) > -(sign_extend:DI (match_operand:SI 2 "register_operand" " > c, c" > +(define_insn "mpyd_arcv2hs" > + [(set (match_operand:DI 0 "even_register_operand" "=r") > + (mult:DI (SEZ:DI (match_operand:SI 1 "register_operand" "r")) > +(SEZ:DI (match_operand:SI 2 "register_operand" "r" > (set (reg:DI ARCV2_ACC) > (mult:DI > - (sign_extend:DI (match_dup 1)) > - (sign_extend:DI (match_dup 2] > + (SEZ:DI (match_dup 1)) > + (SEZ:DI (match_dup 2] >"TARGET_PLUS_MACD" > - "mpyd%? %0,%1,%2" > - [(set_attr "length" "4,4") > - (set_attr "iscompact" "false") > - (set_attr "type" "multi") > - (set_attr "predicable" "yes,no") > - (set_attr "cond" "canuse,nocond")]) > - > -(define_insn "mpyd_imm_arcv2hs" > - [(set (match_operand:DI 0 "even_register_operand" > "=Rcr, r,r,Rcr, r") > - (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" " > 0, c,0, 0, c")) > -(match_operand 2 "immediate_operand" " > L, L,I,Cal,Cal"))) > + "mpyd%?\\t%0,%1,%2" > + [(set_attr "length" "4") > + (set_attr "iscompact" "false") > + (set_attr "type" "multi") > + (set_attr "predicable" "no")]) > + > +(define_insn "*pmpyd_arcv2hs" > + [(set (match_operand:DI 0 "even_register_operand" "=r") > + (mult:DI > +(SEZ:DI (match_operand:SI 1 "even_register_operand" "%0")) > +(SEZ:DI (match_operand:SI 2 "register_operand" "r" > (set (reg:DI ARCV2_ACC) > - (mult:DI (sign_extend:DI (match_dup 1)) > -(match_dup 2)))] > + (mult:DI > + (SEZ:DI (match_dup 1)) > + (SEZ:DI (match_dup 2] >"TARGET_PLUS_MACD" > - "mpyd%? %0,%1,%2" > - [(set_attr "length" "4,4,4,8,8") > - (set_attr "iscompact" "false") > - (set_attr "type" "multi") > - (set_attr "predicable" "yes,no,no,yes,no") > - (set_attr "cond" "canuse,nocond,nocond,canuse_limm,nocond")]) > - > -(define_insn "mpydu_arcv2hs" > - [(set (match_operand:DI 0 "even_register_operand" > "=Rcr, r") > - (mult:DI (zero_extend:DI (match_operand:SI 1 "register_operand" " > 0, c")) > -(zero_extend:DI (match_operand:SI 2 "register_operand" " > c, c" > + "mpyd%?\\t%0,%1,%2" > + [(set_attr "length" "4") > + (set_attr "iscompact" "false") > + (set_attr "type" "multi") > + (set_attr "predicable
Re: [PATCH] arc: Improve/add instruction patterns to better use MAC instructions.
Gentle PING. On Fri, Oct 9, 2020 at 5:24 PM Claudiu Zissulescu wrote: > > From: Claudiu Zissulescu > > ARC MYP7+ instructions add MAC instructions for vector and scalar data > types. This patch adds a madd pattern for 16it datum that is using the > 32bit MAC instruction, and dot_prod patterns for v4hi vector > types. The 64bit moves are also upgraded by using vadd2 instuction. > > gcc/ > -xx-xx Claudiu Zissulescu > > * config/arc/arc.c (arc_split_move): Recognize vadd2 instructions. > * config/arc/arc.md (movdi_insn): Update pattern to use vadd2 > instructions. > (movdf_insn): Likewise. > (maddhisi4): New pattern. > (umaddhisi4): Likewise. > * config/arc/simdext.md (mov_int): Update pattern to use > vadd2. > (sdot_prodv4hi): New pattern. > (udot_prodv4hi): Likewise. > (arc_vec_mac_hi_v4hi): Update/renamed to > arc_vec_mac_v2hiv2si. > (arc_vec_mac_v2hiv2si_zero): New pattern. > > Signed-off-by: Claudiu Zissulescu > --- > gcc/config/arc/arc.c | 8 > gcc/config/arc/arc.md | 71 --- > gcc/config/arc/constraints.md | 5 ++ > gcc/config/arc/simdext.md | 90 +++ > 4 files changed, 147 insertions(+), 27 deletions(-) > > diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c > index ec55cfde87a9..d5b521e75e67 100644 > --- a/gcc/config/arc/arc.c > +++ b/gcc/config/arc/arc.c > @@ -10202,6 +10202,14 @@ arc_split_move (rtx *operands) >return; > } > > + if (TARGET_PLUS_QMACW > + && even_register_operand (operands[0], mode) > + && even_register_operand (operands[1], mode)) > +{ > + emit_move_insn (operands[0], operands[1]); > + return; > +} > + >if (TARGET_PLUS_QMACW >&& GET_CODE (operands[1]) == CONST_VECTOR) > { > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > index f9fc11e51a85..1720e8cd2f6f 100644 > --- a/gcc/config/arc/arc.md > +++ b/gcc/config/arc/arc.md > @@ -1345,8 +1345,8 @@ archs4x, archs4xd" >") > > (define_insn_and_split "*movdi_insn" > - [(set (match_operand:DI 0 "move_dest_operand" "=w, w,r, m") > - (match_operand:DI 1 "move_double_src_operand" "c,Hi,m,cCm3"))] > + [(set (match_operand:DI 0 "move_dest_operand" "=r, r,r, m") > + (match_operand:DI 1 "move_double_src_operand" "r,Hi,m,rCm3"))] >"register_operand (operands[0], DImode) > || register_operand (operands[1], DImode) > || (satisfies_constraint_Cm3 (operands[1]) > @@ -1358,6 +1358,13 @@ archs4x, archs4xd" > default: >return \"#\"; > > +case 0: > +if (TARGET_PLUS_QMACW > + && even_register_operand (operands[0], DImode) > + && even_register_operand (operands[1], DImode)) > + return \"vadd2\\t%0,%1,0\"; > +return \"#\"; > + > case 2: > if (TARGET_LL64 > && memory_operand (operands[1], DImode) > @@ -1374,7 +1381,7 @@ archs4x, archs4xd" > return \"#\"; > } > }" > - "reload_completed" > + "&& reload_completed" >[(const_int 0)] >{ > arc_split_move (operands); > @@ -1420,15 +1427,24 @@ archs4x, archs4xd" >"if (prepare_move_operands (operands, DFmode)) DONE;") > > (define_insn_and_split "*movdf_insn" > - [(set (match_operand:DF 0 "move_dest_operand" "=D,r,c,c,r,m") > - (match_operand:DF 1 "move_double_src_operand" "r,D,c,E,m,c"))] > - "register_operand (operands[0], DFmode) || register_operand (operands[1], > DFmode)" > + [(set (match_operand:DF 0 "move_dest_operand" "=D,r,r,r,r,m") > + (match_operand:DF 1 "move_double_src_operand" "r,D,r,E,m,r"))] > + "register_operand (operands[0], DFmode) > + || register_operand (operands[1], DFmode)" >"* > { > switch (which_alternative) > { > default: >return \"#\"; > + > +case 2: > +if (TARGET_PLUS_QMACW > + && even_register_operand (operands[0], DFmode) > + && even_register_operand (operands[1], DFmode)) > + return \"vadd2\\t%0,%1,0\"; > +return \"#\"; > + > case 4: > if (TARGET_LL64 > && ((even_register_operand (operands[0], DFmode) > @@ -6177,6 +6193,49 @@ archs4x, archs4xd" >[(set_attr "length" "0")]) > > ;; MAC and DMPY instructions > + > +; Use MAC instruction to emulate 16bit mac. > +(define_expand "maddhisi4" > + [(match_operand:SI 0 "register_operand" "") > + (match_operand:HI 1 "register_operand" "") > + (match_operand:HI 2 "extend_operand" "") > + (match_operand:SI 3 "register_operand" "")] > + "TARGET_PLUS_DMPY" > + "{ > + rtx acc_reg = gen_rtx_REG (DImode, ACC_REG_FIRST); > + rtx tmp1 = gen_reg_rtx (SImode); > + rtx tmp2 = gen_reg_rtx (SImode); > + rtx accl = gen_lowpart (SImode, acc_reg); > + > + emit_move_insn (accl, operands[3]); > + emit_insn (gen_rtx_SET (tmp1, gen_rtx_SIGN_EXTEND (SImode, operands[1]))); > + emit_insn (gen_rtx_SET (tmp2, gen_rtx_SIGN_EXTEND (SImode, operand
Re: [PATCH] arc: Add --with-fpu support for ARCv2 cpus
Thanks a lot guys. Patch is pushed. //Claudiu On Mon, Jun 14, 2021 at 12:34 AM Jeff Law wrote: > > > > On 6/13/2021 4:06 AM, Bernhard Reutner-Fischer wrote: > > On Fri, 11 Jun 2021 14:25:24 +0300 > > Claudiu Zissulescu wrote: > > > >> Hi Bernhard, > >> > >> Please find attached my latest patch, it includes (hopefully) all your > >> feedback. > >> > >> Thank you for comments, > > concise and clean, i wouldn't know what to remove. LGTM. > > thanks for your patience! > THen let's consider it approved at this point. Thanks for chiming in > Bernhard and thanks for implementing the suggestions Claudiu! > > jeff