Re: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

2023-10-03 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

It is not necessary to do any mods on your patch. I've just answered
the questions which you asked me. The adds are faster for the ARC CPUs
which are still in production, and I suppose we can leverage the LP
instruction use with DBNZ instructions for implementing loops. I'll
come back to you asap, after I've got the nightly results :)

Thank you,
Claudiu

On Tue, Oct 3, 2023 at 6:34 PM Roger Sayle  wrote:
>
>
> Hi Claudiu,
> Thanks for the answers to my technical questions.
> If you'd prefer to update arc.md's add3 pattern first,
> I'm happy to update/revise my patch based on this
> and your feedback, for example preferring add over
> asl_s (or controlling this choice with -Os).
>
> Thanks again.
> Roger
> --
>
> > -Original Message-
> > From: Claudiu Zissulescu 
> > Sent: 03 October 2023 15:26
> > To: Roger Sayle ; gcc-patches@gcc.gnu.org
> > Subject: RE: [ARC PATCH] Split SImode shifts pre-reload on
> > !TARGET_BARREL_SHIFTER.
> >
> > Hi Roger,
> >
> > It was nice to meet you too.
> >
> > Thank you in looking into the ARC's non-Barrel Shifter configurations.  I
> will dive
> > into your patch asap, but before starting here are a few of my comments:
> >
> > -Original Message-
> > From: Roger Sayle 
> > Sent: Thursday, September 28, 2023 2:27 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Claudiu Zissulescu 
> > Subject: [ARC PATCH] Split SImode shifts pre-reload on
> > !TARGET_BARREL_SHIFTER.
> >
> >
> > Hi Claudiu,
> > It was great meeting up with you and the Synopsys ARC team at the GNU
> tools
> > Cauldron in Cambridge.
> >
> > This patch is the first in a series to improve SImode and DImode shifts
> and rotates
> > in the ARC backend.  This first piece splits SImode shifts, for
> > !TARGET_BARREL_SHIFTER targets, after combine and before reload, in the
> split1
> > pass, as suggested by the FIXME comment above output_shift in arc.cc.  To
> do
> > this I've copied the implementation of the x86_pre_reload_split function
> from
> > i386 backend, and renamed it arc_pre_reload_split.
> >
> > Although the actual implementations of shifts remain the same (as in
> > output_shift), having them as explicit instructions in the RTL stream
> allows better
> > scheduling and use of compact forms when available.  The benefits can be
> seen in
> > two short examples below.
> >
> > For the function:
> > unsigned int foo(unsigned int x, unsigned int y) {
> >   return y << 2;
> > }
> >
> > GCC with -O2 -mcpu=em would previously generate:
> > foo:add r1,r1,r1
> > add r1,r1,r1
> > j_s.d   [blink]
> > mov_s   r0,r1   ;4
> >
> > [CZI] The move shouldn't be generated indeed. The use of ADDs are slightly
> > beneficial for older ARCv1 arches.
> >
> > and with this patch now generates:
> > foo:asl_s r0,r1
> > j_s.d   [blink]
> > asl_s r0,r0
> >
> > [CZI] Nice. This new sequence is as fast as we can get for our ARCv2 cpus.
> >
> > Notice the original (from shift_si3's output_shift) requires the shift
> sequence to be
> > monolithic with the same destination register as the source (requiring an
> extra
> > mov_s).  The new version can eliminate this move, and schedule the second
> asl in
> > the branch delay slot of the return.
> >
> > For the function:
> > int x,y,z;
> >
> > void bar()
> > {
> >   x <<= 3;
> >   y <<= 3;
> >   z <<= 3;
> > }
> >
> > GCC -O2 -mcpu=em currently generates:
> > bar:push_s  r13
> > ld.as   r12,[gp,@x@sda] ;23
> > ld.as   r3,[gp,@y@sda]  ;23
> > mov r2,0
> > add3 r12,r2,r12
> > mov r2,0
> > add3 r3,r2,r3
> > ld.as   r2,[gp,@z@sda]  ;23
> > st.as   r12,[gp,@x@sda] ;26
> > mov r13,0
> > add3 r2,r13,r2
> > st.as   r3,[gp,@y@sda]  ;26
> > st.as   r2,[gp,@z@sda]  ;26
> > j_s.d   [blink]
> > pop_s   r13
> >
> > where each shift by 3, uses ARC's add3 instruction, which is similar to
> x86's lea
> > implementing x = (y<<3) + z, but requires the value zero to be placed in a
> > temporary register "z".  Splitting this before reload allows these pseudos
> to be
> > shared/reused.  With this patch, we get
> >
> > bar:ld.as   r2,[gp,@x@sda]  ;23
> > mov_s   r3,0;3
> > add3r2,r3,r2
> > ld.as   r3,[gp,@y@sda]  ;23
> > st.as   r2,[gp,@x@sda]  ;26
> > ld.as   r2,[gp,@z@sda]  ;23
> > mov_s   r12,0   ;3
> > add3r3,r12,r3
> > add3r2,r12,r2
> > st.as   r3,[gp,@y@sda]  ;26
> > st.as   r2,[gp,@z@sda]  ;26
> > j_s [blink]
> >
> > [CZI] Looks great, but it also shows that I've forgot to add to ADD3
> instruction the
> > Ra,LIMM,RC variant, which will lead to have instead of
> > mov_s   r3,0;3
> > add3r2,r3,r2
> > Only this add3,0,r2, Indeed it is longer instruction but faster.
> >
> > Unfortunately, register allocation means that we only share two of the
> three
> > "mov_s z,0", but this is sufficient to red

Re: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

2023-10-04 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

The patch as it is passed the validation, and it is in general OK.
Although it doesn't address the elephant in the room, namely
output_shift function, it is a welcome cleanup.
I would like you to split the patch in two. One which deals with
improvements on shifts in absence of a barrel shifter, and one which
addresses the default instruction length, as they can be seen as
separate work. Please feel free to commit resulting patches to the
mainline.

Thank you for your contribution,
Claudiu

On Thu, Sep 28, 2023 at 2:27 PM Roger Sayle  wrote:
>
>
> Hi Claudiu,
> It was great meeting up with you and the Synopsys ARC team at the
> GNU tools Cauldron in Cambridge.
>
> This patch is the first in a series to improve SImode and DImode
> shifts and rotates in the ARC backend.  This first piece splits
> SImode shifts, for !TARGET_BARREL_SHIFTER targets, after combine
> and before reload, in the split1 pass, as suggested by the FIXME
> comment above output_shift in arc.cc.  To do this I've copied the
> implementation of the x86_pre_reload_split function from i386
> backend, and renamed it arc_pre_reload_split.
>
> Although the actual implementations of shifts remain the same
> (as in output_shift), having them as explicit instructions in
> the RTL stream allows better scheduling and use of compact forms
> when available.  The benefits can be seen in two short examples
> below.
>
> For the function:
> unsigned int foo(unsigned int x, unsigned int y) {
>   return y << 2;
> }
>
> GCC with -O2 -mcpu=em would previously generate:
> foo:add r1,r1,r1
> add r1,r1,r1
> j_s.d   [blink]
> mov_s   r0,r1   ;4
> and with this patch now generates:
> foo:asl_s r0,r1
> j_s.d   [blink]
> asl_s r0,r0
>
> Notice the original (from shift_si3's output_shift) requires the
> shift sequence to be monolithic with the same destination register
> as the source (requiring an extra mov_s).  The new version can
> eliminate this move, and schedule the second asl in the branch
> delay slot of the return.
>
> For the function:
> int x,y,z;
>
> void bar()
> {
>   x <<= 3;
>   y <<= 3;
>   z <<= 3;
> }
>
> GCC -O2 -mcpu=em currently generates:
> bar:push_s  r13
> ld.as   r12,[gp,@x@sda] ;23
> ld.as   r3,[gp,@y@sda]  ;23
> mov r2,0
> add3 r12,r2,r12
> mov r2,0
> add3 r3,r2,r3
> ld.as   r2,[gp,@z@sda]  ;23
> st.as   r12,[gp,@x@sda] ;26
> mov r13,0
> add3 r2,r13,r2
> st.as   r3,[gp,@y@sda]  ;26
> st.as   r2,[gp,@z@sda]  ;26
> j_s.d   [blink]
> pop_s   r13
>
> where each shift by 3, uses ARC's add3 instruction, which is similar
> to x86's lea implementing x = (y<<3) + z, but requires the value zero
> to be placed in a temporary register "z".  Splitting this before reload
> allows these pseudos to be shared/reused.  With this patch, we get
>
> bar:ld.as   r2,[gp,@x@sda]  ;23
> mov_s   r3,0;3
> add3r2,r3,r2
> ld.as   r3,[gp,@y@sda]  ;23
> st.as   r2,[gp,@x@sda]  ;26
> ld.as   r2,[gp,@z@sda]  ;23
> mov_s   r12,0   ;3
> add3r3,r12,r3
> add3r2,r12,r2
> st.as   r3,[gp,@y@sda]  ;26
> st.as   r2,[gp,@z@sda]  ;26
> j_s [blink]
>
> Unfortunately, register allocation means that we only share two of the
> three "mov_s z,0", but this is sufficient to reduce register pressure
> enough to avoid spilling r13 in the prologue/epilogue.
>
> This patch also contains a (latent?) bug fix.  The implementation of
> the default insn "length" attribute, assumes instructions of type
> "shift" have two input operands and accesses operands[2], hence
> specializations of shifts that don't have a operands[2], need to be
> categorized as type "unary" (which results in the correct length).
>
> This patch has been tested on a cross-compiler to arc-elf (hosted on
> x86_64-pc-linux-gnu), but because I've an incomplete tool chain many
> of the regression test fail, but there are no new failures with new
> test cases added below.  If you can confirm that there are no issues
> from additional testing, is this OK for mainline?
>
> Finally a quick technical question.  ARC's zero overhead loops require
> at least two instructions in the loop, so currently the backend's
> implementation of shr20 pads the loop body with a "nop".
>
> lshr20: mov.f lp_count, 20
> lpnz2f
> lsr r0,r0
> nop
> 2:  # end single insn loop
> j_s [blink]
>
> could this be more efficiently implemented as:
>
> lshr20: mov lp_count, 10
> lp 2f
> lsr_s r0,r0
> lsr_s r0,r0
> 2:  # end single insn loop
> j_s [blink]
>
> i.e. half the number of iterations, but doing twice as much useful
> work in each iteration?  Or might the nop be free on advanced
> microarchitectures, and/or the consecutive dependent shifts cause
> a pipeline stall?  It would be nice to fuse loops t

Re: [ARC PATCH] Split asl dst, 1, src into bset dst, 0, src to implement 1<

2023-10-16 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

Indeed, I was missing the patch file.

Approved.

Thank you for your contribution,
 Claudiu

On Sun, Oct 15, 2023 at 11:14 AM Roger Sayle  wrote:
>
> I’ve done it again. ENOPATCH.
>
>
>
> From: Roger Sayle 
> Sent: 15 October 2023 09:13
> To: 'gcc-patches@gcc.gnu.org' 
> Cc: 'Claudiu Zissulescu' 
> Subject: [ARC PATCH] Split asl dst,1,src into bset dst,0,src to implement 
> 1<
>
>
>
>
> This patch adds a pre-reload splitter to arc.md, to use the bset (set
>
> specific bit instruction) to implement 1<
> on ARC processors that don't have a barrel shifter.
>
>
>
> Currently,
>
>
>
> int foo(int x) {
>
>   return 1 << x;
>
> }
>
>
>
> when compiled with -O2 -mcpu=em is compiled as a loop:
>
>
>
> foo:mov_s   r2,1;3
>
> and.f lp_count,r0, 0x1f
>
> lpnz2f
>
> add r2,r2,r2
>
> nop
>
> 2:  # end single insn loop
>
> j_s.d   [blink]
>
> mov_s   r0,r2   ;4
>
>
>
> with this patch we instead generate a single instruction:
>
>
>
> foo:bsetr0,0,r0
>
> j_s [blink]
>
>
>
>
>
> Finger-crossed this passes Claudiu's nightly testing.  This patch
>
> has been minimally tested by building a cross-compiler cc1 to
>
> arc-linux hosted on x86_64-pc-linux-gnu with no additional failures
>
> seen with make -k check.  Ok for mainline?  Thanks in advance.
>
>
>
>
>
> 2023-10-15  Roger Sayle  
>
>
>
> gcc/ChangeLog
>
> * config/arc/arc.md (*ashlsi3_1): New pre-reload splitter to
>
> use bset dst,0,src to implement 1<
>
>
>
>
> Cheers,
>
> Roger
>
> --
>
>


Re: [ARC PATCH] Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER.

2023-10-24 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

Your patch doesn't introduce new regressions. However, before pushing
to the mainline you need to fix some issues:
1. Please fix the trailing spaces and blocks of 8 spaces which should
be replaced with tabs. You can use check_GNU_style.py script to spot
them.
2. Please use capital letters for code iterators (i.e., any_shift_rotate).

Once the above issues are fixed, please proceed with your commit.

Thank you for your contribution,
Claudiu

On Sun, Oct 8, 2023 at 10:07 PM Roger Sayle  wrote:
>
>
> This patch completes the ARC back-end's transition to using pre-reload
> splitters for SImode shifts and rotates on targets without a barrel
> shifter.  The core part is that the shift_si3 define_insn is no longer
> needed, as shifts and rotates that don't require a loop are split
> before reload, and then because shift_si3_loop is the only caller
> of output_shift, both can be significantly cleaned up and simplified.
> The output_shift function (Claudiu's "the elephant in the room") is
> renamed output_shift_loop, which handles just the four instruction
> zero-overhead loop implementations.
>
> Aside from the clean-ups, the user visible changes are much improved
> implementations of SImode shifts and rotates on affected targets.
>
> For the function:
> unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); }
>
> GCC with -O2 -mcpu=em would previously generate:
>
> rotr_1: lsr_s r2,r0
> bmsk_s r0,r0,0
> ror r0,r0
> j_s.d   [blink]
> or_sr0,r0,r2
>
> with this patch, we now generate:
>
> j_s.d   [blink]
> ror r0,r0
>
> For the function:
> unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); }
>
> GCC with -O2 -mcpu=em would previously generate:
>
> rotr_31:
> mov_s   r2,r0   ;4
> asl_s r0,r0
> add.f 0,r2,r2
> rlc r2,0
> j_s.d   [blink]
> or_sr0,r0,r2
>
> with this patch we now generate an add.f followed by an adc:
>
> rotr_31:
> add.f   r0,r0,r0
> j_s.d   [blink]
> add.cs  r0,r0,1
>
>
> Shifts by constants requiring a loop have been improved for even counts
> by performing two operations in each iteration:
>
> int shl10(int x) { return x >> 10; }
>
> Previously looked like:
>
> shl10:  mov.f lp_count, 10
> lpnz2f
> asr r0,r0
> nop
> 2:  # end single insn loop
> j_s [blink]
>
>
> And now becomes:
>
> shl10:
> mov lp_count,5
> lp  2f
> asr r0,r0
> asr r0,r0
> 2:  # end single insn loop
> j_s [blink]
>
>
> So emulating ARC's SWAP on architectures that don't have it:
>
> unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); }
>
> previously required 10 instructions and ~70 cycles:
>
> rotr_16:
> mov_s   r2,r0   ;4
> mov.f lp_count, 16
> lpnz2f
> add r0,r0,r0
> nop
> 2:  # end single insn loop
> mov.f lp_count, 16
> lpnz2f
> lsr r2,r2
> nop
> 2:  # end single insn loop
> j_s.d   [blink]
> or_sr0,r0,r2
>
> now becomes just 4 instructions and ~18 cycles:
>
> rotr_16:
> mov lp_count,8
> lp  2f
> ror r0,r0
> ror r0,r0
> 2:  # end single insn loop
> j_s [blink]
>
>
> This patch has been tested with a cross-compiler to arc-linux hosted
> on x86_64-pc-linux-gnu and (partially) tested with the compile-only
> portions of the testsuite with no regressions.  Ok for mainline, if
> your own testing shows no issues?
>
>
> 2023-10-07  Roger Sayle  
>
> gcc/ChangeLog
> * config/arc/arc-protos.h (output_shift): Rename to...
> (output_shift_loop): Tweak API to take an explicit rtx_code.
> (arc_split_ashl): Prototype new function here.
> (arc_split_ashr): Likewise.
> (arc_split_lshr): Likewise.
> (arc_split_rotl): Likewise.
> (arc_split_rotr): Likewise.
> * config/arc/arc.cc (output_shift): Delete local prototype.  Rename.
> (output_shift_loop): New function replacing output_shift to output
> a zero overheap loop for SImode shifts and rotates on ARC targets
> without barrel shifter (i.e. no hardware support for these insns).
> (arc_split_ashl): New helper function to split *ashlsi3_nobs.
> (arc_split_ashr): New helper function to split *ashrsi3_nobs.
> (arc_split_lshr): New helper function to split *lshrsi3_nobs.
> (arc_split_rotl): New helper function to split *rotlsi3_nobs.
> (arc_split_rotr): New helper function to split *rotrsi3_nobs.
> * config/arc/arc.md (any_shift_rotate): New define_code_iterator.
> (define_code_attr insn): New code attribute to map to pattern name.
> (si3): New expander unifying previous ashlsi3,
> ashrsi3 and lshrsi3 define_expands.  Adds rotlsi3 and rotrsi3.
> (*si3_nobs): New defin

Re: [ARC PATCH] Improved SImode shifts and rotates with -mswap.

2023-10-30 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

+(define_insn "si2_cnt16"
+  [(set (match_operand:SI 0 "dest_reg_operand" "=w")

Please use "register_operand", and "r" constraint.

+(ANY_ROTATE:SI (match_operand:SI 1 "register_operand" "c")

Please use "r" constraint instead of "c".

+   (const_int 16)))]
+  "TARGET_SWAP"
+  "swap\\t%0,%1"

Otherwise, it looks good to me. Please fix the above and proceed with
your commit.

Thank you for your contribution,
Claudiu


Re: [ARC PATCH] Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

2023-10-30 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

You have a block of 8 spaces that needs to be replaced by tabs:
gcc/config/arc/arc.cc:5538:0:   if (n < 4)

Please fix the above, and proceed with your commit.

Thank you,
Claudiu

On Sun, Oct 29, 2023 at 11:16 AM Roger Sayle  wrote:
>
>
> This patch overhauls the ARC backend's insn_cost target hook, and makes
> some related improvements to rtx_costs, BRANCH_COST, etc.  The primary
> goal is to allow the backend to indicate that shifts and rotates are
> slow (discouraged) when the CPU doesn't have a barrel shifter. I should
> also acknowledge Richard Sandiford for inspiring the use of set_cost
> in this rewrite of arc_insn_cost; this implementation borrows heavily
> for the target hooks for AArch64 and ARM.
>
> The motivating example is derived from PR rtl-optimization/110717.
>
> struct S { int a : 5; };
> unsigned int foo (struct S *p) {
>   return p->a;
> }
>
> With a barrel shifter, GCC -O2 generates the reasonable:
>
> foo:ldb_s   r0,[r0]
> asl_s   r0,r0,27
> j_s.d   [blink]
> asr_s   r0,r0,27
>
> What's interesting is that during combine, the middle-end actually
> has two shifts by three bits, and a sign-extension from QI to SI.
>
> Trying 8, 9 -> 11:
> 8: r158:SI=r157:QI#0<<0x3
>   REG_DEAD r157:QI
> 9: r159:SI=sign_extend(r158:SI#0)
>   REG_DEAD r158:SI
>11: r155:SI=r159:SI>>0x3
>   REG_DEAD r159:SI
>
> Whilst it's reasonable to simplify this to two shifts by 27 bits when
> the CPU has a barrel shifter, it's actually a significant pessimization
> when these shifts are implemented by loops.  This combination can be
> prevented if the backend provides accurate-ish estimates for insn_cost.
>
>
> Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:
>
> foo:ldb_s   r0,[r0]
> mov lp_count,27
> lp  2f
> add r0,r0,r0
> nop
> 2:  # end single insn loop
> mov lp_count,27
> lp  2f
> asr r0,r0
> nop
> 2:  # end single insn loop
> j_s [blink]
>
> which contains two loops and requires about ~113 cycles to execute.
> With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:
>
> foo:ldb_s   r0,[r0]
> mov_s   r2,0;3
> add3r0,r2,r0
> sexb_s  r0,r0
> asr_s   r0,r0
> asr_s   r0,r0
> j_s.d   [blink]
> asr_s   r0,r0
>
> which requires only ~6 cycles, for the shorter shifts by 3 and sign
> extension.
>
>
> Tested with a cross-compiler to arc-linux hosted on x86_64,
> with no new (compile-only) regressions from make -k check.
> Ok for mainline if this passes Claudiu's nightly testing?
>
>
> 2023-10-29  Roger Sayle  
>
> gcc/ChangeLog
> * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
> Provide reasonable values for SHIFTS and ROTATES by constant
> bit counts depending upon TARGET_BARREL_SHIFTER.
> (arc_insn_cost): Use insn attributes if the instruction is
> recognized.  Avoid calling get_attr_length for type "multi",
> i.e. define_insn_and_split patterns without explicit type.
> Fall-back to set_rtx_cost for single_set and pattern_cost
> otherwise.
> * config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
> (BRANCH_COST): Improve/correct definition.
> (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.
>
>
> Thanks again,
> Roger
> --
>


Re: [ARC PATCH] Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.

2023-10-30 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

Do you want to say bmsk_s instead of msk_s here:
+/* { dg-final { scan-assembler "msk_s\\s+r0,r0,0" } } */

Anyhow, the patch looks good. Proceed with your commit.

Thank you,
Claudiu

On Mon, Oct 30, 2023 at 5:05 AM Jeff Law  wrote:
>
>
>
> On 10/28/23 10:47, Roger Sayle wrote:
> >
> > This patch optimizes PR middle-end/101955 for the ARC backend.  On ARC
> > CPUs with a barrel shifter, using two shifts is (probably) optimal as:
> >
> >  asl_s   r0,r0,31
> >  asr_s   r0,r0,31
> >
> > but without a barrel shifter, GCC -O2 -mcpu=em currently generates:
> >
> >  and r2,r0,1
> >  ror r2,r2
> >  add.f   0,r2,r2
> >  sbc r0,r0,r0
> >
> > with this patch, we now generate the smaller, faster and non-flags
> > clobbering:
> >
> >  bmsk_s  r0,r0,0
> >  neg_s   r0,r0
> >
> > Tested with a cross-compiler to arc-linux hosted on x86_64,
> > with no new (compile-only) regressions from make -k check.
> > Ok for mainline if this passes Claudiu's nightly testing?
> >
> >
> > 2023-10-28  Roger Sayle  
> >
> > gcc/ChangeLog
> >  PR middle-end/101955
> >  * config/arc/arc.md (*extvsi_1_0): New define_insn_and_split
> >  to convert sign extract of the least significant bit into an
> >  AND $1 then a NEG when !TARGET_BARREL_SHIFTER.
> >
> > gcc/testsuite/ChangeLog
> >  PR middle-end/101955
> >  * gcc.target/arc/pr101955.c: New test case.
> Good catch.  Looking to do something very similar on the H8 based on
> your work here.
>
> One the H8 we can use bld to load a bit from an 8 bit register into the
> C flag.  Then we use subtract with carry to get an 8 bit 0/-1 which we
> can then sign extend to 16 or 32 bits.  That covers bit positions 0..15
> of an SImode input.
>
> For bits 16..31 we can move the high half into the low half, the use the
> bld sequence.
>
> For bit zero the and+neg is the same number of clocks and size as bld
> based sequence.  But it'll simulate faster, so it's special cased.
>
>
> Jeff
>


Re: [ARC PATCH] Improve DImode left shift by a single bit.

2023-11-03 Thread Claudiu Zissulescu Ianculescu
Missed this one.

Ok, please proceed with the commit.

Thank you for your contribution,
Claudiu

On Sat, Oct 28, 2023 at 4:05 PM Roger Sayle  wrote:
>
>
> This patch improves the code generated for X << 1 (and for X + X) when
> X is 64-bit DImode, using the same two instruction code sequence used
> for DImode addition.
>
> For the test case:
>
> long long foo(long long x) { return x << 1; }
>
> GCC -O2 currently generates the following code:
>
> foo:lsr r2,r0,31
> asl_s   r1,r1,1
> asl_s   r0,r0,1
> j_s.d   [blink]
> or_sr1,r1,r2
>
> and on CPU without a barrel shifter, i.e. -mcpu=em
>
> foo:add.f   0,r0,r0
> asl_s   r1,r1
> rlc r2,0
> asl_s   r0,r0
> j_s.d   [blink]
> or_sr1,r1,r2
>
> with this patch (both with and without a barrel shifter):
>
> foo:add.f   r0,r0,r0
> j_s.d   [blink]
> adc r1,r1,r1
>
> [For Jeff Law's benefit a similar optimization is also applicable to
> H8300H, that could also use a two instruction sequence (plus rts) but
> currently GCC generates 16 instructions (plus an rts) for foo above.]
>
> Tested with a cross-compiler to arc-linux hosted on x86_64,
> with no new (compile-only) regressions from make -k check.
> Ok for mainline if this passes Claudiu's nightly testing?
>
> 2023-10-28  Roger Sayle  
>
> gcc/ChangeLog
> * config/arc/arc.md (addsi3): Fix GNU-style code formatting.
> (adddi3): Change define_expand to generate an *adddi3.
> (*adddi3): New define_insn_and_split to lower DImode additions
> during the split1 pass (after combine and before reload).
> (ashldi3): New define_expand to (only) generate *ashldi3_cnt1
> for DImode left shifts by a single bit.
> (*ashldi3_cnt1): New define_insn_and_split to lower DImode
> left shifts by one bit to an *adddi3.
>
> gcc/testsuite/ChangeLog
> * gcc.target/arc/adddi3-1.c: New test case.
> * gcc.target/arc/ashldi3-1.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] [ARC] Use hardware support for double-precision compare instructions.

2020-01-17 Thread Claudiu Zissulescu Ianculescu
It is already ported :)
https://github.com/gcc-mirror/gcc/commit/555e4a053951a0ae24835a266e71819336d7f637#diff-5b8bd26eec6c2b9f560870c205416edc

Cheers,
Claudiu

On Wed, Jan 15, 2020 at 1:49 AM Vineet Gupta  wrote:
>
> On 12/9/19 1:52 AM, Claudiu Zissulescu wrote:
> > Although the FDCMP (the double precision floating point compare 
> > instruction) is added to the compiler, it is not properly used via cstoredi 
> > pattern. Fix it.
> >
> > OK to apply?
> > Claudidu
> >
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * config/arc/arc.md (iterator SDF): Check TARGET_FP_DP_BASE.
> >   (cstoredi4): Use TARGET_HARD_FLOAT.
> > ---
> >  gcc/config/arc/arc.md | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> > index b592f25afce..bd44030b409 100644
> > --- a/gcc/config/arc/arc.md
> > +++ b/gcc/config/arc/arc.md
> > @@ -3749,7 +3749,7 @@ archs4x, archs4xd"
> >  })
> >
> >  (define_mode_iterator SDF [(SF "TARGET_FP_SP_BASE || TARGET_OPTFPE")
> > -(DF "TARGET_OPTFPE")])
> > +(DF "TARGET_FP_DP_BASE || TARGET_OPTFPE")])
> >
> >  (define_expand "cstore4"
> >[(set (reg:CC CC_REG)
> > @@ -3759,7 +3759,7 @@ archs4x, archs4xd"
> >   (match_operator:SI 1 "comparison_operator" [(reg CC_REG)
> >   (const_int 0)]))]
> >
> > -  "TARGET_FP_SP_BASE || TARGET_OPTFPE"
> > +  "TARGET_HARD_FLOAT || TARGET_OPTFPE"
> >  {
> >gcc_assert (XEXP (operands[1], 0) == operands[2]);
> >gcc_assert (XEXP (operands[1], 1) == operands[3]);
>
> Can this be backported to gcc-9 please ?
> glibc testing uses gcc-9
>
> Thx,
> -Vineet


Re: [PATCH 3/4] [ARC] Save mlo/mhi registers when ISR.

2020-01-27 Thread Claudiu Zissulescu Ianculescu
Yes, I know :(

Thank you for your help. All four patches pushed.
Claudiu

On Wed, Jan 22, 2020 at 10:31 PM Jeff Law  wrote:
>
> On Wed, 2020-01-22 at 10:14 +0200, Claudiu Zissulescu wrote:
> > ARC600 when configured with mul64 instructions uses mlo and mhi
> > registers to store the 64 result of the multiplication. In the ARC600
> > ISA documentation we have the next register configuration when ARC600
> > is configured only with mul64 extension:
> >
> > Register | Name | Use
> > -+--+
> > r57  | mlo  | Multiply low 32 bits, read only
> > r58  | mmid | Multiply middle 32 bits, read only
> > r59  | mhi  | Multiply high 32 bits, read only
> > -
> >
> > When used for Co-existence configurations we have for mul64 the next
> > registers used:
> >
> > Register | Name | Use
> > -+--+
> > r58  | mlo  | Multiply low 32 bits, read only
> > r59  | mhi  | Multiply high 32 bits, read only
> > -
> >
> > Note that mlo/mhi assignment doesn't swap when bigendian CPU
> > configuration is used.
> >
> > The compiler will always use r58 for mlo, regardless of the
> > configuration choosen to ensure mlo/mhi correct splitting. Fixing mlo
> > to the right register number is done at assembly time. The dwarf info
> > is also notified via DBX_... macro. Both mlo/mhi registers needs to
> > saved when ISR happens using a custom sequence.
> >
> > gcc/
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * config/arc/arc-protos.h (gen_mlo): Remove.
> >   (gen_mhi): Likewise.
> >   * config/arc/arc.c (AUX_MULHI): Define.
> >   (arc_must_save_reister): Special handling for r58/59.
> >   (arc_compute_frame_size): Consider mlo/mhi registers.
> >   (arc_save_callee_saves): Emit fp/sp move only when emit_move
> >   paramter is true.
> >   (arc_conditional_register_usage): Remove TARGET_BIG_ENDIAN from
> >   mlo/mhi name selection.
> >   (arc_restore_callee_saves): Don't early restore blink when ISR.
> >   (arc_expand_prologue): Add mlo/mhi saving.
> >   (arc_expand_epilogue): Add mlo/mhi restoring.
> >   (gen_mlo): Remove.
> >   (gen_mhi): Remove.
> >   * config/arc/arc.h (DBX_REGISTER_NUMBER): Correct register
> >   numbering when MUL64 option is used.
> >   (DWARF2_FRAME_REG_OUT): Define.
> >   * config/arc/arc.md (arc600_stall): New pattern.
> >   (VUNSPEC_ARC_ARC600_STALL): Define.
> >   (mulsi64): Use correct mlo/mhi registers.
> >   (mulsi_600): Clean it up.
> >   * config/arc/predicates.md (mlo_operand): Remove any dependency on
> >   TARGET_BIG_ENDIAN.
> >   (mhi_operand): Likewise.
> >
> > testsuite/
> > -xx-xx  Claudiu Zissulescu  
> >   * gcc.target/arc/code-density-flag.c: Update test.
> >   * gcc.target/arc/interrupt-6.c: Likewise.
> Ugh.  But OK.
>
> jeff
> >
>


Re: [PATCH 2/4] [ARC] Use TARGET_INSN_COST.

2020-02-04 Thread Claudiu Zissulescu Ianculescu
> My only worry would be asking for the length early in the RTL pipeline
> may not be as accurate, but it's supposed to work, so if you're
> comfortable with the end results, then OK.
>
Indeed, the length is not accurate, but the results seem slightly
better than using COST_RTX. Using INSN_COSTS seems to me a better
manageable way in controlling what combiner does.
Anyhow, for ARC the instruction size is accurate quite late in the
compilation process as it needs register and immediate value info :(

Thank you for you review,
Claudiu


Re: [PATCH] [ARC] Use hardware support for double-precision compare instructions.

2019-12-12 Thread Claudiu Zissulescu Ianculescu
Thank you for your review. Patch pushed to mainline and gcc9 branch.

//Claudiu

On Wed, Dec 11, 2019 at 8:59 PM Jeff Law  wrote:
>
> On Mon, 2019-12-09 at 11:52 +0200, Claudiu Zissulescu wrote:
> > Although the FDCMP (the double precision floating point compare
> > instruction) is added to the compiler, it is not properly used via
> > cstoredi pattern. Fix it.
> >
> > OK to apply?
> > Claudidu
> >
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * config/arc/arc.md (iterator SDF): Check TARGET_FP_DP_BASE.
> >   (cstoredi4): Use TARGET_HARD_FLOAT.
> OK
> jeff
>


Re: [PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float comparisons

2019-12-12 Thread Claudiu Zissulescu Ianculescu
Pushed. Thank you for your contribution,
Claudiu

On Wed, Dec 11, 2019 at 12:47 AM Vineet Gupta
 wrote:
>
> On 12/10/19 1:12 AM, Claudiu Zissulescu wrote:
> > Hi,
> >
> > Thank you for your contribution, I'll push it asap. As far as I understand, 
> > you need this patch both in gcc9 branch and mainline.
> >
> > Cheers,
> > Claudiu
>
> Indeed both mainline and gcc9
>
> Thx
> -Vineet
>
> >
> >> -Original Message-
> >> From: Vineet Gupta [mailto:vgu...@synopsys.com]
> >> Sent: Monday, December 09, 2019 8:02 PM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Claudiu Zissulescu ;
> >> andrew.burg...@embecosm.com; linux-snps-...@lists.infradead.org;
> >> Vineet Gupta 
> >> Subject: [PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float
> >> comparisons
> >>
> >> ARC gcc generates FDCMP instructions which raises Invalid operation for
> >> signaling NaN only. This causes glibc iseqsig() primitives to fail (in
> >> the current ongoing glibc port to ARC)
> >>
> >> So split up the hard float compares into two categories and for unordered
> >> compares generate the FDCMPF instruction (vs. FDCMP) which raises
> >> exception
> >> for either NaNs.
> >>
> >> With this fix testsuite/gcc.dg/torture/pr52451.c passes for ARC.
> >>
> >> Also passes 6 additional tests in glibc testsuite (test*iseqsig) and no
> >> regressions
> >>
> >> gcc/
> >> -xx-xx  Vineet Gupta  
> >>
> >>  * config/arc/arc-modes.def (CC_FPUE): New Mode CC_FPUE which
> >>  helps codegen generate exceptions even for quiet NaN.
> >>  * config/arc/arc.c (arc_init_reg_tables): Handle New CC_FPUE mode.
> >>  (get_arc_condition_code): Likewise.
> >>  (arc_select_cc_mode): LT, LE, GT, GE to use the New CC_FPUE
> >> mode.
> >>  * config/arc/arc.h (REVERSE_CONDITION): Handle New CC_FPUE
> >> mode.
> >>  * config/arc/predicates.md (proper_comparison_operator):
> >> Likewise.
> >>  * config/arc/fpu.md (cmpsf_fpu_trap): New Pattern for CC_FPUE.
> >>  (cmpdf_fpu_trap): Likewise.
> >>
> >> Signed-off-by: Vineet Gupta 
> >> ---
> >>  gcc/config/arc/arc-modes.def |  1 +
> >>  gcc/config/arc/arc.c |  8 ++--
> >>  gcc/config/arc/arc.h |  2 +-
> >>  gcc/config/arc/fpu.md| 24 
> >>  gcc/config/arc/predicates.md |  1 +
> >>  5 files changed, 33 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def
> >> index 36a2f4abfb25..d16b6a289a15 100644
> >> --- a/gcc/config/arc/arc-modes.def
> >> +++ b/gcc/config/arc/arc-modes.def
> >> @@ -38,4 +38,5 @@ VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI
> >> */
> >>
> >>  /* FPU condition flags.  */
> >>  CC_MODE (CC_FPU);
> >> +CC_MODE (CC_FPUE);
> >>  CC_MODE (CC_FPU_UNEQ);
> >> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> >> index 28305f459dcd..cbb95d6e9043 100644
> >> --- a/gcc/config/arc/arc.c
> >> +++ b/gcc/config/arc/arc.c
> >> @@ -1564,6 +1564,7 @@ get_arc_condition_code (rtx comparison)
> >>  default : gcc_unreachable ();
> >>  }
> >>  case E_CC_FPUmode:
> >> +case E_CC_FPUEmode:
> >>switch (GET_CODE (comparison))
> >>  {
> >>  case EQ: return ARC_CC_EQ;
> >> @@ -1686,11 +1687,13 @@ arc_select_cc_mode (enum rtx_code op, rtx x,
> >> rtx y)
> >>case UNLE:
> >>case UNGT:
> >>case UNGE:
> >> +return CC_FPUmode;
> >> +
> >>case LT:
> >>case LE:
> >>case GT:
> >>case GE:
> >> -return CC_FPUmode;
> >> +return CC_FPUEmode;
> >>
> >>case LTGT:
> >>case UNEQ:
> >> @@ -1844,7 +1847,7 @@ arc_init_reg_tables (void)
> >>if (i == (int) CCmode || i == (int) CC_ZNmode || i == (int) CC_Zmode
> >>|| i == (int) CC_Cmode
> >>|| i == CC_FP_GTmode || i == CC_FP_GEmode || i ==
> >> CC_FP_ORDmode
> >> -  || i == CC_FPUmode || i == CC_FPU_UNEQmode)
> >> +  || i == CC_FPUmode || i == CC_FPUEmode || i ==
> >> CC_FPU_UNEQmode)
> >>  arc_mode_class[i] = 1 << (int) C_MODE;
> >>else
> >>  arc_mode_class[i] = 0;
> >> @@ -8401,6 +8404,7 @@ arc_reorg (void)
> >>
> >>/* Avoid FPU instructions.  */
> >>if ((GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUmode)
> >> +  || (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUEmode)
> >>|| (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) ==
> >> CC_FPU_UNEQmode))
> >>  continue;
> >>
> >> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> >> index 4d7ac3281b41..c08ca3d0d432 100644
> >> --- a/gcc/config/arc/arc.h
> >> +++ b/gcc/config/arc/arc.h
> >> @@ -1531,7 +1531,7 @@ enum arc_function_type {
> >>(((MODE) == CC_FP_GTmode || (MODE) == CC_FP_GEmode \
> >>  || (MODE) == CC_FP_UNEQmode || (MODE) == CC_FP_ORDmode   \
> >>  || (MODE) == CC_FPXmode || (MODE) == CC_FPU_UNEQmode \
> >> -|| (MODE) == CC_FPUmode) \
> >> +|| (MODE) == C

Re: [committed] arc: Remove mlra option [PR113954]

2024-09-24 Thread Claudiu Zissulescu Ianculescu
I'll include your comment in my second patch where I clean some
patterns used by reload.

Thank you,
claudiu

On Mon, Sep 23, 2024 at 5:05 PM Andreas Schwab  wrote:
>
> On Sep 23 2024, Claudiu Zissulescu wrote:
>
> > diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
> > index c800226b179..a225adeff57 100644
> > --- a/gcc/config/arc/arc.cc
> > +++ b/gcc/config/arc/arc.cc
> > @@ -721,7 +721,7 @@ static rtx arc_legitimize_address_0 (rtx, rtx, 
> > machine_mode mode);
> >arc_no_speculation_in_delay_slots_p
> >
> >  #undef TARGET_LRA_P
> > -#define TARGET_LRA_P arc_lra_p
> > +#define TARGET_LRA_P hook_bool_void_true
>
> This is the default for lra_p, so you can remove the override.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."


Re: [PATCH] arc: testsuite: Scan "rlc" instead of "mov.hs".

2025-03-19 Thread Claudiu Zissulescu Ianculescu
LGTM. I'll merge it once stage one is open.

Cheers,
Claudiu

On Tue, Mar 18, 2025 at 6:23 PM Luis Silva  wrote:
>
> Due to the patch by Roger Sayle,
> 09881218137f4af9b7c894c2d350cf2ff8e0ee23, which
> introduces the use of the `rlc rX,0` instruction in place
> of the `mov.hs`, the add overflow test case needs to be
> updated.  The previous test case was validating the `mov.hs`
> instruction, but now it must validate the `rlc` instruction
> as the new behavior.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arc/overflow-1.c: Replace mov.hs with rlc.
>
> Signed-off-by: Luis Silva 
> ---
>  gcc/testsuite/gcc.target/arc/overflow-1.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arc/overflow-1.c 
> b/gcc/testsuite/gcc.target/arc/overflow-1.c
> index 01b3e8ad0fa..694c25cfe66 100644
> --- a/gcc/testsuite/gcc.target/arc/overflow-1.c
> +++ b/gcc/testsuite/gcc.target/arc/overflow-1.c
> @@ -31,9 +31,8 @@ bool addi_overflow (int32_t a, int32_t *res)
>  /*
>   * add.f  r0,r0,r1
>   * st_s   r0,[r2]
> - * mov_s  r0,1
>   * j_s.d  [blink]
> - * mov.hs r0,0
> + * rlcr0,0
>   */
>  bool uadd_overflow (uint32_t a, uint32_t b, uint32_t *res)
>  {
> @@ -75,9 +74,8 @@ bool addi_overflow_p (int32_t a, int32_t res)
>
>  /*
>   * add.f   0,r0,r1
> - * mov_s   r0,1
>   * j_s.d   [blink]
> - * mov.hs  r0,0
> + * rlc r0,0
>   */
>  bool uadd_overflow_p (uint32_t a, uint32_t b, uint32_t res)
>  {
> @@ -95,6 +93,6 @@ bool uaddi_overflow_p (uint32_t a, uint32_t res)
>
>  /* { dg-final { scan-assembler-times "add.f\\s\+"   7 } } */
>  /* { dg-final { scan-assembler-times "mov\.nv\\s\+" 4 } } */
> -/* { dg-final { scan-assembler-times "mov\.hs\\s\+" 2 } } */
> +/* { dg-final { scan-assembler-times "rlc\\s\+" 2 } } */
>  /* { dg-final { scan-assembler-times "seths\\s\+"   2 } } */
>  /* { dg-final { scan-assembler-not   "cmp" } } */
> --
> 2.37.1
>


Re: [PATCH 1/2] arc: Add commutative multiplication patterns.

2025-03-19 Thread Claudiu Zissulescu Ianculescu
LGTM, I'll merge it once stage 1 is open.

Cheers,
Claudiu

On Tue, Mar 18, 2025 at 6:22 PM Luis Silva  wrote:
>
> This patch introduces two new instruction patterns:
>
> `*mulsi3_cmp0`:  This pattern performs a multiplication
> and sets the CC_Z register based on the result, while
> also storing the result of the multiplication in a
> general-purpose register.
>
> `*mulsi3_cmp0_noout`:  This pattern performs a
> multiplication and sets the CC_Z register based on the
> result without storing the result in a general-purpose
> register.
>
> These patterns are optimized to generate code using the `mpy.f`
> instruction, specifically used where the result is compared to zero.
>
> In addition, the previous commutative multiplication implementation
> was removed.  It incorrectly took into account the negative flag,
> which is wrong.  This new implementation only considers the zero
> flag.
>
> A test case has been added to verify the correctness of these
> changes.
>
> gcc/ChangeLog:
>
> * config/arc/arc.cc (arc_select_cc_mode): Handle multiplication
> results compared against zero, selecting CC_Zmode.
> * config/arc/arc.md (*mulsi3_cmp0): New define_insn.
> (*mulsi3_cmp0_noout): New define_insn.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arc/mult-cmp0.c: New test.
>
> Signed-off-by: Luis Silva 
> ---
>  gcc/config/arc/arc.cc|  7 +++
>  gcc/config/arc/arc.md| 34 ++--
>  gcc/testsuite/gcc.target/arc/mult-cmp0.c | 66 
>  3 files changed, 103 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/mult-cmp0.c
>
> diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
> index e3d53576768..8ad5649adc0 100644
> --- a/gcc/config/arc/arc.cc
> +++ b/gcc/config/arc/arc.cc
> @@ -1555,6 +1555,13 @@ arc_select_cc_mode (enum rtx_code op, rtx x, rtx y)
>machine_mode mode = GET_MODE (x);
>rtx x1;
>
> +  /* Matches all instructions which can do .f and clobbers only Z flag.  */
> +  if (GET_MODE_CLASS (mode) == MODE_INT
> +  && y == const0_rtx
> +  && GET_CODE (x) == MULT
> +  && (op == EQ || op == NE))
> +return CC_Zmode;
> +
>/* For an operation that sets the condition codes as a side-effect, the
>   C and V flags is not set as for cmp, so we can only use comparisons 
> where
>   this doesn't matter.  (For LT and GE we can use "mi" and "pl"
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 49dfc9d35af..bc2e8fadd91 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -253,7 +253,7 @@
> simd_vcompare, simd_vpermute, simd_vpack, simd_vpack_with_acc,
> simd_valign, simd_valign_with_acc, simd_vcontrol,
> simd_vspecial_3cycle, simd_vspecial_4cycle, simd_dma, mul16_em, div_rem,
> -   fpu, fpu_fuse, fpu_sdiv, fpu_ddiv, fpu_cvt, block"
> +   fpu, fpu_fuse, fpu_sdiv, fpu_ddiv, fpu_cvt, block, mpy"
>(cond [(eq_attr "is_sfunc" "yes")
>  (cond [(match_test "!TARGET_LONG_CALLS_SET && (!TARGET_MEDIUM_CALLS 
> || GET_CODE (PATTERN (insn)) != COND_EXEC)") (const_string "call")
> (match_test "flag_pic") (const_string "sfunc")]
> @@ -1068,11 +1068,37 @@ archs4x, archs4xd"
> (set_attr "cond" "set_zn")
> (set_attr "length" "*,4,4,4,8")])
>
> -;; The next two patterns are for plos, ior, xor, and, and mult.
> +(define_insn "*mulsi3_cmp0"
> +  [(set (reg:CC_Z CC_REG)
> +   (compare:CC_Z
> +(mult:SI
> + (match_operand:SI 1 "register_operand"  "%r,0,r")
> + (match_operand:SI 2 "nonmemory_operand" "rL,I,i"))
> +(const_int 0)))
> +   (set (match_operand:SI 0 "register_operand""=r,r,r")
> +   (mult:SI (match_dup 1) (match_dup 2)))]
> + "TARGET_MPY"
> + "mpy%?.f\\t%0,%1,%2"
> + [(set_attr "length" "4,4,8")
> +  (set_attr "type" "mpy")])
> +
> +(define_insn "*mulsi3_cmp0_noout"
> +  [(set (reg:CC_Z CC_REG)
> +   (compare:CC_Z
> +(mult:SI
> + (match_operand:SI 0 "register_operand"   "%r,r,r")
> + (match_operand:SI 1 "nonmemory_operand"  "rL,I,i"))
> +(const_int 0)))]
> + "TARGET_MPY"
> + "mpy%?.f\\t0,%0,%1"
> + [(set_attr "length" "4,4,8")
> +  (set_attr "type" "mpy")])
> +
> +;; The next two patterns are for plus, ior, xor, and.
>  (define_insn "*commutative_binary_cmp0_noout"
>[(set (match_operand 0 "cc_set_register" "")
> (match_operator 4 "zn_compare_operator"
> - [(match_operator:SI 3 "commutative_operator"
> + [(match_operator:SI 3 "commutative_operator_sans_mult"
>  [(match_operand:SI 1 "register_operand" "%r,r")
>   (match_operand:SI 2 "nonmemory_operand" "rL,Cal")])
>(const_int 0)]))]
> @@ -1085,7 +,7 @@ archs4x, archs4xd"
>  (define_insn "*commutative_binary_cmp0"
>[(set (match_operand 3 "cc_set_register" "")
> (match_operator 5 "zn_compare_operator"
> - [(match_operator:SI 4 "commutative_opera

Re: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow ()

2025-03-19 Thread Claudiu Zissulescu Ianculescu
LGTM,

Cheers,
Claudiu

On Tue, Mar 18, 2025 at 6:23 PM Luis Silva  wrote:
>
> This patch handles both signed and unsigned
> builtin multiplication overflow.
>
> Uses the "mpy.f" instruction to set the condition
> codes based on the result.  In the event of an
> overflow, the V flag is set, triggering a
> conditional move depending on the V flag status.
>
> For example, set "1" to "r0" in case of overflow:
>
> mov_s   r0,1
> mpy.f   r0,r0,r1
> j_s.d   [blink]
> mov.nv  r0,0
>
> gcc/ChangeLog:
>
> * config/arc/arc.md (mulvsi4): New define_expand.
> (mulsi3_Vcmp): New define_insn.
>
> Signed-off-by: Luis Silva 
> ---
>  gcc/config/arc/arc.md | 33 +
>  1 file changed, 33 insertions(+)
>
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index bc2e8fadd91..dd245d1813c 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -842,6 +842,9 @@ archs4x, archs4xd"
>  ; Optab prefix for sign/zero-extending operations
>  (define_code_attr su_optab [(sign_extend "") (zero_extend "u")])
>
> +;; Code iterator for sign/zero extension
> +(define_code_iterator ANY_EXTEND [sign_extend zero_extend])
> +
>  (define_insn "*xt_cmp0_noout"
>[(set (match_operand 0 "cc_set_register" "")
> (compare:CC_ZN (SEZ:SI (match_operand:SQH 1 "register_operand" "r"))
> @@ -1068,6 +1071,36 @@ archs4x, archs4xd"
> (set_attr "cond" "set_zn")
> (set_attr "length" "*,4,4,4,8")])
>
> +(define_expand "mulvsi4"
> +  [(ANY_EXTEND:DI (match_operand:SI 0 "register_operand"))
> +   (ANY_EXTEND:DI (match_operand:SI 1 "register_operand"))
> +   (ANY_EXTEND:DI (match_operand:SI 2 "register_operand"))
> +   (label_ref (match_operand 3 "" ""))]
> +  "TARGET_MPY"
> +  {
> +emit_insn (gen_mulsi3_Vcmp (operands[0], operands[1],
> + operands[2]));
> +arc_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
> +DONE;
> +  })
> +
> +(define_insn "mulsi3_Vcmp"
> +  [(parallel
> +[(set
> +  (reg:CC_V CC_REG)
> +  (compare:CC_V
> +   (mult:DI
> +   (ANY_EXTEND:DI (match_operand:SI 1 "register_operand"  "%0,r,r,r"))
> +   (ANY_EXTEND:DI (match_operand:SI 2 "nonmemory_operand"  "I,L,r,C32")))
> +   (ANY_EXTEND:DI (mult:SI (match_dup 1) (match_dup 2)
> + (set (match_operand:SI 0 "register_operand"  "=r,r,r,r")
> + (mult:SI (match_dup 1) (match_dup 2)))])]
> +  "register_operand (operands[1], SImode)
> +   || register_operand (operands[2], SImode)"
> +  "mpy.f\\t%0,%1,%2"
> +  [(set_attr "length" "4,4,4,8")
> +   (set_attr "type"   "mpy")])
> +
>  (define_insn "*mulsi3_cmp0"
>[(set (reg:CC_Z CC_REG)
> (compare:CC_Z
> --
> 2.37.1
>


Re: [PATCH] arc: testsuite: Scan "rlc" instead of "mov.hs".

2025-04-24 Thread Claudiu Zissulescu Ianculescu
Hi Jeff,

There is one patch missing, I'll add it to mainline as soon as the
main is open for commits.

Best,
Claudiu

On Fri, Apr 18, 2025 at 12:10 AM Jeff Law  wrote:
>
>
>
> On 3/18/25 10:23 AM, Luis Silva wrote:
> > Due to the patch by Roger Sayle,
> > 09881218137f4af9b7c894c2d350cf2ff8e0ee23, which
> > introduces the use of the `rlc rX,0` instruction in place
> > of the `mov.hs`, the add overflow test case needs to be
> > updated.  The previous test case was validating the `mov.hs`
> > instruction, but now it must validate the `rlc` instruction
> > as the new behavior.
> >
> > gcc/testsuite/ChangeLog:
> >
> >  * gcc.target/arc/overflow-1.c: Replace mov.hs with rlc.
> I don't see any test named "overflow-1.c" in the arc subdirectory?!?
>
> Is it possible that's a change in your local repo?
>
> jeff


Re: [PATCH 1/2] arc: Add commutative multiplication patterns.

2025-04-24 Thread Claudiu Zissulescu Ianculescu
Hi Jeff,

Indeed, Luis should have been using "umulti". The other attributes are
not required. I'll fix it before pushing to the mainline.

Thanks,
Claudiu

On Fri, Apr 18, 2025 at 8:41 PM Jeff Law  wrote:
>
>
>
> On 3/18/25 10:22 AM, Luis Silva wrote:
> > This patch introduces two new instruction patterns:
> >
> >  `*mulsi3_cmp0`:  This pattern performs a multiplication
> >  and sets the CC_Z register based on the result, while
> >  also storing the result of the multiplication in a
> >  general-purpose register.
> >
> >  `*mulsi3_cmp0_noout`:  This pattern performs a
> >  multiplication and sets the CC_Z register based on the
> >  result without storing the result in a general-purpose
> >  register.
> >
> > These patterns are optimized to generate code using the `mpy.f`
> > instruction, specifically used where the result is compared to zero.
> >
> > In addition, the previous commutative multiplication implementation
> > was removed.  It incorrectly took into account the negative flag,
> > which is wrong.  This new implementation only considers the zero
> > flag.
> >
> > A test case has been added to verify the correctness of these
> > changes.
> >
> > gcc/ChangeLog:
> >
> >  * config/arc/arc.cc (arc_select_cc_mode): Handle multiplication
> >  results compared against zero, selecting CC_Zmode.
> >  * config/arc/arc.md (*mulsi3_cmp0): New define_insn.
> >  (*mulsi3_cmp0_noout): New define_insn.
> >
> > gcc/testsuite/ChangeLog:
> >
> >  * gcc.target/arc/mult-cmp0.c: New test.
> So I'm not well versed in the ARC port, but a couple questions.
>
> First your new patterns use a new type "mpy".  Do you want/need to add
> that to the pipeline descriptions?  It would seem advisable to do so.
>
> Do the new patterns need to set "cond" and "predicable" attributes?
>
> Jeff


Fwd: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow ()

2025-04-24 Thread Claudiu Zissulescu Ianculescu
Adding missing email addresses.

-- Forwarded message -
From: Claudiu Zissulescu Ianculescu 
Date: Thu, Apr 24, 2025 at 8:48 PM
Subject: Re: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow ()
To: Jeff Law 


Hi Jeff,

The other attributes are not required as the pattern doesn't allow it
to be used in a predicated execution.  Thus, the default values for
the missing predicates are ok.

Best,
Claudiu

On Fri, Apr 18, 2025 at 8:43 PM Jeff Law  wrote:
>
>
>
> On 3/18/25 10:23 AM, Luis Silva wrote:
> > This patch handles both signed and unsigned
> > builtin multiplication overflow.
> >
> > Uses the "mpy.f" instruction to set the condition
> > codes based on the result.  In the event of an
> > overflow, the V flag is set, triggering a
> > conditional move depending on the V flag status.
> >
> > For example, set "1" to "r0" in case of overflow:
> >
> >   mov_s   r0,1
> >   mpy.f   r0,r0,r1
> >   j_s.d   [blink]
> >   mov.nv  r0,0
> >
> > gcc/ChangeLog:
> >
> >  * config/arc/arc.md (mulvsi4): New define_expand.
> >  (mulsi3_Vcmp): New define_insn.
> So similar to your other patch, there are other attributes (cond and
> predicable) that you may need to set.  I just don't know the port well
> enough to judge that.
>
> jeff
>


Re: [PATCH] Avoid depending on destructor order

2022-09-26 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Hi Thomas,

This change prohibits compiling of ARC backend:

> +  gcc_assert (in_shutdown || ob);

in_shutdown is only defined when ATOMIC_FDE_FAST_PATH is defined,
while gcc_assert is outside of any ifdef. Please can you revisit this
line and change it accordingly.

Thanks,
Claudiu


Re: [PATCH] Avoid depending on destructor order

2022-09-26 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Thanks, I haven't observed it.

Waiting for it,
Claudiu

On Mon, Sep 26, 2022 at 2:49 PM Thomas Neumann  wrote:
>
> Hi Claudiu,
>
> > This change prohibits compiling of ARC backend:
> >
> >> +  gcc_assert (in_shutdown || ob);
> >
> > in_shutdown is only defined when ATOMIC_FDE_FAST_PATH is defined,
> > while gcc_assert is outside of any ifdef. Please can you revisit this
> > line and change it accordingly.
>
> I have a patch ready, I am waiting for someone to approve my patch:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602130.html
>
> Best
>
> Thomas


Re: [committed] arc: Fail conditional move expand patterns

2022-02-28 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Hi Robin,

I don't know how I missed your arc related patch, I'll bootstrap and test
your patch asap.

Thanks,
Claudiu


On Fri, Feb 25, 2022 at 3:29 PM Robin Dapp  wrote:

> > If the movcc comparison is not valid it triggers an assert in the
> > current implementation.  This behavior is not needed as we can FAIL
> > the movcc expand pattern.
>
> In case of a MODE_CC comparison you can also just return it as described
> here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104154
>
> or here:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590639.html
>
> If there already is a "CC comparison" the backend does not need to
> create one and ifcvt can make use of this, creating better sequences.
>
> Regards
>  Robin
>


Re: [PATCH] arc: Fix for new ifcvt behavior [PR104154]

2022-02-28 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Hi Robin,

The patch looks good. Please go ahead and merge it, please let me know if
you cannot.

Thank you,
Claudiu

On Mon, Feb 21, 2022 at 9:57 AM Robin Dapp via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi,
>
> I figured I'd just go ahead and post this patch as well since it seems
> to have fixed the arc build problems.
>
> It would be nice if someone could bootstrap/regtest if Jeff hasn't
> already done so.  I was able to verify that the two testcases attached
> to the PR build cleanly but not much more.  Thank you.
>
> Regards
>  Robin
>
> --
>
> PR104154
>
> gcc/ChangeLog:
>
> * config/arc/arc.cc (gen_compare_reg):  Return the CC-mode
> comparison ifcvt passed us.
>
> ---
>
> From fa98a40abd55e3a10653f6a8c5b2414a2025103b Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Mon, 7 Feb 2022 08:39:41 +0100
> Subject: [PATCH] arc: Fix for new ifcvt behavior [PR104154]
>
> ifcvt now passes a CC-mode "comparison" to backends.  This patch
> simply returns from gen_compare_reg () in that case since nothing
> needs to be prepared anymore.
>
> PR104154
>
> gcc/ChangeLog:
>
> * config/arc/arc.cc (gen_compare_reg):  Return the CC-mode
> comparison ifcvt passed us.
> ---
>  gcc/config/arc/arc.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
> index 8cc173519ab..5e40ec2c04d 100644
> --- a/gcc/config/arc/arc.cc
> +++ b/gcc/config/arc/arc.cc
> @@ -2254,6 +2254,12 @@ gen_compare_reg (rtx comparison, machine_mode omode)
>
>
>cmode = GET_MODE (x);
> +
> +  /* If ifcvt passed us a MODE_CC comparison we can
> + just return it.  It should be in the proper form already.   */
> +  if (GET_MODE_CLASS (cmode) == MODE_CC)
> +return comparison;
> +
>if (cmode == VOIDmode)
>  cmode = GET_MODE (y);
>gcc_assert (cmode == SImode || cmode == SFmode || cmode == DFmode);
> --
> 2.31.1
>
>


Re: [PATCH 1/2] ARC: Use intrinsics for __builtin_add_overflow*()

2023-09-07 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Ok.

Thank you for your contribution,
Claudiu

On Wed, Sep 6, 2023 at 3:50 PM Shahab Vahedi  wrote:
>
> This patch covers signed and unsigned additions.  The generated code
> would be something along these lines:
>
> signed:
>   add.f   r0, r1, r2
>   b.v @label
>
> unsigned:
>   add.f   r0, r1, r2
>   b.c @label
>
> gcc/ChangeLog:
>
> * config/arc/arc-modes.def: Add CC_V mode.
> * config/arc/predicates.md (proper_comparison_operator): Handle
> E_CC_Vmode.
> (equality_comparison_operator): Exclude CC_Vmode from eq/ne.
> (cc_set_register): Handle CC_Vmode.
> (cc_use_register): Likewise.
> * config/arc/arc.md (addsi3_v): New insn.
> (addvsi4): New expand.
> (addsi3_c): New insn.
> (uaddvsi4): New expand.
> * config/arc/arc-protos.h (arc_gen_unlikely_cbranch): New.
> * config/arc/arc.cc (arc_gen_unlikely_cbranch): New.
> (get_arc_condition_code): Handle E_CC_Vmode.
> (arc_init_reg_tables): Handle CC_Vmode.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arc/overflow-1.c: New.
>
> Signed-off-by: Shahab Vahedi 
> ---
>  gcc/config/arc/arc-modes.def  |   1 +
>  gcc/config/arc/arc-protos.h   |   1 +
>  gcc/config/arc/arc.cc |  26 +-
>  gcc/config/arc/arc.md |  49 +++
>  gcc/config/arc/predicates.md  |  14 ++-
>  gcc/testsuite/gcc.target/arc/overflow-1.c | 100 ++
>  6 files changed, 187 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/overflow-1.c
>
> diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def
> index 763e880317d..69eeec5935a 100644
> --- a/gcc/config/arc/arc-modes.def
> +++ b/gcc/config/arc/arc-modes.def
> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  CC_MODE (CC_ZN);
>  CC_MODE (CC_Z);
> +CC_MODE (CC_V);
>  CC_MODE (CC_C);
>  CC_MODE (CC_FP_GT);
>  CC_MODE (CC_FP_GE);
> diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
> index 4f2db7ffb59..bc78fb0b370 100644
> --- a/gcc/config/arc/arc-protos.h
> +++ b/gcc/config/arc/arc-protos.h
> @@ -50,6 +50,7 @@ extern bool arc_check_mov_const (HOST_WIDE_INT );
>  extern bool arc_split_mov_const (rtx *);
>  extern bool arc_can_use_return_insn (void);
>  extern bool arc_split_move_p (rtx *);
> +extern void arc_gen_unlikely_cbranch (enum rtx_code, machine_mode, rtx);
>  #endif /* RTX_CODE */
>
>  extern bool arc_ccfsm_branch_deleted_p (void);
> diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
> index f8c9bf17e2c..ec93d40aeb9 100644
> --- a/gcc/config/arc/arc.cc
> +++ b/gcc/config/arc/arc.cc
> @@ -1538,6 +1538,13 @@ get_arc_condition_code (rtx comparison)
> case GEU : return ARC_CC_NC;
> default : gcc_unreachable ();
> }
> +case E_CC_Vmode:
> +  switch (GET_CODE (comparison))
> +   {
> +   case EQ : return ARC_CC_NV;
> +   case NE : return ARC_CC_V;
> +   default : gcc_unreachable ();
> +   }
>  case E_CC_FP_GTmode:
>if (TARGET_ARGONAUT_SET && TARGET_SPFP)
> switch (GET_CODE (comparison))
> @@ -1868,7 +1875,7 @@ arc_init_reg_tables (void)
>   /* mode_class hasn't been initialized yet for EXTRA_CC_MODES, so
>  we must explicitly check for them here.  */
>   if (i == (int) CCmode || i == (int) CC_ZNmode || i == (int) CC_Zmode
> - || i == (int) CC_Cmode
> + || i == (int) CC_Cmode || i == (int) CC_Vmode
>   || i == CC_FP_GTmode || i == CC_FP_GEmode || i == CC_FP_ORDmode
>   || i == CC_FPUmode || i == CC_FPUEmode || i == CC_FPU_UNEQmode)
> arc_mode_class[i] = 1 << (int) C_MODE;
> @@ -11852,6 +11859,23 @@ arc_libm_function_max_error (unsigned cfn, 
> machine_mode mode,
>return default_libm_function_max_error (cfn, mode, boundary_p);
>  }
>
> +/* Generate RTL for conditional branch with rtx comparison CODE in mode
> +   CC_MODE.  */
> +
> +void
> +arc_gen_unlikely_cbranch (enum rtx_code cmp, machine_mode cc_mode, rtx label)
> +{
> +  rtx cc_reg, x;
> +
> +  cc_reg = gen_rtx_REG (cc_mode, CC_REG);
> +  label = gen_rtx_LABEL_REF (VOIDmode, label);
> +
> +  x = gen_rtx_fmt_ee (cmp, VOIDmode, cc_reg, const0_rtx);
> +  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, label, pc_rtx);
> +
> +  emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
> +}
> +
>  #undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
>  #define TARGET_USE_ANCHORS_FOR_SYMBOL_P arc_use_anchors_for_symbol_p
>
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index d37ecbf4292..9d011f6b4a9 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -2725,6 +2725,55 @@ archs4x, archs4xd"
>   }
>")
>
> +(define_insn "addsi3_v"
> + [(set (match_operand:SI 0 "register_operand"  "=r,r,r,  r")
> +   (plus:SI (match_operand:SI 1 "register_operand"   "r,r,0,  r")
> +   (match_operand:SI 2 "nonm

Re: [PATCH 2/2] ARC: Use intrinsics for __builtin_sub_overflow*()

2023-09-07 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
OK,

Thank you for your contribution,
Claudiu

On Wed, Sep 6, 2023 at 3:50 PM Shahab Vahedi  wrote:
>
> This patch covers signed and unsigned subtractions.  The generated code
> would be something along these lines:
>
> signed:
>   sub.f   r0, r1, r2
>   b.v @label
>
> unsigned:
>   sub.f   r0, r1, r2
>   b.c @label
>
> gcc/ChangeLog:
>
> * config/arc/arc.md (subsi3_v): New insn.
> (subvsi4): New expand.
> (subsi3_c): New insn.
> (usubvsi4): New expand.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arc/overflow-2.c: New.
>
> Signed-off-by: Shahab Vahedi 
> ---
>  gcc/config/arc/arc.md | 48 +++
>  gcc/testsuite/gcc.target/arc/overflow-2.c | 97 +++
>  2 files changed, 145 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arc/overflow-2.c
>
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 9d011f6b4a9..34e9e1a7f1d 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -2973,6 +2973,54 @@ archs4x, archs4xd"
>(set_attr "cpu_facility" "*,cd,*,*,*,*,*,*,*,*")
>])
>
> +(define_insn "subsi3_v"
> +  [(set (match_operand:SI  0 "register_operand"  "=r,r,r,  r")
> +   (minus:SI (match_operand:SI 1 "register_operand"   "r,r,0,  r")
> + (match_operand:SI 2 "nonmemory_operand"  "r,L,I,C32")))
> +   (set (reg:CC_V CC_REG)
> +   (compare:CC_V (sign_extend:DI (minus:SI (match_dup 1)
> +   (match_dup 2)))
> + (minus:DI (sign_extend:DI (match_dup 1))
> +   (sign_extend:DI (match_dup 2)]
> +   ""
> +   "sub.f\\t%0,%1,%2"
> +   [(set_attr "cond"   "set")
> +(set_attr "type"   "compare")
> +(set_attr "length" "4,4,4,8")])
> +
> +(define_expand "subvsi4"
> + [(match_operand:SI 0 "register_operand")
> +  (match_operand:SI 1 "register_operand")
> +  (match_operand:SI 2 "nonmemory_operand")
> +  (label_ref (match_operand 3 "" ""))]
> +  ""
> +  "emit_insn (gen_subsi3_v (operands[0], operands[1], operands[2]));
> +   arc_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
> +   DONE;")
> +
> +(define_insn "subsi3_c"
> +  [(set (match_operand:SI  0 "register_operand"  "=r,r,r,  r")
> +   (minus:SI (match_operand:SI 1 "register_operand"   "r,r,0,  r")
> + (match_operand:SI 2 "nonmemory_operand"  "r,L,I,C32")))
> +   (set (reg:CC_C CC_REG)
> +   (compare:CC_C (match_dup 1)
> + (match_dup 2)))]
> +   ""
> +   "sub.f\\t%0,%1,%2"
> +   [(set_attr "cond"   "set")
> +(set_attr "type"   "compare")
> +(set_attr "length" "4,4,4,8")])
> +
> +(define_expand "usubvsi4"
> +  [(match_operand:SI 0 "register_operand")
> +   (match_operand:SI 1 "register_operand")
> +   (match_operand:SI 2 "nonmemory_operand")
> +   (label_ref (match_operand 3 "" ""))]
> +   ""
> +   "emit_insn (gen_subsi3_c (operands[0], operands[1], operands[2]));
> +arc_gen_unlikely_cbranch (LTU, CC_Cmode, operands[3]);
> +DONE;")
> +
>  (define_expand "subdi3"
>[(set (match_operand:DI 0 "register_operand" "")
> (minus:DI (match_operand:DI 1 "register_operand" "")
> diff --git a/gcc/testsuite/gcc.target/arc/overflow-2.c 
> b/gcc/testsuite/gcc.target/arc/overflow-2.c
> new file mode 100644
> index 000..b4de8c03b22
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/overflow-2.c
> @@ -0,0 +1,97 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1" } */
> +
> +#include 
> +#include 
> +
> +/*
> + * sub.f  r0,r0,r1
> + * st_s   r0,[r2]
> + * mov_s  r0,1
> + * j_s.d  [blink]
> + * mov.nv r0,0
> + */
> +bool sub_overflow (int32_t a, int32_t b, int32_t *res)
> +{
> +  return __builtin_sub_overflow (a, b, res);
> +}
> +
> +/*
> + * sub.f  r0,r0,-1234
> + * st_s   r0,[r1]
> + * mov_s  r0,1
> + * j_s.d  [blink]
> + * mov.nv r0,0
> + */
> +bool subi_overflow (int32_t a, int32_t *res)
> +{
> +  return __builtin_sub_overflow (a, -1234, res);
> +}
> +
> +/*
> + * sub.f  r3,r0,r1
> + * st_s   r3,[r2]
> + * j_s.d  [blink]
> + * setlo  r0,r0,r1
> + */
> +bool usub_overflow (uint32_t a, uint32_t b, uint32_t *res)
> +{
> +  return __builtin_sub_overflow (a, b, res);
> +}
> +
> +/*
> + * sub.f  r2,r0,4321
> + * seths  r0,4320,r0
> + * j_s.d  [blink]
> + * st_s   r2,[r1]
> + */
> +bool usubi_overflow (uint32_t a, uint32_t *res)
> +{
> +  return __builtin_sub_overflow (a, 4321, res);
> +}
> +
> +/*
> + * sub.f  r0,r0,r1
> + * mov_s  r0,1
> + * j_s.d  [blink]
> + * mov.nv r0,0
> + */
> +bool sub_overflow_p (int32_t a, int32_t b, int32_t res)
> +{
> +  return __builtin_sub_overflow_p (a, b, res);
> +}
> +
> +/*
> + * sub.f  r0,r0,-1000
> + * mov_s  r0,1
> + * j_s.d  [blink]
> + * mov.nv r0,0
> + */
> +bool subi_overflow_p (int32_t a, int32_t res)
> +{
> +  return __builtin_sub_overflow_p (a, -1000, res);
> +}
> +
> +/*
> + * j_s.d  [blink]
> + * setlo  r0,r0,r1
> + */
> +bool usub_overflow_p (uint32_t a, uint32_t b, uint32_t res)
> +{
> + 

Re: [PATCH] [ARC] Allow more ABIs in GLIBC_DYNAMIC_LINKER

2020-03-31 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Pushed.

Thank you,
Claudiu

On Sun, Mar 29, 2020 at 2:05 AM Vineet Gupta via Gcc-patches
 wrote:
>
> Enable big-endian suffixed dynamic linker per glibc multi-abi support.
>
> And to avoid a future churn and version pairingi hassles, also allow
> arc700 although glibc for ARC currently doesn't support it.
>
> gcc/
> -xx-xx  Vineet Gupta 
> +
> +   * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/ChangeLog  | 4 
>  gcc/config/arc/linux.h | 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 86ad683a6cb0..c26a748fd51b 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,7 @@
> +2020-03-28  Vineet Gupta 
> +
> +   * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700
> +
>  2020-03-28  Jakub Jelinek  
>
> PR c/93573
> diff --git a/gcc/config/arc/linux.h b/gcc/config/arc/linux.h
> index 0b99da3fcdaf..1bbeccee7115 100644
> --- a/gcc/config/arc/linux.h
> +++ b/gcc/config/arc/linux.h
> @@ -29,7 +29,7 @@ along with GCC; see the file COPYING3.  If not see
>  }  \
>while (0)
>
> -#define GLIBC_DYNAMIC_LINKER   "/lib/ld-linux-arc.so.2"
> +#define GLIBC_DYNAMIC_LINKER   
> "/lib/ld-linux-arc%{mbig-endian:eb}%{mcpu=arc700:700}.so.2"
>  #define UCLIBC_DYNAMIC_LINKER  "/lib/ld-uClibc.so.0"
>
>  /* Note that the default is to link against dynamic libraries, if they are
> --
> 2.20.1
>


Re: [PATCH] [ARC] Allow more ABIs in GLIBC_DYNAMIC_LINKER

2020-04-10 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Done.

Thank you for your support,
Claudiu

On Thu, Apr 9, 2020 at 2:38 AM Vineet Gupta  wrote:
>
> Hi Claudiu,
>
> For glibc needs can this be backported to gcc-9 please !
>
> Thx,
> -Vineet
>
> On 3/31/20 3:06 AM, Claudiu Zissulescu Ianculescu wrote:
> > Pushed.
> >
> > Thank you,
> > Claudiu
> >
> > On Sun, Mar 29, 2020 at 2:05 AM Vineet Gupta via Gcc-patches
> >  wrote:
> >> Enable big-endian suffixed dynamic linker per glibc multi-abi support.
> >>
> >> And to avoid a future churn and version pairingi hassles, also allow
> >> arc700 although glibc for ARC currently doesn't support it.
> >>
> >> gcc/
> >> -xx-xx  Vineet Gupta 
> >> +
> >> +   * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700
> >>
> >> Signed-off-by: Vineet Gupta 
> >> ---
> >>  gcc/ChangeLog  | 4 
> >>  gcc/config/arc/linux.h | 2 +-
> >>  2 files changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> >> index 86ad683a6cb0..c26a748fd51b 100644
> >> --- a/gcc/ChangeLog
> >> +++ b/gcc/ChangeLog
> >> @@ -1,3 +1,7 @@
> >> +2020-03-28  Vineet Gupta 
> >> +
> >> +   * config/arc/linux.h: GLIBC_DYNAMIC_LINKER support BE/arc700
> >> +
> >>  2020-03-28  Jakub Jelinek  
> >>
> >> PR c/93573
> >> diff --git a/gcc/config/arc/linux.h b/gcc/config/arc/linux.h
> >> index 0b99da3fcdaf..1bbeccee7115 100644
> >> --- a/gcc/config/arc/linux.h
> >> +++ b/gcc/config/arc/linux.h
> >> @@ -29,7 +29,7 @@ along with GCC; see the file COPYING3.  If not see
> >>  }  \
> >>while (0)
> >>
> >> -#define GLIBC_DYNAMIC_LINKER   "/lib/ld-linux-arc.so.2"
> >> +#define GLIBC_DYNAMIC_LINKER   
> >> "/lib/ld-linux-arc%{mbig-endian:eb}%{mcpu=arc700:700}.so.2"
> >>  #define UCLIBC_DYNAMIC_LINKER  "/lib/ld-uClibc.so.0"
> >>
> >>  /* Note that the default is to link against dynamic libraries, if they are
> >> --
> >> 2.20.1
> >>
> > ___
> > linux-snps-arc mailing list
> > linux-snps-...@lists.infradead.org
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dsnps-2Darc&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=7FgpX6o3vAhwMrMhLh-4ZJey5kjdNUwOL2CWsFwR4T8&m=MrObyH2ki95_7m_xHpnWX-k9eIMOsxMuSa48qhxYOCY&s=3ggbGwaiJuSFnFECy0ItuwBBMDAcriwCdSc3GA0UFig&e=
>


Re: [PATCH] arc: Use separate predicated patterns for mpyd(u)

2020-10-23 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Gentle PING.

On Wed, Oct 7, 2020 at 12:39 PM Claudiu Zissulescu  wrote:
>
> From: Claudiu Zissulescu 
>
> The compiler can match mpyd.eq r0,r1,r0 as a predicated instruction,
> which is incorrect. The mpyd(u) instruction takes as input two 32-bit
> registers, returning into a double 64-bit even-odd register pair.  For
> the predicated case, the ARC instruction decoder expects the
> destination register to be the same as the first input register. In
> the big-endian case the result is swaped in the destination register
> pair, however, the instruction encoding remains the same.  Refurbish
> the mpyd(u) patterns to take into account the above observation.
>
> Permission to apply this patch to master, gcc10 and gcc9 branches.
>
> Cheers,
> Claudiu
>
> -xx-xx  Claudiu Zissulescu  
>
> * testsuite/gcc.target/arc/pmpyd.c: New test.
> * testsuite/gcc.target/arc/tmac-1.c: Update.
> * config/arc/arc.md (mpyd_arcv2hs): New template
> pattern.
> (*pmpyd_arcv2hs): Likewise.
> (*pmpyd_imm_arcv2hs): Likewise.
> (mpyd_arcv2hs): Moved into above template.
> (mpyd_imm_arcv2hs): Moved into above template.
> (mpydu_arcv2hs): Likewise.
> (mpydu_imm_arcv2hs): Likewise.
> (su_optab): New optab prefix for sign/zero-extending operations.
>
> Signed-off-by: Claudiu Zissulescu 
> ---
>  gcc/config/arc/arc.md | 101 +-
>  gcc/testsuite/gcc.target/arc/pmpyd.c  |  15 
>  gcc/testsuite/gcc.target/arc/tmac-1.c |   2 +-
>  3 files changed, 67 insertions(+), 51 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/pmpyd.c
>
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 1720e8cd2f6f..d4d9f59a3eac 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -894,6 +894,8 @@ archs4x, archs4xd"
>
>  (define_code_iterator SEZ [sign_extend zero_extend])
>  (define_code_attr SEZ_prefix [(sign_extend "sex") (zero_extend "ext")])
> +; Optab prefix for sign/zero-extending operations
> +(define_code_attr su_optab [(sign_extend "") (zero_extend "u")])
>
>  (define_insn "*xt_cmp0_noout"
>[(set (match_operand 0 "cc_set_register" "")
> @@ -6436,66 +6438,65 @@ archs4x, archs4xd"
> (set_attr "predicable" "no")
> (set_attr "cond" "nocond")])
>
> -(define_insn "mpyd_arcv2hs"
> -  [(set (match_operand:DI 0 "even_register_operand"
> "=Rcr, r")
> -   (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand"  "  
> 0, c"))
> -(sign_extend:DI (match_operand:SI 2 "register_operand"  "  
> c, c"
> +(define_insn "mpyd_arcv2hs"
> +  [(set (match_operand:DI 0 "even_register_operand"   "=r")
> +   (mult:DI (SEZ:DI (match_operand:SI 1 "register_operand" "r"))
> +(SEZ:DI (match_operand:SI 2 "register_operand" "r"
> (set (reg:DI ARCV2_ACC)
> (mult:DI
> - (sign_extend:DI (match_dup 1))
> - (sign_extend:DI (match_dup 2]
> + (SEZ:DI (match_dup 1))
> + (SEZ:DI (match_dup 2]
>"TARGET_PLUS_MACD"
> -  "mpyd%? %0,%1,%2"
> -  [(set_attr "length" "4,4")
> -  (set_attr "iscompact" "false")
> -  (set_attr "type" "multi")
> -  (set_attr "predicable" "yes,no")
> -  (set_attr "cond" "canuse,nocond")])
> -
> -(define_insn "mpyd_imm_arcv2hs"
> -  [(set (match_operand:DI 0 "even_register_operand"
> "=Rcr, r,r,Rcr,  r")
> -   (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand"  "  
> 0, c,0,  0,  c"))
> -(match_operand 2   "immediate_operand"  "  
> L, L,I,Cal,Cal")))
> +  "mpyd%?\\t%0,%1,%2"
> +  [(set_attr "length" "4")
> +   (set_attr "iscompact" "false")
> +   (set_attr "type" "multi")
> +   (set_attr "predicable" "no")])
> +
> +(define_insn "*pmpyd_arcv2hs"
> +  [(set (match_operand:DI 0 "even_register_operand" "=r")
> +   (mult:DI
> +(SEZ:DI (match_operand:SI 1 "even_register_operand" "%0"))
> +(SEZ:DI (match_operand:SI 2 "register_operand"  "r"
> (set (reg:DI ARCV2_ACC)
> -   (mult:DI (sign_extend:DI (match_dup 1))
> -(match_dup 2)))]
> +   (mult:DI
> + (SEZ:DI (match_dup 1))
> + (SEZ:DI (match_dup 2]
>"TARGET_PLUS_MACD"
> -  "mpyd%? %0,%1,%2"
> -  [(set_attr "length" "4,4,4,8,8")
> -  (set_attr "iscompact" "false")
> -  (set_attr "type" "multi")
> -  (set_attr "predicable" "yes,no,no,yes,no")
> -  (set_attr "cond" "canuse,nocond,nocond,canuse_limm,nocond")])
> -
> -(define_insn "mpydu_arcv2hs"
> -  [(set (match_operand:DI 0 "even_register_operand"
> "=Rcr, r")
> -   (mult:DI (zero_extend:DI (match_operand:SI 1 "register_operand"  "  
> 0, c"))
> -(zero_extend:DI (match_operand:SI 2 "register_operand" "   
> c, c"
> +  "mpyd%?\\t%0,%1,%2"
> +  [(set_attr "length" "4")
> +   (set_attr "iscompact" "false")
> +   (set_attr "type" "multi")
> +   (set_attr "predicable

Re: [PATCH] arc: Improve/add instruction patterns to better use MAC instructions.

2020-10-23 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Gentle PING.

On Fri, Oct 9, 2020 at 5:24 PM Claudiu Zissulescu  wrote:
>
> From: Claudiu Zissulescu 
>
> ARC MYP7+ instructions add MAC instructions for vector and scalar data
> types. This patch adds a madd pattern for 16it datum that is using the
> 32bit MAC instruction, and dot_prod patterns for v4hi vector
> types. The 64bit moves are also upgraded by using vadd2 instuction.
>
> gcc/
> -xx-xx  Claudiu Zissulescu  
>
> * config/arc/arc.c (arc_split_move): Recognize vadd2 instructions.
> * config/arc/arc.md (movdi_insn): Update pattern to use vadd2
> instructions.
> (movdf_insn): Likewise.
> (maddhisi4): New pattern.
> (umaddhisi4): Likewise.
> * config/arc/simdext.md (mov_int): Update pattern to use
> vadd2.
> (sdot_prodv4hi): New pattern.
> (udot_prodv4hi): Likewise.
> (arc_vec_mac_hi_v4hi): Update/renamed to
> arc_vec_mac_v2hiv2si.
> (arc_vec_mac_v2hiv2si_zero): New pattern.
>
> Signed-off-by: Claudiu Zissulescu 
> ---
>  gcc/config/arc/arc.c  |  8 
>  gcc/config/arc/arc.md | 71 ---
>  gcc/config/arc/constraints.md |  5 ++
>  gcc/config/arc/simdext.md | 90 +++
>  4 files changed, 147 insertions(+), 27 deletions(-)
>
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index ec55cfde87a9..d5b521e75e67 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -10202,6 +10202,14 @@ arc_split_move (rtx *operands)
>return;
>  }
>
> +  if (TARGET_PLUS_QMACW
> +  && even_register_operand (operands[0], mode)
> +  && even_register_operand (operands[1], mode))
> +{
> +  emit_move_insn (operands[0], operands[1]);
> +  return;
> +}
> +
>if (TARGET_PLUS_QMACW
>&& GET_CODE (operands[1]) == CONST_VECTOR)
>  {
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index f9fc11e51a85..1720e8cd2f6f 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -1345,8 +1345,8 @@ archs4x, archs4xd"
>")
>
>  (define_insn_and_split "*movdi_insn"
> -  [(set (match_operand:DI 0 "move_dest_operand"  "=w, w,r,   m")
> -   (match_operand:DI 1 "move_double_src_operand" "c,Hi,m,cCm3"))]
> +  [(set (match_operand:DI 0 "move_dest_operand"  "=r, r,r,   m")
> +   (match_operand:DI 1 "move_double_src_operand" "r,Hi,m,rCm3"))]
>"register_operand (operands[0], DImode)
> || register_operand (operands[1], DImode)
> || (satisfies_constraint_Cm3 (operands[1])
> @@ -1358,6 +1358,13 @@ archs4x, archs4xd"
>  default:
>return \"#\";
>
> +case 0:
> +if (TARGET_PLUS_QMACW
> +   && even_register_operand (operands[0], DImode)
> +   && even_register_operand (operands[1], DImode))
> +  return \"vadd2\\t%0,%1,0\";
> +return \"#\";
> +
>  case 2:
>  if (TARGET_LL64
>  && memory_operand (operands[1], DImode)
> @@ -1374,7 +1381,7 @@ archs4x, archs4xd"
>  return \"#\";
>  }
>  }"
> -  "reload_completed"
> +  "&& reload_completed"
>[(const_int 0)]
>{
> arc_split_move (operands);
> @@ -1420,15 +1427,24 @@ archs4x, archs4xd"
>"if (prepare_move_operands (operands, DFmode)) DONE;")
>
>  (define_insn_and_split "*movdf_insn"
> -  [(set (match_operand:DF 0 "move_dest_operand"  "=D,r,c,c,r,m")
> -   (match_operand:DF 1 "move_double_src_operand" "r,D,c,E,m,c"))]
> -  "register_operand (operands[0], DFmode) || register_operand (operands[1], 
> DFmode)"
> +  [(set (match_operand:DF 0 "move_dest_operand"  "=D,r,r,r,r,m")
> +   (match_operand:DF 1 "move_double_src_operand" "r,D,r,E,m,r"))]
> +  "register_operand (operands[0], DFmode)
> +   || register_operand (operands[1], DFmode)"
>"*
>  {
>   switch (which_alternative)
> {
>  default:
>return \"#\";
> +
> +case 2:
> +if (TARGET_PLUS_QMACW
> +   && even_register_operand (operands[0], DFmode)
> +   && even_register_operand (operands[1], DFmode))
> +  return \"vadd2\\t%0,%1,0\";
> +return \"#\";
> +
>  case 4:
>  if (TARGET_LL64
> && ((even_register_operand (operands[0], DFmode)
> @@ -6177,6 +6193,49 @@ archs4x, archs4xd"
>[(set_attr "length" "0")])
>
>  ;; MAC and DMPY instructions
> +
> +; Use MAC instruction to emulate 16bit mac.
> +(define_expand "maddhisi4"
> +  [(match_operand:SI 0 "register_operand" "")
> +   (match_operand:HI 1 "register_operand" "")
> +   (match_operand:HI 2 "extend_operand"   "")
> +   (match_operand:SI 3 "register_operand" "")]
> +  "TARGET_PLUS_DMPY"
> +  "{
> +   rtx acc_reg = gen_rtx_REG (DImode, ACC_REG_FIRST);
> +   rtx tmp1 = gen_reg_rtx (SImode);
> +   rtx tmp2 = gen_reg_rtx (SImode);
> +   rtx accl = gen_lowpart (SImode, acc_reg);
> +
> +   emit_move_insn (accl, operands[3]);
> +   emit_insn (gen_rtx_SET (tmp1, gen_rtx_SIGN_EXTEND (SImode, operands[1])));
> +   emit_insn (gen_rtx_SET (tmp2, gen_rtx_SIGN_EXTEND (SImode, operand

Re: [PATCH] arc: Add --with-fpu support for ARCv2 cpus

2021-06-14 Thread Claudiu Zissulescu Ianculescu via Gcc-patches
Thanks a lot guys. Patch is pushed.

//Claudiu

On Mon, Jun 14, 2021 at 12:34 AM Jeff Law  wrote:
>
>
>
> On 6/13/2021 4:06 AM, Bernhard Reutner-Fischer wrote:
> > On Fri, 11 Jun 2021 14:25:24 +0300
> > Claudiu Zissulescu  wrote:
> >
> >> Hi Bernhard,
> >>
> >> Please find attached my latest patch, it includes (hopefully) all your
> >> feedback.
> >>
> >> Thank you for comments,
> > concise and clean, i wouldn't know what to remove. LGTM.
> > thanks for your patience!
> THen let's consider it approved at this point.  Thanks for chiming in
> Bernhard and thanks for implementing the suggestions Claudiu!
>
> jeff