from:"Oleg Endo"

[SH][committed] Fix PR 101469

2023-07-13 Thread Oleg Endo

Hi,

The attached patch fixes PR 101469.
Tested by the original reporter Rin Okuyama on NetBSD with GCC 10.5.
Applied to master, GCC 11, GCC 12, GCC 13 after 'make all' sanity check.

Cheers,
Oleg


gcc/ChangeLog:

PR target/101469
* config/sh/sh.md (peephole2): Handle case where eliminated reg
is also used by the address of the following memory
operand.

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 4622dba0121..76e7774cef3 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -10680,6 +10680,45 @@
&& peep2_reg_dead_p (2, operands[1]) && peep2_reg_dead_p (3, operands[0])"
   [(const_int 0)]
 {
+  if (MEM_P (operands[3]) && reg_overlap_mentioned_p (operands[0], operands[3]))
+{
+  // Take care when the eliminated operand[0] register is part of
+  // the destination memory address.
+  rtx addr = XEXP (operands[3], 0);
+
+  if (REG_P (addr))
+	operands[3] = replace_equiv_address (operands[3], operands[1]);
+
+  else if (GET_CODE (addr) == PLUS && REG_P (XEXP (addr, 0))
+	   && CONST_INT_P (XEXP (addr, 1))
+	   && REGNO (operands[0]) == REGNO (XEXP (addr, 0)))
+	operands[3] = replace_equiv_address (operands[3],
+			gen_rtx_PLUS (SImode, operands[1], XEXP (addr, 1)));
+
+  else if (GET_CODE (addr) == PLUS && REG_P (XEXP (addr, 0))
+	   && REG_P (XEXP (addr, 1)))
+{
+  // register + register address  @(R0, Rn)
+  // can change only the Rn in the address, not R0.
+  if (REGNO (operands[0]) == REGNO (XEXP (addr, 0))
+	  && REGNO (XEXP (addr, 0)) != 0)
+	{
+	  operands[3] = replace_equiv_address (operands[3],
+			gen_rtx_PLUS (SImode, operands[1], XEXP (addr, 1)));
+	}
+  else if (REGNO (operands[0]) == REGNO (XEXP (addr, 1))
+		   && REGNO (XEXP (addr, 1)) != 0)
+{
+	  operands[3] = replace_equiv_address (operands[3],
+			gen_rtx_PLUS (SImode, XEXP (addr, 0), operands[1]));
+}
+  else
+FAIL;
+}
+  else
+FAIL;
+}
+
   emit_insn (gen_addsi3 (operands[1], operands[1], operands[2]));
   sh_peephole_emit_move_insn (operands[3], operands[1]);
 })

[SH][committed] Fix PR 101177

2023-10-20 Thread Oleg Endo

The attached patch fixes PR 101177.

Committed to master, cherry-picked to GCC-13, GCC-12 and GCC-11.
Sanity tested with 'make all-gcc'.

Cheers,
Oleg

gcc/ChangeLog:

PR target/101177
* config/sh/sh.md (unnamed split pattern): Fix comparison of
find_regno_note result.

From 3ce4e99303d01d348229cca22bf8d3dd63004e01 Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Fri, 20 Oct 2023 18:48:34 +0900
Subject: [PATCH] SH: Fix PR 101177

Fix accidentally inverted comparison.

gcc/ChangeLog:

	PR target/101177
	* config/sh/sh.md (unnamed split pattern): Fix comparison of
	find_regno_note result.
---
 gcc/config/sh/sh.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 76e7774..93374c6 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -841,9 +841,9 @@
   rtx reg = operands[0];
   if (SUBREG_P (reg))
 reg = SUBREG_REG (reg);
   gcc_assert (REG_P (reg));
-  if (find_regno_note (curr_insn, REG_DEAD, REGNO (reg)) != NULL_RTX)
+  if (find_regno_note (curr_insn, REG_DEAD, REGNO (reg)) == NULL_RTX)
 FAIL;
 
   /* FIXME: Maybe also search the predecessor basic blocks to catch
  more cases.  */
--
libgit2 1.3.2

[SH][committed] Fix PR 111001

2023-10-23 Thread Oleg Endo

The attached patch fixes PR 111001.

Committed to master, cherry-picked to GCC-13, GCC-12 and GCC-11.
Sanity tested with 'make all-gcc'.
Bootstrapped on GCC-13 sh4-linux by Adrian.

Cheers,
Oleg

gcc/ChangeLog:

PR target/111001
* config/sh/sh_treg_combine.cc (sh_treg_combine::record_set_of_reg):
Skip over nop move insns.

From 4414818f4e5de54ea3c353e2ebb2e79a89ae211b Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Mon, 23 Oct 2023 22:08:37 +0900
Subject: [PATCH] SH: Fix PR 111001

gcc/ChangeLog:

	PR target/111001
	* config/sh/sh_treg_combine.cc (sh_treg_combine::record_set_of_reg):
	Skip over nop move insns.
---
 gcc/config/sh/sh_treg_combine.cc |  9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh_treg_combine.cc b/gcc/config/sh/sh_treg_combine.cc
index f6553c0..685ca54 100644
--- a/gcc/config/sh/sh_treg_combine.cc
+++ b/gcc/config/sh/sh_treg_combine.cc
@@ -731,9 +731,16 @@ sh_treg_combine::record_set_of_reg (rtx reg, rtx_insn *start_insn,
 	  new_entry.cstore_type = cstore_inverted;
 	}
   else if (REG_P (new_entry.cstore.set_src ()))
 	{
-	  // If it's a reg-reg copy follow the copied reg.
+	  // If it's a reg-reg copy follow the copied reg, but ignore
+	  // nop copies of the reg onto itself.
+	  if (REGNO (new_entry.cstore.set_src ()) == REGNO (reg))
+	{
+	  i = prev_nonnote_nondebug_insn_bb (i);
+	  continue;
+	}
+
 	  new_entry.cstore_reg_reg_copies.push_back (new_entry.cstore);
 	  reg = new_entry.cstore.set_src ();
 	  i = new_entry.cstore.insn;
 
--
libgit2 1.3.2

Re: RFA: crc builtin functions & optimizations

2022-03-14 Thread Oleg Endo

On Mon, 2022-03-14 at 18:04 -0700, Andrew Pinski via Gcc-patches wrote:
> On Mon, Mar 14, 2022 at 5:33 PM Joern Rennecke
>  wrote:
> > 
> > Most microprocessors have efficient ways to perform CRC operations, be
> > that with lookup tables, rotates, or even special instructions.
> > However, because we lack a representation for CRC in the compiler, we
> > can't do proper instruction selection.  With this patch I seek out to
> > rectify this,
> > I've avoided using a mode name for the built-in functions because that
> > would tie the semantics to the size of the addressable unit.  We
> > generally use abbreviations like s/l/ll for type names, which is all
> > right when the type can be widened without changing semantics.  For
> > the data input, however, we also have to consider the shift count that
> > is tied to it.  That is why I used a number to designate the width of
> > the data input and shift.
> > 
> > For machine support, I made a start with 8 and 16 bit little-endian
> > CRC for RISCV using a
> > lookup table.  I am sure once we have the basic infrastructure in the
> > tree, we'll get more
> > contributions of suitable named patterns for various ports.
> 
> 
> A few points.
> There are at least 9 different polynomials for the CRC-8 in common use today.
> For CRC-32 there are 5 different polynomials used.
> You don't have a patch to invoke.texi adding the descriptions of the builtins.
> How is your polynom 3rd argument described? Is it similar to how it is
> done on the wiki for the CRC?
> Does it make sense to have to list the most common polynomials in the
> documentation?
> 
> Also I am sorry but micro-optimizing coremarks is just wrong. Maybe it
> is better to pick the CRC32 that is inside zip instead for a testcase
> and benchmarking against?
> Or even the CRC32C for iSCSI/ext4.
> 
> I see you also don't optimize the case where you have three other
> variants of polynomials that are reversed, reciprocal and reversed
> reciocal.

In my own CRC library I've got ~30 'commonly used' CRC types, based on
the following generic definition:

template <
  // number of crc result bits (polynomial order in bits)
  unsigned int BitCount,

  // normal polynomial without the leading 1 bit.
  typename crc_impl_detail::select_int::type TruncPoly,

  // initial remainder
  typename crc_impl_detail::select_int::type InitRem = 0,

  // final xor value
  typename crc_impl_detail::select_int::type FinalXor = 0,

  // input data byte reflected before processing (LSB / MSB first)
  bool ReflectInput = false,

  // output CRC reflected before the xor
  bool ReflectRemainder = false >
class crc
{
...
};


and then it goes like ...

// CRC-1 (most hardware; also known as parity bit)
// x + 1
typedef crc < 1, 0x01 > crc_1;

// CRC-3
typedef crc < 3, 0x03, 0x07, 0x00, true, true> crc_3;

...

// CRC-32 (ISO 3309, ANSI X3.66, FIPS PUB 71, FED-STD-1003, ITU-T V.42, 
Ethernet, SATA, MPEG-2, Gzip, PKZIP, POSIX cksum, PNG, ZMODEM)
// x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + 
x^4 + x^2 + x + 1
typedef crc < 32, 0x04C11DB7, 0x, 0x, true, true > crc_32;

typedef crc < 32, 0x04C11DB7, 0x7FFF, 0x7FFF, false, false > 
crc_32_mpeg2;
typedef crc < 32, 0x04C11db7, 0x, 0x, false, false > 
crc_32_posix;

...


It then generates the lookup tables at compile time into .rodata for
the types that are used in the program, which is great for MCUs with
more flash/ROM than RAM.

Specific CRC types can be overridden if the system has a better way to
calculate the CRC, e.g. as hardware peripheral.

This being a library makes it relatively easy to tune and customize for
various systems.

How would that work together with your proposal?

Cheers,
Oleg

Re: [PATCH] sh-linux fix target cpu

2022-01-30 Thread Oleg Endo

On Fri, 2022-01-28 at 15:18 -0700, Jeff Law via Gcc-patches wrote:
> 
> On 1/12/2022 2:02 AM, Yoshinori Sato wrote:
> > sh-linux not supported any SH1 and SH2a little-endian.
> > Add exceptios it.
> > 
> > gcc/ChangeLog:
> > 
> > * config/sh/t-linux (MULTILIB_EXCEPTIONS): Add m1, mb/m1 and m2a.
> Thanks.  Technically this is probably too late to make gcc-12 as we're 
> in stage4 (regression fixes only).  BUt it was posted during stage3 
> (general bugfixing) and is very very low risk.
> 
> I went ahead and committed it for you.
> 
> Thanks, and sorry for the delays.


Thanks, Jeff!

Cheers,
Oleg

Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass

2019-10-26 Thread Oleg Endo

On Sat, 2019-10-26 at 12:21 -0600, Jeff Law wrote:
> On 10/25/19 11:39 AM, Craig Blackmore wrote:
> > This patch aims to allow more load/store instructions to be
> > compressed by
> > replacing a load/store of 'base register + large offset' with a new
> > load/store
> > of 'new base + small offset'. If the new base gets stored in a
> > compressed
> > register, then the new load/store can be compressed. Since there is
> > an overhead
> > in creating the new base, this change is only attempted when 'base
> > register' is
> > referenced in at least 4 load/stores in a basic block.
> > 
> > The optimization is implemented in a new RISC-V specific pass
> > called
> > shorten_memrefs which is enabled for RVC targets. It has been
> > developed for the
> > 32-bit lw/sw instructions but could also be extended to 64-bit
> > ld/sd in future.
> > 
> > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc,
> > rv64imac,
> > rv64imafdc via QEMU. No regressions.
> > 
> > gcc/ChangeLog:
> > 
> > * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for
> > riscv.
> > * config/riscv/riscv-passes.def: New file.
> > * config/riscv/riscv-protos.h (make_pass_shorten_memrefs):
> > Declare.
> > * config/riscv/riscv-shorten-memrefs.c: New file.
> > * config/riscv/riscv.c (tree-pass.h): New include.
> > (riscv_compressed_reg_p): New Function
> > (riscv_compressed_lw_offset_p): Likewise.
> > (riscv_compressed_lw_address_p): Likewise.
> > (riscv_shorten_lw_offset): Likewise.
> > (riscv_legitimize_address): Attempt to convert base +
> > large_offset
> > to compressible new_base + small_offset.
> > (riscv_address_cost): Make anticipated compressed load/stores
> > cheaper for code size than uncompressed load/stores.
> > (riscv_register_priority): Move compressed register check to
> > riscv_compressed_reg_p.
> > * config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET):
> > Define.
> > * config/riscv/riscv.opt (mshorten-memefs): New option.
> > * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
> > (PASSES_EXTRA): Add riscv-passes.def.
> > * doc/invoke.texi: Document -mshorten-memrefs.
> 
> This has traditionally been done via the the legitimize_address hook.
> Is there some reason that hook is insufficient for this case?
> 
> The hook, IIRC, is called out explow.c.
> 

This sounds like some of my addressing mode selection (AMS) attempts on
SH.  Haven't looked at the patch (sorry), but I'm sure the problem is
pretty much the same.

On SH legitimize_address is used to do ... "something" ... to the
address in order to make the displacement fit.  The issue is,
legitimize_address doesn't get any context so it can't even try to find
a local optimal base address or something like that.

Cheers,
Oleg

Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass

2019-10-26 Thread Oleg Endo

On Sat, 2019-10-26 at 14:04 -0600, Jeff Law wrote:
> On 10/26/19 1:33 PM, Andrew Waterman wrote:
> > I don't know enough to say whether the legitimize_address hook is
> > sufficient for the intended purpose, but I am sure that RISC-V's
> > concerns are not unique: other GCC targets have to cope with
> > offset-size constraints that are coupled to register-allocation
> > constraints.
> 
> Yup.  I think every risc port in the 90s faces this problem.  I
> always
> wished for a generic mechanism for ports to handle this problem.
> 
> Regardless, it's probably worth investigating.
> 

What we tried to do with the address mode selection (AMS) optimization
some time ago was the following:

  - Extract memory accesses from the insns stream and put them in 
"access sequences".  Also analyze the address expression and try   
to find effective base addresses by tracing back address 
calculations.

  - For each memory access, get a set of address mode alternatives and 
the corresponding costs from the backend.  The full context of each
access is provided, so the backend can detect things like 
"in this access sequence, FP loads dominate" and use this 
information to tune the alternative costs.

  - Given the alternatives and costs for each memory access, the pass 
would then try to minimize the costs of the whole memory access
sequence, taking costs of address modification isnns into account. 

I think this is quite generic, but of course also complex.  The
optimization problem itself is hard.  There was some research done by
others using  CPLEX or PBQP solvers.  To keep things simple we used a 
backtracking algorithm and handled only a limited set of scenarios. 
For example, I think we could not get loop constructs work nicely to
improve post-inc address mode utilization.

The SH ISA has very similar properties to ARM thumb and RVC, and
perhaps others.  Advantages would not be limited to RISC only, even
CISC ISAs like M68K, RX, ... can benefit from it, as the "proper
optimization" can reduce the instruction sizes by shortening the
addresses in the instruction stream.

If anyone is interested, here is the AMS optimization pass class:

https://github.com/erikvarga/gcc/blob/master/gcc/ams.h
https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc

It's using a different style to callback into the backend code.  Not
GCC's "hooks" but a delegate pattern.  SH backend's delegate
implementation is here

https://github.com/erikvarga/gcc/blob/master/gcc/config/sh/sh.c#L11897

We were getting some improvements in the generated code, but it started
putting pressure on register allocation related issues in the SH
backend (the R0 problem), so we could not do more proper testing.

Maybe somebody can get some ideas out of it.

Cheers,
Oleg

Re: [PATCH][RFC] C++-style iterators for FOR_EACH_IMM_USE_STMT

2019-10-29 Thread Oleg Endo

On Tue, 2019-10-29 at 11:26 +0100, Richard Biener wrote:
> While I converted other iterators requiring special BREAK_FROM_XYZ
> a few years ago FOR_EACH_IMM_USE_STMT is remaining.  I've pondered
> a bit but cannot arrive at a "nice" solution here with just one
> iterator as the macros happen to use.  For reference, the macro use
> is
> 
>   imm_use_iterator iter;
>   gimple *use_stmt;
>   FOR_EACH_IMM_USE_STMT (use_stmt, iter, name)
> {
>   use_operand_p use_p;
>   FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> ;
> }
> 
> which expands to (w/o macros)
> 
>imm_use_iterator iter; 
>for (gimple *use_stmt = first_imm_use_stmt (&iter, name);
> !end_imm_use_stmt_p (&iter);
> use_stmt = nest_imm_use_stmt (&iter))
>  for (use_operand_p use_p = first_imm_use_on_stmt (&iter);
>   !end_imm_use_on_stmt_p (&iter);
>   use_p = next_imm_use_on_stmt (&iter))
>;
> 
> and my foolish C++ attempt results in
> 
>  for (imm_use_stmt_iter it = SSAVAR; !it.end_p (); ++it)
>for (imm_use_stmt_iter::use_on_stmt it2 = it; !it2.end_p ();
> ++it2)
>  ;
> 
> with *it providing the gimple * USE_STMT and *it2 the use_operand_p.
> The complication here is to map the two-level iteration to "the C++
> way".
> Are there any STL examples mimicing this?  Of course with C++11 we
> could
> do
> 
>   for (imm_use_stmt_iter it = SSAVAR; !it.end_p (); ++it)
> for (auto it2 = it.first_use_on_stmt (); !it2.end_p (); ++it2)
>   ;
> 
> but that's not much nicer either.

Is there a way to put it in such a way that the iterators follow
standard concepts for iterators?  It would increase chances of it
becoming nicer by utilizing range based for loops.

Cheers,
Oleg

Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-11-03 Thread Oleg Endo

On Fri, 2019-10-11 at 23:27 +0900, Oleg Endo wrote:
> On Thu, 2019-10-03 at 19:34 -0600, Jeff Law wrote:
> > 
> > So probably the most interesting target for this test is v850-elf
> > as
> > it's got a reasonably well functioning simulator, hard and soft FP
> > targets, little endian, and I'm familiar with its current set of
> > failures.
> > 
> > I can confirm that your patch makes no difference in the test
> > results
> > (which includes execution results).
> > 
> > In fact, there haven't been any problems on any target in my tester
> > that
> > I can tie back to this change.
> > 
> > At this point I'd say let's go for it.
> > 
> 
> Thanks, Jeff.  I'll commit it to trunk if there are no further
> objections some time next week.
> 

I've just committed it as r277752.

Personally I'd like to install it on GCC 8 and 9 branches as well.
Any thoughts on that?

Cheers,
Oleg

Re: [PATCH][RFC] C++-style iterators for FOR_EACH_IMM_USE_STMT

2019-11-03 Thread Oleg Endo

On Wed, 2019-10-30 at 10:27 +0100, Richard Biener wrote:
> 
> Hmm, not sure - I'd like to write
> 
>  for (gimple *use_stmt : imm_stmt_uses (SSAVAR))
>for (use_operand_p use_p :  from 
> above>)
>  ...
> 
> I don't see how that's possible.  It would need to be "awkward" like
> 
>  for (auto it : imm_stmt_uses (SSAVAR))
>{
>  gimple *use_stmt = *it;
>  for (use_operand_p use_p : it)
>...
>}
> 
> so the first loops iteration object are the actual iterator and you'd
> have to do extra indirection to get at the actual stmt you iterated
> to.
> 
> So I'd extend C++ (hah) to allow
> 
>   for (gimple *use_stmt : imm_stmt_uses (SSAVAR))
> for (use_operand_p use_p : auto)
>   ...
> 
> where 'auto' magically selects the next iterator object in scope
> [that matches].
> 
> ;)

Have you applied for a patent yet? :D

How about this one?

for (gimple* use_stmt : imm_stmt_uses (SSAVAR))
  for (use_operand_p use_p : imm_uses_on_stmt (*use_stmt))

... where helper function "imm_uses_on_stmt" returns a range object
that offers a begin and end function and its own iterator type.


Another concept that could be interesting are filter iterators.

We used a simplistic re-implementation (c++03) to avoid dragging in
boost when working on AMS
https://github.com/erikvarga/gcc/blob/master/gcc/filter_iterator.h

Example uses are
https://github.com/erikvarga/gcc/blob/master/gcc/ams.h#L845
https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc#L3715


I think there are also some places in RTL where filter iterators could
be used, e.g. "iterate over all MEMs in an RTL" could be made to look
something like that:

  for (auto&& i : filter_rtl (my_rtl_obj, MEM_P))
   ...


Anyway, maybe it can plant some ideas.

Cheers,
Oleg

Re: [PATCH 2/4] MSP430: Disable exception handling by default for C++

2019-11-07 Thread Oleg Endo

On Thu, 2019-11-07 at 21:37 +, Jozef Lawrynowicz wrote:
> The code size bloat added by building C++ programs using libraries containing
> support for exceptions is significant. When using simple constructs such as
> static variables, sometimes many kB from the libraries are unnecessarily
> pulled in.
> 
> So this patch disable exceptions by default for MSP430 when compiling for C++,
> by implicitly passing -fno-exceptions unless -fexceptions is passed.

It is extremely annoying when GCC's default standard behavior differs
across different targets.  And as a consequence, you have to add a load
of workarounds and disable other things, like fiddling with the
testsuite.  It's the same thing as setting "double = float" to get more
"speed" by default.

I would strongly advice against making such non-standard behaviors the
default in the vanilla compiler.  C++ normally has exceptions enabled. 
If a user doesn't want them and is willing to deal with it all the
consequences, then we already have a mechanism to do that:
 --fno-exceptions

Perhaps it's generally more useful to add a global configure option for
GCC to disable exception handling by default.  Then you can provide a
turn-key toolchain to your customers as well -- just add an option to
the configure line.

Cheers,
Oleg

Re: [PATCH 0/4][MSP430] Tweaks to default configuration to reduce code size

2019-11-08 Thread Oleg Endo

On Thu, 2019-11-07 at 21:31 +, Jozef Lawrynowicz wrote:
> When building small programs for MSP430, the impact of the unused
> functions pulled in from the CRT libraries is quite noticeable. Most of these
> relates to feature that will never be used for MSP430 (Transactional memory,
> supporting shared objects and dynamic linking), or rarely used (exception
> handling).

There's a magic switch, which does the business, at least for me, most
of the time:

   -flto

If you're trying to bring down the executable size as much as possible,
but don't use -flto, I think something is wrong.

Cheers,
Oleg

Re: [PATCH 0/4][MSP430] Tweaks to default configuration to reduce code size

2019-11-08 Thread Oleg Endo

On Fri, 2019-11-08 at 13:27 +, Jozef Lawrynowicz wrote:
> 
> Yes, I should have used -flto in my examples. But it doesn't help remove these
> CRT library functions which are normally either directly added to the
> list of functions to run before main (via .init, .ctors or .init_array) or 
> used
> in functions which are themselves added to this list.
> 
> The unnecessary functions we want to remove are:
>   deregister_tm_clones
>   register_tm_clones
>   __do_global_dtors_aux
>   frame_dummy
> LTO can't remove any of them.
> 

Ah, right, good point.  That's not MSP430 specific actually.  For those
things I usually have custom init code, which also does other things
occasionally.  Stripping off global dtors is then an option in the
build system which takes care of it (in my case, I do it by modifying
the generated linker script).

But again, as with the exceptions, it might be better to implement
these kind of things outside of the compiler, e.g. by building the app
with -nostartfiles -nodefaultlibs and providing your own substitutes.

Another option is to patch those things in using the OS part of the
target triplet.

Cheers,
Oleg

Re: Add a new combine pass

2019-12-04 Thread Oleg Endo

On Tue, 2019-12-03 at 12:05 -0600, Segher Boessenkool wrote:
> On Tue, Dec 03, 2019 at 10:33:48PM +0900, Oleg Endo wrote:
> > On Mon, 2019-11-25 at 16:47 -0600, Segher Boessenkool wrote:
> > > 
> > > > > - sh (that's sh4-linux):
> > > > > 
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c: In function 
> > > > > 'snmp_get_cpu_field':
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to 
> > > > > find a register to spill in class 'R0_REGS'
> > > > >  1638 | }
> > > > >   | ^
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the 
> > > > > insn:
> > > > > (insn 18 17 19 2 (set (reg:SI 0 r0)
> > > > > (mem:SI (plus:SI (reg:SI 4 r4 [178])
> > > > > (reg:SI 6 r6 [171])) [17 *_3+0 S4 A32])) 
> > > > > "/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i}
> > > > >  (expr_list:REG_DEAD (reg:SI 4 r4 [178])
> > > > > (expr_list:REG_DEAD (reg:SI 6 r6 [171])
> > > > > (nil
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier 
> > > > > errors, bailing out
> > > > 
> > > > Would have to look more at this one.  Seems odd that it can't allocate
> > > > R0 when it's already the destination and when R0 can't be live before
> > > > the insn.  But there again, this is reload, so my enthuasiasm for 
> > > > looking
> > > > is a bit limited :-)
> > > 
> > > It wants to use r0 in some other insn, so it needs to spill it here, but
> > > cannot.  This is what class_likely_spilled is for.
> > 
> > Hmm ... the R0 problem ... SH doesn't override class_likely_spilled
> > explicitly, but it's got a R0_REGS class with only one said reg in it. 
> > So the default impl of class_likely_spilled should do its thing.
> 
> Yes, good point.  So what happened here?

"Something, somewhere, went terribly wrong"...

insn 18 wants to do

mov.l @(r4,r6),r0

But it can't because the reg+reg address mode has a R0 constraint
itself.  So it needs to be changed to

mov   r4,r0
mov.l @(r0,r6),r0

And it can't handle that.  Or only sometimes?  Don't remember.


>   Is it just RA messing things
> up, unrelated to the new pass?
> 

Yep, I think so.  The additional pass seems to create "tougher" code so
reload passes out earlier than usual.  We've had the same issue when
trying address mode selection optimization.  In fact that was one huge
showstopper.

Cheers,
Oleg

Re: Add a new combine pass

2019-12-06 Thread Oleg Endo

On Fri, 2019-12-06 at 16:51 -0600, Segher Boessenkool wrote:
> On Wed, Dec 04, 2019 at 07:43:30PM +0900, Oleg Endo wrote:
> > On Tue, 2019-12-03 at 12:05 -0600, Segher Boessenkool wrote:
> > > > Hmm ... the R0 problem ... SH doesn't override class_likely_spilled
> > > > explicitly, but it's got a R0_REGS class with only one said reg in it. 
> > > > So the default impl of class_likely_spilled should do its thing.
> > > 
> > > Yes, good point.  So what happened here?
> > 
> > "Something, somewhere, went terribly wrong"...
> > 
> > insn 18 wants to do
> > 
> > mov.l @(r4,r6),r0
> > 
> > But it can't because the reg+reg address mode has a R0 constraint
> > itself.  So it needs to be changed to
> > 
> > mov   r4,r0
> > mov.l @(r0,r6),r0
> > 
> > And it can't handle that.  Or only sometimes?  Don't remember.
> > 
> > >   Is it just RA messing things
> > > up, unrelated to the new pass?
> > 
> > Yep, I think so.  The additional pass seems to create "tougher" code so
> > reload passes out earlier than usual.  We've had the same issue when
> > trying address mode selection optimization.  In fact that was one huge
> > showstopper.
> 
> So maybe you should have a define_insn_and_split that allows any two
> regs and replaces one by r0 if neither is (and a move to r0 before the
> load)?  Split after reload of course.
> 
> It may be admitting defeat, but it may even result in better code as
> well ;-)
> 

AFAIR I've tried that already and it was just like running in circles. 
Means it didn't help.  Perhaps if R0_REGS was hidden from RA altogether
it might work.  But that sounds like opening a whole other can of
worms.  Another idea I was entertaining was to do a custom RTL pass to
pre-allocate all R0 constraints before the real full RA.  But then the
whole reload stuff would still have the same issue as above.  So all
the wallpapering is just moot.  Proper fix of the actual problem would
be more appropriate.

Cheers,
Oleg

Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-07 Thread Oleg Endo

On Tue, 2019-11-26 at 07:38 +0100, ste...@franke.ms wrote:
> > On 11/21/19 10:30 AM, ste...@franke.ms wrote:
> > > Hi there,
> > > 
> > > here is mc68k's patch to switch the m68k architecture over to ccmode and
> > > lra. See https://github.com/mc68kghost/gcc 68k-ccmode branch.
> > 
> > Bernd Schmidt posted a conversion of the m68k port to ccmode a couple
> > weeks before yours.  We've already ACK'd it for installing onto the trunk.
> > 
> > Jeff
> 
> To be honest:
> - 8 days is hardly "a couple weeks before"
> - ccmode is not the same as ccmode+lra
> 
> The paperwork for contributing to fsf is on the way and the repo at
> https://github.com/mc68kghost/gcc got an update. Tests are not yet at 100%
> (master branch fails too many tests) but it's closer to master branch now.
> The code is to 50% identical, a fair amount has swapped cmp/bcc, few are a
> tad worse and some yield surprisingly better code.
> 

You can still submit patches for further improvements, like adding
support for LRA.  Now that the main CCmode conversion is on trunk and
has been confirmed and tested, it should be much easier for you to
pinpoint problems in your changes.

Cheers,
Oleg

Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo

On Fri, 2019-12-13 at 05:03 -0600, Segher Boessenkool wrote:
> On Thu, Dec 12, 2019 at 09:32:27AM +, Richard Sandiford wrote:
> > I doubt it will be long before we deprecate
> > all targets that require old reload.)
> 
> Do we wait until GCC 12 (to remove old reload completely)?  If not, we
> should deprecate it now.
> 

Segher, could you please re-run your tests on SH with -mlra as
mentioned here?
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00133.html

I'm thinking to make -mlra the default on SH.

Cheers,
Oleg

Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo

On Fri, 2019-12-13 at 08:09 -0600, Segher Boessenkool wrote:
> On Fri, Dec 13, 2019 at 10:06:20PM +0900, Oleg Endo wrote:
> > On Fri, 2019-12-13 at 05:03 -0600, Segher Boessenkool wrote:
> > > On Thu, Dec 12, 2019 at 09:32:27AM +, Richard Sandiford
> > > wrote:
> > > > I doubt it will be long before we deprecate
> > > > all targets that require old reload.)
> > > 
> > > Do we wait until GCC 12 (to remove old reload completely)?  If
> > > not, we
> > > should deprecate it now.
> > > 
> > 
> > Segher, could you please re-run your tests on SH with -mlra as
> > mentioned here?
> > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00133.html
> > 
> > I'm thinking to make -mlra the default on SH.
> 
> With LRA, sh builds fine (with the combine2 patches).  I have no idea
> if correct code is generated, but it doesn't ICE anymore.
> 

Great, thanks for checking.  I'll try to run some more tests.

Cheers,
Oleg

Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo

On Fri, 2019-12-13 at 15:57 +0100, John Paul Adrian Glaubitz wrote:
> Hello Segher!
> 
> > With LRA, sh builds fine (with the combine2 patches).  I have no idea
> > if correct code is generated, but it doesn't ICE anymore.
> 
> What are the combine2 patches?

See the other thread that I've linked in my message.

>  And I would support switching SH to LRA as
> there are a few cases (Debian packages) where GCC fails with an internal
> compiler error which I reported to the GCC bugzilla.

Have you tried rebuilding debian on/for SH with -mlra enabled for
*everything*?  Do you have an easy way of doing that?  It would be
interesting to see how it goes.

Cheers,
Oleg

Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo

On Fri, 2019-12-13 at 16:09 +0100, John Paul Adrian Glaubitz wrote:
> Hi!
> 
> On 12/13/19 4:06 PM, Oleg Endo wrote:
> > > What are the combine2 patches?
> > 
> > See the other thread that I've linked in my message.
> 
> I don't see any patch there.

You'd have to crawl up the discussion or so.
And I think there were a couple of versions.  Anyway, I don't think it
made it into trunk yet. 

> > 
> > Have you tried rebuilding debian on/for SH with -mlra enabled for
> > *everything*?  Do you have an easy way of doing that?  It would be
> > interesting to see how it goes.
> 
> Yes, that would be possible. We would have to enable -mlra in gcc by
> default and then trigger a rebuild for 10.000 source packages. But
> that would take a while to finish.
> 

Better start now then :)
No hurry though.

Cheers,
Oleg

Re: [PATCH 10/11] sh: Update unexpected empty split condition

2021-06-01 Thread Oleg Endo

On Wed, 2021-06-02 at 00:05 -0500, Kewen Lin wrote:
> gcc/ChangeLog:
> 
>   * config/sh/sh.md (doloop_end_split): Fix empty split condition.
> ---
>  gcc/config/sh/sh.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
> index e3af9ae21c1..93ee7c9a7de 100644
> --- a/gcc/config/sh/sh.md
> +++ b/gcc/config/sh/sh.md
> @@ -6424,7 +6424,7 @@ (define_insn_and_split "doloop_end_split"
> (clobber (reg:SI T_REG))]
>"TARGET_SH2"
>"#"
> -  ""
> +  "&& 1"
>[(parallel [(set (reg:SI T_REG)
>  (eq:SI (match_dup 2) (const_int 1)))
> (set (match_dup 0) (plus:SI (match_dup 2) (const_int -1)))])

This is OK (obvious).

Cheers,
Oleg

[committed][SH] Fix 101737

2024-03-02 Thread Oleg Endo

Hi,

The attached patch should fix PR 101737.  It's a rather obvious oversight. 
Sanity tested with 'make all-gcc'.  Committed to master, gcc-13, gcc-12,
gcc-11.

Cheers,
Oleg


gcc/ChangeLog:
PR target/101737
* config/sh/sh.cc (sh_is_nott_insn): Handle case where the input
is not an insn, but e.g. a code label.
From 4ff8ffe7331cf174668cf5c729fd68ff327ab014 Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Sun, 3 Mar 2024 14:58:58 +0900
Subject: [PATCH] SH: Fix 101737

gcc/ChangeLog:
	PR target/101737
	* config/sh/sh.cc (sh_is_nott_insn): Handle case where the input
	is not an insn, but e.g. a code label.
---
 gcc/config/sh/sh.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
index 2c4..ef3c2e6 100644
--- a/gcc/config/sh/sh.cc
+++ b/gcc/config/sh/sh.cc
@@ -11766,9 +11766,10 @@ sh_insn_operands_modified_between_p (rtx_insn* operands_insn,
negates the T bit and stores the result in the T bit.  */
 bool
 sh_is_nott_insn (const rtx_insn* i)
 {
-  return i != NULL && GET_CODE (PATTERN (i)) == SET
+  return i != NULL_RTX && PATTERN (i) != NULL_RTX
+	 && GET_CODE (PATTERN (i)) == SET
 	 && t_reg_operand (XEXP (PATTERN (i), 0), VOIDmode)
 	 && negt_reg_operand (XEXP (PATTERN (i), 1), VOIDmode);
 }
 
--
libgit2 1.6.4

Re: [PATCH] m68k: restore bootstrap

2024-02-18 Thread Oleg Endo

On Sun, 2024-02-18 at 08:42 -0700, Jeff Law wrote:
> 
> On 2/18/24 02:18, Mikael Pettersson wrote:
> > m68k fails to bootstrap since -ffold-mem-offsets was introduced,
> > in what looks like wrong-code during stage2.
> > 
> > To restore bootstrap this disables -ffold-mem-offsets on m68k.
> > It's not ideal, but better than keeping bootstraps broken until
> > the root cause is debugged and fixed.
> > 
> > Tested with a bootstrap and regression test run on m68k-linux-gnu.
> > 
> > Ok for master? (I'll need help getting it committed.)
> > 
> > gcc/
> > PR target/113357
> > * config/m68k/m68k.cc (m68k_option_override): Disable
> > -ffold-mem-offsets.  Fix typo in comment.
> Definitely not OK.This needs to be debugged further, just disabling 
> the pass is not the right solution here.
> 
> It is also worth noting I'm bootstrapping and regression testing the 
> m68k weekly.
> 
> 

Jeff, could you please consider sharing your test setup so that others can
reproduce it as well?

I'd be really better if more people had access to a unified test setup and
methodology.

Best regards,
Oleg Endo

Re: [committed] Adjust expectations for pr59533-1.c

2024-01-21 Thread Oleg Endo



On Sun, 2024-01-21 at 19:14 -0700, Jeff Law wrote:
> The change for pr111267 twiddled code generation for sh/pr59533-1.c
> 
> We end up eliminating two comparisons, but require two shll instructions 
> to do so.  And in a couple places we're using an addc sequence rather 
> than a subc sequence.   This patch adjusts the expected codegen for the 
> test as all those are either a wash or a
> 
> The fwprop change does cause some code regressions on the same test. 
> I'll file a distinct but for that issue.
> 
> Pushed to the trunk,
> 
> Jeff

Thanks for keeping an eye on this.

Note that on SH4 the comparison insns are of MT type, which increases
likelihood of parallel execution.  So it's better to use those e.g. to shift
out the MSB into T bit than shll.

Cheers,
Oleg

Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Oleg Endo



On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:
> This is work originally started by Joern @ Embecosm.
> 
> There's been a long standing sense that we're generating too many 
> sign/zero extensions on the RISC-V port.  REE is useful, but it's really 
> focused on a relatively narrow part of the extension problem.
> 
> What Joern's patch does is introduce a new pass which tracks liveness of 
> chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31 
> and 32..63.
> 
> If it encounters a sign/zero extend that sets bits that are never read, 
> then it replaces the sign/zero extension with a narrowing subreg.  The 
> narrowing subreg usually gets eliminated by subsequent passes (it's just 
> a copy after all).
> 

Have you tried it on SH, too?  (and if so any numbers?)

It sounds like this one would be great to remove some of the sign/zero
extension removal hackery that I've accumulated in the SH backend.

Cheers,
Oleg

Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Oleg Endo

On Sun, 2023-11-19 at 19:51 -0700, Jeff Law wrote:
> 
> On 11/19/23 18:22, Oleg Endo wrote:
> > 
> > On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:
> > > This is work originally started by Joern @ Embecosm.
> > > 
> > > There's been a long standing sense that we're generating too many
> > > sign/zero extensions on the RISC-V port.  REE is useful, but it's really
> > > focused on a relatively narrow part of the extension problem.
> > > 
> > > What Joern's patch does is introduce a new pass which tracks liveness of
> > > chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31
> > > and 32..63.
> > > 
> > > If it encounters a sign/zero extend that sets bits that are never read,
> > > then it replaces the sign/zero extension with a narrowing subreg.  The
> > > narrowing subreg usually gets eliminated by subsequent passes (it's just
> > > a copy after all).
> > > 
> > 
> > Have you tried it on SH, too?  (and if so any numbers?)


> Just bootstrap with C regression testing on sh4/sh4eb.  No data on 
> improvements.
> 

Alright.  I'll check what it does for SH once it's in.

Cheers,
Oleg

Re: RISC-V: Added support for CRC.

2023-09-26 Thread Oleg Endo

On Sun, 2023-09-24 at 00:05 +0100, Joern Rennecke wrote:
> 
> Although maybe Oleg Endo's library, as mentioned in
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591748.html ,
> might be suitable?  What is the license for that?
> 
> 

I haven't published the library, but I think I could do that.

It's a C++-14 header-only thing and uses templates + constexpr to generate
the .rodata lookup tables.  It's convenient for an application project, as
it doesn't require any generator tool in the build.  This might be not a big
advantage in the context of GCC.

Since the tables are computed during compile-time, there is no particular
optimization implemented.  The run-time function is also nothing fancy:

static constexpr uint8_t table_index (value_type rem, uint8_t x)
{
  if (ReflectInput)
return x ^ rem;
  else
return x ^ (BitCount > 8 ? (rem >> (BitCount - 8))
 : (rem << (8 - BitCount)));
}

static constexpr value_type shift (value_type rem)
{
  return ReflectInput ? rem >> 8 : rem << 8;
}

static value_type
default_process_bytes (value_type rem, const uint8_t* in, const uint8_t* in_end)
{
  for (; in != in_end; ++in)
  {
auto i = table_index (rem, *in);
rem = table[i] ^ shift (rem);
  }
  return rem;
}

Anyway, let me know if anyone is interested.

Cheers,
Oleg

Re: [committed] Remove compromised sh test

2024-06-26 Thread Oleg Endo




On Wed, 2024-06-26 at 07:22 -0600, Jeff Law wrote:
> Surya's recent patch to IRA improves the code for sh/pr54602-1.c 
> slightly.  Specifically it's able to eliminate a save/restore in the 
> prologue/epilogue and a bit of register shuffling.
> 
> As a result there literally aren't any insns that can be used to fill 
> the delay slot of the return, so a nop gets emitted and the test fails.
> 
> Given there literally aren't any insns to move into the delay slot, the 
> best course of action is to just drop the test.
> 
> Pushed to the trunk.
> 
> Jeff

I can't reproduce what you are saying.
Which triplet and flags is your test setup using?

For this test case, GCC 13 with -m4 -ml -O1 -fno-pic:

_test01:
mov.l   r8,@-r15
sts.l   pr,@-r15
mov.l   .L3,r0
jsr @r0
mov r6,r8
add r8,r0
lds.l   @r15+,pr
rts 
mov.l   @r15+,r8
.L3:
.long   _test00


current GCC master branch with -m4 -ml -O1 -fno-pic:

_test00:
mov.l   r8,@-r15
sts.l   pr,@-r15
mov.l   .L3,r0
jsr @r0
mov r6,r8
add r8,r0
lds.l   @r15+,pr
rts
mov.l   @r15+,r8
.L4:
    .align 2
.L3:
.long   _test01


Best regards,
Oleg Endo

Re: [committed] Remove compromised sh test

2024-06-26 Thread Oleg Endo




On Wed, 2024-06-26 at 16:39 -0600, Jeff Law wrote:
> 
> On 6/26/24 4:12 PM, Oleg Endo wrote:
> > 
> > 
> > On Wed, 2024-06-26 at 07:22 -0600, Jeff Law wrote:
> > > Surya's recent patch to IRA improves the code for sh/pr54602-1.c
> > > slightly.  Specifically it's able to eliminate a save/restore in the
> > > prologue/epilogue and a bit of register shuffling.
> > > 
> > > As a result there literally aren't any insns that can be used to fill
> > > the delay slot of the return, so a nop gets emitted and the test fails.
> > > 
> > > Given there literally aren't any insns to move into the delay slot, the
> > > best course of action is to just drop the test.
> > > 
> > > Pushed to the trunk.
> > > 
> > > Jeff
> > 
> > I can't reproduce what you are saying.
> > Which triplet and flags is your test setup using?
> > 
> > For this test case, GCC 13 with -m4 -ml -O1 -fno-pic:
> No -m flags at all.   As plain of a testrun as you can do.
> 

OK, then what's the default config of your test setup / triplet?
Can you please show the generated code that you get?  Because - like I said
- I can't reproduce it.

Best regards,
Oleg Endo

Re: [committed] Remove compromised sh test

2024-06-26 Thread Oleg Endo

On Wed, 2024-06-26 at 18:30 -0600, Jeff Law wrote:
> > > 
> > 
> > OK, then what's the default config of your test setup / triplet?
> > Can you please show the generated code that you get?  Because - like I said
> > - I can't reproduce it.
> test01:
>  sts.l   pr,@-r15! 31[c=4 l=2]  movsi_i/10
>  add #-4,r15 ! 32[c=4 l=2]  *addsi3/0
>  mov.l   .L3,r0  ! 26[c=10 l=2]  movsi_i/0
>  jsr @r0 ! 12[c=5 l=2]  call_valuei
>  mov.l   r6,@r15 ! 4 [c=4 l=2]  movsi_i/8
>  mov.l   @r15,r1 ! 29[c=1 l=2]  movsi_i/5
>  add r1,r0   ! 30[c=4 l=2]  *addsi3/0
>  add #4,r15  ! 36[c=4 l=2]  *addsi3/0
>  lds.l   @r15+,pr! 38[c=1 l=2]  movsi_i/14
>  rts
>  nop ! 40[c=0 l=4]  *return_i
> 
> 
> Note that there's a scheduling barrier in the RTL between insns 30 and 
> 36.  So instructions prior to insn 36 can't be used to fill the delay slot.
> 

Thanks.  Now I'm also seeing the same result.  Needed to specify -O2 to get
that.  -O1 was not enough it seems.

I don't know why you said that the code for this case improved -- it has
not?!

I think the test is still valid.  The reason for the failure might be
different from the original one (the scheduling barrier for whatever
reason), but the end result is the same -- the last delay slot is not
stuffed, although the 'add r1,r0' could go in there.

I'd like to revert the removal of this test case, as it catches a valid
issue.

Best regards,
Oleg Endo

Re: [RFC PATCH] cse: Add another CSE pass after split1

2024-06-28 Thread Oleg Endo

Hi,

On Thu, 2024-06-27 at 14:56 -0700, Palmer Dabbelt wrote:
> This is really more of a question than a patch.
> 
> Looking at PR/115687 I managed to convince myself there's a general
> class of problems here: splitting might produce constant subexpressions,
> but as far as I can tell there's nothing to eliminate those constant
> subexpressions.  So I very quickly threw together a CSE that doesn't
> fold expressions, and it does eliminate the high-part constants in
> question.

Maybe this is somewhat relevant ... 

On SH there was/is a need to hoist constant loads outside of loops, which
might form as part of combine/split1 optimization.

https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543&action=diff

Don't know about others, but maybe it would make sense to have those passes
permanently added for everyone, with conditional opt-in/opt-out so keep the
compile times down.

Best regards,
Oleg Endo

Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-03 Thread Oleg Endo

Hi!

On Wed, 2024-07-03 at 19:28 +0200, Sébastien Michelland wrote:
> On 2024-07-03 17:59, Jeff Law wrote:
> > On 7/3/24 3:59 AM, Sébastien Michelland wrote:
> > > libgcc's fp-bit.c is quite slow and most modern/developed architectures
> > > have switched to using the soft-fp library. This patch does so for
> > > free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default 
> > > parameters
> > > for the most part, most notably no exceptions.
> > > 
> > > A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows
> > > about x3 speedup (~320 -> 1050 Kwhets/s).
> > > 
> > > I'm sending this as RFC because I'm quite unsure about testing. I built
> > > the compiler and ran the benchmark, but I don't know if GCC has a test
> > > for soft-fp correctness and whether I can run that in my non-hosted
> > > environment. Any advice?
> > > 
> > > Cheers,
> > > Sébastien
> > > 
> > > libgcc/ChangeLog:
> > > 
> > >  * config.host: Use soft-fp library for non-hosted SH3/SH4
> > >  instead of fpdbit.
> > >  * config/sh/sfp-machine.h: New.

> > I'd really like to hear from Oleg on this, though given we're using the 
> > soft-fp library on other targets it seems reasonable at a high level.

I don't understand why this is being limited to SH3 and SH4 only?
Almost all SH4 systems out there have an FPU (unless special configurations
are used).  So I'd say if switching to soft-fp, then for SH-anything, not
just SH3/SH4.

If it yields some improvements for some users, I'm all for it.

> > As far as testing, the GCC testsuite has some FP components which would 
> > implicitly test soft fp on any target that doesn't have hardware 
> > floating point.
> 
> Thank you. I went this route, following the guide [1] and the 
> instructions for cross-compiling [2] before hitting "Newlib does not 
> support CPU sh3eb" which I should have seen coming.
> 
> There are plenty of random ports lying around but just grabbing one 
> doesn't feel right (and I don't have a canonical one to go to as I 
> usually run a custom libc for... mostly bad reasons).
> 
> Deferring maybe again to the few SH users... how do you usually do it?
> 
> 

I think it would make sense to test it using sh-sim on SH2 big-endian and
little endian at least, as that doesn't have an FPU and hence would run
tests utilizing soft-fp.

After building the toolchain for --target=sh-elf, you can use this to run
the testsuite in the simulator:

make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}"

(add make -j parameter according to you needs -- it will be slow)

Let me know if you have any further questions.

Best regards,
Oleg Endo

Re: [committed] Fix various sh define_insn_and_split predicates

2024-07-06 Thread Oleg Endo




On Sat, 2024-07-06 at 06:40 -0600, Jeff Law wrote:
> The sh4-linux-gnu port has failed to bootstrap since the introduction of 
> late combine due to failures to split certain insns.
> 
> This is caused by incorrect predicates in various define_insn_and_split 
> patterns.  Essentially the insn's predicate is something like 
> "TARGET_SH1".  The split predicate is "&& can_create_pseudos_p ()".  So 
> these patterns will match post-reload, but be un-splittable.  So at 
> assembly output time, we get the failure as the output template is "#".
> 
> This patch fixes the most obvious & egregious cases by bringing the 
> split condition into the insn's predicate and leaving "&& 1" as the 
> split condition.  That's enough to get sh4-linux-gnu bootstrapping again 
> and I'm hoping it does the same for sh4eb-linux-gnu.
> 
> Pushing to the trunk.
> 

Thanks, Jeff!

Best regards,
Oleg Endo

Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-06 Thread Oleg Endo

Hi,

( For some weird reason I keep losing Sebastien's messages ... )

On Sat, 2024-07-06 at 07:35 -0600, Jeff Law wrote:
> 
> On 7/5/24 1:28 AM, Sébastien Michelland wrote:
> > Hi Oleg!
> > 
> > > I don't understand why this is being limited to SH3 and SH4 only?
> > > Almost all SH4 systems out there have an FPU (unless special 
> > > configurations
> > > are used).  So I'd say if switching to soft-fp, then for SH-anything, not
> > > just SH3/SH4.
> > > 
> > > If it yields some improvements for some users, I'm all for it.
> > 
> > Yeah I just defaulted to SH3/SH4 conservatively because that's the only 
> > hardware I have. (My main platform also happens to be one of these SH4 
> > without an FPU, the SH4AL-DSP.)

Oh, wow, especially rare type!

> > 
> > Once this is tested/validated on simulator, I'll happily simplify the 
> > patch to apply to all SH.

The default sh-elf configuration has no multi-libs for SH3 and SH4 variants
without FPU (from what I can see).  So it won't use soft-fp so much during
sim testing.  So please change to soft-fp for sh*, not just SH3/SH4.

> > 
> > > I think it would make sense to test it using sh-sim on SH2 big-endian and
> > > little endian at least, as that doesn't have an FPU and hence would run
> > > tests utilizing soft-fp.
> > > 
> > > After building the toolchain for --target=sh-elf, you can use this to run
> > > the testsuite in the simulator:
> > > 
> > > make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}"
> > > 
> > > (add make -j parameter according to you needs -- it will be slow)
> > 
> > Alright, it might take a little bit.
> > 
> > Building the combined tree of gcc/binutils/newlib masters (again 
> > following [1])
> > 

I have never built the toolchain using a combined tree.  Like you said, it's
difficult to debug and so on.  I've only built it separately and never had
any issues with this approach on multiple platforms/targets.

Here's an old proposed change to the simtest instructions to not use
combined trees:

https://gcc.gnu.org/pipermail/gcc-patches/attachments/20140815/fb38918e/attachment.bin

> 

> This is almost certainly a poorly written pattern.  I just fixed a bunch 
> of these, but not this one.  Essentially a recent change in the generic 
> parts of the compiler is exposing some bugs in the SH backend. 

The patterns were written and tested to the best of our knowledge at that
time many years ago.  Nobody thought that we'll get a 2nd combine pass after
RA.  Anyway, I'll have a look at the remaining patterns.

Sebastien, in the meantime you could also try out and test your changes on
the latest GCC 14 branch, which shouldn't have those issues.

Best regards,
Oleg Endo

Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-08 Thread Oleg Endo



Hi,

> > > > The default sh-elf configuration has no multi-libs for SH3 and SH4 
> > > > variants
> > > > without FPU (from what I can see).  So it won't use soft-fp so much 
> > > > during
> > > > sim testing.  So please change to soft-fp for sh*, not just SH3/SH4.
> > 
> > Got it, done that locally, and will update patch once tested.
> > 
> > > > Here's an old proposed change to the simtest instructions to not use
> > > > combined trees:
> > > > 
> > > > https://gcc.gnu.org/pipermail/gcc-patches/attachments/20140815/fb38918e/attachment.bin
> > 
> > Thanks for the instructions. Apologies for the back-and-forth as I'm 
> > pretty new with this infrastructure (I usually do research stuff on LLVM).

No need to apologize.  I know this is a tedious and annoying thing to go
through and there is only very little useful information out there.

> > The split-tree build goes better, still fails with GCC 15 (as expected, 
> > though somehow my custom toolchain did build originally) and sort of 
> > works with GCC 14.

> > The binutils/gdb repos have been merged since that attachement, and 
> > while I can build binutils only with --disable-gdb, building gdb (in 
> > another build folder, reconfiguring from scratch) seems iffy. The global 
> > CFLAGS/CXXFLAGS to switch to 32-bit affects at least parts of binutils, 
> > resulting in a broken toolchain due to architecture mixup:

It shouldn't be needed to build GDB separately or to specify the -m32 flags.
Not sure why you have to do that.

I've just tried the following configure lines:

binutils-gdb (binutils-2_41-release)
<..>/configure --target=sh-elf --prefix=/usr/local --disable-nls 
--disable-werror --enable-initfini-array

gcc (any version)
<..>/configure --target=sh-elf --prefix=/usr/local --enable-languages=c,c++,lto 
--disable-nls --disable-werror --with-newlib --enable-lto --enable-multilib 
--with-system-zlib --disable-libstdcxx-verbose --disable-symvers

newlib (latest)
CFLAGS_FOR_TARGET="-Wno-error=implicit-function-declaration -Wno-implicit-int 
-ffunction-sections -fdata-sections -flto" <..>/newlib/configure --host=sh-elf 
--target=sh-elf --prefix=/usr/local --enable-multilib 
--enable-newlib-io-c99-formats

Note that the latest newlib version will try to create multilib directories
one directory above its current build directory for some reason.  So just
create another sub-directory in the build directory and do the config and
build from there.

Other than that, the build steps are the same as before.


I could reproduce the issue with the latest GCC when building libstdc++. 
I'm working on a fix for it.


Unfortunately I'm also getting the SIGBUS error when running a C++ program
that uses std::cout / std::cerr.

To be honest, I don't remember what the issue was/is, whether this has ever
worked at all or not.  I've tried rewinding everything back ~10 years ago
but was still getting the same error.  Using printf from the simulator seems
to work fine though.  So I guess a bunch of C++ tests of the GCC testsuite
will fail on the simulator, but that could be tolerable -- it never passed
all the tests on the simulator anyway.  It's still a good way to test for
regressions that could be introduced by a patch.

> > How active are the main types? Like are there still new products 
> > designed with these (maybe the J2)?

There is some activity on the software side which mainly stems from folks
using old parts and systems.  I'd say the biggest activity is now people
hammering on Sega 32X (SH2), Saturn (SH2) and Dreamcast (SH4), but I might
be biased here.

As for new hardware, I'm not sure.  Apparently it's still possible to
license SH4A(+FPU) and SH4AL-DSP IP cores from Renesas, but I doubt anybody
is really doing that.  Some parts are still being manufactured, like SH2A
for some niche applications. Don't know what j-core people are up to these
days.  Some of the SH MCUs have been re-implemented as open source gateware
for the MisTer FPGA project.

> > 
> > I'd be interested to learn more about the history of the SH backend, if 
> > anyone wrote that up somewhere...
> > 

>From what I know it started during the earlier cygwin days in the 90s,
originally contracted by Hitachi to complement their own in-house C compiler
and also to allow sh-linux to happen at some point.  It was entertained by
Renesas for a while through further contracted support work but eventually
they have abandoned it.  STmicro was also a licensee of the SH4 CPU for
their TV set top boxes and had a few guys submitting patches now and then
for a while.  But the whole thing basically went on life support about 10
years ago.

Perhaps Jeff or others can give more insight on the historical parts.


Best regards,
Oleg Endo

Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-05-16 Thread Oleg Endo



On Thu, 2024-05-16 at 10:35 +0200, Richard Biener wrote:
> On Fri, Apr 5, 2024 at 8:14 PM Andrew Pinski  wrote:
> > 
> > On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis  
> > wrote:
> > > 
> > > If we consider code like:
> > > 
> > > if (bar1 == x)
> > >   return foo();
> > > if (bar2 != y)
> > >   return foo();
> > > return 0;
> > > 
> > > We would like the ifcombine pass to convert this to:
> > > 
> > > if (bar1 == x || bar2 != y)
> > >   return foo();
> > > return 0;
> > > 
> > > The ifcombine pass can handle this transformation but it is ran very 
> > > early and
> > > it misses the opportunity because there are two seperate blocks for foo().
> > > The pre pass is good at removing duplicate code and blocks and due to that
> > > running ifcombine again after it can increase the number of successful
> > > conversions.
> > 
> > I do think we should have something similar to re-running
> > ssa-ifcombine but I think it should be much later, like after the loop
> > optimizations are done.
> > Maybe just a simplified version of it (that does the combining and not
> > the optimizations part) included in isel or pass_optimize_widening_mul
> > (which itself should most likely become part of isel or renamed since
> > it handles more than just widening multiply these days).
> 
> I've long wished we had a (late?) pass that can also undo if-conversion
> (basically do what RTL expansion would later do).  Maybe
> gimple-predicate-analysis.cc (what's used by uninit analysis) can
> represent mixed CFG + if-converted conditions so we can optimize
> it and code-gen the condition in a more optimal manner much like
> we have if-to-switch, switch-conversion and switch-expansion.
> 
> That said, I agree that re-running ifcombine should be later.  And there's
> still the old task of splitting tail-merging from PRE (and possibly making
> it more effective).

Sorry to butt in, but it might be little bit relevant and caught my
attention.

I've got this SH patch sitting around
https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543

The idea is basically to run an additional loop pass after combine and
split1.  The main purpose is to hoist constant loads out of loops. Such
constant loads might be formed (in this particular case) during combine
transformations.

The patch adds a new file gcc/config/sh/sh_loop.cc, which has some boiler-
plate code copy pasted from other places to get the loop pass setup and
going.

Any thoughts on this way of doing it?


Best regards,
Oleg Endo

Re: [PATCH 45/52] sh: New hook implementation sh_c_mode_for_floating_type

2024-06-02 Thread Oleg Endo



Hi!

On Sun, 2024-06-02 at 22:01 -0500, Kewen Lin wrote:
> This is to remove macro LONG_DOUBLE_TYPE_SIZE define in
> sh port, and add new port specific hook implementation
> sh_c_mode_for_floating_type.
> 

The SH parts look OK to me.

Best regards,
Oleg Endo


> gcc/ChangeLog:
> 
>   * config/sh/sh.cc (sh_c_mode_for_floating_type): New function.
>   (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
>   * config/sh/sh.h (LONG_DOUBLE_TYPE_SIZE): Remove.
> ---
>  gcc/config/sh/sh.cc | 18 ++
>  gcc/config/sh/sh.h  | 10 --
>  2 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
> index ef3c2e6791d..bc017420381 100644
> --- a/gcc/config/sh/sh.cc
> +++ b/gcc/config/sh/sh.cc
> @@ -328,6 +328,7 @@ static unsigned int sh_hard_regno_nregs (unsigned int, 
> machine_mode);
>  static bool sh_hard_regno_mode_ok (unsigned int, machine_mode);
>  static bool sh_modes_tieable_p (machine_mode, machine_mode);
>  static bool sh_can_change_mode_class (machine_mode, machine_mode, 
> reg_class_t);
> +static machine_mode sh_c_mode_for_floating_type (enum tree_index);
>  
>  TARGET_GNU_ATTRIBUTES (sh_attribute_table,
>  {
> @@ -664,6 +665,9 @@ TARGET_GNU_ATTRIBUTES (sh_attribute_table,
>  #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
>  #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
>  
> +#undef TARGET_C_MODE_FOR_FLOATING_TYPE
> +#define TARGET_C_MODE_FOR_FLOATING_TYPE sh_c_mode_for_floating_type
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
>  
> @@ -10674,6 +10678,20 @@ sh_can_change_mode_class (machine_mode from, 
> machine_mode to,
>return true;
>  }
>  
> +/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return SFmode or DFmode
> +   for TI_DOUBLE_TYPE which is for double type, go with the default one
> +   for the others.  */
> +
> +static machine_mode
> +sh_c_mode_for_floating_type (enum tree_index ti)
> +{
> +  /* Since the SH2e has only `float' support, it is desirable to make all
> + floating point types equivalent to `float'.  */
> +  if (ti == TI_DOUBLE_TYPE)
> +return TARGET_FPU_SINGLE_ONLY ? SFmode : DFmode;
> +  return default_mode_for_floating_type (ti);
> +}
> +
>  /* Return true if registers in machine mode MODE will likely be
> allocated to registers in small register classes.  */
>  bool
> diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
> index 7d3a3f08338..53cad85d122 100644
> --- a/gcc/config/sh/sh.h
> +++ b/gcc/config/sh/sh.h
> @@ -425,9 +425,6 @@ extern const sh_atomic_model& selected_atomic_model 
> (void);
>  /* Width in bits of a `long long'.  */
>  #define LONG_LONG_TYPE_SIZE 64
>  
> -/* Width in bits of a `long double'.  */
> -#define LONG_DOUBLE_TYPE_SIZE 64
> -
>  /* Width of a word, in units (bytes).  */
>  #define UNITS_PER_WORD   (4)
>  #define MIN_UNITS_PER_WORD 4
> @@ -1433,13 +1430,6 @@ extern bool current_function_interrupt;
> Do not define this if the table should contain absolute addresses.  */
>  #define CASE_VECTOR_PC_RELATIVE 1
>  
> -/* Define it here, so that it doesn't get bumped to 64-bits on SHmedia.  */
> -#define FLOAT_TYPE_SIZE 32
> -
> -/* Since the SH2e has only `float' support, it is desirable to make all
> -   floating point types equivalent to `float'.  */
> -#define DOUBLE_TYPE_SIZE (TARGET_FPU_SINGLE_ONLY ? 32 : 64)
> -
>  /* 'char' is signed by default.  */
>  #define DEFAULT_SIGNED_CHAR  1
>  
> -- 
> 2.43.0
>

Re: [PATCH 4/6] sh: Make *minus_plus_one work after RA

2024-06-20 Thread Oleg Endo



On Thu, 2024-06-20 at 14:34 +0100, Richard Sandiford wrote:
> *minus_plus_one had no constraints, which meant that it could be
> matched after RA with operands 0, 1 and 2 all being different.
> The associated split instead requires operand 0 to be tied to
> operand 1.

Thanks for spotting this.  Makes sense, please install.

Best regards,
Oleg Endo

> 
> gcc/
>   * config/sh/sh.md (*minus_plus_one): Add constraints.
> ---
>  gcc/config/sh/sh.md | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
> index 92a1efeb811..9491b49e55b 100644
> --- a/gcc/config/sh/sh.md
> +++ b/gcc/config/sh/sh.md
> @@ -1642,9 +1642,9 @@ (define_insn_and_split "*addc"
>  ;; matched.  Split this up into a simple sub add sequence, as this will save
>  ;; us one sett insn.
>  (define_insn_and_split "*minus_plus_one"
> -  [(set (match_operand:SI 0 "arith_reg_dest" "")
> - (plus:SI (minus:SI (match_operand:SI 1 "arith_reg_operand" "")
> -(match_operand:SI 2 "arith_reg_operand" ""))
> +  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
> + (plus:SI (minus:SI (match_operand:SI 1 "arith_reg_operand" "0")
> +(match_operand:SI 2 "arith_reg_operand" "r"))
>(const_int 1)))]
>"TARGET_SH1"
>"#"
> -- 
> 2.25.1
>

Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-20 Thread Oleg Endo



On Thu, 2024-06-20 at 14:34 +0100, Richard Sandiford wrote:
> 
> I tried compiling at least one target per CPU directory and comparing
> the assembly output for parts of the GCC testsuite.  This is just a way
> of getting a flavour of how the pass performs; it obviously isn't a
> meaningful benchmark.  All targets seemed to improve on average:
> 
> Target Tests   GoodBad   %Good   Delta  Median
> == =   ===   =   =  ==
> aarch64-linux-gnu   2215   1975240  89.16%   -4159  -1
> aarch64_be-linux-gnu1569   1483 86  94.52%  -10117  -1
> alpha-linux-gnu 1454   1370 84  94.22%   -9502  -1
> amdgcn-amdhsa   5122   4671451  91.19%  -35737  -1
> arc-elf 2166   1932234  89.20%  -37742  -1
> arm-linux-gnueabi   1953   1661292  85.05%  -12415  -1
> arm-linux-gnueabihf 1834   1549285  84.46%  -11137  -1
> avr-elf 4789   4330459  90.42% -441276  -4
> bfin-elf2795   2394401  85.65%  -19252  -1
> bpf-elf 3122   2928194  93.79%   -8785  -1
> c6x-elf 2227   1929298  86.62%  -17339  -1
> cris-elf3464   3270194  94.40%  -23263  -2
> csky-elf2915   2591324  88.89%  -22146  -1
> epiphany-elf2399   2304 95  96.04%  -28698  -2
> fr30-elf7712   7299413  94.64%  -99830  -2
> frv-linux-gnu   3332   2877455  86.34%  -25108  -1
> ft32-elf2775   2667108  96.11%  -25029  -1
> h8300-elf   3176   2862314  90.11%  -29305  -2
> hppa64-hp-hpux11.23 4287   4247 40  99.07%  -45963  -2
> ia64-linux-gnu  2343   1946397  83.06%   -9907  -2
> iq2000-elf  9684   9637 47  99.51% -126557  -2
> lm32-elf2681   2608 73  97.28%  -59884  -3
> loongarch64-linux-gnu   1303   1218 85  93.48%  -13375  -2
> m32r-elf1626   1517109  93.30%   -9323  -2
> m68k-linux-gnu  3022   2620402  86.70%  -21531  -1
> mcore-elf   2315   2085230  90.06%  -24160  -1
> microblaze-elf  2782   2585197  92.92%  -16530  -1
> mipsel-linux-gnu1958   1827131  93.31%  -15462  -1
> mipsisa64-linux-gnu 1655   1488167  89.91%  -16592  -2
> mmix4914   4814100  97.96%  -63021  -1
> mn10300-elf 3639   3320319  91.23%  -34752  -2
> moxie-rtems 3497   3252245  92.99%  -87305  -3
> msp430-elf  4353   3876477  89.04%  -23780  -1
> nds32le-elf 3042   2780262  91.39%  -27320  -1
> nios2-linux-gnu 1683   1355328  80.51%   -8065  -1
> nvptx-none  2114   1781333  84.25%  -12589  -2
> or1k-elf3045   2699346  88.64%  -14328  -2
> pdp11   4515   4146369  91.83%  -26047  -2
> pru-elf 1585   1245340  78.55%   -5225  -1
> riscv32-elf 2122   2000122  94.25% -101162  -2
> riscv64-elf 1841   1726115  93.75%  -49997  -2
> rl78-elf2823   2530293  89.62%  -40742  -4
> rx-elf  2614   2480134  94.87%  -18863  -1
> s390-linux-gnu  1591   1393198  87.55%  -16696  -1
> s390x-linux-gnu 2015   1879136  93.25%  -21134  -1
> sh-linux-gnu1870   1507363  80.59%   -9491  -1
> sparc-linux-gnu 1123   1075 48  95.73%  -14503  -1
> sparc-wrs-vxworks   1121   1073 48  95.72%  -14578  -1
> sparc64-linux-gnu   1096   1021 75  93.16%  -15003  -1
> v850-elf1897   1728169  91.09%  -11078  -1
> vax-netbsdelf   3035   2995 40  98.68%  -27642  -1
> visium-elf  1392   1106286  79.45%   -7984  -2
> xstormy16-elf   2577   2071506  80.36%  -13061  -1
> 
> 

Since you have already briefly compared some of the code, can you share
those cases which get worse and might require some potential follow up
patches?

Best regards,
Oleg Endo

[SH, committed]: Fix outage caused by secondary combine pass (was: Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4)

2024-07-20 Thread Oleg Endo

Hi,

I've committed the attached patch to fix the full gcc + libstdc++ build on
sh-elf.

Best regards,
Oleg Endo



On Sat, 2024-07-06 at 07:35 -0600, Jeff Law wrote:
> 
> On 7/5/24 1:28 AM, Sébastien Michelland wrote:
> > Hi Oleg!
> > 
> > > I don't understand why this is being limited to SH3 and SH4 only?
> > > Almost all SH4 systems out there have an FPU (unless special 
> > > configurations
> > > are used).  So I'd say if switching to soft-fp, then for SH-anything, not
> > > just SH3/SH4.
> > > 
> > > If it yields some improvements for some users, I'm all for it.
> > 
> > Yeah I just defaulted to SH3/SH4 conservatively because that's the only 
> > hardware I have. (My main platform also happens to be one of these SH4 
> > without an FPU, the SH4AL-DSP.)
> > 
> > Once this is tested/validated on simulator, I'll happily simplify the 
> > patch to apply to all SH.
> > 
> > > I think it would make sense to test it using sh-sim on SH2 big-endian and
> > > little endian at least, as that doesn't have an FPU and hence would run
> > > tests utilizing soft-fp.
> > > 
> > > After building the toolchain for --target=sh-elf, you can use this to run
> > > the testsuite in the simulator:
> > > 
> > > make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}"
> > > 
> > > (add make -j parameter according to you needs -- it will be slow)
> > 
> > Alright, it might take a little bit.
> > 
> > Building the combined tree of gcc/binutils/newlib masters (again 
> > following [1]) gives me an ICE in libstdc++v3/src/libbacktrace, 
> > irrespective of my libgcc change:
> This is almost certainly a poorly written pattern.  I just fixed a bunch 
> of these, but not this one.  Essentially a recent change in the generic 
> parts of the compiler is exposing some bugs in the SH backend. 
> Specifically:
> 
> > ;; Store (negated) T bit as all zeros or ones in a reg.  
> > ;;  subcRn,Rn   ! Rn = Rn - Rn - T; T = T
> > ;;  not Rn,Rn   ! Rn = 0 - Rn
> > ;; 
> > ;; Note the call to sh_split_treg_set_expr may clobber
> > ;; the T reg.  We must express this, even though it's
> > ;; not immediately obvious this pattern changes the
> > ;; T register.
> > (define_insn_and_split "mov_neg_si_t"
> >   [(set (match_operand:SI 0 "arith_reg_dest" "=r")
> > (neg:SI (match_operand 1 "treg_set_expr")))
> >(clobber (reg:SI T_REG))] 
> >   "TARGET_SH1" 
> > {
> >   gcc_assert (t_reg_operand (operands[1], VOIDmode));
> >   return "subc  %0,%0";
> > }
> >   "&& can_create_pseudo_p () && !t_reg_operand (operands[1], VOIDmode)"
> >   [(const_int 0)]
> > {
> >   sh_treg_insns ti = sh_split_treg_set_expr (operands[1], curr_insn);
> >   emit_insn (gen_mov_neg_si_t (operands[0], get_t_reg_rtx ()));
> > 
> >   if (ti.remove_trailing_nott ())
> > emit_insn (gen_one_cmplsi2 (operands[0], operands[0]));
> > 
> >   DONE; 
> > }
> >   [(set_attr "type" "arith")])
> 
> 
> As written this pattern could match after register allocation is 
> complete and thus we can't create new pseudos (the condition TARGET_SH1 
> controls that behavior).  operands[1] won't necessarily be the T 
> register in that case.
> 
> The split condition fails because we can't create new pseudos, so it's 
> left as-is.  At final assembly time the assertion triggers.
> 
> the "&& can_create_pseudo ()" part of the split condition should be 
> moved into the main condition.  I think that's all that's necessary to 
> fix this problem.  It'd probably be best of Oleg went through the 
> various define_insn_and_split patterns that utilize can_create_pseudo in 
> their split condition and evaluated them.
> 
> I only fixed the most obvious cases in my change from this morning.  I 
> don't typically work on the SH port and for changes which aren't 
> obviously correct, Oleg is in a better position to evaluate the proper fix.
> 
> jeff
From 9e740e7d71d02369774e1380902bddd9681c463f Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Sun, 21 Jul 2024 14:11:21 +0900
Subject: [PATCH] SH: Fix outage caused by recently added 2nd combine pass after reg alloc

I've also confirmed on the CSiBE set that the secondary combine pass is
actually beneficial on SH.  It does result in some code size reductions.

gcc/CHangeLog:
	* config/sh/sh.md (mov_neg_s

Re: [PATCH] sh: Don't call make_insn_raw in sh_recog_treg_set_expr [PR116189]

2024-08-05 Thread Oleg Endo



On Mon, 2024-08-05 at 14:15 -0700, Andrew Pinski wrote:
> This was an interesting compare debug failure to debug. The first symptom
> was in gcse which would produce different order of creating psedu-registers. 
> This
> was caused by a different order of a hashtable walk, due to the hash table 
> having different
> number of entries. Which in turn was due to the number of max insn being 
> different between
> the 2 runs. The place max insn uid comes from was in sh_recog_treg_set_expr 
> which is called
> via rtx_costs and fwprop would cause rtx_costs in some cases for debug insn 
> related stuff.
> 
> Build and tested for sh4-linux-gnu.


Thanks so much!
I think it should be safe to install this on all open branches.


Best regards,
Oleg Endo


> 
>   PR target/116189
> 
> gcc/ChangeLog:
> 
>   * config/sh/sh.cc (sh_recog_treg_set_expr): Don't call make_insn_raw,
>   make the insn with a fake uid.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/torture/pr116189-1.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/sh/sh.cc   | 12 +++-
>  .../c-c++-common/torture/pr116189-1.c | 30 +++
>  2 files changed, 41 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/c-c++-common/torture/pr116189-1.c
> 
> diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
> index bc017420381..7391b8df583 100644
> --- a/gcc/config/sh/sh.cc
> +++ b/gcc/config/sh/sh.cc
> @@ -12297,7 +12297,17 @@ sh_recog_treg_set_expr (rtx op, machine_mode mode)
>   have to capture its current state and restore it afterwards.  */
>recog_data_d prev_recog_data = recog_data;
>  
> -  rtx_insn* i = make_insn_raw (gen_rtx_SET (get_t_reg_rtx (), op));
> +  /* Note we can't use insn_raw here since that increases the uid
> + and could cause debug compare differences; this insn never leaves
> + this function so create a dummy one. */
> +  rtx_insn* i = as_a  (rtx_alloc (INSN));
> +
> +  INSN_UID (i) = 1;
> +  PATTERN (i) = gen_rtx_SET (get_t_reg_rtx (), op);
> +  INSN_CODE (i) = -1;
> +  REG_NOTES (i) = NULL;
> +  INSN_LOCATION (i) = curr_insn_location ();
> +  BLOCK_FOR_INSN (i) = NULL;
>SET_PREV_INSN (i) = NULL;
>SET_NEXT_INSN (i) = NULL;
>  
> diff --git a/gcc/testsuite/c-c++-common/torture/pr116189-1.c 
> b/gcc/testsuite/c-c++-common/torture/pr116189-1.c
> new file mode 100644
> index 000..055c563f43e
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/torture/pr116189-1.c
> @@ -0,0 +1,30 @@
> +/* { dg-additional-options "-fcompare-debug" } */
> +
> +/* PR target/116189 */
> +
> +/* In the sh backend, we used to create insn in the path of rtx_costs.
> +   This means sometimes the max uid for insns would be different between
> +   debugging and non debugging which then would cause gcse's hashtable
> +   to have different number of slots which would cause a different walk
> +   for that hash table.  */
> +
> +extern void ff(void);
> +extern short nn[8][4];
> +typedef unsigned short move_table[4];
> +extern signed long long ira_overall_cost;
> +extern signed long long ira_load_cost;
> +extern move_table *x_ira_register_move_cost[1];
> +struct move { struct move *next; };
> +unsigned short t;
> +void emit_move_list(struct move * list, int freq, unsigned char mode, int 
> regno) {
> +  int cost;
> +  for (; list != 0; list = list->next)
> +  {
> +ff();
> +unsigned short aclass = t;
> +cost = (nn)[mode][aclass] ;
> +ira_load_cost = cost;
> +cost = x_ira_register_move_cost[mode][aclass][aclass] * freq ;
> +ira_overall_cost = cost;
> +  }
> +}
> -- 
> 2.43.0
>

Add overload for register_pass

2013-08-24 Thread Oleg Endo

Hi,

I've been working on a SH specific RTL pass and just adapted it to the
new pass handling.  One thing that bugged me was pass registration.  How
about adding an overload for 'register_pass' as in the attached patch?
Registering a pass is then as simple as:

  register_pass (make_new_ifcvt_sh (g, false, "ifcvt1_sh"),
 PASS_POS_INSERT_AFTER, "ce1", 1);

Tested with make all-gcc.

Cheers,
Oleg

gcc/ChangeLog:
* passes.c (register_pass): Add overload.
* tree-pass.h (register_pass): Forward declare it.
Add comment.
Index: gcc/tree-pass.h
===
--- gcc/tree-pass.h	(revision 201967)
+++ gcc/tree-pass.h	(working copy)
@@ -91,7 +91,8 @@
   virtual opt_pass *clone ();
 
   /* If has_gate is set, this pass and all sub-passes are executed only if
- the function returns true.  */
+ the function returns true.
+ The default implementation returns true.  */
   virtual bool gate ();
 
   /* This is the code to run.  If has_execute is false, then there should
@@ -330,6 +331,14 @@
   enum pass_positioning_ops pos_op; /* how to insert the new pass.  */
 };
 
+/* Registers a new pass.  Either fill out the register_pass_info or specify
+   the individual parameters.  The pass object is expected to have been
+   allocated using operator new and the pass manager takes the ownership of
+   the pass object.  */
+extern void register_pass (register_pass_info *);
+extern void register_pass (opt_pass* pass, pass_positioning_ops pos,
+			   const char* ref_pass_name, int ref_pass_inst_number);
+
 extern gimple_opt_pass *make_pass_mudflap_1 (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_mudflap_2 (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_asan (gcc::context *ctxt);
@@ -594,7 +603,6 @@
 extern void ipa_read_optimization_summaries (void);
 extern void register_one_dump_file (struct opt_pass *);
 extern bool function_called_by_processed_nodes_p (void);
-extern void register_pass (struct register_pass_info *);
 
 /* Set to true if the pass is called the first time during compilation of the
current function.  Note that using this information in the optimization
Index: gcc/passes.c
===
--- gcc/passes.c	(revision 201967)
+++ gcc/passes.c	(working copy)
@@ -1365,7 +1365,19 @@
 register_pass (struct register_pass_info *pass_info)
 {
   g->get_passes ()->register_pass (pass_info);
+}
 
+void
+register_pass (opt_pass* pass, pass_positioning_ops pos,
+	   const char* ref_pass_name, int ref_pass_inst_number)
+{
+  register_pass_info i;
+  i.pass = pass;
+  i.reference_pass_name = ref_pass_name;
+  i.ref_pass_instance_number = ref_pass_inst_number;
+  i.pos_op = pos;
+
+  g->get_passes ()->register_pass (&i);
 }
 
 void

Re: [PATCH] Fix illegal cast to rtx (*insn_gen_fn) (rtx, ...)

2013-08-29 Thread Oleg Endo

On Wed, 2013-08-07 at 21:24 +0200, Oleg Endo wrote:
> On Wed, 2013-08-07 at 15:08 -0400, Michael Meissner wrote:
> > On Tue, Aug 06, 2013 at 11:45:40PM +0200, Oleg Endo wrote:
> > > On Mon, 2013-08-05 at 13:25 -1000, Richard Henderson wrote:
> > > > On 08/05/2013 12:32 PM, Oleg Endo wrote:
> > > > > Thanks, committed as rev 201513.
> > > > > 4.8 also has the same problem.  The patch applies on 4.8 branch 
> > > > > without
> > > > > problems and make all-gcc works.
> > > > > OK for 4.8, too?
> > > > 
> > > > Hum.  I suppose so, since it's relatively self-contained.  I suppose the
> > > > out-of-tree openrisc port will thank us...
> > > 
> > > Maybe it's better to wait for a while and collect follow up patches such
> > > as the rs6000 one.
> > 
> > The tree right now is broken for the powerpc.  I would prefer to get patches
> > installed ASAP rather than waiting for additional ports.
> 
> I've just committed the PPC fix for trunk.  Sorry for the delay.
> I haven't committed anything related to this issue on the 4.8 branch
> yet.  I'll do that next week if nothing else comes up.

Sorry for the delay.  I've just backported the 2 patches to 4.8.
Tested with 'make all-gcc' for SH and PPC cross compilers.
Committed as rev 202083.

Cheers,
Oleg
Index: gcc/expr.c
===
--- gcc/expr.c	(revision 202080)
+++ gcc/expr.c	(working copy)
@@ -119,7 +119,7 @@
   int reverse;
 };
 
-static void move_by_pieces_1 (rtx (*) (rtx, ...), enum machine_mode,
+static void move_by_pieces_1 (insn_gen_fn, machine_mode,
 			  struct move_by_pieces_d *);
 static bool block_move_libcall_safe_for_call_parm (void);
 static bool emit_block_move_via_movmem (rtx, rtx, rtx, unsigned, unsigned, HOST_WIDE_INT);
@@ -128,7 +128,7 @@
 static rtx clear_by_pieces_1 (void *, HOST_WIDE_INT, enum machine_mode);
 static void clear_by_pieces (rtx, unsigned HOST_WIDE_INT, unsigned int);
 static void store_by_pieces_1 (struct store_by_pieces_d *, unsigned int);
-static void store_by_pieces_2 (rtx (*) (rtx, ...), enum machine_mode,
+static void store_by_pieces_2 (insn_gen_fn, machine_mode,
 			   struct store_by_pieces_d *);
 static tree clear_storage_libcall_fn (int);
 static rtx compress_float_constant (rtx, rtx);
@@ -1043,7 +1043,7 @@
to make a move insn for that mode.  DATA has all the other info.  */
 
 static void
-move_by_pieces_1 (rtx (*genfun) (rtx, ...), enum machine_mode mode,
+move_by_pieces_1 (insn_gen_fn genfun, machine_mode mode,
 		  struct move_by_pieces_d *data)
 {
   unsigned int size = GET_MODE_SIZE (mode);
@@ -2657,7 +2657,7 @@
to make a move insn for that mode.  DATA has all the other info.  */
 
 static void
-store_by_pieces_2 (rtx (*genfun) (rtx, ...), enum machine_mode mode,
+store_by_pieces_2 (insn_gen_fn genfun, machine_mode mode,
 		   struct store_by_pieces_d *data)
 {
   unsigned int size = GET_MODE_SIZE (mode);
Index: gcc/recog.h
===
--- gcc/recog.h	(revision 202080)
+++ gcc/recog.h	(working copy)
@@ -256,8 +256,58 @@
 
 typedef int (*insn_operand_predicate_fn) (rtx, enum machine_mode);
 typedef const char * (*insn_output_fn) (rtx *, rtx);
-typedef rtx (*insn_gen_fn) (rtx, ...);
 
+struct insn_gen_fn
+{
+  typedef rtx (*f0) (void);
+  typedef rtx (*f1) (rtx);
+  typedef rtx (*f2) (rtx, rtx);
+  typedef rtx (*f3) (rtx, rtx, rtx);
+  typedef rtx (*f4) (rtx, rtx, rtx, rtx);
+  typedef rtx (*f5) (rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f6) (rtx, rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f7) (rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f8) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f9) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f10) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f11) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f12) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f13) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f14) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f15) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+  typedef rtx (*f16) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+
+  typedef f0 stored_funcptr;
+
+  rtx operator () (void) const { return ((f0)func) (); }
+  rtx operator () (rtx a0) const { return ((f1)func) (a0); }
+  rtx operator () (rtx a0, rtx a1) const { return ((f2)func) (a0, a1); }
+  rtx operator () (rtx a0, rtx a1, rtx a2) const { return ((f3)func) (a0, a1, a2); }
+  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3) const { return ((f4)func) (

Re: [PATCH] Fix illegal cast to rtx (*insn_gen_fn) (rtx, ...)

2013-08-29 Thread Oleg Endo

On Thu, 2013-08-29 at 20:51 +0200, Jakub Jelinek wrote:
> On Thu, Aug 29, 2013 at 08:45:33PM +0200, Oleg Endo wrote:
> > Sorry for the delay.  I've just backported the 2 patches to 4.8.
> > Tested with 'make all-gcc' for SH and PPC cross compilers.
> > Committed as rev 202083.
> 
> Please fix the overly long lines as a follow-up.
> 

In m original mail
http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01315.html
I wrote:

* I don't know whether it's really needed to properly format the code of
class insn_gen_fn.  After reading the first two or three overloads
(which do fit into 80 columns) one gets the idea and so I guess nobody
is going to read that stuff completely anyway.

Nobody commented on it and after Richard's OK to the patch I assumed
it's fine that way as an exception.

Of course I'll do it if you insist :)

Cheers,
Oleg


> > +struct insn_gen_fn
> > +{
> > +  typedef rtx (*f0) (void);
> > +  typedef rtx (*f1) (rtx);
> > +  typedef rtx (*f2) (rtx, rtx);
> > +  typedef rtx (*f3) (rtx, rtx, rtx);
> > +  typedef rtx (*f4) (rtx, rtx, rtx, rtx);
> > +  typedef rtx (*f5) (rtx, rtx, rtx, rtx, rtx);
> > +  typedef rtx (*f6) (rtx, rtx, rtx, rtx, rtx, rtx);
> > +  typedef rtx (*f7) (rtx, rtx, rtx, rtx, rtx, rtx, rtx);
> > +  typedef rtx (*f8) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
> > +  typedef rtx (*f9) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
> > +  typedef rtx (*f10) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
> > +  typedef rtx (*f11) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
> > rtx);
> > +  typedef rtx (*f12) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
> > rtx, rtx);
> > +  typedef rtx (*f13) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
> > rtx, rtx, rtx);
> > +  typedef rtx (*f14) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
> > rtx, rtx, rtx, rtx);
> > +  typedef rtx (*f15) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
> > rtx, rtx, rtx, rtx, rtx);
> > +  typedef rtx (*f16) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
> > rtx, rtx, rtx, rtx, rtx, rtx);
> > +
> > +  typedef f0 stored_funcptr;
> > +
> > +  rtx operator () (void) const { return ((f0)func) (); }
> > +  rtx operator () (rtx a0) const { return ((f1)func) (a0); }
> > +  rtx operator () (rtx a0, rtx a1) const { return ((f2)func) (a0, a1); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2) const { return ((f3)func) (a0, 
> > a1, a2); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3) const { return 
> > ((f4)func) (a0, a1, a2, a3); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4) const { return 
> > ((f5)func) (a0, a1, a2, a3, a4); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5) const { 
> > return ((f6)func) (a0, a1, a2, a3, a4, a5); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx a6) 
> > const { return ((f7)func) (a0, a1, a2, a3, a4, a5, a6); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx a6, 
> > rtx a7) const { return ((f8)func) (a0, a1, a2, a3, a4, a5, a6, a7); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx a6, 
> > rtx a7, rtx a8) const { return ((f9)func) (a0, a1, a2, a3, a4, a5, a6, a7, 
> > a8); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx a6, 
> > rtx a7, rtx a8, rtx a9) const { return ((f10)func) (a0, a1, a2, a3, a4, a5, 
> > a6, a7, a8, a9); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx a6, 
> > rtx a7, rtx a8, rtx a9, rtx a10) const { return ((f11)func) (a0, a1, a2, 
> > a3, a4, a5, a6, a7, a8, a9, a10); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx a6, 
> > rtx a7, rtx a8, rtx a9, rtx a10, rtx a11) const { return ((f12)func) (a0, 
> > a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx a6, 
> > rtx a7, rtx a8, rtx a9, rtx a10, rtx a11, rtx a12) const { return 
> > ((f13)func) (a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx a6, 
> > rtx a7, rtx a8, rtx a9, rtx a10, rtx a11, rtx a12, rtx a13) const { return 
> > ((f14)func) (a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13); }
> > +  rtx operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx a6, 
> > rtx a7, rtx a8, rtx a9, rtx a10, rtx a11, rtx a12, rtx a13, rtx a14) const 
> > { return ((f15)func) (a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, 
> &

Re: [PATCH, SH4] Fix PR58314 (unsatisfied constraints)

2013-09-12 Thread Oleg Endo

On Thu, 2013-09-12 at 15:37 +0200, Christian Bruel wrote:
> The attached patch fixes an ice while building the linux kernel. Reduced
> in the included testcase.
> 
> The problem is that we are generating a movhi_reg_reg insn that accepts
> only registers as operands. Spilling a pseudo on the stack results in an
> invalid memory load/store constraints.
> 
> The attached patch allows memory for reload.
> Tested with the testsuite on sh4-linux and sh-superh-gcc.
> No performance impact on a large number of benchmarks (EEMBC, CSIBe,
> spec2006, ...)
> 
> Oleg, since you moved out the r,r constraints from *mohi into
> movhi_reg_reg, do you agree ?

Yep.  Just a few nits:

- the comment block above the "*mov_reg_reg" pattern is partially
invalidated by your fix and should be updated, too.

- although the original failure popped up with -Os, I think the test
should go into gcc/testsuite/gcc.target/sh/torture/

Cheers,
Oleg

Re: [PATCH, committed] SH: Fix PR58314 (unsatisfied constraints)

2013-09-18 Thread Oleg Endo

On Wed, 2013-09-18 at 09:55 +0200, Christian Bruel wrote:
> Hi Richard,
> 
> On 09/16/2013 07:10 PM, Richard Sandiford wrote:
> > Hi Christian,
> >
> > Christian Bruel  writes:
> >> @@ -6893,11 +6894,14 @@ label:
> >>  ;; reloading MAC subregs otherwise.  For that probably special patterns
> >>  ;; would be required.
> >>  (define_insn "*mov_reg_reg"
> >> -  [(set (match_operand:QIHI 0 "arith_reg_dest" "=r")
> >> -  (match_operand:QIHI 1 "register_operand" "r"))]
> >> +  [(set (match_operand:QIHI 0 "arith_reg_dest" "=r,m,*z")
> >> +  (match_operand:QIHI 1 "register_operand" "r,*z,m"))]
> > If the constraints allow "m", the predicates need to accept memories too.
> > (It'd be worth having an insn condition that rejects both operands
> > being memories though.)
> >
> > Thanks,
> > Richard
> Thanks for your comment,
> 
> I was wondering this too when doing the fix. I felt that a memory
> operand would be matched by the *movhi" patterns bellow.  As  I wanted
> to fix only the spilling case, so the original operand is a pseudo reg
> having matched the register predicate.
> Without the predicate memory not found, I wonder how I never hit a kind
> of "insn not found" error,  well, 'll give a try to adding a memory
> condition in the predicate, 
> but I fear that the movhi patterns will stop
> to match,

Yes, this will be the case.  The order of the movhi and movqi patterns
in the md file is important.  To address the predicates vs. constraints
issue, the following seems to work:

(define_insn "*mov_reg_reg"
  [(set (match_operand:QIHI 0 "general_movsrc_operand" "=r,m,*z")
(match_operand:QIHI 1 "general_movdst_operand" "r,*z,m"))]
  "TARGET_SH1 && !t_reg_operand (operands[1], VOIDmode)
   && (arith_reg_operand (operands[0], mode)
   || arith_reg_operand (operands[1], mode))
   && (!can_create_pseudo_p () && REG_P (operands[0]) && REG_P (operands[1]))"
  "@
mov %1,%0
mov.%1,%0
mov.%1,%0"
  [(set_attr "type" "move,store,load")])

.. at least it survives the test case for this PR.  I haven't done
further tests.

BTW, in the test case (gcc.target/sh/torture/pr58314.c), this 

/* { dg-options "-Os" } */

defeats the purpose of the torture tests.

Cheers,
Oleg

Re: [PATCH, committed] SH: Fix PR58314 (unsatisfied constraints)

2013-09-18 Thread Oleg Endo

On Thu, 2013-09-19 at 08:15 +0900, Kaz Kojima wrote:
> Christian Bruel  wrote:
> > && (!can_create_pseudo_p () && REG_P (operands[0]) && REG_P (operands[1]))"
> > 
> > is necessary ?
> 
> It looks an another hack to allow the 2nd and 3rd alternatives only
> when reloading.  If so, it might be a bit cleaner to use a special
> predicate like

Yes, that's the idea behind it.  Although I must say, I haven't tried it
without the hack, i.e. allowing memories in the insn also before reload.
If it doesn't cause any regressions, it's probably better to put the
reg-reg alternative back to the "*movhi" and "*movqi" insns and move
those above the displacement addressing patterns.

> ;; Returns 1 if OP can be a source of a mov*_reg_reg insn. Same as
> ;; general_movsrc_operand, but mem allowed only when reload in progress.
> (define_predicate "movsrc_reg_reg_operand"
>   (match_code "subreg,reg,mem")
> {
>   if (reload_in_progress && MEM_P (op))
> return general_movsrc_operand (op, mode);
> 
>   return register_operand (op, mode);
> })
> 
> and its dst version for that purpose.

Yes, sorry for suggesting the lazy version.

Cheers,
Oleg

Re: [PATCH, committed] SH: Fix PR58314 (unsatisfied constraints)

2013-09-19 Thread Oleg Endo

Hi,

On Thu, 2013-09-19 at 10:44 +0200, Christian Bruel wrote:
> Hi Kaz, Oleg,
> 
> On 09/19/2013 01:15 AM, Kaz Kojima wrote:
> > Christian Bruel  wrote:
> >> && (!can_create_pseudo_p () && REG_P (operands[0]) && REG_P (operands[1]))"
> >>
> >> is necessary ?
> > It looks an another hack to allow the 2nd and 3rd alternatives only
> > when reloading.  If so, it might be a bit cleaner to use a special
> > predicate like
> >
> >
> This still looks complicated to me. I have tested for sh-superh-elf and
> sh-linux the attached patch that just "fixes" the issue reported by
> Richard with no regression and absolutely no differences in code
> generation for CSIBe and a few other benches (eembc, coremark, ...). 
> The spill alternatives are correctly selected and the original PR still
> passes.
> 
> If OK I'd like to apply it to trunk/4.8. If there is the need for an
> additional hack, How about sending it separately ?

Yeah, the move patterns probably could use some cleanup / refactoring
anyway.  I also wonder what is going to happen if LRA is used ... but
that's another story.  Have you also checked the patch for SH2A?

Cheers,
Oleg

Re: [PATCH]Fix computation of offset in ivopt

2013-09-24 Thread Oleg Endo

On Tue, 2013-09-24 at 12:31 +0200, Richard Biener wrote:
> On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng  wrote:
> > Hi,
> > This patch fix two minor bugs when computing offset in IVOPT.
> > 1) Considering below example:
> > #define MAX 100
> > struct tag
> > {
> >   int i;
> >   int j;
> > }
> > struct tag arr[MAX]
> >
> > int foo (int len)
> > {
> >   int i = 0;
> >   for (; i < len; i++)
> >   {
> > access arr[i].j;
> >   }
> > }
> >
> > Without this patch, the offset computed by strip_offset_1 for address
> > arr[i].j is ZERO, which is apparently not.
> >
> > 2) Considering below example:
> > //...
> >   :
> >   KeyIndex_66 = KeyIndex_194 + 4294967295;
> >   if (KeyIndex_66 != 0)
> > goto ;
> >   else
> > goto ;
> >
> >   :
> >
> >   :
> >   # KeyIndex_194 = PHI 
> >   _62 = KeyIndex_194 + 1073741823;
> >   _63 = _62 * 4;
> >   _64 = pretmp_184 + _63;
> >   _65 = *_64;
> >   if (_65 == 0)
> > goto ;
> >   else
> > goto ;
> > //...
> >
> > There are iv use and candidate like:
> >
> > use 1
> >   address
> >   in statement _65 = *_64;
> >
> >   at position *_64
> >   type handletype *
> >   base pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4
> >   step 4294967292
> >   base object (void *) pretmp_184
> >   related candidates
> >
> > candidate 6
> >   var_before ivtmp.16
> >   var_after ivtmp.16
> >   incremented before use 1
> >   type unsigned int
> >   base (unsigned int) (pretmp_184 + (sizetype) KeyIndex_180 * 4)
> >   step 4294967292
> >   base object (void *) pretmp_184
> > Candidate 6 is related to use 1
> >
> > In function get_computation_cost_at for use 1 using cand 6, ubase and cbase
> > are:
> > pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4
> > pretmp_184 + (sizetype) KeyIndex_180 * 4
> >
> > The cstepi computed in HOST_WIDE_INT is :  0xfffc, while offset
> > computed in TYPE(utype) is : 0xfffc.  Though they both stand for value
> > "-4" in different precision, statement "offset -= ratio * cstepi" returns
> > 0x1, which is wrong.
> >
> > Tested on x86_64 and arm.  Is it OK?
> 
> +   field = TREE_OPERAND (expr, 1);
> +   if (DECL_FIELD_BIT_OFFSET (field)
> +   && cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field)))
> + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field));
> +
> +   tmp = component_ref_field_offset (expr);
> +   if (top_compref
> +   && cst_and_fits_in_hwi (tmp))
> + {
> +   /* Strip the component reference completely.  */
> +   op0 = TREE_OPERAND (expr, 0);
> +   op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0);
> +   *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT;
> +   return op0;
> + }
> 
> the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false
> for either offset part you may end up doing half accounting and not
> stripping.
> 
> Btw, DECL_FIELD_BIT_OFFSET is always non-NULL.  I suggest to
> rewrite to
> 
>  if (!inside_addr)
>return orig_expr;
> 
>  tmp = component_ref_field_offset (expr);
>  field = TREE_OPERAND (expr, 1);
>  if (top_compref
>  && cst_and_fits_in_hwi (tmp)
>  && cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field)))
> {
>   ...
> }
> 
> note that this doesn't really handle overflows correctly as
> 
> +   *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT;
> 
> may still overflow.
> 
> @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data,
>  bitmap_clear (*depends_on);
>  }
> 
> +  /* Sign-extend offset if utype has lower precision than HOST_WIDE_INT.  */
> +  offset = sext_hwi (offset, TYPE_PRECISION (utype));
> +
> 
> offset is computed elsewhere in difference_cost and the bug to me seems that
> it is unsigned.  sign-extending it here is odd at least (and the extension
> should probably happen at sizetype precision, not that of utype).
> 

After reading "overflow" and "ivopt", I was wondering whether
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55190 is somehow related.

Cheers,
Oleg

[SH, committed] Fix minor formatting nits

2013-09-24 Thread Oleg Endo

Hello,

The attached patch fixes a few formatting nits in sh.md.
Committed as rev. 202876.

Cheers,
Oleg

gcc/ChangeLog:
config/sh/sh.md: Fix formatting.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 202873)
+++ gcc/config/sh/sh.md	(working copy)
@@ -783,7 +783,7 @@
 	tst	%0,%0
 	cmp/eq	%1,%0
 	cmp/eq	%1,%0"
-   [(set_attr "type" "mt_group")])
+  [(set_attr "type" "mt_group")])
 
 ;; FIXME: For some reason, on SH4A and SH2A combine fails to simplify this
 ;; pattern by itself.  What this actually does is:
@@ -809,7 +809,7 @@
   "@
 	cmp/pl	%0
 	cmp/gt	%1,%0"
-   [(set_attr "type" "mt_group")])
+  [(set_attr "type" "mt_group")])
 
 (define_insn "cmpgesi_t"
   [(set (reg:SI T_REG)
@@ -819,7 +819,7 @@
   "@
 	cmp/pz	%0
 	cmp/ge	%1,%0"
-   [(set_attr "type" "mt_group")])
+  [(set_attr "type" "mt_group")])
 
 ;; FIXME: This is actually wrong.  There is no way to literally move a
 ;; general reg to t reg.  Luckily, it seems that this pattern will be only
@@ -831,7 +831,7 @@
   [(set (reg:SI T_REG) (match_operand:SI 0 "arith_reg_operand" "r"))]
   "TARGET_SH1"
   "cmp/pl	%0"
-   [(set_attr "type" "mt_group")])
+  [(set_attr "type" "mt_group")])
 
 ;; Some integer sign comparison patterns can be realized with the div0s insn.
 ;;	div0s	Rm,Rn		T = (Rm >> 31) ^ (Rn >> 31)
@@ -6898,9 +6898,9 @@
 	(match_operand:QIHI 1 "register_operand" "r,*z,m"))]
   "TARGET_SH1 && !t_reg_operand (operands[1], VOIDmode)"
   "@
-mov		%1,%0
-mov.	%1,%0
-mov.	%1,%0"
+	mov	%1,%0
+	mov.	%1,%0
+	mov.	%1,%0"
   [(set_attr "type" "move,store,load")])
 
 ;; FIXME: The non-SH2A and SH2A variants should be combined by adding

Re: [v3] fixup --enable-cxx-flags

2012-09-29 Thread Oleg Endo

On Fri, 2012-09-28 at 21:12 -0700, Benjamin De Kosnik wrote:
> ... found while working on arm-eabisim cross, using --enable-cxx-flags
> was not working as AM_CXXFLAGS was being over-ridden by CXXFLAGS.
> (Despite the comments warning about this.)
> 
> Fixed.
> 
> Also patchlet for the last commit, forgot to edit PARALELL_FLAGS, so
> --disable-thread compiles were failing. 
> 
> -benjamin
> 
> tested x86/linux
> tested x86/linux --enable-cxx-flags="-g0"

Hm, this doesn't help PR 53579, does it?

Cheers,
Oleg

[SH] PR 50457 - Add additional atomic models

2012-09-30 Thread Oleg Endo

Hello,

This implements the changes as proposed PR, albeit with some small
differences:

* I decided to go for a more verbose option name '-matomic-model'
instead of just '-matomic'.  

* In addition to the soft-tcb model I've also added a soft-imask model.
Interrupt-flipping atomics might not be the best choice but it is easy
to setup and get started with.

* There is a new atomic model parameter 'strict' to prohibit mixing of
atomic model sequences on SH4A.

There are no functional changes to the already existing soft and hard
atomics, except that '-msoft-atomic' is now mapped to
'-matomic-model=soft-gusa' and '-mhard-atomic' has been removed.

Tested on rev 191865 with 'make all' and 'make info dvi pdf' and by
compiling a couple of functions that use atomics and eyeballing the asm
output.

OK?

Cheers,
Oleg

ChangeLog:

PR target/50457
* config/sh/sh.opt (matomic-model): New option.
(msoft-atomic): Mark as deprecated and alias to 
matomic-model=soft-gusa.
(mhard-atomic): Delete.
* config/sh/predicates.md (gbr_displacement): New predicate.
* config/sh/sh-protos.h (sh_atomic_model): New struct.
(selected_atomic_model): New declaration.
(TARGET_ATOMIC_ANY, TARGET_ATOMIC_STRICT, 
TARGET_ATOMIC_SOFT_GUSA, TARGET_ATOMIC_HARD_LLCS, 
TARGET_ATOMIC_SOFT_TCB, TARGET_ATOMIC_SOFT_TCB_GBR_OFFSET_RTX, 
TARGET_ATOMIC_SOFT_IMASK): New macros.
* config/sh/linux.h (SUBTARGET_OVERRIDE_OPTIONS): Adapt setting 
to default atomic model.
* config/sh/sh.c (selected_atomic_model_): New global variable.
(selected_atomic_model, parse_validate_atomic_model_option): New
functions.
(sh_option_override): Replace atomic selection checks with call 
to parse_validate_atomic_model_option.
* config/sh/sh.h (TARGET_ANY_ATOMIC, UNSUPPORTED_ATOMIC_OPTIONS,
UNSUPPORTED_HARD_ATOMIC_CPU): Delete.
(DRIVER_SELF_SPECS): Remove atomic checks.
config/sh/sync.md: Update documentation comments.
(atomic_compare_and_swap, atomic_exchange, 
atomic_fetch_, atomic_fetch_nand, 
atomic__fetch, atomic_nand_fetch): Use
TARGET_ATOMIC_ANY as condition.  Add TARGET_ATOMIC_STRICT check
for SH4A case.  Handle new TARGET_ATOMIC_SOFT_TCB and
TARGET_ATOMIC_SOFT_IMASK cases.
(atomic_test_and_set): Handle new TARGET_ATOMIC_SOFT_TCB and 
TARGET_ATOMIC_SOFT_IMASK cases.
(atomic_compare_and_swapsi_hard, atomic_exchangesi_hard, 
atomic_fetch_si_hard, atomic_fetch_nandsi_hard, 
atomic__fetchsi_hard, atomic_nand_fetchsi_hard): 
Add TARGET_ATOMIC_STRICT check.
(atomic_compare_and_swap_hard, atomic_exchange_hard,
atomic_fetch__hard, 
atomic_fetch_nand_hard, 
atomic__fetch_hard, 
atomic_nand_fetch_hard, atomic_test_and_set_hard): Use
TARGET_ATOMIC_HARD_LLCS condition.
(atomic_compare_and_swap_soft, atomic_exchange_soft,
atomic_fetch__soft,
atomic_fetch_nand_soft, 
atomic__fetch_soft,
atomic_nand_fetch_soft, atomic_test_and_set_soft): Append 
_gusa to the insn names and use TARGET_ATOMIC_SOFT_GUSA as 
condition.
(atomic_compare_and_swap_soft_tcb, 
atomic_exchange_soft_tcb, 
atomic_fetch__soft_tcb, 
atomic_fetch_nand_soft_tcb, 
atomic__fetch_soft_tcb, 
atomic_nand_fetch_soft_tcb, atomic_test_and_set_soft_tcb):
New insns.
(atomic_compare_and_swap_soft_imask, 
atomic_exchange_soft_imask, 
atomic_fetch__soft_imask, 
atomic_fetch_nand_soft_imask, 
atomic__fetch_soft_imask, 
atomic_nand_fetch_soft_imask, 
atomic_test_and_set_soft_imask): New insns.
* doc/invoke.texi (SH Options): Document new matomic-model 
option.  Remove msoft-atomic and mhard-atomic options.
Index: gcc/config/sh/sh.opt
===
--- gcc/config/sh/sh.opt	(revision 191865)
+++ gcc/config/sh/sh.opt	(working copy)
@@ -320,12 +320,12 @@
 Follow Renesas (formerly Hitachi) / SuperH calling conventions
 
 msoft-atomic
-Target Report Var(TARGET_SOFT_ATOMIC)
-Use gUSA software atomic sequences
+Target Undocumented Alias(matomic-model=, soft-gusa, none)
+Deprecated.  Use -matomic= instead to select the atomic model
 
-mhard-atomic
-Target Report Var(TARGET_HARD_ATOMIC)
-Use hardware atomic sequences
+matomic-model=
+Target Report RejectNegative Joined Var(sh_atomic_model_str)
+Specify the model for atomic operations
 
 mtas
 Target Report RejectNegative Var(TARGET_ENABLE_TAS)
Index: gcc/config/sh/predicates.md
===
--- gcc/config/sh/predicates.md	(revision 191865)
+++ gcc/config/sh/predicates.md	(working copy)
@@ -1071,3 +1071,19 @@
 
   return false;
 })
+
+;; A predicate that determines whether a given c

Re: [SH] PR 50457 - Add additional atomic models

2012-09-30 Thread Oleg Endo

On Mon, 2012-10-01 at 08:38 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > This implements the changes as proposed PR, albeit with some small
> > differences:
> 
> > --- gcc/config/sh/sh.c  (revision 191865)
> > +++ gcc/config/sh/sh.c  (working copy)
> [snip]
> > +  std::vector tokens;
> > +  for (std::stringstream ss (str); ss.good (); )
> > +  {
> > +tokens.push_back (std::string ());
> > +std::getline (ss, tokens.back (), ',');
> > +  }
> 
> Can we use C++ in .c files already?  I couldn't find other examples
> in the current gcc.
> 

The existing .c files are compiled as C++ already.  There was a
discussion not long go whether the .c files should be renamed to .cc or
not.  If I remember correctly, the conclusion was that existing .c files
remain .c, while files newly added should be .cc.  gcc/double-int.c
would probably one of the recent examples.

Cheers,
Oleg

[SH] PR 51244 - Handle T bit -> 0x7FFFFFFF / 0x80000000

2012-10-01 Thread Oleg Endo

Hello,

This handles the case where the T bit is stored to a reg as the value
0x7FFF or 0x8000.
Tested on rev 191894 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/51244
* config/sh/sh.md (*mov_t_msb_neg): New insn and two 
accompanying unnamed split patterns.

testsuite/ChangeLog:

PR target/51244
* gcc.target/sh/pr51244-12.c: New.
Index: gcc/testsuite/gcc.target/sh/pr51244-12.c
===
--- gcc/testsuite/gcc.target/sh/pr51244-12.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr51244-12.c	(revision 0)
@@ -0,0 +1,68 @@
+/* Check that the negc instruction is generated as expected for the cases
+   below.  If we see a movrt or #-1 negc sequence it means that the pattern
+   which handles the inverted case does not work properly.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*" } { "" } } */
+/* { dg-final { scan-assembler-times "negc" 10 } } */
+/* { dg-final { scan-assembler-not "movrt|#-1|add|sub" } } */
+
+int
+test00 (int a, int b, int* x)
+{
+  return (a == b) ? 0x7FFF : 0x8000;
+}
+
+int
+test00_inv (int a, int b)
+{
+  return (a != b) ? 0x8000 : 0x7FFF;
+}
+
+int
+test01 (int a, int b)
+{
+  return (a >= b) ? 0x7FFF : 0x8000;
+}
+
+int
+test01_inv (int a, int b)
+{
+  return (a < b) ? 0x8000 : 0x7FFF;
+}
+
+int
+test02 (int a, int b)
+{
+  return (a > b) ? 0x7FFF : 0x8000;
+}
+
+int
+test02_inv (int a, int b)
+{
+  return (a <= b) ? 0x8000 : 0x7FFF;
+}
+
+int
+test03 (int a, int b)
+{
+  return ((a & b) == 0) ? 0x7FFF : 0x8000;
+}
+
+int
+test03_inv (int a, int b)
+{
+  return ((a & b) != 0) ? 0x8000 : 0x7FFF;
+}
+
+int
+test04 (int a)
+{
+  return ((a & 0x55) == 0) ? 0x7FFF : 0x8000;
+}
+
+int
+test04_inv (int a)
+{
+  return ((a & 0x55) != 0) ? 0x8000 : 0x7FFF;
+}
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 191894)
+++ gcc/config/sh/sh.md	(working copy)
@@ -10769,6 +10769,51 @@
 	(set (reg:SI T_REG) (const_int 1))
 	(use (match_dup 2))])])
 
+;; Use negc to store the T bit in a MSB of a reg in the following way:
+;;	T = 1: 0x8000 -> reg
+;;	T = 0: 0x7FFF -> reg
+;; This works because 0 - 0x8000 = 0x8000.
+(define_insn_and_split "*mov_t_msb_neg"
+  [(set (match_operand:SI 0 "arith_reg_dest")
+	(minus:SI (const_int -2147483648)  ;; 0x8000
+		  (match_operand 1 "t_reg_operand")))
+   (clobber (reg:SI T_REG))]
+  "TARGET_SH1"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(set (match_dup 2) (const_int -2147483648))
+   (parallel [(set (match_dup 0) (minus:SI (neg:SI (match_dup 2))
+ (reg:SI T_REG)))
+	  (clobber (reg:SI T_REG))])]
+{
+  operands[2] = gen_reg_rtx (SImode);
+})
+
+;; These are essentially the same as above, but with the inverted T bit.
+;; Combine recognizes the split patterns, but does not take them sometimes
+;; if the T_REG clobber is specified.  Instead it tries to split out the
+;; T bit negation.  Since these splits are supposed to be taken only by
+;; combine, it will see the T_REG clobber of the *mov_t_msb_neg insn, so this
+;; should be fine.
+(define_split
+  [(set (match_operand:SI 0 "arith_reg_dest")
+	(plus:SI (match_operand 1 "negt_reg_operand")
+		 (const_int 2147483647)))]  ;; 0x7fff
+  "TARGET_SH1 && can_create_pseudo_p ()"
+  [(parallel [(set (match_dup 0)
+		   (minus:SI (const_int -2147483648) (reg:SI T_REG)))
+	  (clobber (reg:SI T_REG))])])
+
+(define_split
+  [(set (match_operand:SI 0 "arith_reg_dest")
+	(if_then_else:SI (match_operand 1 "t_reg_operand")
+			 (const_int 2147483647)  ;; 0x7fff
+			 (const_int -2147483648)))]  ;; 0x8000
+  "TARGET_SH1 && can_create_pseudo_p ()"
+  [(parallel [(set (match_dup 0)
+		   (minus:SI (const_int -2147483648) (reg:SI T_REG)))
+	  (clobber (reg:SI T_REG))])])
+
 ;; The *negnegt pattern helps the combine pass to figure out how to fold 
 ;; an explicit double T bit negation.
 (define_insn_and_split "*negnegt"

[SH] PR 50457 - Cleanup linux-atomic

2012-10-02 Thread Oleg Endo

Hello,

This is the patch as proposed in the PR to make
libgcc/config/sh/linux-atomic use the appropriate compiler generated
atomic built-in functions depending on the currently selected
atomic-model.

Tested on 191894 with 'make all-gcc' and by compiling code to see if the
__SH_ATOMIC_MODEL_*__ defines work as expected.  The new file
linux-atomic.c was tested by compiling it separately and eyeballing the
generated code.

OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/50457
* config/sh/sh.c (parse_validate_atomic_model_option): Handle 
name strings in sh_atomic_model.
* config/sh/sh.h (TARGET_CPU_CPP_BUILTINS): Move macro 
implementation to ...
* config/sh/sh-c.c (sh_cpu_cpp_builtins): ... this new function.
Add __SH1__ and __SH2__ defines.  Add __SH_ATOMIC_MODEL_*__ 
define.
* config/sh/sh-protos.h (sh_atomic_model): Add name and 
cdef_name variables.
(sh_cpu_cpp_builtins): Declare new function.

libgcc/ChangeLog:

PR target/50457
* config/sh/linux-atomic.S: Delete.
* config/sh/linux-atomic.c: New.
* config/sh/t-linux (LIB2ADD): Replace linux-atomic.S with 
linux-atomic.c.  Add cflags to disable warnings.
Index: libgcc/config/sh/t-linux
===
--- libgcc/config/sh/t-linux	(revision 191894)
+++ libgcc/config/sh/t-linux	(working copy)
@@ -1,9 +1,13 @@
 LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_invalidate_array
 
-LIB2ADD = $(srcdir)/config/sh/linux-atomic.S
+LIB2ADD = $(srcdir)/config/sh/linux-atomic.c
 
 HOST_LIBGCC2_CFLAGS += -mieee -DNO_FPSCR_VALUES
 
+# Silence atomic built-in related warnings in linux-atomic.c.
+# Unfortunately the conflicting types warning can't be disabled selectively.
+HOST_LIBGCC2_CFLAGS += -w -Wno-sync-nand
+
 # Override t-slibgcc-elf-ver to export some libgcc symbols with
 # the symbol versions that glibc used, and hide some lib1func
 # routines which should not be called via PLT.  We have to create
Index: libgcc/config/sh/linux-atomic.S
===
--- libgcc/config/sh/linux-atomic.S	(revision 191894)
+++ libgcc/config/sh/linux-atomic.S	(working copy)
@@ -1,223 +0,0 @@
-/* Copyright (C) 2006, 2008, 2009 Free Software Foundation, Inc.
-
-   This file is part of GCC.
-
-   GCC is free software; you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3, or (at your option)
-   any later version.
-
-   GCC is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   .  */
-
-
-!! Linux specific atomic routines for the Renesas / SuperH SH CPUs.
-!! Linux kernel for SH3/4 has implemented the support for software
-!! atomic sequences.
-
-#define FUNC(X)		.type X,@function
-#define HIDDEN_FUNC(X)	FUNC(X); .hidden X
-#define ENDFUNC0(X)	.Lfe_##X: .size X,.Lfe_##X-X
-#define ENDFUNC(X)	ENDFUNC0(X)
-
-#if ! __SH5__
-
-#define ATOMIC_TEST_AND_SET(N,T,EXT) \
-	.global	__sync_lock_test_and_set_##N; \
-	HIDDEN_FUNC(__sync_lock_test_and_set_##N); \
-	.align	2; \
-__sync_lock_test_and_set_##N:; \
-	mova	1f, r0; \
-	nop; \
-	mov	r15, r1; \
-	mov	#(0f-1f), r15; \
-0:	mov.##T	@r4, r2; \
-	mov.##T	r5, @r4; \
-1:	mov	r1, r15; \
-	rts; \
-	 EXT	r2, r0; \
-	ENDFUNC(__sync_lock_test_and_set_##N)
-
-ATOMIC_TEST_AND_SET (1,b,extu.b)
-ATOMIC_TEST_AND_SET (2,w,extu.w)
-ATOMIC_TEST_AND_SET (4,l,mov)
-
-#define ATOMIC_COMPARE_AND_SWAP(N,T,EXTS,EXT) \
-	.global	__sync_val_compare_and_swap_##N; \
-	HIDDEN_FUNC(__sync_val_compare_and_swap_##N); \
-	.align	2; \
-__sync_val_compare_and_swap_##N:; \
-	mova	1f, r0; \
-	EXTS	r5, r5; \
-	mov	r15, r1; \
-	mov	#(0f-1f), r15; \
-0:	mov.##T	@r4, r2; \
-	cmp/eq	r2, r5; \
-	bf	1f; \
-	mov.##T	r6, @r4; \
-1:	mov	r1, r15; \
-	rts; \
-	 EXT	r2, r0; \
-	ENDFUNC(__sync_val_compare_and_swap_##N)
-
-ATOMIC_COMPARE_AND_SWAP (1,b,exts.b,extu.b)
-ATOMIC_COMPARE_AND_SWAP (2,w,exts.w,extu.w)
-ATOMIC_COMPARE_AND_SWAP (4,l,mov,mov)
-
-#define ATOMIC_BOOL_COMPARE_AND_SWAP(N,T,EXTS) \
-	.global	__sync_bool_compare_and_swap_##N; \
-	HIDDEN_FUNC(__sync_bool_compare_and_swap_##N); \
-	.align	2; \
-__sync_bool_compare_and_swap_##N:; \
-	mova	1f, r0; \
-	EXTS	r5, r5; \
-	mov	r15, r1; \
-	mov	#(0f-1f), r15; \
-0:	mov.##T	@r4, r2; \
-	cmp/eq	r2, r

[SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing

2012-10-03 Thread Oleg Endo

Hello,

This adds the two common built-in functions __builtin_thread_pointer and
__builtin_set_thread_pointer to the SH port.
I've done it in a way so that hopefully it can be transitioned to target
independent thread pointer built-ins easily, as suggested by Richard a
while ago:
http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00946.html

Originally I wanted to wait until the target independent bits are in,
but somehow the thread mentioned above died and I got impatient.

I've also added support for SH's GBR based displacement addressing
modes.  They are not used for general purpose mem loads/stores by the
compiler, but rather when code accesses data behind the thread pointer
(thread control block or something).
The way GBR displacement address opportunities are discovered might not
be the best way of doing this sort of thing, but it works.  The insn
walking could potentially slow down compile times, but it is only
enabled for functions where the GBR is referenced, so it shouldn't be so
bad.  Alternatives and suggestions are highly appreciated.  :)

Tested on rev 191894 with 'make all' (c,c++) and
'make -k check-gcc RUNTESTFLAGS="sh.exp --target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"'

For code that doesn't reference GBR there are no functional changes.
TLS code that references GBR might trigger the 'sh_find_equiv_gbr_addr'
function.  Unfortunately TLS tests don't seem to work on sh-sim, so I
could not test this part.

Cheers,
Oleg

gcc/ChangeLog:

PR target/54760
* config/sh/sh.md (define_constants): Add UNSPECV_GBR.
(get_thread_pointer, set_thread_pointer): New expanders.
(load_gbr): Rename to store_gbr.  Remove GBR_REG use.
(store_gbr): New insn.
(*mov_gbr_load, *mov_gbr_store): New insns and 
accompanying unnamed splits.
* config/sh/predicates.md (general_movsrc_operand, 
general_movdst_operand): Reject GBR addresses.
* config/sh/sh-protos.h (sh_find_equiv_gbr_addr): New 
declaration.
* config/sh/sh.c (prepare_move_operands): Use gen_store_gbr 
instead of gen_load_gbr in TLS_MODEL_LOCAL_EXEC case.
(sh_address_cost, sh_legitimate_address_p, sh_secondary_reload):
Handle GBR addresses.
(builtin_description): Add is_enabled member.
(shmedia_builtin, sh1_builtin): New functions.
(signature_args): Add SH_BLTIN_VP.
(bdesc): Use shmedia_builtin for existing built-ins.  Add 
__builtin_thread_pointer and __builtin_set_thread_pointer as 
sh1_builtin.
(sh_media_init_builtins, sh_init_builtins): Merge into single 
function sh_init_builtins.  Add is_enabled checking.
(sh_media_builtin_decl, sh_builtin_decl): Merge into single 
function sh_builtin_decl.  Add is_enabled checking.
(base_reg_disp): New class.
(sh_find_base_reg_disp, sh_find_equiv_gbr_addr): New functions.

testsuite/ChangeLog:

PR target/54760
* gcc.target/sh/pr54706-1.c: New.
* gcc.target/sh/pr54706-2.c: New.
* gcc.target/sh/pr54706-3.c: New.

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 191894)
+++ gcc/config/sh/sh.md	(working copy)
@@ -175,6 +175,7 @@
   (UNSPECV_WINDOW_END	10)
   (UNSPECV_CONST_END	11)
   (UNSPECV_EH_RETURN	12)
+  (UNSPECV_GBR		13)
 ])
 
 ;; -
@@ -10029,13 +10030,165 @@
   DONE;
 })
 
-(define_insn "load_gbr"
-  [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GBR_REG))
-   (use (reg:SI GBR_REG))]
+;;--
+;; Thread pointer getter and setter.
+;;
+;; On SH the thread pointer is kept in the GBR.
+;; These patterns are usually expanded from the respective built-in functions.
+(define_expand "get_thread_pointer"
+  [(set (match_operand:SI 0 "register_operand") (reg:SI GBR_REG))]
+  "TARGET_SH1")
+
+(define_insn "store_gbr"
+  [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GBR_REG))]
   ""
   "stc	gbr,%0"
   [(set_attr "type" "tls_load")])
 
+(define_expand "set_thread_pointer"
+  [(set (reg:SI GBR_REG)
+	(unspec_volatile:SI [(match_operand:SI 0 "register_operand")]
+	 UNSPECV_GBR))]
+  "TARGET_SH1")
+
+(define_insn "load_gbr"
+  [(set (reg:SI GBR_REG)
+	(unspec_volatile:SI [(match_operand:SI 0 "register_operand" "r")]
+	 UNSPECV_GBR))]
+  "TARGET_SH1"
+  "ldc	%0,gbr"
+  [(set_attr "type" "move")])
+
+;;--
+;; Thread pointer relative memory loads and stores.
+;;
+;; On SH there are GBR displacement address modes which can be utilized to
+;; access memory behind the thread pointer.
+;; Since we do not allow using GBR for general purpose memory accesses, these
+;; GBR addressing modes are formed by the combine pass.
+;; This could b

Re: [SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing

2012-10-03 Thread Oleg Endo

On Wed, 2012-10-03 at 23:21 +0200, Oleg Endo wrote:
> testsuite/ChangeLog:
> 
>   PR target/54760
>   * gcc.target/sh/pr54706-1.c: New.
>   * gcc.target/sh/pr54706-2.c: New.
>   * gcc.target/sh/pr54706-3.c: New.

Obviously there is a typo in the file names for the test cases.
Attached is the corrected patch + changelog.

Cheers,
Oleg

gcc/ChangeLog:

PR target/54760
* config/sh/sh.md (define_constants): Add UNSPECV_GBR.
(get_thread_pointer, set_thread_pointer): New expanders.
(load_gbr): Rename to store_gbr.  Remove GBR_REG use.
(store_gbr): New insn.
(*mov_gbr_load, *mov_gbr_store): New insns and 
accompanying unnamed splits.
* config/sh/predicates.md (general_movsrc_operand, 
general_movdst_operand): Reject GBR addresses.
* config/sh/sh-protos.h (sh_find_equiv_gbr_addr): New 
declaration.
* config/sh/sh.c (prepare_move_operands): Use gen_store_gbr 
instead of gen_load_gbr in TLS_MODEL_LOCAL_EXEC case.
(sh_address_cost, sh_legitimate_address_p, sh_secondary_reload):
Handle GBR addresses.
(builtin_description): Add is_enabled member.
(shmedia_builtin, sh1_builtin): New functions.
(signature_args): Add SH_BLTIN_VP.
(bdesc): Use shmedia_builtin for existing built-ins.  Add 
__builtin_thread_pointer and __builtin_set_thread_pointer as 
sh1_builtin.
(sh_media_init_builtins, sh_init_builtins): Merge into single 
function sh_init_builtins.  Add is_enabled checking.
(sh_media_builtin_decl, sh_builtin_decl): Merge into single 
function sh_builtin_decl.  Add is_enabled checking.
(base_reg_disp): New class.
(sh_find_base_reg_disp, sh_find_equiv_gbr_addr): New functions.

testsuite/ChangeLog:

PR target/54760
* gcc.target/sh/pr54760-1.c: New.
* gcc.target/sh/pr54760-2.c: New.
* gcc.target/sh/pr54760-3.c: New.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 191894)
+++ gcc/config/sh/sh.md	(working copy)
@@ -175,6 +175,7 @@
   (UNSPECV_WINDOW_END	10)
   (UNSPECV_CONST_END	11)
   (UNSPECV_EH_RETURN	12)
+  (UNSPECV_GBR		13)
 ])

 ;; -
@@ -10029,13 +10030,165 @@
   DONE;
 })

-(define_insn "load_gbr"
-  [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GBR_REG))
-   (use (reg:SI GBR_REG))]
+;;--
+;; Thread pointer getter and setter.
+;;
+;; On SH the thread pointer is kept in the GBR.
+;; These patterns are usually expanded from the respective built-in functions.
+(define_expand "get_thread_pointer"
+  [(set (match_operand:SI 0 "register_operand") (reg:SI GBR_REG))]
+  "TARGET_SH1")
+
+(define_insn "store_gbr"
+  [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GBR_REG))]
   ""
   "stc	gbr,%0"
   [(set_attr "type" "tls_load")])

+(define_expand "set_thread_pointer"
+  [(set (reg:SI GBR_REG)
+	(unspec_volatile:SI [(match_operand:SI 0 "register_operand")]
+	 UNSPECV_GBR))]
+  "TARGET_SH1")
+
+(define_insn "load_gbr"
+  [(set (reg:SI GBR_REG)
+	(unspec_volatile:SI [(match_operand:SI 0 "register_operand" "r")]
+	 UNSPECV_GBR))]
+  "TARGET_SH1"
+  "ldc	%0,gbr"
+  [(set_attr "type" "move")])
+
+;;--
+;; Thread pointer relative memory loads and stores.
+;;
+;; On SH there are GBR displacement address modes which can be utilized to
+;; access memory behind the thread pointer.
+;; Since we do not allow using GBR for general purpose memory accesses, these
+;; GBR addressing modes are formed by the combine pass.
+;; This could be done with fewer patterns than below by using a mem predicate
+;; for the GBR mem, but then reload would try to reload addresses with a
+;; zero displacement for some strange reason.
+
+(define_insn "*mov_gbr_load"
+  [(set (match_operand:QIHISI 0 "register_operand" "=z")
+	(mem:QIHISI (plus:SI (reg:SI GBR_REG)
+			 (match_operand:QIHISI 1 "gbr_displacement"]
+  "TARGET_SH1"
+  "mov.	@(%O1,gbr),%0"
+  [(set_attr "type" "load")])
+
+(define_insn "*mov_gbr_load"
+  [(set (match_operand:QIHISI 0 "register_operand" "=z")
+	(mem:QIHISI (reg:SI GBR_REG)))]
+  "TARGET_SH1"
+  "mov.	@(0,gbr),%0"
+  [(set_attr "type" "load")])
+
+(define_insn "*mov_gbr_load"
+  [(set (match_operand:SI 0 "register_operand" "=z

[SH] PR 33135 - Remove mieee option in libgcc

2012-10-03 Thread Oleg Endo

Hello,

Since the -mieee behavior has been fixed, is enabled by default on SH
and the additional flags in libgcc can be removed.

OK?

Cheers,
Oleg

libgcc/ChangeLog:

PR target/33135
* config/sh/t-sh (HOST_LIBGCC2_CFLAGS): Delete.
* config/sh/t-netbsd (HOST_LIBGCC2_CFLAGS): Delete.
* config/sh/t-linux (HOST_LIBGCC2_CFLAGS): Remove mieee option.
Index: libgcc/config/sh/t-sh
===
--- libgcc/config/sh/t-sh	(revision 192050)
+++ libgcc/config/sh/t-sh	(working copy)
@@ -59,5 +59,3 @@
 libgcc-4-300.a: div_table-4-300.o
 	$(AR_CREATE_FOR_TARGET) $@ div_table-4-300.o
 
-HOST_LIBGCC2_CFLAGS += -mieee
-
Index: libgcc/config/sh/t-netbsd
===
--- libgcc/config/sh/t-netbsd	(revision 192050)
+++ libgcc/config/sh/t-netbsd	(working copy)
@@ -1,3 +1,2 @@
 LIB1ASMFUNCS_CACHE = _ic_invalidate
 
-HOST_LIBGCC2_CFLAGS += -mieee
Index: libgcc/config/sh/t-linux
===
--- libgcc/config/sh/t-linux	(revision 192051)
+++ libgcc/config/sh/t-linux	(working copy)
@@ -2,7 +2,7 @@
 
 LIB2ADD = $(srcdir)/config/sh/linux-atomic.c
 
-HOST_LIBGCC2_CFLAGS += -mieee -DNO_FPSCR_VALUES
+HOST_LIBGCC2_CFLAGS += -DNO_FPSCR_VALUES
 
 # Silence atomic built-in related warnings in linux-atomic.c.
 # Unfortunately the conflicting types warning can't be disabled selectively.

[wwwdocs] SH 4.8 changes update

2012-10-04 Thread Oleg Endo

Hello,

The atomic options of SH have been changed recently.  The attached patch
updates the 4.8 changes.html accordingly, plus some minor wording fixes.

OK?

Cheers,
Oleg
? www_4_8_sh_changes_2.patch
Index: htdocs/gcc-4.8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.35
diff -u -r1.35 changes.html
--- htdocs/gcc-4.8/changes.html	3 Oct 2012 22:33:46 -	1.35
+++ htdocs/gcc-4.8/changes.html	4 Oct 2012 21:47:18 -
@@ -255,19 +255,45 @@
 
 Improved support for the __atomic built-in functions:
 
-  Minor improvements to code generated for software atomic sequences
-  that are enabled by -msoft-atomic.
+  A new option -matomic-model=model selects the
+  model for the generated atomic sequences.  The following models are
+  supported:
+  
+soft-gusa
+Software gUSA sequences (SH3* and SH4* only).  On SH4A targets this
+will now also partially utilize the movco.l and
+movli.l instructions.  This is the default when the target
+is sh3*-*-linux* or sh4*-*-linux*.
+
+hard-llcs
+Hardware movco.l / movli.l sequences
+(SH4A only).
+
+soft-tcb
+Software thread control block sequences.
+
+soft-imask
+Software interrupt flipping sequences (privileged mode only).  This is
+the default when the target is sh1*-*-linux* or
+sh2*-*-linux*.
+
+none
+Generates function calls to the respective __atomic
+built-in functions.  This is the default for SH64 targets or when the
+target is not sh*-*-linux*.
+  
+
+  The option -msoft-atomic has been deprecated.  It is
+  now an alias for -matomic-model=soft-gusa.
 
   A new option -mtas makes the compiler generate
   the tas.b instruction for the
-  __atomic_test_and_set built-in function.
+  __atomic_test_and_set built-in function regardless of the
+  selected atomic model.
+
+  The __sync functions in libgcc now reflect
+  the selected atomic model when building the toolchain.
 
-  The SH4A instructions movco.l and
-  movli.l are now supported.  They are used to implement some
-  software atomic sequences that are enabled by -msoft-atomic.
-  In addition to that, pure movco.l / movli.l
-  atomic sequences can be enabled with the new option
-  -mhard-atomic.
 
 
 Added support for the mov.b and mov.w
@@ -280,11 +306,11 @@
 
 Improvements to conditional branches and code that involves the T bit.
 A new option -mzdcbranch tells the compiler to favor
-zero-displacement branches.  This is enabled by default for SH4 and
-SH4A.
+zero-displacement branches.  This is enabled by default for SH4* targets.
+
 
 The pref instruction will now be emitted by the
-__builtin_prefetch built-in function for SH3.
+__builtin_prefetch built-in function for SH3* targets.
 
 The fmac instruction will now be emitted by the
 fmaf standard function and the __builtin_fmaf
@@ -298,7 +324,7 @@
 
 Added new options -mfsrra and -mfsca to allow
 the compiler using the fsrra and fsca
-instructions on CPUs other than SH4A (where they are already enabled by
+instructions on targets other than SH4A (where they are already enabled by
 default).
 
 Added support for the __builtin_bswap32 built-in function.

Re: [SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing

2012-10-04 Thread Oleg Endo

On Thu, 2012-10-04 at 11:52 +0900, Kaz Kojima wrote:

> sh4-unknown-linux-gnu build failed during compiling libmudflap:
> 
> /exp/ldroot/dodes/xsh-gcc/./gcc/xgcc -B/exp/ldroot/dodes/xsh-gcc/./gcc/ 
> -B/usr/local/sh4-unknown-linux-gnu/bin/ 
> -B/usr/local/sh4-unknown-linux-gnu/lib/ -isystem 
> /usr/local/sh4-unknown-linux-gnu/include -isystem 
> /usr/local/sh4-unknown-linux-gnu/sys-include -DHAVE_CONFIG_H -I. 
> -I../../../LOCAL/trunk/libmudflap -DLIBMUDFLAPTH -g -O2 -MT 
> libmudflapth_la-mf-runtime.lo -MD -MP -MF 
> .deps/libmudflapth_la-mf-runtime.Tpo -c 
> ../../../LOCAL/trunk/libmudflap/mf-runtime.c -o libmudflapth_la-mf-runtime.o
> ../../../LOCAL/trunk/libmudflap/mf-runtime.c: In function 
> 'begin_recursion_protect1':
> ../../../LOCAL/trunk/libmudflap/mf-runtime.c:152:1: internal compiler error: 
> Segmentation fault
>  }
>  ^
> 0x8529c60 crash_signal
>   ../../LOCAL/trunk/gcc/toplev.c:335
> 0x8771a87 sh_find_base_reg_disp
>   ../../LOCAL/trunk/gcc/config/sh/sh.c:13344
> 0x8791554 sh_find_equiv_gbr_addr(rtx_def*, rtx_def*)
>   ../../LOCAL/trunk/gcc/config/sh/sh.c:13395
> 0x87ce6cf gen_split_1029(rtx_def*, rtx_def**)
>   ../../LOCAL/trunk/gcc/config/sh/sh.md:10184
> 0x87ea6e0 split_1
>   ../../LOCAL/trunk/gcc/config/sh/sh.md:10183
> 0x87ea6e0 split_3
>   ../../LOCAL/trunk/gcc/config/sh/sh.md:7082
> 0x82bf8a1 try_split(rtx_def*, rtx_def*, int)
>   ../../LOCAL/trunk/gcc/emit-rtl.c:3503
> 0x849c642 split_insn
>   ../../LOCAL/trunk/gcc/recog.c:2809
> 0x84a08b5 split_all_insns()
>   ../../LOCAL/trunk/gcc/recog.c:2899
> 0x84a09a7 rest_of_handle_split_all_insns
>   ../../LOCAL/trunk/gcc/recog.c:3751
> 
> Looks prev_nonnote_insn returns a barrier there:
> 
> (gdb) fr 0
> #0  0x08771a87 in sh_find_base_reg_disp (insn=, x=0xb79ddd40, 
> base_reg=0x0, disp=0) at ../../LOCAL/trunk/gcc/config/sh/sh.c:13344
> 13344   if (p != NULL && GET_CODE (p) == SET && REG_P (XEXP (p, 0))
> (gdb) p p
> $1 = (rtx) 0xafafafaf
> (gdb) p i
> $2 = (rtx_def *) 0xb79ecc3c
> (gdb) call debug_rtx(i)
> (barrier 216 215 217)

Oops.  Thanks for tracing this.
It should have been:
for (...)
  {
if (!NONJUMP_INSN_P (i))
  continue;
rtx p = PATTERN (i);
...
  }

> > (builtin_description): Add is_enabled member.
> > (shmedia_builtin, sh1_builtin): New functions.
> > (signature_args): Add SH_BLTIN_VP.
> > (bdesc): Use shmedia_builtin for existing built-ins.  Add 
> > __builtin_thread_pointer and __builtin_set_thread_pointer as 
> > sh1_builtin.
> > (sh_media_init_builtins, sh_init_builtins): Merge into single 
> > function sh_init_builtins.  Add is_enabled checking.
> > (sh_media_builtin_decl, sh_builtin_decl): Merge into single 
> > function sh_builtin_decl.  Add is_enabled checking.
> 
> It would be better to separate this part except new thread pointer
> builtins into an independent patch which should be tested also with
> sh64-elf build, though now unified sh64-elf build is failing.
> I'd like to commit a quick fix for sh64-elf build failure.

Do you mean something like the attached patch as a preparation step?
(checked with 'make all')

Cheers,
Oleg

gcc/ChangeLog:

PR target/54760
* config/sh/sh.c (builtin_description): Add is_enabled member.
(shmedia_builtin_p): New function.
(bdesc): Use shmedia_builtin_p for existing built-ins.
(sh_media_init_builtins, sh_init_builtins): Merge into single
function sh_init_builtins.  Add is_enabled checking.  Move
variable declarations to where they are actually used.
(sh_media_builtin_decl, sh_builtin_decl): Merge into single 
function sh_builtin_decl.  Add is_enabled checking.
(sh_expand_builtin): Move variable declarations to where they
are actually used.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192107)
+++ gcc/config/sh/sh.c	(working copy)
@@ -243,8 +243,6 @@
 
 static void sh_init_builtins (void);
 static tree sh_builtin_decl (unsigned, bool);
-static void sh_media_init_builtins (void);
-static tree sh_media_builtin_decl (unsigned, bool);
 static rtx sh_expand_builtin (tree, rtx, rtx, enum machine_mode, int);
 static void sh_output_mi_thunk (FILE *, tree, HOST_WIDE_INT, HOST_WIDE_INT, tree);
 static void sh_file_start (void);
@@ -11510,12 +11508,19 @@
 
 struct builtin_description
 {
+  bool (* const is_enabled) (void);
   const enum insn_code icode;
   const char *const name;
   int signature;
   tree fndecl;
 };
 
+static bool
+shmedia_builtin_p (void)
+{
+  return TARGET_SHMEDIA;
+}
+
 /* describe number and signedness of arguments; arg[0] == result
(1: unsigned, 2: signed, 4: don't care, 8: pointer 0: no argument */
 /* 9: 64-bit pointer, 10: 32-bit pointer */
@@ -11582,103 +11587,189 @@
 /* nsb: takes long long arg, returns unsigned char.  */
 static struct builtin_description bdesc[] =
 {
-  { CODE_FOR_absv2si2,

Re: [wwwdocs] SH 4.8 changes update

2012-10-05 Thread Oleg Endo

After commiting the last SH changes updates for 4.8 I was kindly
informed that the br tags were left open.  I've committed the attached
fix as obvious.

Cheers,
Oleg
? www_4_8_sh_changes_2_close_br.patch
Index: htdocs/gcc-4.8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.36
diff -u -r1.36 changes.html
--- htdocs/gcc-4.8/changes.html	5 Oct 2012 19:21:00 -	1.36
+++ htdocs/gcc-4.8/changes.html	5 Oct 2012 19:28:48 -
@@ -259,25 +259,25 @@
   model for the generated atomic sequences.  The following models are
   supported:
   
-soft-gusa
+soft-gusa
 Software gUSA sequences (SH3* and SH4* only).  On SH4A targets this
 will now also partially utilize the movco.l and
 movli.l instructions.  This is the default when the target
 is sh3*-*-linux* or sh4*-*-linux*.
 
-hard-llcs
+hard-llcs
 Hardware movco.l / movli.l sequences
 (SH4A only).
 
-soft-tcb
+soft-tcb
 Software thread control block sequences.
 
-soft-imask
+soft-imask
 Software interrupt flipping sequences (privileged mode only).  This is
 the default when the target is sh1*-*-linux* or
 sh2*-*-linux*.
 
-none
+none
 Generates function calls to the respective __atomic
 built-in functions.  This is the default for SH64 targets or when the
 target is not sh*-*-linux*.

Re: [SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing

2012-10-05 Thread Oleg Endo

On Fri, 2012-10-05 at 21:55 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > Do you mean something like the attached patch as a preparation step?
> > (checked with 'make all')
> 
> Yes.  The patch is OK with removing the first line of the ChangeLog
> entry for PR number.

Done.
The attached patch is the next step that adds the thread pointer
builtins.  The GBR address mode stuff will follow afterwards separately.
Tested on rev 192142 with 'make all' and
'make -k check-gcc RUNTESTFLAGS="sh.exp --target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"'

Cheers,
Oleg


gcc/ChangeLog:

PR target/54760
* config/sh/sh.md (define_constants): Add UNSPECV_GBR.
(get_thread_pointer, set_thread_pointer): New expanders.
(load_gbr): Rename to store_gbr.  Remove GBR_REG use.
(store_gbr): New insn.
* config/sh/sh.c (prepare_move_operands): Use gen_store_gbr 
instead of gen_load_gbr in TLS_MODEL_LOCAL_EXEC case.
(sh1_builtin_p): New function.
(signature_args): Add SH_BLTIN_VP.
(bdesc): Add __builtin_thread_pointer and 
__builtin_set_thread_pointer.

testsuite/ChangeLog:

PR target/54760
* gcc.target/sh/pr54760-1.c: New.
Index: gcc/testsuite/gcc.target/sh/pr54760-1.c
===
--- gcc/testsuite/gcc.target/sh/pr54760-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr54760-1.c	(revision 0)
@@ -0,0 +1,20 @@
+/* Check that the __builtin_thread_pointer and __builtin_set_thread_pointer
+   built-in functions result in gbr store / load instructions.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } }  */
+/* { dg-final { scan-assembler-times "ldc" 1 } } */
+/* { dg-final { scan-assembler-times "stc" 1 } } */
+/* { dg-final { scan-assembler-times "gbr" 2 } } */
+
+void*
+test00 (void)
+{
+  return __builtin_thread_pointer ();
+}
+
+void
+test01 (void* p)
+{
+  __builtin_set_thread_pointer (p);
+}
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192142)
+++ gcc/config/sh/sh.md	(working copy)
@@ -175,6 +175,7 @@
   (UNSPECV_WINDOW_END	10)
   (UNSPECV_CONST_END	11)
   (UNSPECV_EH_RETURN	12)
+  (UNSPECV_GBR		13)
 ])
 
 ;; -
@@ -10029,13 +10030,37 @@
   DONE;
 })
 
-(define_insn "load_gbr"
-  [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GBR_REG))
-   (use (reg:SI GBR_REG))]
+;;--
+;; Thread pointer getter and setter.
+;;
+;; On SH the thread pointer is kept in the GBR.
+;; These patterns are usually expanded from the respective built-in functions.
+(define_expand "get_thread_pointer"
+  [(set (match_operand:SI 0 "register_operand") (reg:SI GBR_REG))]
+  "TARGET_SH1")
+
+;; The store_gbr insn can also be used on !TARGET_SH1 for doing TLS accesses.
+(define_insn "store_gbr"
+  [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GBR_REG))]
   ""
   "stc	gbr,%0"
   [(set_attr "type" "tls_load")])
 
+(define_expand "set_thread_pointer"
+  [(set (reg:SI GBR_REG)
+	(unspec_volatile:SI [(match_operand:SI 0 "register_operand")]
+	 UNSPECV_GBR))]
+  "TARGET_SH1")
+
+(define_insn "load_gbr"
+  [(set (reg:SI GBR_REG)
+	(unspec_volatile:SI [(match_operand:SI 0 "register_operand" "r")]
+	 UNSPECV_GBR))]
+  "TARGET_SH1"
+  "ldc	%0,gbr"
+  [(set_attr "type" "move")])
+
+;;--
 ;; case instruction for switch statements.
 
 ;; Operand 0 is index
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192142)
+++ gcc/config/sh/sh.c	(working copy)
@@ -1887,7 +1887,7 @@
 
 	case TLS_MODEL_LOCAL_EXEC:
 	  tmp2 = gen_reg_rtx (Pmode);
-	  emit_insn (gen_load_gbr (tmp2));
+	  emit_insn (gen_store_gbr (tmp2));
 	  tmp = gen_reg_rtx (Pmode);
 	  emit_insn (gen_symTPOFF2reg (tmp, op1));
 
@@ -11521,6 +11521,12 @@
   return TARGET_SHMEDIA;
 }
 
+static bool
+sh1_builtin_p (void)
+{
+  return TARGET_SH1;
+}
+
 /* describe number and signedness of arguments; arg[0] == result
(1: unsigned, 2: signed, 4: don't care, 8: pointer 0: no argument */
 /* 9: 64-bit pointer, 10: 32-bit pointer */
@@ -11578,6 +11584,8 @@
   { 1, 1, 1, 1 },
 #define SH_BLTIN_PV 23
   { 0, 8 },
+#define SH_BLTIN_VP 24
+  { 8, 0 },

[SH] PR 54685 - unsigned int comparison with 0x7FFFFFFF

2012-10-06 Thread Oleg Endo

Hello,

The attached patch improves comparisons such as
'unsigned int <= 0x7FFF' on SH.
As mentioned in the PR, for some reason, those comparisons do not go
through the cstore expander.  As a consequence the comparison doesn't
get the chance to be canonicalized by the target code and ends up as
'(~x) >> 31'.
I've not investigated this further and just fixed the symptoms on SH.  I
don't know whether it's also an issue on other targets.

Tested on rev 192142 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/54685
* config/sh/sh.md (one_cmplsi2): Make insn_and_split.  Add 
manual combine matching for an insn sequence where a ge:SI
pattern can be used.

testsuite/ChangeLog:

PR target/54685
* gcc.target/sh/pr54685.c: New.
Index: gcc/testsuite/gcc.target/sh/pr54685.c
===
--- gcc/testsuite/gcc.target/sh/pr54685.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr54685.c	(revision 0)
@@ -0,0 +1,58 @@
+/* Check that a comparison 'unsigned int <= 0x7FFF' results in code
+   utilizing the cmp/pz instruction.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } }  */
+/* { dg-final { scan-assembler-not "not" } } */
+/* { dg-final { scan-assembler-times "cmp/pz" 7 } } */
+/* { dg-final { scan-assembler-times "shll" 1 } } */
+/* { dg-final { scan-assembler-times "movt" 4 } } */
+
+int
+test_00 (unsigned int a)
+{
+  return !(a > 0x7FFF);
+}
+
+int
+test_01 (unsigned int a)
+{
+  return !(a > 0x7FFF) ? -5 : 10;
+}
+
+int
+test_02 (unsigned int a)
+{
+  /* 1x shll, 1x movt  */
+  return a >= 0x8000;
+}
+
+int
+test_03 (unsigned int a)
+{
+  return a >= 0x8000 ? -5 : 10;
+}
+
+int
+test_04 (unsigned int a)
+{
+  return a <= 0x7FFF;
+}
+
+int
+test_05 (unsigned int a)
+{
+  return a <= 0x7FFF ? -5 : 10;
+}
+
+int
+test_06 (unsigned int a)
+{
+  return a < 0x8000;
+}
+
+int
+test_07 (unsigned int a)
+{
+  return a < 0x8000 ? -5 : 10;
+}
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192142)
+++ gcc/config/sh/sh.md	(working copy)
@@ -5188,11 +5188,61 @@
   "neg	%1,%0"
   [(set_attr "type" "arith")])
 
-(define_insn "one_cmplsi2"
+(define_insn_and_split "one_cmplsi2"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r")
 	(not:SI (match_operand:SI 1 "arith_reg_operand" "r")))]
   "TARGET_SH1"
   "not	%1,%0"
+  "&& can_create_pseudo_p ()"
+  [(set (reg:SI T_REG) (ge:SI (match_dup 1) (const_int 0)))
+   (set (match_dup 0) (reg:SI T_REG))]
+{
+/* PR 54685
+   If the result of 'unsigned int <= 0x7FFF' ends up as the following
+   sequence:
+
+ (set (reg0) (not:SI (reg0) (reg1)))
+ (parallel [(set (reg2) (lshiftrt:SI (reg0) (const_int 31)))
+		(clobber (reg:SI T_REG))])
+
+   ... match and combine the sequence manually in the split pass after the
+   combine pass.  Notice that combine does try the target pattern of this
+   split, but if the pattern is added it interferes with other patterns, in
+   particular with the div0s comparisons.
+   This could also be done with a peephole but doing it here before register
+   allocation can save one temporary.
+   When we're here, the not:SI pattern obviously has been matched already
+   and we only have to see whether the following insn is the left shift.  */
+
+  rtx i = next_nonnote_insn_bb (curr_insn);
+  if (i == NULL_RTX || !NONJUMP_INSN_P (i))
+FAIL;
+
+  rtx p = PATTERN (i);
+  if (GET_CODE (p) != PARALLEL || XVECLEN (p, 0) != 2)
+FAIL;
+
+  rtx p0 = XVECEXP (p, 0, 0);
+  rtx p1 = XVECEXP (p, 0, 1);
+
+  if (/* (set (reg2) (lshiftrt:SI (reg0) (const_int 31)))  */
+  GET_CODE (p0) == SET
+  && GET_CODE (XEXP (p0, 1)) == LSHIFTRT
+  && REG_P (XEXP (XEXP (p0, 1), 0))
+  && REGNO (XEXP (XEXP (p0, 1), 0)) == REGNO (operands[0])
+  && CONST_INT_P (XEXP (XEXP (p0, 1), 1))
+  && INTVAL (XEXP (XEXP (p0, 1), 1)) == 31
+
+  /* (clobber (reg:SI T_REG))  */
+  && GET_CODE (p1) == CLOBBER && REG_P (XEXP (p1, 0))
+  && REGNO (XEXP (p1, 0)) == T_REG)
+{
+  operands[0] = XEXP (p0, 0);
+  set_insn_deleted (i);
+}
+  else
+FAIL;
+}
   [(set_attr "type" "arith")])
 
 (define_expand "one_cmpldi2"

Re: [SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing

2012-10-06 Thread Oleg Endo

On Sat, 2012-10-06 at 12:31 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > The attached patch is the next step that adds the thread pointer
> > builtins.  The GBR address mode stuff will follow afterwards separately.
> > Tested on rev 192142 with 'make all' and
> > 'make -k check-gcc RUNTESTFLAGS="sh.exp --target_board=sh-sim
> > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"'
> 
> The patch is OK.

Installed.
Attached patch adds the GBR addressing mode stuff.  I've added the 
'!NONJUMP_INSN_P (i)' check which should fix the crash.  Could you
please test again?

Tested on rev 192154 with 'make all' and
make -k check-gcc RUNTESTFLAGS="sh.exp=pr54760* --target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Cheers,
Oleg

gcc/ChangeLog:

PR target/54760
* config/sh/sh.md (*mov_gbr_load, *mov_gbr_store):
New insns and accompanying unnamed splits.
* config/sh/predicates.md (general_movsrc_operand,
general_movdst_operand): Reject GBR addresses.
* config/sh/sh-protos.h (sh_find_equiv_gbr_addr): New
declaration.
* config/sh/sh.c (sh_address_cost, sh_legitimate_address_p, 
sh_secondary_reload): Handle GBR addresses.
(base_reg_disp): New class.
(sh_find_base_reg_disp, sh_find_equiv_gbr_addr): New functions.

testsuite/ChangeLog:

PR target/54760
* gcc.target/sh/pr54760-2.c: New.
* gcc.target/sh/pr54760-3.c: New.

Index: gcc/testsuite/gcc.target/sh/pr54760-2.c
===
--- gcc/testsuite/gcc.target/sh/pr54760-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr54760-2.c	(revision 0)
@@ -0,0 +1,223 @@
+/* Check that thread pointer relative memory accesses are converted to
+   gbr displacement address modes.  If we see a gbr register store
+   instruction something is not working properly.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } }  */
+/* { dg-final { scan-assembler-times "stc\tgbr" 0 } } */
+
+/* ---
+  Simple GBR load.
+*/
+#define func(name, type, disp)\
+  int \
+  name ## _tp_load (void) \
+  { \
+type* tp = (type*)__builtin_thread_pointer (); \
+return tp[disp]; \
+  }
+
+func (test00, int, 0)
+func (test01, int, 5)
+func (test02, int, 255)
+
+func (test03, short, 0)
+func (test04, short, 5)
+func (test05, short, 255)
+
+func (test06, char, 0)
+func (test07, char, 5)
+func (test08, char, 255)
+
+func (test09, unsigned int, 0)
+func (test10, unsigned int, 5)
+func (test11, unsigned int, 255)
+
+func (test12, unsigned short, 0)
+func (test13, unsigned short, 5)
+func (test14, unsigned short, 255)
+
+func (test15, unsigned char, 0)
+func (test16, unsigned char, 5)
+func (test17, unsigned char, 255)
+
+#undef func
+
+/* ---
+  Simple GBR store.
+*/
+#define func(name, type, disp)\
+  void \
+  name ## _tp_store (int a) \
+  { \
+type* tp = (type*)__builtin_thread_pointer (); \
+tp[disp] = (type)a; \
+  }
+
+func (test00, int, 0)
+func (test01, int, 5)
+func (test02, int, 255)
+
+func (test03, short, 0)
+func (test04, short, 5)
+func (test05, short, 255)
+
+func (test06, char, 0)
+func (test07, char, 5)
+func (test08, char, 255)
+
+func (test09, unsigned int, 0)
+func (test10, unsigned int, 5)
+func (test11, unsigned int, 255)
+
+func (test12, unsigned short, 0)
+func (test13, unsigned short, 5)
+func (test14, unsigned short, 255)
+
+func (test15, unsigned char, 0)
+func (test16, unsigned char, 5)
+func (test17, unsigned char, 255)
+
+#undef func
+
+/* ---
+  Arithmetic on the result of a GBR load.
+*/
+#define func(name, type, disp, op, opname)\
+  int \
+  name ## _tp_load_arith_ ##opname (int a) \
+  { \
+type* tp = (type*)__builtin_thread_pointer (); \
+return tp[disp] op a; \
+  }
+
+#define funcs(op, opname) \
+  func (test00, int, 0, op, opname) \
+  func (test01, int, 5, op, opname) \
+  func (test02, int, 255, op, opname) \
+  func (test03, short, 0, op, opname) \
+  func (test04, short, 5, op, opname) \
+  func (test05, short, 255, op, opname) \
+  func (test06, char, 0, op, opname) \
+  func (test07, char, 5, op, opname) \
+  func (test08, char, 255, op, opname) \
+  func (test09, unsigned int, 0, op, opname) \
+  func (test10, unsigned int, 5, op, opname) \
+  func (test11, unsigned int, 255, op, opname) \
+  func (test12, unsigned short, 0, op, opname) \
+  func (test13, unsigned short, 5, op, opname) \
+  func (test14, unsigned short, 255, op, opname) \
+  func (test15, unsigned char, 0,

Re: [wwwdocs] SH 4.8 changes update

2012-10-06 Thread Oleg Endo

On Sat, 2012-10-06 at 17:57 +0200, Gerald Pfeifer wrote:
> Hi Oleg,
> 
> have you considered also documenting the new __builtin_thread_pointer
> and __builtin_set_thread_pointer built-ins?  (I just noticed this by
> chance.)

Yes, sure.  The documentation and www changes updates will follow soon.

> 
> On Thu, 4 Oct 2012, Oleg Endo wrote:
> > The atomic options of SH have been changed recently.  The attached patch 
> > updates the 4.8 changes.html accordingly, plus some minor wording fixes.
> 
> Looking at the list of parameters to -matomic-model=, it appears
> that a definition list () is better suitable than a regular
> list, so I went ahead and made the change below.
> 

Thanks!  Looks way better now.

Cheers,
Oleg

RE: [Patch,avr]: Remove -mshort-calls option

2012-10-07 Thread Oleg Endo

On Sun, 2012-10-07 at 18:01 +, Weddington, Eric wrote:
> 
> > -Original Message-
> > From: Georg-Johann Lay []
> > Sent: Friday, October 05, 2012 9:55 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Denis Chertykov; Weddington, Eric; Joerg Wunsch
> > Subject: [Patch,avr]: Remove -mshort-calls option
> > 
> > As already discussed, this patch removes the -mshort-calls command option
> > from
> > avr-gcc.
> > 
> > Ok to apply?
> 
> Ok to apply, but...
> 
> > 
> > If the change is on order, changes to wwwdocs will follow, i.e. deprecate
> > the
> > option in 4.7 and tell it is removed in the 4.8 caveats.
> >
> 
> ... but where do we notify the user that the switch is deprecated?
> 

Maybe would be better to first make the option a no-op that prints a
warning in GCC, remove it from all documentations and mention the
deprecation in the wwwdocs changes.  Then, in the next GCC release,
remove it completely.

Cheers,
Oleg

Re: [Patch,avr]: Remove -mshort-calls option

2012-10-07 Thread Oleg Endo

On Sun, 2012-10-07 at 21:37 +0200, Georg-Johann Lay wrote:
> Oleg Endo wrote:
> > Maybe would be better to first make the option a no-op that prints a
> > warning in GCC, remove it from all documentations and mention the
> > deprecation in the wwwdocs changes.  Then, in the next GCC release,
> > remove it completely.
> 
> IMHO it's not reasonable to put effort into a declining feature.  Why 
> shouldn't we follow common GCC practice here?

"Effort" in this case is adding a "Warn(...)" to the existing option and
then remove it some time later.  I think it's more user friendly to
first warn and then do.  But that's just my opinion, feel free to
ignore :)

Cheers,
Oleg

Re: [SH] PR 54685 - unsigned int comparison with 0x7FFFFFFF

2012-10-07 Thread Oleg Endo

On Mon, 2012-10-08 at 09:45 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > The attached patch improves comparisons such as
> > 'unsigned int <= 0x7FFF' on SH.
> > As mentioned in the PR, for some reason, those comparisons do not go
> > through the cstore expander.  As a consequence the comparison doesn't
> > get the chance to be canonicalized by the target code and ends up as
> > '(~x) >> 31'.
> > I've not investigated this further and just fixed the symptoms on SH.  I
> > don't know whether it's also an issue on other targets.
> > 
> > Tested on rev 192142 with
> > make -k check RUNTESTFLAGS="--target_board=sh-sim
> > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> > 
> > and no new failures.
> > OK?
> 
> I've run CSiBE with and without the patch for sh4-unknown-linux-gnu
> at -O2.  Only one difference in the resulted sizes: jpeg-6b/jcphuff
> increases 5336 bytes to 5340 bytes with the patch.  Could you look
> into it?

Yep, that's actually the only place in the CSiBE set where this case
hits.  The function in question is encode_mcu_AC_refine.  The increase
seems to be due to different register allocation and different spill
code :T
I've attached the asm diff.

Cheers,
Oleg
--- CSiBE/m4-single-ml-O2-trunk/jpeg-6b/jcphuff.s
+++ CSiBE/m4-single-ml-O2/jpeg-6b/jcphuff.s
@@ -2147,7 +2147,7 @@
 	bt/s	.L611
 	mov.l	r2,@(24,r15)
 	bra	.L612
-	mov.l	@(44,r15),r0
+	mov.l	@(44,r15),r3
 .L611:
 	mov.l	.L565,r4
 	mov	r2,r5
@@ -2513,21 +2513,21 @@
 	mov	r0,r1
 	mov	r9,r0
 	and	r2,r1
-	mov.l	@(24,r15),r3
+	mov.l	@(28,r15),r3
 	mov.b	r1,@(r0,r8)
 	mov	r9,r11
-	mov.l	@(28,r15),r0
-	add	#1,r3
-	mov.l	@(36,r15),r1
+	mov.l	@(24,r15),r2
+	add	#4,r3
+	mov.l	@(36,r15),r0
 	add	#1,r11
-	mov.l	@(40,r15),r2
+	mov.l	@(40,r15),r1
+	add	#1,r2
 	add	#4,r0
-	add	#4,r1
-	mov.l	r3,@(24,r15)
-	mov.l	r0,@(28,r15)
-	cmp/ge	r3,r2
+	mov.l	r2,@(24,r15)
+	mov.l	r3,@(28,r15)
+	cmp/ge	r2,r1
 	bt/s	.L599
-	mov.l	r1,@(36,r15)
+	mov.l	r0,@(36,r15)
 	tst	r11,r11
 	bt/s	.L555
 	mov	r12,r14
@@ -2545,21 +2545,23 @@
 	mov.w	.L578,r3
 	cmp/hi	r3,r2
 	bf/s	.L612
-	mov.l	@(44,r15),r0
+	mov.l	@(44,r15),r3
 .L515:
 	mov.l	.L582,r2
 	jsr	@r2
 	mov	r14,r4
 .L459:
+	mov.l	@(44,r15),r3
+.L612:
 	mov.l	@(44,r15),r0
-.L612:
+	mov.l	@(24,r3),r2
 	mov.l	@(16,r14),r3
-	mov.l	@(24,r0),r2
 	mov.l	r3,@r2
 	mov.l	@(20,r14),r3
 	mov.l	r3,@(4,r2)
 	mov.w	.L580,r2
-	mov.l	@(r0,r2),r2
+	add	r0,r2
+	mov.l	@(8,r2),r2
 	tst	r2,r2
 	bt	.L544
 	add	#64,r14
@@ -2594,18 +2596,18 @@
 	add	#1,r2
 	mov.l	r2,@(16,r15)
 .L467:
-	mov.l	@(24,r15),r3
-	mov.l	@(28,r15),r0
-	mov.l	@(36,r15),r1
-	add	#1,r3
-	mov.l	@(40,r15),r2
+	mov.l	@(24,r15),r2
+	mov.l	@(28,r15),r3
+	mov.l	@(36,r15),r0
+	add	#1,r2
+	mov.l	@(40,r15),r1
+	add	#4,r3
 	add	#4,r0
-	add	#4,r1
-	mov.l	r3,@(24,r15)
-	mov.l	r0,@(28,r15)
-	cmp/ge	r3,r2
+	mov.l	r2,@(24,r15)
+	mov.l	r3,@(28,r15)
+	cmp/ge	r2,r1
 	bf/s	.L603
-	mov.l	r1,@(36,r15)
+	mov.l	r0,@(36,r15)
 .L599:
 	bra	.L617
 	mov.l	@(28,r15),r1
@@ -2614,8 +2616,8 @@
 	bf/s	.L523
 	mov	r12,r14
 .L555:
-	mov.l	@(16,r15),r3
-	cmp/pl	r3
+	mov.l	@(16,r15),r2
+	cmp/pl	r2
 	bf	.L459
 	mov.l	@(56,r14),r3
 	bra	.L625
@@ -2642,13 +2644,13 @@
 	add	#1,r2
 	mov.l	r2,@r3
 .L511:
-	mov.l	@(20,r15),r1
+	mov.l	@(20,r15),r0
 .L620:
-	mov	#0,r2
+	mov	#0,r1
 	mov	#0,r11
-	mov.l	r2,@(16,r15)
+	mov.l	r1,@(16,r15)
 	bra	.L467
-	mov.l	@(0,r1),r8
+	mov.l	@(0,r0),r8
 	.align 1
 .L522:
 	bra	.L619
@@ -2659,7 +2661,7 @@
 .L578:
 	.short	937
 .L580:
-	.short	196
+	.short	188
 .L581:
 	.short	312
 .L583:
@@ -2728,16 +2730,15 @@
 	tst	r3,r3
 	mov.l	r14,@(28,r12)
 	mov.l	@r1,r0
-	mov.l	@(52,r15),r2
+	mov.l	@(52,r15),r1
 	add	r0,r0
 	mov.l	r11,@(24,r12)
 	bf/s	.L511
-	mov.w	@(r0,r2),r1
-	not	r1,r1
-	mov	r14,r10
-	shll	r1
+	mov.w	@(r0,r1),r2
+	cmp/pz	r2
 	neg	r14,r3
 	movt	r1
+	mov	r14,r10
 	add	#23,r3
 	shld	r3,r1
 	add	#1,r10
@@ -2784,7 +2785,7 @@
 	mov	r9,r6
 .L601:
 	bra	.L620
-	mov.l	@(20,r15),r1
+	mov.l	@(20,r15),r0
 	.align 1
 .L556:
 	mov.l	.L589,r1
@@ -2812,9 +2813,9 @@
 	add	#-8,r10
 	.align 1
 .L558:
-	mov.l	.L589,r3
+	mov.l	.L589,r2
 	mov	r12,r4
-	jsr	@r3
+	jsr	@r2
 	mov.l	r1,@(4,r15)
 	mov.l	@(4,r15),r1
 	cmp/eq	r13,r1
@@ -2830,8 +2831,8 @@
 	dt	r2
 	bf/s	.L507
 	mov.l	r2,@(20,r12)
-	mov.l	.L589,r0
-	jsr	@r0
+	mov.l	.L589,r3
+	jsr	@r3
 	mov	r12,r4
 	bra	.L622
 	add	#-8,r10

Re: [SH] PR 54685 - unsigned int comparison with 0x7FFFFFFF

2012-10-07 Thread Oleg Endo

On Mon, 2012-10-08 at 03:53 +0200, Oleg Endo wrote:
> On Mon, 2012-10-08 at 09:45 +0900, Kaz Kojima wrote:
> > Oleg Endo  wrote:
> > > The attached patch improves comparisons such as
> > > 'unsigned int <= 0x7FFF' on SH.
> > > As mentioned in the PR, for some reason, those comparisons do not go
> > > through the cstore expander.  As a consequence the comparison doesn't
> > > get the chance to be canonicalized by the target code and ends up as
> > > '(~x) >> 31'.
> > > I've not investigated this further and just fixed the symptoms on SH.  I
> > > don't know whether it's also an issue on other targets.
> > > 
> > > Tested on rev 192142 with
> > > make -k check RUNTESTFLAGS="--target_board=sh-sim
> > > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> > > 
> > > and no new failures.
> > > OK?
> > 
> > I've run CSiBE with and without the patch for sh4-unknown-linux-gnu
> > at -O2.  Only one difference in the resulted sizes: jpeg-6b/jcphuff
> > increases 5336 bytes to 5340 bytes with the patch.  Could you look
> > into it?
> 
> Yep, that's actually the only place in the CSiBE set where this case
> hits.  The function in question is encode_mcu_AC_refine.  The increase
> seems to be due to different register allocation and different spill
> code :T
> I've attached the asm diff.

I've just checked this against current trunk (rev 192193).  The problem
seems to be gone.  There's also a total decrease of 152 bytes on this
file without the patch.  So it seems it was a different issue.  The diff
is now:

--- CSiBE/m4-single-ml-O2/jpeg-6b/jcphuff_.s
+++ CSiBE/m4-single-ml-O2/jpeg-6b/jcphuff.s
@@ -2626,13 +2626,12 @@
add r0,r0
mov.l   r3,@(24,r13)
bf/s.L502
-   mov.w   @(r0,r1),r11
+   mov.w   @(r0,r1),r7
mov.l   @(12,r15),r0
-   not r11,r11
-   shllr11
+   cmp/pz  r7
mov.l   @(12,r15),r10
+   movtr11
neg r0,r2
-   movtr11
add #23,r2
shldr2,r11
add #1,r10

Cheers,
Oleg

[SH] PR 54760 - Document new thread pointer built-ins

2012-10-08 Thread Oleg Endo

Hello,

This adds documentation on the new thread pointer built-ins that were
added recently to the SH target.
Tested with 'make info dvi pdf'.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/54760
* doc/extend.texi (Target Builtins): Add SH built-in section.
Document __builtin_thread_pointer and 
__builtin_set_thread_pointer.
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 192200)
+++ gcc/doc/extend.texi	(working copy)
@@ -8651,6 +8651,7 @@
 * PowerPC Built-in Functions::
 * PowerPC AltiVec/VSX Built-in Functions::
 * RX Built-in Functions::
+* SH Built-in Functions::
 * SPARC VIS Built-in Functions::
 * SPU Built-in Functions::
 * TI C6X Built-in Functions::
@@ -13687,6 +13688,41 @@
 @samp{vec_vsx_st} builtins will always generate the VSX @samp{LXVD2X},
 @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
+@node SH Built-in Functions
+@subsection SH Built-in Functions
+The following built-in functions are supported on the SH1, SH2, SH3 and SH4
+families of processors:
+
+@deftypefn {Built-in Function} {void} __builtin_set_thread_pointer (void *@var{ptr})
+Sets the @samp{GBR} register to the specified value @var{ptr}.  This is usually
+used by system code that manages threads and execution contexts.  The compiler
+normally will not generate code that modifies the contents of @samp{GBR} and
+thus the value is preserved across function calls.  Changing the @samp{GBR}
+value in user code must be done with caution, since the compiler might use
+@samp{GBR} in order to access thread local variables.
+
+@end deftypefn
+
+@deftypefn {Built-in Function} {void *} __builtin_thread_pointer (void)
+Returns the value that is currently set in the @samp{GBR} register.
+Memory loads and stores that use the thread pointer as a base address will be
+turned into @samp{GBR} based displacement loads and stores, if possible.
+For example:
+@smallexample
+struct my_tcb
+@{
+   int a, b, c, d, e;
+@};
+
+int get_tcb_value (void)
+@{
+  // Generate @samp{mov.l @@(8,gbr),r0} instruction
+  return ((my_tcb*)__builtin_thread_pointer ())->c;
+@}
+
+@end smallexample
+@end deftypefn
+
 @node RX Built-in Functions
 @subsection RX Built-in Functions
 GCC supports some of the RX instructions which cannot be expressed in

[wwwdocs] SH 4.8 changes - document thread pointer built-ins

2012-10-08 Thread Oleg Endo

Hello,

This documents the new thread pointer built-ins in the SH www changes
for 4.8.
OK?

Cheers,
Oleg
? www_4_8_sh_changes_3.patch
Index: htdocs/gcc-4.8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.43
diff -u -r1.43 changes.html
--- htdocs/gcc-4.8/changes.html	8 Oct 2012 19:22:43 -	1.43
+++ htdocs/gcc-4.8/changes.html	8 Oct 2012 22:01:06 -
@@ -377,6 +377,15 @@
 is now enabled and the option -ffinite-math-only implicitly
 sets -mno-ieee.
 
+Added support for the built-in functions
+__builtin_thread_pointer and
+__builtin_set_thread_pointer.  This assumes that
+GBR is used to hold the thread pointer of the current thread,
+which has been the case since a while already.  Memory loads and stores
+relative to the address returned by __builtin_thread_pointer
+will now also utilize GBR based displacement address modes.
+
+
   
 
 SPARC

[SH] PR 34777 - Add test case

2012-10-08 Thread Oleg Endo

Hello,

This adds the reduced test case as mentioned by Kaz in the PR to the
test suite.
Tested with
make -k check-gcc RUNTESTFLAGS="compile.exp=pr34777*
--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

OK?

Cheers,
Oleg

testsuite/ChangeLog:

PR target/34777
* gcc.c-torture/compile/pr34777.c: New.
Index: gcc/testsuite/gcc.c-torture/compile/pr34777.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr34777.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/pr34777.c	(revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-additional-options "-fPIC" }  */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*" } { "" } }  */
+
+static __inline __attribute__ ((__always_inline__)) void *
+_dl_mmap (void * start, int length, int prot, int flags, int fd,
+	  int offset)
+{
+  register long __sc3 __asm__ ("r3") = 90;
+  register long __sc4 __asm__ ("r4") = (long) start;
+  register long __sc5 __asm__ ("r5") = (long) length;
+  register long __sc6 __asm__ ("r6") = (long) prot;
+  register long __sc7 __asm__ ("r7") = (long) flags;
+  register long __sc0 __asm__ ("r0") = (long) fd;
+  register long __sc1 __asm__ ("r1") = (long) offset;
+  __asm__ __volatile__ ("trapa	%1"
+			: "=z" (__sc0)
+			: "i" (0x10 + 6), "0" (__sc0), "r" (__sc4),
+			  "r" (__sc5), "r" (__sc6), "r" (__sc7),
+			  "r" (__sc3), "r" (__sc1)
+			: "memory" );
+}
+
+extern int _dl_pagesize;
+void
+_dl_dprintf(int fd, const char *fmt, ...)
+{
+  static char *buf;
+  buf = _dl_mmap ((void *) 0, _dl_pagesize, 0x1 | 0x2, 0x02 | 0x20, -1, 0);
+}

Re: [SH] PR 34777 - Add test case

2012-10-09 Thread Oleg Endo

On Tue, 2012-10-09 at 18:33 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > This adds the reduced test case as mentioned by Kaz in the PR to the
> > test suite.
> > Tested with
> > make -k check-gcc RUNTESTFLAGS="compile.exp=pr34777*
> > --target_board=sh-sim
> > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> > 
> > OK?
> 
> It should be put into gcc.target/sh instead of gcc.c-torture/compile
> and tested with -Os -fschedule-insns -fPIC -mprefergot, shouldn't it?
> 

Uhm, yes, I forgot to add the -fschedule-insns and -mprefergot options.
Regarding the -Os option, I think it's better to test this one at
multiple optimization levels, just in case.  I've looked through
gcc.c-torture/compile and found some target specific test cases there,
so I thought it would be OK to do the same :)
Some targets also have their own torture subdir.  If it's better, I
could also create gcc.target/sh/torture.

Cheers,
Oleg

[SH] PR 52480 - fix movua.l for big endian

2012-10-09 Thread Oleg Endo

Hello,

This is the same patch I posted in the PR.  It seems to fix the issue.
Tested on rev 192200 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/52480
* config/sh/sh.md (extv, extzv): Check that operands[3] is zero,
regardless of the endianness.

testsuite/ChangeLog:

PR target/52480
* gcc.target/sh/sh4a-bitmovua.c: Compact skip-if list.  Add 
runtime tests.
Index: gcc/testsuite/gcc.target/sh/sh4a-bitmovua.c
===
--- gcc/testsuite/gcc.target/sh/sh4a-bitmovua.c	(revision 192200)
+++ gcc/testsuite/gcc.target/sh/sh4a-bitmovua.c	(working copy)
@@ -1,7 +1,7 @@
 /* Verify that we generate movua to load unaligned 32-bit values on SH4A.  */
-/* { dg-do compile { target "sh*-*-*" } } */
-/* { dg-options "-O" } */
-/* { dg-skip-if "" { "sh*-*-*" } { "*" } { "-m4a" "-m4a-single" "-m4a-single-only" "-m4a-nofpu" } }  */
+/* { dg-do run { target "sh*-*-*" } } */
+/* { dg-options "-O1 -save-temps -fno-inline" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "*" } { "-m4a*" } }  */
 /* { dg-final { scan-assembler-times "movua.l" 6 } } */
 
 /* Aligned.  */
@@ -64,4 +64,28 @@
   return y4.d;
 }
 
+#include 
 
+int
+main (void)
+{
+  x1.d = 0x12345678;
+  assert (f1 () == 0x12345678);
+
+  x2.d = 0x12345678;
+  assert (f2 () == 0x12345678);
+
+  x3.d = 0x12345678;
+  assert (f3 () == 0x12345678);
+
+  y_1.d = 0x12345678;
+  assert (g1 () == 0x12345678);
+
+  y2.d = 0x12345678;
+  assert (g2 () == 0x12345678);
+
+  y3.d = 0x12345678;
+  assert (g3 () == 0x12345678);
+
+  return 0;
+}
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192200)
+++ gcc/config/sh/sh.md	(working copy)
@@ -12706,7 +12706,7 @@
}
   if (TARGET_SH4A_ARCH
   && INTVAL (operands[2]) == 32
-  && INTVAL (operands[3]) == -24 * (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+  && INTVAL (operands[3]) == 0
   && MEM_P (operands[1]) && MEM_ALIGN (operands[1]) < 32)
 {
   rtx src = adjust_address (operands[1], BLKmode, 0);
@@ -12738,7 +12738,7 @@
 }
   if (TARGET_SH4A_ARCH
   && INTVAL (operands[2]) == 32
-  && INTVAL (operands[3]) == -24 * (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+  && INTVAL (operands[3]) == 0
   && MEM_P (operands[1]) && MEM_ALIGN (operands[1]) < 32)
 {
   rtx src = adjust_address (operands[1], BLKmode, 0);

[SH] PR 51244 - Improve T bit store and cbranch

2012-10-11 Thread Oleg Endo

Hello,

This one further improves T bit stores and conditional branches on SH
for cases like described in comment #53 in the PR.
Tested on rev 192200 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.

Cheers,
Oleg

gcc/ChangeLog:

PR target/51244
* config/sh/sh.md (negsi_cond, negdi_cond, stack_protect_test): 
Remove get_t_reg_rtx when invoking gen_branch_true or 
gen_branch_false.
(*zero_extendsi2_compact): Convert to insn_and_split.  
Convert zero extensions of T bit stores to reg moves in 
splitter.  Remove obsolete unnamed peephole2 that caught zero 
extensions after negc T bit stores.
(*branch_true_eq, *branch_false_ne): Delete.
(branch_true, branch_false): Convert insn to expander.  Move 
actual insn logic to...
(*cbranch_t): ...this new insn_and_split.  Try to find 
preceding redundant T bit stores and tests and combine them 
with the conditional branch if possible in the splitter.
(movrt_xor, *movt_movrt): New insn_and_split.
* config/sh/predicates.md (cbranch_treg_value): New predicate.
* config/sh/sh-protos.h (sh_eval_treg_value): Forward declare...
* config/sh/sh.c (sh_eval_treg_value): ...this new function.
(expand_cbranchsi4, expand_cbranchdi4): Remove get_t_reg_rtx 
when invoking gen_branch_true or gen_branch_false.

testsuite/ChangeLog:

PR target/51244
* gcc.target/sh/pr51244-13.c: New.
* gcc.target/sh/pr51244-14.c: New.
* gcc.target/sh/pr51244-15.c: New.
* gcc.target/sh/pr51244-16.c: New.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192200)
+++ gcc/config/sh/sh.c	(working copy)
@@ -2059,7 +2059,7 @@
 void
 expand_cbranchsi4 (rtx *operands, enum rtx_code comparison, int probability)
 {
-  rtx (*branch_expander) (rtx, rtx) = gen_branch_true;
+  rtx (*branch_expander) (rtx) = gen_branch_true;
   comparison = prepare_cbranch_operands (operands, SImode, comparison);
   switch (comparison)
 {
@@ -2071,7 +2071,7 @@
   emit_insn (gen_rtx_SET (VOIDmode, get_t_reg_rtx (),
   gen_rtx_fmt_ee (comparison, SImode,
   operands[1], operands[2])));
-  rtx jump = emit_jump_insn (branch_expander (operands[3], get_t_reg_rtx ()));
+  rtx jump = emit_jump_insn (branch_expander (operands[3]));
   if (probability >= 0)
 add_reg_note (jump, REG_BR_PROB, GEN_INT (probability));
 }
@@ -2123,7 +2123,7 @@
   if (TARGET_CMPEQDI_T)
 	{
 	  emit_insn (gen_cmpeqdi_t (operands[1], operands[2]));
-	  emit_jump_insn (gen_branch_true (operands[3], get_t_reg_rtx ()));
+	  emit_jump_insn (gen_branch_true (operands[3]));
 	  return true;
 	}
   msw_skip = NE;
@@ -2150,7 +2150,7 @@
   if (TARGET_CMPEQDI_T)
 	{
 	  emit_insn (gen_cmpeqdi_t (operands[1], operands[2]));
-	  emit_jump_insn (gen_branch_false (operands[3], get_t_reg_rtx ()));
+	  emit_jump_insn (gen_branch_false (operands[3]));
 	  return true;
 	}
   msw_taken = NE;
@@ -2281,6 +2281,43 @@
   return true;
 }
 
+/* Given an operand, return 1 if the evaluated operand plugged into an
+   if_then_else will result in a branch_true, 0 if branch_false, or
+   -1 if neither nor applies.  The truth table goes like this:
+
+   op   | cmpval |   code  | result
+   -++-+
+  T (0) |   0|  EQ (1) |  0 = 0 ^ (0 == 1)
+  T (0) |   1|  EQ (1) |  1 = 0 ^ (1 == 1)
+  T (0) |   0|  NE (0) |  1 = 0 ^ (0 == 0)
+  T (0) |   1|  NE (0) |  0 = 0 ^ (1 == 0)
+ !T (1) |   0|  EQ (1) |  1 = 1 ^ (0 == 1)
+ !T (1) |   1|  EQ (1) |  0 = 1 ^ (1 == 1)
+ !T (1) |   0|  NE (0) |  0 = 1 ^ (0 == 0)
+ !T (1) |   1|  NE (0) |  1 = 1 ^ (1 == 0)  */
+int
+sh_eval_treg_value (rtx op)
+{
+  enum rtx_code code = GET_CODE (op);
+  if ((code != EQ && code != NE) || !CONST_INT_P (XEXP (op, 1)))
+return -1;
+
+  int cmpop = code == EQ ? 1 : 0;
+  int cmpval = INTVAL (XEXP (op, 1));
+  if (cmpval != 0 && cmpval != 1)
+return -1;
+
+  int t;
+  if (t_reg_operand (XEXP (op, 0), GET_MODE (XEXP (op, 0
+t = 0;
+  else if (negt_reg_operand (XEXP (op, 0), GET_MODE (XEXP (op, 0
+t = 1;
+  else
+return -1;
+  
+  return t ^ (cmpval == cmpop);
+}
+
 /* Emit INSN, possibly in a PARALLEL with an USE of fpscr for SH4.  */
 
 static void
@@ -2485,9 +2522,9 @@
 sh_emit_set_t_insn (gen_ieee_ccmpeqsf_t (op0, op1), mode);
 
   if (branch_code == code)
-emit_jump_insn (gen_branch_true (operands[3], get_t_reg_rtx ()));
+emit_jump_insn (gen_branch_true (operands[3]));
   else
-emit_jump_insn (gen_branch_false (operands[3], get_t_reg_rtx ()));
+emit_jump_insn (gen_branch_false (operands[3]));
 }
 
 void
@@ -2521,7 +2558,7 @@

Re: [PATCH 0/6] Thread pointer built-in functions / [SH] PR 54760

2012-10-11 Thread Oleg Endo

On Thu, 2012-10-11 at 23:11 +0800, Chung-Lin Tang wrote:
> This patch set has been committed, thanks to all maintainers who
> reviewed the respective parts.
> 
> Thanks,
> Chung-Lin
> 

This broke the recently added thread pointer built-ins on SH, but I was
prepared for that, so no problem here.  The attached patch is a straight
forward fix.

However, with the patch applied I get an ICE on one of the SH thread
pointer tests:  gcc/testsuite/gcc.target/sh/pr54760-3.c, function
test04:

internal compiler error: in expand_insn, at optabs.c:8208
__builtin_set_thread_pointer (xx[i]);
 ^
0x8478872 expand_insn
../../gcc-trunk2/gcc/optabs.c:8208
0x8478872 expand_insn
../../gcc-trunk2/gcc/optabs.c:8204
0x81ded5a expand_builtin_set_thread_pointer
../../gcc-trunk2/gcc/builtins.c:5780
0x81e6b18 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode,
int)
../../gcc-trunk2/gcc/builtins.c:6855
0x82eeaf9 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**)
../../gcc-trunk2/gcc/expr.c:10143
0x8216a7b expand_call_stmt
../../gcc-trunk2/gcc/cfgexpand.c:2012
0x8216a7b expand_gimple_stmt_1
../../gcc-trunk2/gcc/cfgexpand.c:2050
0x8216a7b expand_gimple_stmt
../../gcc-trunk2/gcc/cfgexpand.c:2202
0x8218406 expand_gimple_basic_block
../../gcc-trunk2/gcc/cfgexpand.c:3956
0x821a417 gimple_expand_cfg
../../gcc-trunk2/gcc/cfgexpand.c:4475
Please submit a full bug report,


All the other test cases seem to produce code as expected though.
Could you please try out the failing test case mentioned above?
As mentioned in the file gcc/testsuite/gcc.target/sh/pr54760-3.c it
should be moved to C torture tests.

Anyway, regardless of this failure, the attached patch for SH should be
applicable. OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/54760
* config/sh/sh.c (bdesc): Remove thread pointer built-ins.
* config/sh/sh.md (get_thread_pointer, set_thread_pointer): 
Append mode name 'si'.

Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192378)
+++ gcc/config/sh/sh.c	(working copy)
@@ -11778,12 +11778,6 @@
 CODE_FOR_byterev,	"__builtin_sh_media_BYTEREV", SH_BLTIN_2, 0 },
   { shmedia_builtin_p,
 CODE_FOR_prefetch,	"__builtin_sh_media_PREFO", SH_BLTIN_PSSV, 0 },
-
-  { sh1_builtin_p,
-CODE_FOR_get_thread_pointer, "__builtin_thread_pointer", SH_BLTIN_VP, 0 },
-  { sh1_builtin_p,
-CODE_FOR_set_thread_pointer, "__builtin_set_thread_pointer",
-SH_BLTIN_PV, 0 },
 };
 
 static void
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192378)
+++ gcc/config/sh/sh.md	(working copy)
@@ -10085,7 +10085,7 @@
 ;;
 ;; On SH the thread pointer is kept in the GBR.
 ;; These patterns are usually expanded from the respective built-in functions.
-(define_expand "get_thread_pointer"
+(define_expand "get_thread_pointersi"
   [(set (match_operand:SI 0 "register_operand") (reg:SI GBR_REG))]
   "TARGET_SH1")
 
@@ -10096,7 +10096,7 @@
   "stc	gbr,%0"
   [(set_attr "type" "tls_load")])
 
-(define_expand "set_thread_pointer"
+(define_expand "set_thread_pointersi"
   [(set (reg:SI GBR_REG)
 	(unspec_volatile:SI [(match_operand:SI 0 "register_operand")]
 	 UNSPECV_GBR))]

[SH] PR 54680

2012-10-12 Thread Oleg Endo

Hello,

The attached patch fixes PR 54680.
Tested on rev 192200 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/54680
* config/sh/sh.c (sh_fsca_sf2int, sh_fsca_int2sf): Fix swapped
comments.
* config/sh/predicates.md (fpul_operand): Add comment.
(fpul_fsca_operand, fsca_scale_factor): New predicates.
* config/sh/sh.md (fsca): Move below sincossf3 expander.  
Convert to insn_and_split.  Use fpul_fsca_operand and 
fsca_scale_factor predicates. Simplify fpul operand in splitter.

testsuite/ChangeLog:

PR target/54680
* gcc.target/sh/pr54680.c: New.
Index: gcc/testsuite/gcc.target/sh/pr54680.c
===
--- gcc/testsuite/gcc.target/sh/pr54680.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr54680.c	(revision 0)
@@ -0,0 +1,66 @@
+/* Verify that the fsca input value is not converted to float and then back
+   to int.  Notice that we can't count just "lds" insns because mode switches
+   use "lds.l".  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O2 -mfsca -funsafe-math-optimizations" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m1" "-m2*" "-m3*" "-m4al" "*nofpu" "-m4-340*" "-m4-400*" "-m4-500*" "-m5*" } { "" } }  */
+/* { dg-final { scan-assembler-times "fsca" 7 } } */
+/* { dg-final { scan-assembler-times "shad" 1 } } */
+/* { dg-final { scan-assembler-times "lds\t" 6 } } */
+/* { dg-final { scan-assembler-times "fmul" 2 } } */
+/* { dg-final { scan-assembler-times "ftrc" 1 } } */
+
+#include 
+
+static const float pi = 3.14159265359f;
+
+float
+test00 (int x)
+{
+  /* 1x shad, 1x lds, 1x fsca  */
+  return sinf ( (x >> 8) * (2*pi) / (1 << 16));
+}
+
+float
+test01 (int x)
+{
+  /* 1x lds, 1x fsca  */
+  return sinf (x * (2*pi) / 65536);
+}
+
+float
+test02 (int x)
+{
+  /* 1x lds, 1x fsca  */
+  return sinf (x * (2*pi / 65536));
+}
+
+float
+test03 (int x)
+{
+  /* 1x lds, 1x fsca  */
+  float scale = 2*pi / 65536;
+  return sinf (x * scale);
+}
+
+float
+test04 (int x)
+{
+  /* 1x lds, 1x fsca  */
+  return cosf (x / 65536.0f * 2*pi);
+}
+
+float
+test05 (int x)
+{
+  /* 1x lds, 1x fsca, 1x fmul  */
+  float scale = 2*pi / 65536;
+  return sinf (x * scale) * cosf (x * scale);
+}
+
+float
+test_06 (float x)
+{
+  /* 1x fmul, 1x ftrc, 1x fsca  */
+  return sinf (x);
+}
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192200)
+++ gcc/config/sh/sh.c	(working copy)
@@ -12628,11 +12628,9 @@
   gcc_unreachable ();
 }
 
-/* This function returns a constant rtx that represents pi / 2**15 in
-   SFmode.  it's used to scale SFmode angles, in radians, to a
-   fixed-point signed 16.16-bit fraction of a full circle, i.e., 2*pi
-   maps to 0x1).  */
-
+/* This function returns a constant rtx that represents 2**15 / pi in
+   SFmode.  It's used to scale a fixed-point signed 16.16-bit fraction
+   of a full circle back to an SFmode value, i.e. 0x1 maps to 2*pi.  */
 static GTY(()) rtx sh_fsca_sf2int_rtx;
 
 rtx
@@ -12649,11 +12647,10 @@
   return sh_fsca_sf2int_rtx;
 }
 
-/* This function returns a constant rtx that represents 2**15 / pi in
-   SFmode.  it's used to scale a fixed-point signed 16.16-bit fraction
-   of a full circle back to a SFmode value, i.e., 0x1 maps to
-   2*pi).  */
-
+/* This function returns a constant rtx that represents pi / 2**15 in
+   SFmode.  It's used to scale SFmode angles, in radians, to a
+   fixed-point signed 16.16-bit fraction of a full circle, i.e. 2*pi
+   maps to 0x1.  */
 static GTY(()) rtx sh_fsca_int2sf_rtx;
 
 rtx
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192200)
+++ gcc/config/sh/sh.md	(working copy)
@@ -12055,22 +12055,6 @@
   [(set_attr "type" "fsrra")
(set_attr "fp_mode" "single")])
 
-(define_insn "fsca"
-  [(set (match_operand:V2SF 0 "fp_arith_reg_operand" "=f")
-	(vec_concat:V2SF
-	 (unspec:SF [(mult:SF
-		  (float:SF (match_operand:SI 1 "fpul_operand" "y"))
-		  (match_operand:SF 2 "immediate_operand" "i"))
-		] UNSPEC_FSINA)
-	 (unspec:SF [(mult:SF (float:SF (match_dup 1)) (match_dup 2))
-		] UNSPEC_FCOSA)))
-   (use (match_operand:PSI 3 "fpscr_operand" "c"))]
-  "TARGET_FPU_ANY && TARGET_FSCA
-   && operands[2] == sh_fsca_int2sf ()"
-  "fsca	fpul,%d0"
-  [(set_attr "type" "fsca")
-   (set_attr "fp_mode" "single")])
-
 ;; When the sincos pattern is defined, the builtin functions sin and cos
 ;; will be expanded to the sincos pattern and one of the output values will
 ;; remain unused.
@@ -12097,6 +12081,38 @@
   DONE;
 })
 
+(define_insn_and_split "fsca"
+  [(set (match_operand:V2SF 0 "fp_arith_reg_operand" "=f")
+	(vec_concat:V2SF
+	 (unspec:SF [(mult:SF
+		  (float:SF (match_opera

[SH] PR 54602

2012-10-12 Thread Oleg Endo

Hello,

This fixes the issue of PR 54602 as proposed in the PR.
Tested on rev 192200 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/54602
* config/sh/sh.md: Correct define_delay for return insns.
(*movsi_pop): Delete.

testsuite/ChangeLog:

PR target/54602
* gcc.target/sh/pr54602-1.c: New.
* gcc.target/sh/pr54602-2.c: New.
* gcc.target/sh/pr54602-3.c: New.
* gcc.target/sh/pr54602-4.c: New.
Index: gcc/testsuite/gcc.target/sh/pr54602-1.c
===
--- gcc/testsuite/gcc.target/sh/pr54602-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr54602-1.c	(revision 0)
@@ -0,0 +1,15 @@
+/* Verify that the delay slot is stuffed with register pop insns for normal
+   (i.e. not interrupt handler) function returns.  If everything goes as
+   expected we won't see any nop insns.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } }  */
+/* { dg-final { scan-assembler-not "nop" } } */
+
+int test00 (int a, int b);
+
+int
+test01 (int a, int b, int c, int d)
+{
+  return test00 (a, b) + c;
+}
Index: gcc/testsuite/gcc.target/sh/pr54602-3.c
===
--- gcc/testsuite/gcc.target/sh/pr54602-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr54602-3.c	(revision 0)
@@ -0,0 +1,12 @@
+/* Verify that the rte delay slot is not stuffed with register pop insns
+   which touch the banked registers r0..r7 on SH3* and SH4* targets.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "*" } { "-m3*" "-m4*" } }  */
+/* { dg-final { scan-assembler-times "nop" 1 } } */
+
+int __attribute__ ((interrupt_handler))
+test00 (int a, int b, int c, int d)
+{
+  return a + b;
+}
Index: gcc/testsuite/gcc.target/sh/pr54602-2.c
===
--- gcc/testsuite/gcc.target/sh/pr54602-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr54602-2.c	(revision 0)
@@ -0,0 +1,15 @@
+/* Verify that the delay slot is not stuffed with register pop insns for
+   interrupt handler function returns on SH1* and SH2* targets, where the
+   rte insn uses the stack pointer.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "*" } { "-m1*" "-m2*" } }  */
+/* { dg-final { scan-assembler-times "nop" 1 } } */
+
+int test00 (int a, int b);
+
+int __attribute__ ((interrupt_handler))
+test01 (int a, int b, int c, int d)
+{
+  return test00 (a, b) + c;
+}
Index: gcc/testsuite/gcc.target/sh/pr54602-4.c
===
--- gcc/testsuite/gcc.target/sh/pr54602-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr54602-4.c	(revision 0)
@@ -0,0 +1,15 @@
+/* Verify that the delay slot is stuffed with register pop insns on SH3* and
+   SH4* targets, where the stack pointer is not used by the rte insn.  If
+   everything works out, we won't see a nop insn.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "*" } { "-m3*" "-m4*" } }  */
+/* { dg-final { scan-assembler-not "nop" } } */
+
+int test00 (int a, int b);
+
+int __attribute__ ((interrupt_handler))
+test01 (int a, int b, int c, int d)
+{
+  return test00 (a, b) + c;
+}
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192200)
+++ gcc/config/sh/sh.md	(working copy)
@@ -541,22 +541,22 @@
   (eq_attr "needs_delay_slot" "yes")
   [(eq_attr "in_delay_slot" "yes") (nil) (nil)])
 
-;; On the SH and SH2, the rte instruction reads the return pc from the stack,
-;; and thus we can't put a pop instruction in its delay slot.
-;; On the SH3 and SH4, the rte instruction does not use the stack, so a pop
-;; instruction can go in the delay slot.
 ;; Since a normal return (rts) implicitly uses the PR register,
 ;; we can't allow PR register loads in an rts delay slot.
+;; On the SH1* and SH2*, the rte instruction reads the return pc from the
+;; stack, and thus we can't put a pop instruction in its delay slot.
+;; On the SH3* and SH4*, the rte instruction does not use the stack, so a
+;; pop instruction can go in the delay slot, unless it references a banked
+;; register (the register bank is switched by rte).
 (define_delay
   (eq_attr "type" "return")
   [(and (eq_attr "in_delay_slot" "yes")
 	(ior (and (eq_attr "interrupt_function" "no")
 		  (eq_attr "type" "!pload,prset"))
 	 (and (eq_attr "interrupt_function" "yes")
-		  (ior
-		   (not (match_test "TARGET_SH3"))
-		   (eq_attr "hit_stack" "no")
-		   (eq_attr "banked" "no") (nil) (nil)])
+		  (ior (match_test "TARGET_SH3") (eq_attr "hit_

Re: [PATCH 0/6] Thread pointer built-in functions / [SH] PR 54760

2012-10-13 Thread Oleg Endo

On Sat, 2012-10-13 at 17:33 +0800, Chung-Lin Tang wrote:
> On 2012/10/12 06:55 AM, Oleg Endo wrote:
> > This broke the recently added thread pointer built-ins on SH, but I was
> > prepared for that, so no problem here.  The attached patch is a straight
> > forward fix.
> > 
> > However, with the patch applied I get an ICE on one of the SH thread
> > pointer tests:  gcc/testsuite/gcc.target/sh/pr54760-3.c, function
> > test04:
> > 
> > internal compiler error: in expand_insn, at optabs.c:8208
> > __builtin_set_thread_pointer (xx[i]);
> 
> Looks like I was supposed to use create_input_operand() there instead.
> I've committed the attached patch as obvious. This should be fixed now.

Yep, confirmed.  Thanks!

Cheers,
Oleg

Re: [SH] PR 34777 - Add test case

2012-10-14 Thread Oleg Endo

On Wed, 2012-10-10 at 07:46 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > Uhm, yes, I forgot to add the -fschedule-insns and -mprefergot options.
> > Regarding the -Os option, I think it's better to test this one at
> > multiple optimization levels, just in case.  I've looked through
> > gcc.c-torture/compile and found some target specific test cases there,
> > so I thought it would be OK to do the same :)
> > Some targets also have their own torture subdir.  If it's better, I
> > could also create gcc.target/sh/torture.
> 
> Maybe.  For this specific test, I thought that "-Os -fschedule-insns
> -fPIC -mprefergot" would be enough because empirically these options
> will give high R0 register pressure which had caused that PR.
> 

Sorry for the delayed reply.
The attached patch adds gcc.target/sh/torture and puts the test there.
The torture subdir might be also useful in the future.
Tested on rev 192417 with
make -k check-gcc RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml}"

OK?

Cheers,
Oleg

testsuite/ChangeLog:

PR target/34777
* gcc.target/sh/torture/sh-torture.exp: New.
* gcc.target/sh/torture/pr34777.c: New.
Index: gcc/testsuite/gcc.target/sh/torture/pr34777.c
===
--- gcc/testsuite/gcc.target/sh/torture/pr34777.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/torture/pr34777.c	(revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-additional-options "-fschedule-insns -fPIC -mprefergot" }  */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*" } { "" } }  */
+
+static __inline __attribute__ ((__always_inline__)) void *
+_dl_mmap (void * start, int length, int prot, int flags, int fd,
+	  int offset)
+{
+  register long __sc3 __asm__ ("r3") = 90;
+  register long __sc4 __asm__ ("r4") = (long) start;
+  register long __sc5 __asm__ ("r5") = (long) length;
+  register long __sc6 __asm__ ("r6") = (long) prot;
+  register long __sc7 __asm__ ("r7") = (long) flags;
+  register long __sc0 __asm__ ("r0") = (long) fd;
+  register long __sc1 __asm__ ("r1") = (long) offset;
+  __asm__ __volatile__ ("trapa	%1"
+			: "=z" (__sc0)
+			: "i" (0x10 + 6), "0" (__sc0), "r" (__sc4),
+			  "r" (__sc5), "r" (__sc6), "r" (__sc7),
+			  "r" (__sc3), "r" (__sc1)
+			: "memory" );
+}
+
+extern int _dl_pagesize;
+void
+_dl_dprintf(int fd, const char *fmt, ...)
+{
+  static char *buf;
+  buf = _dl_mmap ((void *) 0, _dl_pagesize, 0x1 | 0x2, 0x02 | 0x20, -1, 0);
+}
Index: gcc/testsuite/gcc.target/sh/torture/sh-torture.exp
===
--- gcc/testsuite/gcc.target/sh/torture/sh-torture.exp	(revision 0)
+++ gcc/testsuite/gcc.target/sh/torture/sh-torture.exp	(revision 0)
@@ -0,0 +1,41 @@
+#   Copyright (C) 2012 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `gcc-dg.exp' driver, looping over
+# optimization options.
+
+# Exit immediately if this isn't a SH target.
+if { ![istarget sh*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] $DEFAULT_CFLAGS
+
+# All done.
+dg-finish

[SH] Document function attributes

2012-10-14 Thread Oleg Endo

Hello,

The attached patch adds documentation for SH specific function
attributes which haven't been documented yet.
Tested with 'make info dvi pdf'.
OK?

Cheers,
Oleg

gcc/ChangeLog:

* config/sh/sh.c: Update function attribute comments.
* doc/extend.texi (function_vector): Rephrase SH2A specific 
part.
(nosave_low_regs, renesas, trapa_handler): Document SH specific 
attributes.
(sp_switch, trap_exit): Add to index.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192417)
+++ gcc/config/sh/sh.c	(working copy)
@@ -9451,30 +9451,42 @@
   return;
 }
 
-/* Supported attributes:
+/*--
+/* Target specific attributes
+  Supported attributes are:
 
-   interrupt_handler -- specifies this function is an interrupt handler.
+   * interrupt_handler
+	Specifies this function is an interrupt handler.
 
-   trapa_handler - like above, but don't save all registers.
+   * trapa_handler
+	Like interrupt_handler, but don't save all registers.
 
-   sp_switch -- specifies an alternate stack for an interrupt handler
-   to run on.
+   * sp_switch
+	Specifies an alternate stack for an interrupt handler to run on.
 
-   trap_exit -- use a trapa to exit an interrupt function instead of
-   an rte instruction.
+   * trap_exit
+	Use a trapa to exit an interrupt function instead of rte.
 
-   nosave_low_regs - don't save r0..r7 in an interrupt handler.
- This is useful on the SH3 and upwards,
- which has a separate set of low regs for User and Supervisor modes.
- This should only be used for the lowest level of interrupts.  Higher levels
- of interrupts must save the registers in case they themselves are
- interrupted.
+   * nosave_low_regs
+	Don't save r0..r7 in an interrupt handler function.
+	This is useful on SH3* and SH4*, which have a separate set of low
+	regs for user and privileged modes.
+	This is mainly to be used for non-reentrant interrupt handlers (i.e.
+	those that run with interrupts disabled and thus can't be
+	interrupted thenselves).
 
-   renesas -- use Renesas calling/layout conventions (functions and
-   structures).
+   * renesas
+	Use Renesas calling/layout conventions (functions and structures).
 
-   resbank -- In case of an ISR, use a register bank to save registers
-   R0-R14, MACH, MACL, GBR and PR.  This is useful only on SH2A targets.
+   * resbank
+	In case of an interrupt handler function, use a register bank to
+	save registers R0-R14, MACH, MACL, GBR and PR.
+	This is available only on SH2A targets.
+
+   * function_vector
+	Declares a function to be called using the TBR relative addressing
+	mode.  Takes an argument that specifies the slot number in the table
+	where this function can be looked up by the JSR/N @@(disp8,TBR) insn.
 */
 
 /* Handle a 'resbank' attribute.  */
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 192417)
+++ gcc/doc/extend.texi	(working copy)
@@ -2682,17 +2682,16 @@
 the function vector has a limited size (maximum 128 entries on the H8/300
 and 64 entries on the H8/300H and H8S) and shares space with the interrupt vector.
 
-In SH2A target, this attribute declares a function to be called using the
+On SH2A targets, this attribute declares a function to be called using the
 TBR relative addressing mode.  The argument to this attribute is the entry
 number of the same function in a vector table containing all the TBR
-relative addressable functions.  For the successful jump, register TBR
-should contain the start address of this TBR relative vector table.
-In the startup routine of the user application, user needs to care of this
-TBR register initialization.  The TBR relative vector table can have at
-max 256 function entries.  The jumps to these functions will be generated
-using a SH2A specific, non delayed branch instruction JSR/N @@(disp8,TBR).
-You must use GAS and GLD from GNU binutils version 2.7 or later for
-this attribute to work correctly.
+relative addressable functions.  For correct operation the TBR must be setup
+accordingly to point to the start of the vector table before any functions with
+this attribute are invoked.  Usually a good place to do the initialization is
+the startup routine.  The TBR relative vector table can have at max 256 function
+entries.  The jumps to these functions will be generated using a SH2A specific,
+non delayed branch instruction JSR/N @@(disp8,TBR).  You must use GAS and GLD
+from GNU binutils version 2.7 or later for this attribute to work correctly.
 
 Please refer the example of M16C target, to see the use of this
 attribute while declaring a function,
@@ -3251,6 +3250,13 @@
 take function pointer arguments.  The @code{nothrow} attribute is not
 implemented in GCC versions earlier than 3.3.
 
+@item nosave_low_regs
+@cindex @co

[SH] PR 54760 - Add DImode GBR loads/stores, fix optimization

2012-10-15 Thread Oleg Endo

Hello,

I somehow initially forgot to implement DImode GBR based loads/stores.
Attached patch does that and also fixes a problem with the GBR address
mode optimization.
Tested on rev 192417 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/54760
* config/sh/sh.c (sh_find_base_reg_disp): Stop searching insns 
when hitting a call insn if GBR is marked as call used.
* config/sh/iterators.md (QIHISIDI): New mode iterator.
* config/sh/predicates.md (gbr_address_mem): New predicate.
* config/sh/sh.md (*movdi_gbr_load, *movdi_gbr_store): New 
insn_and_split.
Use QIHISIDI instead of QIHISI in unnamed GBR addressing splits.


testsuite/ChangeLog:

PR target/54760
* gcc.target/sh/pr54760-2.c: Add long long and unsigned long 
long test functions.
* gcc.target/sh/pr54760-4.c: New.   
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192417)
+++ gcc/config/sh/sh.c	(working copy)
@@ -13383,6 +13383,10 @@
   for (rtx i = prev_nonnote_insn (insn); i != NULL;
 	   i = prev_nonnote_insn (i))
 	{
+	  if (REGNO_REG_SET_P (regs_invalidated_by_call_regset, GBR_REG)
+	  && CALL_P (i))
+	break;
+
 	  if (!NONJUMP_INSN_P (i))
 	continue;
 
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192417)
+++ gcc/config/sh/sh.md	(working copy)
@@ -10277,6 +10277,47 @@
   "mov.	%0,@(0,gbr)"
   [(set_attr "type" "store")])
 
+;; DImode memory accesses have to be split in two SImode accesses.
+;; Split them before reload, so that it gets a better chance to figure out
+;; how to deal with the R0 restriction for the individual SImode accesses.
+;; Do not match this insn during or after reload because it can't be split
+;; afterwards.
+(define_insn_and_split "*movdi_gbr_load"
+  [(set (match_operand:DI 0 "register_operand")
+	(match_operand:DI 1 "gbr_address_mem"))]
+  "TARGET_SH1 && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 6))]
+{
+  /* Swap low/high part load order on little endian, so that the result reg
+ of the second load can be used better.  */
+  int off = TARGET_LITTLE_ENDIAN ? 1 : 0;
+  operands[3 + off] = gen_lowpart (SImode, operands[0]);
+  operands[5 + off] = gen_lowpart (SImode, operands[1]);
+  operands[4 - off] = gen_highpart (SImode, operands[0]);
+  operands[6 - off] = gen_highpart (SImode, operands[1]);
+})
+
+(define_insn_and_split "*movdi_gbr_store"
+  [(set (match_operand:DI 0 "gbr_address_mem")
+	(match_operand:DI 1 "register_operand"))]
+  "TARGET_SH1 && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 6))]
+{
+  /* Swap low/high part store order on big endian, so that stores of function
+ call results can save a reg copy.  */
+  int off = TARGET_LITTLE_ENDIAN ? 0 : 1;
+  operands[3 + off] = gen_lowpart (SImode, operands[0]);
+  operands[5 + off] = gen_lowpart (SImode, operands[1]);
+  operands[4 - off] = gen_highpart (SImode, operands[0]);
+  operands[6 - off] = gen_highpart (SImode, operands[1]);
+})
+
 ;; Sometimes memory accesses do not get combined with the store_gbr insn,
 ;; in particular when the displacements are in the range of the regular move
 ;; insns.  Thus, in the first split pass after the combine pass we search
@@ -10287,15 +10328,15 @@
 ;; other operand) and there's no point of doing it if the GBR is not
 ;; referenced in a function at all.
 (define_split
-  [(set (match_operand:QIHISI 0 "register_operand")
-	(match_operand:QIHISI 1 "memory_operand"))]
+  [(set (match_operand:QIHISIDI 0 "register_operand")
+	(match_operand:QIHISIDI 1 "memory_operand"))]
   "TARGET_SH1 && !reload_in_progress && !reload_completed
&& df_regs_ever_live_p (GBR_REG)"
   [(set (match_dup 0) (match_dup 1))]
 {
   rtx gbr_mem = sh_find_equiv_gbr_addr (curr_insn, operands[1]);
   if (gbr_mem != NULL_RTX)
-operands[1] = change_address (operands[1], GET_MODE (operands[1]), gbr_mem);
+operands[1] = replace_equiv_address (operands[1], gbr_mem);
   else
 FAIL;
 })
@@ -10309,7 +10350,7 @@
 {
   rtx gbr_mem = sh_find_equiv_gbr_addr (curr_insn, operands[1]);
   if (gbr_mem != NULL_RTX)
-operands[1] = change_address (operands[1], GET_MODE (operands[1]), gbr_mem);
+operands[1] = replace_equiv_address (operands[1], gbr_mem);
   else
 FAIL;
 })
@@ -10328,23 +10369,22 @@
   if (gbr_mem != NULL_RTX)
 {
   operands[2] = gen_reg_rtx (GET_MODE (operands[1]));
-  operands[1] = change_address (operands[1], GET_MODE (operands[1]),
-gbr_mem);
+  operands[1] = replace_equiv_address (operands[1], gbr_mem);
 }
   else
 FAIL;
 })
 
 (define_split
-  [(set

[SH] PR 51244 - Catch more unnecessary sign/zero extensions

2012-10-15 Thread Oleg Endo

Hello,

This one refactors some copy pasta that my previous patch regarding this
matter introduced and catches more unnecessary sign/zero extensions of T
bit stores.  It also fixes the bug reported in PR 54925 which popped up
after the last patch for PR 51244.
Tested on rev 192417 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/51244
* config/sh/sh-protos.h (set_of_reg): New struct.
(sh_find_set_of_reg, sh_is_logical_t_store_expr, 
sh_try_omit_signzero_extend):  Declare...
* config/sh/sh.c (sh_find_set_of_reg, 
sh_is_logical_t_store_expr, 
sh_try_omit_signzero_extend): ...these new functions.
* config/sh/sh.md (*logical_op_t): New insn_and_split.
(*zero_extendsi2_compact): Use sh_try_omit_signzero_extend
in splitter.
(*extendsi2_compact_reg): Convert to insn_and_split.  Use 
sh_try_omit_signzero_extend in splitter.
(*mov_reg_reg): Disallow t_reg_operand as operand 1.
(*cbranch_t): Rewrite combine part in splitter using new 
sh_find_set_of_reg function.

testsuite/ChangeLog:

PR target/51244
* gcc.target/sh/pr51244-17.c: New.
Index: gcc/config/sh/sh-protos.h
===
--- gcc/config/sh/sh-protos.h	(revision 192417)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -163,6 +163,25 @@
 	enum machine_mode mode = VOIDmode);
 extern rtx sh_find_equiv_gbr_addr (rtx cur_insn, rtx mem);
 extern int sh_eval_treg_value (rtx op);
+
+/* Result value of sh_find_set_of_reg.  */
+struct set_of_reg
+{
+  /* The insn where sh_find_set_of_reg stopped looking.
+ Can be NULL_RTX if the end of the insn list was reached.  */
+  rtx insn;
+
+  /* The set rtx of the specified reg if found, NULL_RTX otherwise.  */
+  const_rtx set_rtx;
+
+  /* The set source rtx of the specified reg if found, NULL_RTX otherwise.
+ Usually, this is the most interesting return value.  */
+  rtx set_src;
+};
+
+extern set_of_reg sh_find_set_of_reg (rtx reg, rtx insn, rtx(*stepfunc)(rtx));
+extern bool sh_is_logical_t_store_expr (rtx op, rtx insn);
+extern rtx sh_try_omit_signzero_extend (rtx extended_op, rtx insn);
 #endif /* RTX_CODE */
 
 extern void sh_cpu_cpp_builtins (cpp_reader* pfile);
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192417)
+++ gcc/config/sh/sh.c	(working copy)
@@ -13450,4 +13450,114 @@
   return NULL_RTX;
 }
 
+/*--
+  Manual insn combine support code.
+*/
+
+/* Given a reg rtx and a start insn, try to find the insn that sets the
+   specified reg by using the specified insn stepping function, such as 
+   'prev_nonnote_insn_bb'.  When the insn is found, try to extract the rtx
+   of the reg set.  */
+set_of_reg
+sh_find_set_of_reg (rtx reg, rtx insn, rtx(*stepfunc)(rtx))
+{
+  set_of_reg result;
+  result.insn = insn;
+  result.set_rtx = NULL_RTX;
+  result.set_src = NULL_RTX;
+
+  if (!REG_P (reg) || insn == NULL_RTX)
+return result;
+
+  for (result.insn = stepfunc (insn); result.insn != NULL_RTX;
+   result.insn = stepfunc (result.insn))
+{
+  if (LABEL_P (result.insn) || BARRIER_P (result.insn))
+	return result;
+  if (!NONJUMP_INSN_P (result.insn))
+	continue;
+  if (reg_set_p (reg, result.insn))
+	{
+	  result.set_rtx = set_of (reg, result.insn);
+
+	  if (result.set_rtx == NULL_RTX || GET_CODE (result.set_rtx) != SET)
+	return result;
+
+	  result.set_src = XEXP (result.set_rtx, 1);
+	  return result;
+	}
+}
+
+  return result;
+}
+
+/* Given an op rtx and an insn, try to find out whether the result of the
+   specified op consists only of logical operations on T bit stores.  */
+bool
+sh_is_logical_t_store_expr (rtx op, rtx insn)
+{
+  if (!logical_operator (op, SImode))
+return false;
+
+  rtx ops[2] = { XEXP (op, 0), XEXP (op, 1) };
+  int op_is_t_count = 0;
+
+  for (int i = 0; i < 2; ++i)
+{
+  if (t_reg_operand (ops[i], VOIDmode)
+	  || negt_reg_operand (ops[i], VOIDmode))
+	op_is_t_count++;
+
+  else
+	{
+	  set_of_reg op_set = sh_find_set_of_reg (ops[i], insn,
+		  prev_nonnote_insn_bb);
+	  if (op_set.set_src == NULL_RTX)
+	continue;
+
+	  if (t_reg_operand (op_set.set_src, VOIDmode)
+	  || negt_reg_operand (op_set.set_src, VOIDmode)
+	  || sh_is_logical_t_store_expr (op_set.set_src, op_set.insn))
+	  op_is_t_count++;
+	}
+}
+  
+  return op_is_t_count == 2;
+}
+
+/* Given the operand that is extended in a sign/zero extend insn, and the
+   insn, try to figure out whether the sign/zero extension can be replaced
+   by a simple reg-reg copy.  If so, the replacement reg rtx is returned,
+   NULL_RTX otherwise.  */
+rtx
+sh_try_omit_signzero_extend (rtx ext

[SH] PR 54925 - Add test case

2012-10-15 Thread Oleg Endo

Hello,

This adds the test case from the PR.
Tested together with the patch posted here
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01380.html

OK?

Cheers,
Oleg

testsuite/ChangeLog:

PR target/54925
* gcc.c-torture/compile/pr54925.c: New.
Index: gcc/testsuite/gcc.c-torture/compile/pr54925.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr54925.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/pr54925.c	(revision 0)
@@ -0,0 +1,24 @@
+/* PR target/54925  */
+extern int bar;
+static unsigned char *
+nr_memcpy (unsigned char *, unsigned char *, unsigned short);
+
+void 
+baz (char *buf, unsigned short len)
+{
+  unsigned char data[10];
+  if (len == 0)
+return;
+  nr_memcpy (data, (unsigned char *) buf, len);
+  foo (&bar);
+}
+
+static unsigned char *
+nr_memcpy (unsigned char * to, unsigned char * from, unsigned short len)
+{
+  while (len > 0)
+{
+  len--;
+  *to++ = *from++;
+}
+}

Re: [SH] PR 54925 - Add test case

2012-10-15 Thread Oleg Endo

On Mon, 2012-10-15 at 20:37 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > This adds the test case from the PR.
> > Tested together with the patch posted here
> > http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01380.html
> > 
> > OK?
> 
> It would be better to make it a valid C program.  I've checked
> that the test case with the change below also ICEs on revision
> 192446 for sh-linux and your another patch fixes it.  OK with
> that change.
> 
> Regards,
>   kaz
> --
> --- gcc.c-torture/compile/pr54925.c~  2012-10-15 20:00:50.0 +0900
> +++ gcc.c-torture/compile/pr54925.c   2012-10-15 20:01:03.0 +0900
> @@ -1,5 +1,6 @@
>  /* PR target/54925  */
>  extern int bar;
> +extern void foo (int *);
>  static unsigned char *
>  nr_memcpy (unsigned char *, unsigned char *, unsigned short);
>  
> @@ -16,9 +17,11 @@ baz (char *buf, unsigned short len)
>  static unsigned char *
>  nr_memcpy (unsigned char * to, unsigned char * from, unsigned short len)
>  {
> +  unsigned char *p = to;
>while (len > 0)
>  {
>len--;
>*to++ = *from++;
>  }
> +  return p;
>  }

Thanks for checking it!  Committed with the change as rev 192482.

Cheers,
Oleg

Re: [wwwdocs] SH 4.8 changes - document thread pointer built-ins

2012-10-16 Thread Oleg Endo

On Wed, 2012-10-17 at 00:45 +0200, Gerald Pfeifer wrote:
> On Tue, 9 Oct 2012, Oleg Endo wrote:
> > This documents the new thread pointer built-ins in the SH www changes
> > for 4.8.
> 
> Thanks, Oleg.
> 
> I've got one change and one question:
> 
> +Added support for the built-in functions
> +__builtin_thread_pointer and
> +__builtin_set_thread_pointer.  This assumes that
> +GBR is used to hold the thread pointer of the current 
> thread,
> +which has been the case since a while already. 
> 
> "since a while" -> "for a while", and I made that change.
> That said, why is this important, and is there a fixed date or version?

It might be important for some embedded systems software that does not
use the GBR for storing the thread pointer, but for something else (like
a pointer to some global table of frequently used stuff or something
like that).  I just thought it might be better to mention this.  But
you're right, the last "for a while" part sounds strange, and should
probably just be removed, reducing it to "This assumes that
GBR is used to hold the thread pointer of the current
thread."

> 
> +Memory loads and stores
> +relative to the address returned by __builtin_thread_pointer
> +will now also utilize GBR based displacement address modes.
> 
> Why do these _now_ utilize these address modes, when per the above
> __builtin_thread_pointer was just added?  This last sentence implies
> a change when there does not seem to be one?

Because before GCC did not utilize GBR addressing modes on SH at all.
Now it can do that, if the base address is obtained via
__builtin_thread_pointer.  Does that make sense? :)

Cheers,
Oleg

[SH, committed] PR 55042

2012-10-27 Thread Oleg Endo

Hello,

I've committed the obvious fix for PR 55042 as rev 192877.

Cheers,
Oleg

gcc/ChangeLog:

PR target/55042
* config/sh/sh.c (sh1_builtin_p): Comment out unused function.

Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192482)
+++ gcc/config/sh/sh.c	(working copy)
@@ -11587,11 +11587,14 @@
   return TARGET_SHMEDIA;
 }
 
+/* This function can be used if there are any built-ins that are not for
+   SHmedia.  It's commented out to avoid the defined-but-unused warning.
 static bool
 sh1_builtin_p (void)
 {
   return TARGET_SH1;
 }
+*/
 
 /* describe number and signedness of arguments; arg[0] == result
(1: unsigned, 2: signed, 4: don't care, 8: pointer 0: no argument */

[SH] PR 54988

2012-10-29 Thread Oleg Endo

Hello,

This fixes the issues of PR 54988.
Tested on rev 192482 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.

Cheers,
Oleg

gcc/ChangeLog:

PR target/54988
* config/sh/sh.md (tstqi_t_zero): Rename to *tstqi_t_zero.
(*tst_t_zero): New insns.
* config/sh/iterators.md (lowpart_be, lowpart_le): New mode 
attributes.

testsuite/ChangeLog:

PR target/54988
gcc.target/sh/pr53988.c: New.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192482)
+++ gcc/config/sh/sh.md	(working copy)
@@ -633,13 +633,39 @@
 ;; Test low QI subreg against zero.
 ;; This avoids unnecessary zero extension before the test.
 
-(define_insn "tstqi_t_zero"
+(define_insn "*tstqi_t_zero"
   [(set (reg:SI T_REG)
 	(eq:SI (match_operand:QI 0 "logical_operand" "z") (const_int 0)))]
   "TARGET_SH1"
   "tst	#255,%0"
   [(set_attr "type" "mt_group")])
 
+;; This pattern might be risky because it also tests the upper bits and not
+;; only the subreg.  However, it seems that combine will get to this only
+;; when testing sign/zero extended values.  In this case the extended upper
+;; bits do not matter.
+(define_insn "*tst_t_zero"
+  [(set (reg:SI T_REG)
+	(eq:SI
+	  (subreg:QIHI
+	(and:SI (match_operand:SI 0 "arith_reg_operand" "%r")
+		(match_operand:SI 1 "arith_reg_operand" "r")) )
+	  (const_int 0)))]
+  "TARGET_SH1 && TARGET_LITTLE_ENDIAN"
+  "tst	%0,%1"
+  [(set_attr "type" "mt_group")])
+
+(define_insn "*tst_t_zero"
+  [(set (reg:SI T_REG)
+	(eq:SI
+	  (subreg:QIHI
+	(and:SI (match_operand:SI 0 "arith_reg_operand" "%r")
+		(match_operand:SI 1 "arith_reg_operand" "r")) )
+	  (const_int 0)))]
+  "TARGET_SH1 && !TARGET_LITTLE_ENDIAN"
+  "tst	%0,%1"
+  [(set_attr "type" "mt_group")])
+
 ;; Extract LSB, negate and store in T bit.
 
 (define_insn "tstsi_t_and_not"
@@ -3514,7 +3540,7 @@
   /* If it is possible to turn the and insn into a zero extension
  already, redundant zero extensions will be folded, which results
  in better code.  
- Ideally the splitter of *andsi_compact would be enough, if reundant
+ Ideally the splitter of *andsi_compact would be enough, if redundant
  zero extensions were detected after the combine pass, which does not
  happen at the moment.  */
   if (TARGET_SH1)
Index: gcc/config/sh/iterators.md
===
--- gcc/config/sh/iterators.md	(revision 192482)
+++ gcc/config/sh/iterators.md	(working copy)
@@ -38,3 +38,6 @@
 ;; Return codes.
 (define_code_iterator any_return [return simple_return])
 
+;; Lowpart subreg byte position code attributes for big and little endian.
+(define_mode_attr lowpart_be [(QI "3") (HI "2")])
+(define_mode_attr lowpart_le [(QI "0") (HI "0")])
Index: gcc/testsuite/gcc.target/sh/pr53988.c
===
--- gcc/testsuite/gcc.target/sh/pr53988.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr53988.c	(revision 0)
@@ -0,0 +1,74 @@
+/* Check that the tst Rm,Rn instruction is generated for QImode and HImode
+   values loaded from memory.  If everything goes as expected we won't see
+   any sign/zero extensions or and ops.  On SH2A we don't expect to see the
+   movu insn.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } }  */
+/* { dg-final { scan-assembler-times "tst\tr" 8 } } */
+/* { dg-final { scan-assembler-not "tst\t#255" } } */
+/* { dg-final { scan-assembler-not "exts|extu|and|movu" } } */
+
+int
+test00 (char* a, char* b, int c, int d)
+{
+  if (*a & *b)
+return c;
+  return d;
+}
+
+int
+test01 (unsigned char* a, unsigned char* b, int c, int d)
+{
+  if (*a & *b)
+return c;
+  return d;
+}
+
+int
+test02 (short* a, short* b, int c, int d)
+{
+  if (*a & *b)
+return c;
+  return d;
+}
+
+int
+test03 (unsigned short* a, unsigned short* b, int c, int d)
+{
+  if (*a & *b)
+return c;
+  return d;
+}
+
+int
+test04 (char* a, short* b, int c, int d)
+{
+  if (*a & *b)
+return c;
+  return d;
+}
+
+int
+test05 (short* a, char* b, int c, int d)
+{
+  if (*a & *b)
+return c;
+  return d;
+}
+
+int
+test06 (int* a, char* b, int c, int d)
+{
+  if (*a & *b)
+return c;
+  return d;
+}
+
+int
+test07 (int* a, short* b, int c, int d)
+{
+  if (*a & *b)
+return c;
+  return d;
+}

[SH, committed] PR 54963

2012-10-30 Thread Oleg Endo

Hello,

This is the latest proposed patch from the PR.
Tested on rev 192482 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
Pre-approved by Kaz in the PR.
Committed as rev 192983.

Cheers,
Oleg

gcc/ChangeLog:

PR target/54963
* config/sh/iterators.md (SIDI): New mode iterator.
* config/sh/sh.md (negdi2): Use parallel around operation and
T_REG clobber in expander.
(*negdi2): Mark output operand as early clobbered.
Add T_REG clobber.  Split after reload.  Simplify split code.
(abssi2, absdi2): Fold expanders into abs2.
(*abssi2, *absdi2): Fold into *abs2 insn_and_split.
Split insns before reload.
(*negabssi2, *negabsdi2): Fold into *negabs2.
Add T_REG clobber.  Split insns before reload.
(negsi_cond): Reformat.  Use emit_move_insn instead of
gen_movesi.
(negdi_cond): Reformat.  Use emit_move_insn instead of a pair
of gen_movsi.  Split insn before reload.

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192482)
+++ gcc/config/sh/sh.md	(working copy)
@@ -5177,28 +5177,25 @@
 ;; Don't expand immediately because otherwise neg:DI (abs:DI) will not be
 ;; combined.
 (define_expand "negdi2"
-  [(set (match_operand:DI 0 "arith_reg_dest" "")
-	(neg:DI (match_operand:DI 1 "arith_reg_operand" "")))
-   (clobber (reg:SI T_REG))]
-  "TARGET_SH1"
-  "")
+  [(parallel [(set (match_operand:DI 0 "arith_reg_dest")
+		   (neg:DI (match_operand:DI 1 "arith_reg_operand")))
+	  (clobber (reg:SI T_REG))])]
+  "TARGET_SH1")
 
 (define_insn_and_split "*negdi2"
-  [(set (match_operand:DI 0 "arith_reg_dest" "=r")
-	(neg:DI (match_operand:DI 1 "arith_reg_operand" "r")))]
+  [(set (match_operand:DI 0 "arith_reg_dest" "=&r")
+	(neg:DI (match_operand:DI 1 "arith_reg_operand" "r")))
+   (clobber (reg:SI T_REG))]
   "TARGET_SH1"
   "#"
-  "TARGET_SH1"
+  "&& reload_completed"
   [(const_int 0)]
 {
-  rtx low_src = gen_lowpart (SImode, operands[1]);
-  rtx high_src = gen_highpart (SImode, operands[1]);
-  rtx low_dst = gen_lowpart (SImode, operands[0]);
-  rtx high_dst = gen_highpart (SImode, operands[0]);
-
   emit_insn (gen_clrt ());
-  emit_insn (gen_negc (low_dst, low_src));
-  emit_insn (gen_negc (high_dst, high_src));
+  emit_insn (gen_negc (gen_lowpart (SImode, operands[0]),
+		   gen_lowpart (SImode, operands[1])));
+  emit_insn (gen_negc (gen_highpart (SImode, operands[0]),
+		   gen_highpart (SImode, operands[1])));
   DONE;
 })
 
@@ -5272,38 +5269,53 @@
 		(const_int -1)))]
   "TARGET_SHMEDIA" "")
 
-(define_expand "abssi2"
-  [(set (match_operand:SI 0 "arith_reg_dest" "")
-  	(abs:SI (match_operand:SI 1 "arith_reg_operand" "")))
+(define_expand "abs2"
+  [(parallel [(set (match_operand:SIDI 0 "arith_reg_dest")
+		   (abs:SIDI (match_operand:SIDI 1 "arith_reg_operand")))
+	  (clobber (reg:SI T_REG))])]
+  "TARGET_SH1")
+
+(define_insn_and_split "*abs2"
+  [(set (match_operand:SIDI 0 "arith_reg_dest")
+  	(abs:SIDI (match_operand:SIDI 1 "arith_reg_operand")))
(clobber (reg:SI T_REG))]
   "TARGET_SH1"
-  "")
-
-(define_insn_and_split "*abssi2"
-  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
-  	(abs:SI (match_operand:SI 1 "arith_reg_operand" "r")))]
-  "TARGET_SH1"
   "#"
-  "TARGET_SH1"
+  "&& can_create_pseudo_p ()"
   [(const_int 0)]
 {
-  emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
-  emit_insn (gen_negsi_cond (operands[0], operands[1], operands[1],
-		 const1_rtx));
+  if (mode == SImode)
+emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
+  else
+{
+  rtx high_src = gen_highpart (SImode, operands[1]);
+  emit_insn (gen_cmpgesi_t (high_src, const0_rtx));
+}
+
+  emit_insn (gen_neg_cond (operands[0], operands[1], operands[1],
+ const1_rtx));
   DONE;
 })
 
-(define_insn_and_split "*negabssi2"
-  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
-  	(neg:SI (abs:SI (match_operand:SI 1 "arith_reg_operand" "r"]
+(define_insn_and_split "*negabs2"
+  [(set (match_operand:SIDI 0 "arith_reg_dest")
+	(neg:SIDI (abs:SIDI (match_operand:SIDI 1 "arith_reg_operand"
+   (clobber (reg:SI T_REG))]
   "TARGET_SH1"
   "#"
-  "TARGET_SH1"
+  "&& can_create_pseudo_p ()"
   [(const_int 0)]
 {
-  emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
-  emit_insn (gen_negsi_cond (operands[0], operands[1], operands[1],
-		 const0_rtx));
+  if (mode == SImode)
+emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
+  else
+{
+  rtx high_src = gen_highpart (SImode, operands[1]);
+  emit_insn (gen_cmpgesi_t (high_src, const0_rtx));
+}
+
+  emit_insn (gen_neg_cond (operands[0], operands[1], operands[1],
+ const0_rtx));
   DONE;
 })
 
@@ -5316,10 +5328,10 @@
 
 (define_insn_and_split "negsi_cond"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
-	(if_then_else:SI (eq:SI (reg

Re: Fix bugs introduced by switch-case profile propagation

2012-10-31 Thread Oleg Endo

On Fri, 2012-10-26 at 17:05 +0200, Jan Hubicka wrote:
> > Hi,
> > 
> > On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka  wrote:
> > >> Ping.
> > >>
> > >>
> > >> On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman  
> > >> wrote:
> > >> > Hi,
> > >> >  This patch fixes bugs introduced by my previous patch to propagate
> > >> > profiles during switch expansion. Bootstrap and profiledbootstrap
> > >> > successful on x86_64. Confirmed that it fixes the crashes reported in
> > >> > PR middle-end/54957. OK for trunk?
> > >> >
> > >> > - Easwaran
> > >> >
> > >> > 2012-10-17   Easwaran Raman  
> > >> >
> > >> > PR target/54938
> > >> > PR middle-end/54957
> > >> > * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
> > >> > only if it doesn't already exist.
> > >> > * except.c (sjlj_emit_function_enter): Remove unused variable.
> > >> > * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
> > >
> > > Seems fine, but under what conditions you get NULL here?
> > Wasn't sure if this is an OK for the patch or if I need to address
> > anything else.
> 
> Actually I think you should make the except.c to setcurrent_bb when expanding
> the switch instead.
> OK with this change.

Is there any progress regarding this issue?
It makes testing on SH against current trunk difficult.  Would it be OK
to at least commit this hunk?

Index: gcc/optabs.c
===
--- gcc/optabs.c(revision 192963)
+++ gcc/optabs.c(working copy)
@@ -4270,8 +4270,8 @@
   && JUMP_P (insn)
   && any_condjump_p (insn))
 {
-  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
-  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
+  if (!find_reg_note (insn, REG_BR_PROB, 0))
+add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
 }
 }
 

I've tested this on rev 192983 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and it looks OK.

Cheers,
Oleg

[SH] PR 51244 - Fix defects introduced in 4.8

2013-10-04 Thread Oleg Endo

Hello,

Some of the things I've done in 4.8 to improve SH T bit handling turned
out to produce wrong code.  The attached patch fixes that by introducing
an SH specific RTL pass.

Tested on rev 202876 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
Additional test cases will follow.
OK for trunk?

Cheers,
Oleg

gcc/ChangeLog:

PR target/51244
* config/sh/ifcvt_sh.cc: New SH specific RTL pass.
* config.gcc (SH extra_objs): Add ifcvt_sh.o.
* config/sh/t-sh (ifcvt_sh.o): New entry.
* config/sh/sh.c (sh_fixed_condition_code_regs): New function 
that implements the target hook TARGET_FIXED_CONDITION_CODE_REGS.
(register_sh_passes): New function.  Register ifcvt_sh pass.
(sh_option_override): Invoke it.
(sh_canonicalize_comparison): Handle op0_preserve_value.
* sh.md (*cbranch_t"): Do not try to optimize missed test and 
branch opportunities.  Canonicalize branch condition.
(nott): Allow only if pseudos can be created for non-SH2A.
Index: gcc/config.gcc
===
--- gcc/config.gcc	(revision 202876)
+++ gcc/config.gcc	(working copy)
@@ -462,6 +462,7 @@
 	cpu_type=sh
 	need_64bit_hwint=yes
 	extra_options="${extra_options} fused-madd.opt"
+	extra_objs="${extra_objs} ifcvt_sh.o"
 	;;
 v850*-*-*)
 	cpu_type=v850
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 202876)
+++ gcc/config/sh/sh.c	(working copy)
@@ -53,6 +53,9 @@
 #include "alloc-pool.h"
 #include "tm-constrs.h"
 #include "opts.h"
+#include "tree-pass.h"
+#include "pass_manager.h"
+#include "context.h"
 
 #include 
 #include 
@@ -311,6 +314,7 @@
 static void sh_canonicalize_comparison (int *, rtx *, rtx *, bool);
 static void sh_canonicalize_comparison (enum rtx_code&, rtx&, rtx&,
 	enum machine_mode, bool);
+static bool sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2);
 
 static void sh_init_sync_libfuncs (void) ATTRIBUTE_UNUSED;
 
@@ -587,6 +591,9 @@
 #undef TARGET_CANONICALIZE_COMPARISON
 #define TARGET_CANONICALIZE_COMPARISON	sh_canonicalize_comparison
 
+#undef TARGET_FIXED_CONDITION_CODE_REGS
+#define TARGET_FIXED_CONDITION_CODE_REGS sh_fixed_condition_code_regs
+
 /* Machine-specific symbol_ref flags.  */
 #define SYMBOL_FLAG_FUNCVEC_FUNCTION	(SYMBOL_FLAG_MACH_DEP << 0)
 
@@ -710,6 +717,34 @@
 #undef err_ret
 }
 
+/* Register SH specific RTL passes.  */
+extern opt_pass* make_pass_ifcvt_sh (gcc::context* ctx, bool split_insns,
+ const char* name);
+static void
+register_sh_passes (void)
+{
+  if (!TARGET_SH1)
+return;
+
+/* Running the ifcvt_sh pass after ce1 generates better code when
+   comparisons are combined and reg-reg moves are introduced, because
+   reg-reg moves will be eliminated afterwards.  However, there are quite
+   some cases where combine will be unable to fold comparison related insns,
+   thus for now don't do it.
+  register_pass (make_pass_ifcvt_sh (g, false, "ifcvt1_sh"),
+		 PASS_POS_INSERT_AFTER, "ce1", 1);
+*/
+
+  /*  Run ifcvt_sh pass after combine but before register allocation.  */
+  register_pass (make_pass_ifcvt_sh (g, true, "ifcvt2_sh"),
+		 PASS_POS_INSERT_AFTER, "split1", 1);
+
+  /* Run ifcvt_sh pass after register allocation and basic block reordering
+ as this sometimes creates new opportunities.  */
+  register_pass (make_pass_ifcvt_sh (g, true, "ifcvt3_sh"),
+		 PASS_POS_INSERT_AFTER, "split4", 1);
+}
+
 /* Implement TARGET_OPTION_OVERRIDE macro.  Validate and override 
various options, and do some machine dependent initialization.  */
 static void
@@ -1022,6 +1057,8 @@
  target CPU.  */
   selected_atomic_model_
 = parse_validate_atomic_model_option (sh_atomic_model_str);
+
+  register_sh_passes ();
 }
 
 /* Print the operand address in x to the stream.  */
@@ -1908,7 +1945,7 @@
 static void
 sh_canonicalize_comparison (enum rtx_code& cmp, rtx& op0, rtx& op1,
 			enum machine_mode mode,
-			bool op0_preserve_value ATTRIBUTE_UNUSED)
+			bool op0_preserve_value)
 {
   /* When invoked from within the combine pass the mode is not specified,
  so try to get it from one of the operands.  */
@@ -1928,6 +1965,9 @@
   // Make sure that the constant operand is the second operand.
   if (CONST_INT_P (op0) && !CONST_INT_P (op1))
 {
+  if (op0_preserve_value)
+	return;
+
   std::swap (op0, op1);
   cmp = swap_condition (cmp);
 }
@@ -2016,6 +2056,14 @@
   *code = (int)tmp_code;
 }
 
+bool
+sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2)
+{
+  *p1 = T_REG;
+  *p2 = INVALID_REGNUM;
+  return true;
+}
+
 enum rtx_code
 prepare_cbranch_operands (rtx *operands, enum machine_mode mode,
 			  enum rtx_code comparison)
Index: gcc/config/sh/sh.md
===

Re: Using gen_int_mode instead of GEN_INT minor testsuite fallout on MIPS

2013-10-05 Thread Oleg Endo

On Fri, 2013-09-27 at 11:38 -0700, Mike Stump wrote:
> Can the sh people weigh in on this?  Are the PSI and PDI precisions 32 and 64?

PSI is used for representing FPSCR (floating point control register),
which has only max. 22 bits (as far as I know).

PDI is used on SH-5 for representing target address registers, which can
be anything between 32 and 64 bits (implementation defined, as far as I
understand).

> 
> On Sep 17, 2013, at 10:24 AM, Mike Stump  wrote:
> > On Sep 16, 2013, at 8:41 PM, DJ Delorie  wrote:
> >> m32c's PSImode is 24-bits, why does it have "32" in the macro?
> >> 
> >> /* 24-bit pointers, in 32-bit units */
> >> -PARTIAL_INT_MODE (SI);
> >> +PARTIAL_INT_MODE_NAME (SI, 32, PSI);
> > 
> > Sorry, fingers copied the wrong number.  Thanks for the catch.
> > 
> > 
>

Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Oleg Endo

On Mon, 2013-10-07 at 10:30 +0200, Christian Bruel wrote:
> Hi Oleg,
> 
> +/*
> +This pass tries to optimize for example this:
> + mov.l   @(4,r4),r1
> + tst r1,r1
> + movtr1
> + tst r1,r1
> + bt/s.L5
> +
> +into something simpler:
> + mov.l   @(4,r4),r1
> + tst r1,r1
> + bf/s.L5
> +
> +Such sequences can be identified by looking for conditional branches and
> +checking whether the ccreg is set before the conditional branch
> +by testing another register for != 0, which was set by a ccreg store.
> +This can be optimized by eliminating the redundant comparison and
> +inverting the branch condition.  There can be multiple comparisons in
> +different basic blocks that all end up in the redunant test insn before the
> +conditional branch.  Some example RTL ...
> +
> 
> Nice things to optimize the sequences when t-bit values are not
> recognized due to branches direction, I have 2 questions
> 
> 1) I find the name "if-conversion" for this pass a little bit forced,
> since you don't aim to remove branches. If looks more like some kind of
> extension value numbering. 

To be honest, I had some difficulty picking the name.
Maybe something like 'sh_tbit_combine' or 'sh_treg_combine' would be
better, or at least less confusing?  Suggestions are highly appreciated.

> 2) I'm wondering in which extend this case could be handled by a more
> global generic target value numbering to handle boolean operations.
> Maybe just a phasing problem as the branch directions are not yet
> computed in gimple-ssa, which would mean reworking in RTL ?

I don't know.  What the pass currently does looks like highly
specialized combine (try_eliminate_cstores) and a little bit of CSE
(try_combine_comparisons), I think.
The problems on SH are due to the way insns are expanded and the T bit
handling (SImode hardreg, etc), I guess.  Moreover, some of the SH insns
might split out cstores and inverted cstores after combine, which then
would result in those sequences.  Combine itself sometimes can catch
some of the cases, but not all, or at least not with a finite amount of
combine patterns ;)

Cheers,
Oleg

Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Oleg Endo

On Mon, 2013-10-07 at 07:44 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > Forgot to handle a case in function can_remove_cstore, thanks for
> > catching it.  Fixed in the attached patch and also added test cases.
> > Retested as before without new failures.
> 
> Ok for trunk.
> 
> > Yeah, right.  I've changed 'ifcvt_sh' to 'sh_ifcvt'.
> 
> >+  register_pass (make_pass_sh_ifcvt (g, false, "ifcvt1_sh"),
> >+ PASS_POS_INSERT_AFTER, "ce1", 1);
> >+*/
> 
> s/ifcvt1_sh/sh_ifcvt1/ might be better even in a comment.

Sorry, I missed one.  Will fix and resend the committed patch after we
have agreed on the pass name (see Christian's message).

Cheers,
Oleg

Re: [PATCH, SH] Add support for inlined builtin-strcmp (1/2)

2013-10-17 Thread Oleg Endo

Hi,

On Thu, 2013-10-17 at 16:13 +0200, Christian Bruel wrote:
> Hello,
> 
> This patch just reorganizes the SH code used for memory builtins into
> its own file, in preparation of the RTL strcmp hoisting in the next part.
> 

Since GCC is now being compiled as C++, it's probably better to name
newly added source files .cc instead of .c.  Could you please rename the
new file to sh-mem.cc?

Thanks,
Oleg

Re: [PATCH, SH] Add support for inlined builtin-strcmp (2/2)

2013-10-17 Thread Oleg Endo

Hi,

On Thu, 2013-10-17 at 16:15 +0200, Christian Bruel wrote:
> Hello,
> 
> This patch adds support to inline an optimized version of strcmp when
> not optimizing for size. The generated code makes use of the cmp/str
> instruction to test 4 bytes at a time when correctly aligned.
> 
> note that a new pattern was added to match the cmp/str instruction, but
> no attempt was made to catch it from combine.
> 
> This results in general cycles improvements (against both newlib and
> glibc implementations), one of which is a 10%  cycle improvement for a
> famous strcmp-biased "benchmark" starting with a D , but still standard.

Nice.

> This optimization  can be disabled with -fno-builtin-strcmp.
> 
> No regressions on sh4 in big and little endian, and sh2 (sh3, and sh4a
> are still running for big and little endian for sanity)
> 

I was wondering, in file sh-mem.c, the new function
'sh4_expand_cmpstr' ... why is it SH4-something?  It's a bit confusing,
since cmp/str has been around since ever (i.e. since SH1).  Maybe just
rename it to 'sh_expand_cmpstr' instead?  The function always returns
'true', so maybe just make it return 'void'?

Also, in the expander ...

+  [(set (match_operand:SI 0 "register_operand" "")
+   (compare:SI (match_operand:BLK 1 "memory_operand" "")

... no need to use empty "" constraints.

Cheers,
Oleg

Re: [PATCH, SH] Add support for inlined builtin-strcmp (1/2)

2013-10-19 Thread Oleg Endo

On Fri, 2013-10-18 at 09:38 +0200, Christian Bruel wrote:
> On 10/18/2013 12:53 AM, Oleg Endo wrote:
> > Hi,
> >
> > On Thu, 2013-10-17 at 16:13 +0200, Christian Bruel wrote:
> >> Hello,
> >>
> >> This patch just reorganizes the SH code used for memory builtins into
> >> its own file, in preparation of the RTL strcmp hoisting in the next part.
> >>
> > Since GCC is now being compiled as C++, it's probably better to name
> > newly added source files .cc instead of .c.  Could you please rename the
> > new file to sh-mem.cc?
> >
> > Thanks,
> > Oleg
> Hello Oleg,
> 
> I have no objection to rename a pure C file to a c++ suffixed file. 
> I'll conform to whatever
>  the general guidelines for pure C code is.
> 
> For now it doesn't seem to be the tendency.
> 
> grep -i "ew File" ChangeLog | grep .c:
> * gimple-builder.c: New File.
> * config/winnt-c.c: New file
> * ipa-profile.c: New file.
> * ubsan.c: New file.
> * ipa-devirt.c: New file.
> * vtable-verify.c: New file.
> * config/arm/aarch-common.c: ... here.  New file.
> * diagnostic-color.c: New file.
> * config/linux-android.c: New file.
> 

Yeah, the thing is that pure C files (.c) are also compiled as C++.
There is no distinction of how .c and .cc files are compiled.  Thus it's
quite easy for C++ code to "slip in", e.g. by follow up changes.  So I
think it's better to have new files checked in as .cc in the first place
in order to avoid more confusion.

> I haven't seen any reference to this in the GCC coding guidelines,
> should we prefer .cc, cxx, C,  cpp., c++.. ?

There was some discussion on the GCC list regarding this.
gcc/ChangeLog can be used as a summarizing conclusion:

2013-03-27  Gabriel Dos Reis  

* Makefile.in (.SUFFIXES): Add .cc.
(.c.o): Apply same recipe for implicit rule .cc.o.

2013-09-20  Basile Starynkevitch  

* gengtype.c (file_rules): Added rule for *.cc files.

2013-09-25  Tom Tromey  

* Makefile.in (CCDEPMODE, DEPDIR, depcomp, COMPILE.base)
(COMPILE, POSTCOMPILE): New variables.
(.cc.o .c.o): Use COMPILE, POSTCOMPILE.

> Also I'm wondering if there is any plan to rename all files in the tree
> so we have a consistent source tree.

It's been discussed a while ago on the GCC list.  The decision was not
to do it in order to avoid SVN mass renames for now.  See also:

http://gcc.gnu.org/ml/gcc/2012-08/msg00311.html
http://gcc.gnu.org/ml/gcc/2012-08/msg00312.html
http://gcc.gnu.org/ml/gcc/2012-08/msg00315.html

Cheers,
Oleg

Re: [PATCH, SH] Add support for inlined builtin-strcmp (2/2)

2013-10-19 Thread Oleg Endo

On Fri, 2013-10-18 at 09:59 +0200, Christian Bruel wrote:
> On 10/18/2013 01:05 AM, Oleg Endo wrote:
> > I was wondering, in file sh-mem.c, the new function
> > 'sh4_expand_cmpstr' ... why is it SH4-something?  It's a bit confusing,
> > since cmp/str has been around since ever (i.e. since SH1). Maybe just
> > rename it to 'sh_expand_cmpstr' instead?
> 
> Just historical. (SH4* are our primary SH platforms). The code is
> enabled/tested for all SH1 of course, I will  rename. Thanks .
> 
> >  Maybe just
> > rename it to 'sh_expand_cmpstr' instead?  The function always returns
> > 'true', so maybe just make it return 'void'?
> 
> yes, it's for genericity as I plan to reuse/specialize the code based on
> the count parameter for strncmp to be contributed next.

I already assumed so :)

> >
> > Also, in the expander ...
> >
> > +  [(set (match_operand:SI 0 "register_operand" "")
> > +   (compare:SI (match_operand:BLK 1 "memory_operand" "")
> >
> > ... no need to use empty "" constraints
> 
> OK, thanks

Could you also please remove the quotes around the preparation block:
  "
{
   if (! optimize_insn_for_size_p () && sh4_expand_cmpstr(operands))
  DONE;
   else FAIL;
}")

I've attached two test cases, tested with 
make -k check-gcc RUNTESTFLAGS="sh.exp=strcmp* --target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Could you please include them?

Cheers,
Oleg



Index: gcc/testsuite/gcc.target/sh/strcmp-2.c
===
--- gcc/testsuite/gcc.target/sh/strcmp-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/strcmp-2.c	(revision 0)
@@ -0,0 +1,13 @@
+/* Check that the __builtin_strcmp function is not inlined when optimizing
+   for size.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-Os" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*" } { "" } } */
+/* { dg-final { scan-assembler-not "cmp/str" } } */
+/* { dg-final { scan-assembler "jsr|jmp|bsr|bra" } } */
+
+int
+test00 (const char* a, const char* b)
+{
+  return __builtin_strcmp (a, b);
+}
Index: gcc/testsuite/gcc.target/sh/strcmp-1.c
===
--- gcc/testsuite/gcc.target/sh/strcmp-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/strcmp-1.c	(revision 0)
@@ -0,0 +1,12 @@
+/* Check that the __builtin_strcmp function is inlined utilizing cmp/str insn
+   when optimizing for speed.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O2" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*" } { "" } } */
+/* { dg-final { scan-assembler "cmp/str" } } */
+
+int
+test00 (const char* a, const char* b)
+{
+  return __builtin_strcmp (a, b);
+}

[SH, committed] Fix test pr54089-3.c

2013-10-19 Thread Oleg Endo

Hello,

The test case pr54089-3.c started to fail a while ago, because of the
broken test for load of constant 31.  Committed as obvious to trunk and
4.8 branch.

Cheers,
Oleg

testsuite/ChangeLog:
* gcc.target/sh/pr54089-3.c: Fix test for load of constant 31.

Index: gcc/testsuite/gcc.target/sh/pr54089-3.c
===
--- gcc/testsuite/gcc.target/sh/pr54089-3.c	(revision 203000)
+++ gcc/testsuite/gcc.target/sh/pr54089-3.c	(working copy)
@@ -5,7 +5,7 @@
 /* { dg-options "-O1" } */
 /* { dg-skip-if "" { "sh*-*-*" } { "*" } { "-m1*" "-m2" "-m2e*" } } */
 /* { dg-final { scan-assembler-not "and" } } */
-/* { dg-final { scan-assembler-not "31" } } */
+/* { dg-final { scan-assembler-not "#31" } } */
 
 int
 test00 (unsigned int a, int* b, int c, int* d, unsigned int e)

[SH] PR 52483 - Fix volatile mem stores

2013-10-22 Thread Oleg Endo

Hello,

The attached patch fixes volatile mem stores on SH so that they won't
result in redundant sign/zero extensions and will utilize available
addressing modes.  This is similar to what has been done to fix memory
loads in http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01315.html

Tested on rev 203909 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
However, the test result summary showed a bunch of "WARNING: program
timed out."  Kaz, could you please add it to your test queue and let me
know if it's OK for trunk?

Cheers,
Oleg

gcc/ChangeLog:
PR target/52483
* config/sh/predicates.md (general_movdst_operand): Allow
reg+reg addressing, do not use general_operand for memory 
operands.

testsuite/ChangeLog:
PR target/52483
* gcc.target/sh/pr52483-1.c: Add tests for memory stores.
* gcc.target/sh/pr52483-2.c: Likewise.
* gcc.target/sh/pr52483-3.c: Likewise.
* gcc.target/sh/pr52483-4.c: Likewise.
Index: gcc/config/sh/predicates.md
===
--- gcc/config/sh/predicates.md	(revision 203857)
+++ gcc/config/sh/predicates.md	(working copy)
@@ -550,17 +550,36 @@
   && ! (reload_in_progress || reload_completed))
 return 0;
 
-  if ((mode == QImode || mode == HImode)
-  && mode == GET_MODE (op)
-  && (MEM_P (op)
-	  || (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)
+  if (mode == GET_MODE (op)
+  && (MEM_P (op) || (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)
 {
-  rtx x = XEXP ((MEM_P (op) ? op : SUBREG_REG (op)), 0);
+  rtx mem_rtx = MEM_P (op) ? op : SUBREG_REG (op);
+  rtx x = XEXP (mem_rtx, 0);
 
-  if (GET_CODE (x) == PLUS
+  if ((mode == QImode || mode == HImode)
+	  && GET_CODE (x) == PLUS
 	  && REG_P (XEXP (x, 0))
 	  && CONST_INT_P (XEXP (x, 1)))
 	return sh_legitimate_index_p (mode, XEXP (x, 1), TARGET_SH2A, false);
+
+  /* Allow reg+reg addressing here without validating the register
+	 numbers.  Usually one of the regs must be R0 or a pseudo reg.
+	 In some cases it can happen that arguments from hard regs are
+	 propagated directly into address expressions.  In this cases reload
+	 will have to fix it up later.  However, allow this only for native
+	 1, 2 or 4 byte addresses.  */
+  if (can_create_pseudo_p () && GET_CODE (x) == PLUS
+	  && GET_MODE_SIZE (mode) <= 4
+	  && REG_P (XEXP (x, 0)) && REG_P (XEXP (x, 1)))
+	return true;
+
+  /* 'general_operand' does not allow volatile mems during RTL expansion to
+	 avoid matching arithmetic that operates on mems, it seems.
+	 On SH this leads to redundant sign extensions for QImode or HImode
+	 stores.  Thus we mimic the behavior but allow volatile mems.  */
+if (memory_address_addr_space_p (GET_MODE (mem_rtx), x,
+	 MEM_ADDR_SPACE (mem_rtx)))
+	  return true;
 }
 
   return general_operand (op, mode);
Index: gcc/testsuite/gcc.target/sh/pr52483-1.c
===
--- gcc/testsuite/gcc.target/sh/pr52483-1.c	(revision 203857)
+++ gcc/testsuite/gcc.target/sh/pr52483-1.c	(working copy)
@@ -1,9 +1,9 @@
-/* Check that loads from volatile mems don't result in redundant sign
-   extensions.  */
+/* Check that loads/stores from/to volatile mems don't result in redundant
+   sign/zero extensions.  */
 /* { dg-do compile { target "sh*-*-*" } } */
-/* { dg-options "-O1" } */
+/* { dg-options "-O2" } */
 /* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } }  */
-/* { dg-final { scan-assembler-not "exts" } } */
+/* { dg-final { scan-assembler-not "exts|extu" } } */
 
 int
 test_00 (volatile char* x)
@@ -11,20 +11,44 @@
   return *x;
 }
 
+void
+test_100 (volatile char* x, char y)
+{
+  *x = y;
+}
+
 int
 test_01 (volatile short* x)
 {
   return *x;
 }
 
+void
+test_101 (volatile unsigned char* x, unsigned char y)
+{
+  *x = y;
+}
+
 int
 test_02 (volatile unsigned char* x)
 {
   return *x == 0x80;
 }
 
+void
+test_102 (volatile short* x, short y)
+{
+  *x = y;
+}
+
 int
 test_03 (volatile unsigned short* x)
 {
   return *x == 0xFF80;
 }
+
+void
+test_103 (volatile unsigned short* x, unsigned short y)
+{
+  *x = y;
+}
Index: gcc/testsuite/gcc.target/sh/pr52483-2.c
===
--- gcc/testsuite/gcc.target/sh/pr52483-2.c	(revision 203857)
+++ gcc/testsuite/gcc.target/sh/pr52483-2.c	(working copy)
@@ -1,14 +1,15 @@
-/* Check that loads from volatile mems utilize displacement addressing
-   modes and do not result in redundant sign extensions. */
+/* Check that loads/stores from/to volatile mems utilize displacement
+   addressing modes and do not result in redundant sign/zero extensions. */
 /* { dg-do compile { target "sh*-*-*" } } */
 /* { dg-options "-O1" } */
 /* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } }  */
-/* { dg-final { scan-assembler-times "@\\(5,"

1 2 3 4 5 6 7 8 >

1 - 100 of 763 matches

Mail list logo